Some random C++ gotchas

November 14, 2008 by Tor Brede Vekterli

C++ is an extremely powerful programming language, but due to its flexiblity and many features, it can take a long time to learn how to use it correctly. I'm going to try to write up some posts every now and then about aspects of the language that come to my mind that might not be immediately obvious to a lot of people, and if I'm lucky someone will actually learn something new from it :)

First up are some semantics of the standard delete-operator, how to properly treat initializer lists and a few words about return value optimizations and why C++0x will be awesome for this.

Update: for those reaching this post from searching Google for "random c++" et al, I assume you're looking for information on the generation of random numbers and not just random C++ nuggets :). For all your random number needs, please refer to the Boost.Random library, which comes packed with a veritable heap of algorithms, such as the famous Mersenne Twister.

delete does not have to be NULL-checked

The following is a common sight in destructors and cleanup code:

if (some_ptr)
  delete some_ptr;

Makes sense, right? Common C/C++ courtesy indicates that we wouldn't want to try to invoke an operator on a NULL pointer—that'd cast us into the pits of pagefault hellfire! Except that's not the case at all.

Let's turn to the C++ standard, §3.7.4.2 (basic.std.dynamic.deallocation), paragraph 3:

(…) The value of the first argument supplied to a deallocation function may be a null pointer value; if so, and if the deallocation function is one supplied in the standard library, the call has no effect. (…)

Emphasis mine. Note that the standard implies that this does not necessarily apply for custom deallocators, but in the vast majority of cases it should hold true.

Of course, you should be using a more or less smart pointer rather than rolling your own memory management for most cases anyway!

Initializer-lists do not specify order of initialization

Now this is a potentially nasty one! When providing an initialization list in a constructor, the actual order of the list-elements is ignored. What is actually used is the order in which member variables are declared in the class. Not being aware of this can lead to some subtle yet dangerous bugs when one member variable depends on the initialization of another. Consider the following delicious example:

class pizza
{
  pizza_base base_;
  topping topping_;

public:
  pizza(const topping& t)
    : topping_(t), base_(topping_)
  {}
};

The pizza_base takes in a reference to the topping that we store in our pizza object. But oh dear! The pizza_base object is actually initialized before the topping since it's declared before it, resulting in an object with an undefined reference being sent off to the pizza_base constructor! Pizza disaster! The chef blew up with the kitchen!

As you can probably imagine, this is how a proper pizza is made:

class pizza
{
  topping topping_; // Proper ordering
  pizza_base base_;

public:
  pizza(const topping& t)
    : topping_(t), base_(topping_)
  {}
};

Bon apetit!

Return value optimizations (or not)

Normally when you're writing C++ and using current (i.e. non-0x) compilers, you don't necessarily have to pay for the costs of copy-construction when returning an object by value from a function (such as std::vector<mybigobject_t> stuff()), as the compiler might be able to perform return value optimizations (RVO), allowing the returned object to be constructed directly in the caller, completely bypassing the need for a copy. This has its limitations, however, as the compiler is some times not capable of deducing when or how an optimization may safely be applied, causing it to fall back on copying.

The most common form of RVO is the unnamed return value optimization, which essentially takes the form return type(construction-args);, that is, the to-be-returned object is specified directly in the return-statement (and is as such unnamed, as it is not associated with any variable). Example:

std::string rvo_polka()
{
  return std::string("Three cheers for RVO");
}

...

// Returned string will be constructed directly into foo
std::string foo = rvo_polka();

A second, less supported form of RVO is the named return value optimization (NRVO). As you can probably deduce from the earlier description of its unnamed equivalent, this is when the to-be-returned value is stored in a variable before the actual return. Example:

std::string nrvo_waltz()
{
  std::string optimize_me("Four cheers for NRVO");
  return optimize_me;
}

...

std::string foo = nrvo_waltz();

You can find support for this in compilers like Visual C++ 2005 and beyond, but there exist several limitations as to when the optimization can be applied. Although regular RVO is easy to identify, NRVO can be more tricky. Either way, both imply that the actual behavior of your code will change between debug and release-builds (even moreso than usual), which can cause problems if your (copy-)constructors have side-effects.

Sidenote: There are of course alternatives to the return-by-copy debacle, such as passing a non-const reference to the function that is modified rather than there being a return value, or returning an auto_ptr with an object allocated on the heap. These are often sub-optimal, however (at least for the sake of this argument). Moving the returned value to a reference parameter just for performance reasons is not really semantically elegant (since the function is not "returning" anything afterall), and has that faint scent of premature optimization. Returning an auto_ptr (or similar) incurs a heap-allocation penalty and the need to handle pointers rather than just values in the caller (unless you want to return a pointer, of course, in which case you probably should be returning an auto_ptr!).

Clearly, what is needed is a more generalized solution!

Move semantics to the rescue

The major language upgrade that is C++0x brings along rvalues on the haul. Rvalues amongst other things allow us to identify and modify tempory objects, which in the end is what return value objects often are. For a good description of how this works, see this description of rvalues, move semantics and perfect forwarding. Suddenly returning a non-RVO'd 1,000,000 element vector becomes a mere matter of swapping a pointer and a size variable! This would also maintain any strong exception safety guarantees. How delightfully convenient!

Posted in:

Post a comment