C++ compile-time string hashing with Boost.MPL

April 18, 2009 by Tor Brede Vekterli

Being able to automatically compute a string hash at compile time can be handy in several situations, such as using them in a switch-case statement, where only integer cases are allowed. The problem with using strings in C++ metaprogramming is that you cannot pass a string literal as a template argument, so hashing solutions have often been done with macro preprocessor magic instead (still with the end result being more of an unrolled hash computation loop rather than just the resulting hash integer itself). It's of course possible to pass argument sequences of single chars to templates, and subsequently calculate a hash using standard template recursion. A lot of single chars as template args do not really make for very pretty reading, however.

Recently, long-time Boost-contributor Eric Niebler has created a string type for the Boost MPL, which can currently be found in the Boost SVN trunk (or, if you're reading this from the glorious world of the future, in the regular Boost distribution. Remember to also pet your robot dog and thank your cyborg butler). The string literal restriction is still very much in place, but mpl::string uses the fairly unknown—and arguably somewhat implementation-defined—existence of multi-character literals to make things more manageable.

I'm going to show how using the MPL allows for quickly and elegantly creating a generalized hashing metafunction that uses the same hashing algorithm as Boost.Functional/Hash and can as a result be compared against runtime values computed by its functions. Please note that this code is something I threw together pretty quickly and doesn't have any accompanied tests, library packaging or anything of that nature. It has only been tested on Visual C++ 2008.

If you want to experiment with mpl::string on Boost 1.38 or below, and you're not synched up against the trunk branch, get char.hpp, char_fwd.hpp, string.hpp and limits/string.hpp from the mpl directory of the SVN trunk and copy them to your mpl directory in the boost include directory.

Creating a sequence hashing metafunction

If we look at the implementation of Boost.Functional/Hash, we can see that it performs hashing by maintaining a running "seed" value that is combined with the hash values of the elements to hash, yielding a new seed value that is again combined etc, until the last element has been processed, at which point the seed is the result value. We can easily convert this into a metafunction that accepts a seed and value (both MPL integral wrappers) and "returns" a new seed integral value as its type. This will compute the seed for a single integer.

As an mpl::string is a valid MPL forward sequence, to compute the hash of any arbitrary integer type sequence, we fold the sequence with a starting seed of 0 and using the combinator metafunction as the binary, forward operation. mpl::_1 and mpl::_2 are MPL placeholder expressions for the current seed and sequence element being folded, respectively. For each sequence element, the fold metafunction is applied and its result is used as the seed for the next application and so on. When the sequence has ended, the seed of the last application is our fold result value.

The code 

#include <boost/mpl/string.hpp>
#include <boost/mpl/fold.hpp>
#include <boost/mpl/size_t.hpp>

namespace meta
{
#pragma warning(push)
// disable addition overflow warning
#pragma warning(disable:4307)

template <typename Seed, typename Value>
struct hash_combine
{
  typedef mpl::size_t<
    Seed::value ^ (static_cast<std::size_t>(Value::value)
      + 0x9e3779b9 + (Seed::value << 6) + (Seed::value >> 2))
  > type;
};

#pragma warning(pop)

// Hash any sequence of integral wrapper types
template <typename Sequence>
struct hash_sequence
  : mpl::fold<
        Sequence
      , mpl::size_t<0>
      , hash_combine<mpl::_1, mpl::_2>
    >::type
{};

// For hashing std::strings et al that don't include the zero-terminator
template <typename String>
struct hash_string
  : hash_sequence<String>
{};

// Hash including terminating zero for char arrays
template <typename String>
struct hash_cstring
  : hash_combine<
        hash_sequence<String>
      , mpl::size_t<0>
    >::type
{};

} // namespace meta

Note that the string hash metafunction comes in two flavors: hash_string for std::strings and hash_cstring for character arrays (not pointers). This is because a character array is treated like any other array, and its sequence of hashed elements includes its terminating zero character. This is not the case with std::strings, and the two metafunction variants compensate for this.

We can now use the metafunction in places where integers are required, and if you look at the resulting machine code, they will have been replaced verbatim.

switch (boost::hash_value("bunnies"))
{
case meta::hash_cstring< mpl::string<'bunn','ies'> >::value:
  std::cout << "bunnies!\n";
  break;
case meta::hash_cstring< mpl::string<'bear','s'> >::value:
  std::cout << "bears!\n";
  break;
default:
  std::cout << "no hash matched\n";
}

As you can see, the mpl strings have to be separated into groups of a maximum of 4 characters, but still much more elegant than having to do it character by character. It may be noted that the introduction of user-defined literals in C++0x combined with constexpr functions should most likely make all this obsolete.

All code covered by the Boost 1.0 license.

If you found this article worth reading, you might also be interested in reading the C++ compile-time string parser prototype article.

Post a comment