Portable, repeatable, pseudo-random number generation in C++

Problem statement

I wanted to generate pseudo-random numbers in C++, in such a way that the generated numbers are the same every time the program runs (i.e. they are repeatable). I also want the program to be portable i.e. the program works the same across many compilers (and possibly OS-es).

This surprisingly hard to do well in C++.

First you my want to ask: "Why you have such requirements", and this is a valid question. Abseil Authors claim that:

The legitimate use cases for an eternally unchanging pseudorandom sequence are uncommon within Google.

Source: http://abseil.io/docs/cpp/guides/random#guarantees-provided-by-the-abseil-random-library

There are many reasons for you to want repeatable pseudorandom numbers:

  • Monte-Carlo simulation where you want results to be the same every time you run it;
  • Generation of X for a game;
  • Fuzzing (where you want to get repeatable results);

I want to write a random landscape generator (and I want to be able to debug each scene if errors happen).

State of recent C++ standard

The good parts

C++ standard defines an array of pseudo-random number generators, that are well defined repeatable and portable. So whenever I want a random uint32 C++ standard has me covered.

Moreover the C++ standard has defined a reasonable API for pseudo-random number generators, so std::mt19937 and absl::BitGen can be called in the same way.

The bad parts

The problem however is that you usually need more, for my project I needed to be able to:

  • Generate floating-point numbers from uniform distribution in arbitrary ranges;

    E.g. I want to draw double from range 0.003 to 0.012 where each number has the same probability;

  • Generate floating-point numbers from normal distribution;

  • Generate random bools;

All these are non-trivial to do properly, C++ standard helpfully provides types that implement important distributions e.g. std::uniform_real_distribution or std::normal_distributions, but these algorithms are non-portable. There are good reasons for this non-portability (i.e. there are multiple algorithms to generate normal distribution, each with different tradeoffs).

Even if stream of values obtained from std::mt19937 is repeatable and portable, stream of values obtained from std::normal_distributions can differ between compilers.

Solution

Solution is to use a third-party library, however I did not found a library that promises that results will be unchanged between library version changes.

I found two libraries that provide what I needed:

Both do not promise that results numbers drawn from distributions defined there will be repeatable across versions, and both use the same algorithm for normal distribution. So I decided to use Abseil (mainly due to the fact that abseil has supported Cmake and boost has only experimental support for Cmake).

Worst case I will freeze version of abseil that I use (or just freeze version of absl::random).

How to make sure random numbers you use are repeatable

You need to do extensive testing.

Currently for my (hobby!) project I test the following:

  1. For each wrapper for generating random numbers, I test that distributions coming from this wrapper are correct (I test mean and standard deviation).

  2. For each wrapper I check that generated random numbers are the same as "known good" values;

  3. I have end-to-end tests where whole process is tested against golden images.

    End-to-end tests are super-important, as I wouldn't have caught some bugs otherwise.

Bonus

For my project sometimes I needed to generate Pairs of numbers, so I had a class that takes two random variables, and wraps them in a pair, it looked like that:

// Base class for reference
template <typename Output>
class RandomVariable {
 public:
  virtual ~RandomVariable() = default;
  virtual Output Generate(RandomGenerator &random) const = 0;


  Output operator()(RandomGenerator &random) const { return Generate(random); }
};


template <typename RealScalar>
class RandomPairImpl : public RandomPair<RealScalar> {
 public:
  explicit RandomPairImpl(std::shared_ptr<RandomVariable<RealScalar>> left,
                          std::shared_ptr<RandomVariable<RealScalar>> right)
      : left_(std::move(left)), right_(std::move(right)) {}
  ~RandomPairImpl() override = default;

  std::tuple<RealScalar, RealScalar> Generate(
      RandomGenerator &random) const override {
    return std::make_tuple(left_->Generate(random), right_->Generate(random));
  }

protected:
  std::shared_ptr<RandomVariable<RealScalar>> left_;
  std::shared_ptr<RandomVariable<RealScalar>> right_;
};

Can you spot the bug that makes the code non-portable?

The bug is as follows:

  • Calls to RandomVariable::Generate modify the random argument (passed as a reference). When you draw random number from a generator you need to modify it's state (as you expect next call to the random generator produce different random numbers).
  • This line: std::make_tuple(left_->Generate(random), right_->Generate(random)) modifies random two times, but the order of evaluation function parameters is undefined! And this was enough so my program returned different results on gcc and clang.

Solution was to replace the function with:

auto left = left_->Generate(random);
auto right = right_->Generate(random);
return std::make_tuple(left, right);