What's the best way to unit test code that generates random output?

Question

Specifically, I've got a method picks n items from a list in such a way that a% of them meet one criterion, and b% meet a second, and so on. A simplified example would be to pick 5 items where 50% have a given property with the value 'true', and 50% 'false'; 50% of the time the method would return 2 true/3 false, and the other 50%, 3 true/2 false.

Statistically speaking, this means that over 100 runs, I should get about 250 true/250 false, but because of the randomness, 240/260 is entirely possible.

What's the best way to unit test this? I'm assuming that even though technically 300/200 is possible, it should probably fail the test if this happens. Is there a generally accepted tolerance for cases like this, and if so, how do you determine what that is?

Edit: In the code I'm working on, I don't have the luxury of using a pseudo-random number generator, or a mechanism of forcing it to balance out over time, as the lists that are picked out are generated on different machines. I need to be able to demonstrate that over time, the average number of items matching each criterion will tend to the required percentage.

in the example, do you want atleast/exactly 50% OR approx 50% ? To be more specific, what is the randomness in this test ? — Gishu, Jun 18, 2010 at 9:41
I suspect that you actually are not writing unit tests. If you are testing servers running on different machines, you are actually in system test. I just can tell you: if you have unit tests for the classes running on these servers, you don't need so much (detailed) system testing anymore. — Stefan Steinegger, Jun 18, 2010 at 10:13
No, the point is to confirm that the algorithm does pick 2/3 50% of the time, and 3/2 the other 50% (in this simple example), regardless of where it's used. It's not one system that's going to be producing this; Running it 1000 times on one machine is considered an adequate test to demonstrate that if it's run 10 times on 100 machines round the world, it'll average out appropriately, hence the reason I plan on unit testing it. — Flynn1179, Jun 18, 2010 at 11:51
Possible duplicate of Unit testing - how do I test a function that returns random output? — Raedwald, Jan 23, 2016 at 15:54

Stefan Steinegger · Accepted Answer · 2010-06-18 09:46:34Z

21

Random and statistics are not favored in unit tests. Unit tests should always return the same result. Always. Not mostly.

What you could do is trying to remove the random generator of the logic you are testing. Then you can mock the random generator and return predefined values.

Additional thoughts:

You could consider to change the implementation to make it more testable. Try to get as less random values as possible. You could for instance only get one random value to determine the deviation from the average distribution. This would be easy to test. If the random value is zero, you should get the exact distribution you expect in average. If the value is for instance 1.0, you miss the average by some defined factor, for instance by 10%. You could also implement some Gaussian distribution etc. I know this is not the topic here, but if you are free to implement it as you want, consider testability.

edited Jun 18, 2010 at 9:46

answered Jun 18, 2010 at 9:35

Stefan Steinegger

64.2k16 gold badges130 silver badges193 bronze badges

You wouldn't consider 'value must be < x' as a valid unit test? Anyway, as I clarified in an edit, it'll be different machines generating these lists, which makes any kind of forced distribution impossible :(
– Flynn1179
Jun 18, 2010 at 10:05
Randomness is perfectly permissible. Unless the randomness is testing different code paths, then this makes tracing the test hellish, and defeats the purpose of using them.
– Stefan Valianu
Jun 18, 2010 at 19:18
Yeah, I think I'm going to go with mocking the RNG to provide pre-determined values; my actual problem's a LOT more complex than my example, but I should be able to derive the expected result from a given list of 'random' numbers to test against.
– Flynn1179
Jun 24, 2010 at 23:15

Add a comment |

Chathuranga Chandrasekara · Accepted Answer · 2010-06-18 09:31:47Z

4

According to the Statistical information you have, determine a range instead of a particular single value as a result.

answered Jun 18, 2010 at 9:31

Chathuranga Chandrasekara

20.7k31 gold badges100 silver badges138 bronze badges

3

As long as you have any random in the test, it could still be false positive of false negative.
– Stefan Steinegger
Jun 18, 2010 at 9:35

Add a comment |

polygenelubricants · Accepted Answer · 2010-06-18 09:47:12Z

Many probabilistic algorithms in e.g. scientific computing use pseudo-random number generators, instead of a true random number generator. Even though they're not truly random, a carefully chosen pseudo-random number generator will do the job just fine.

One advantage of a pseudo-random number generator is that the random number sequence they produce is fully reproducible. Since the algorithm is deterministic, the same seed would always generate the same sequence. This is often the deciding factor why they're chosen in the first place, because experiments need to be repeatable, results reproducible.

This concept is also applicable for testing. Components can be designed such that you can plug in any source of random numbers. For testing, you can then use generators that are consistently seeded. The result would then be repeatable, which is suitable for testing.

Note that if in fact a true random number is needed, you can still test it this way, as long as the component features a pluggable source of random numbers. You can re-plug in the same sequence (which may be truly random if need be) to the same component for testing.

walkytalky · Accepted Answer · 2010-06-18 13:47:55Z

It seems to me there are at least three distinct things you want to test here:

The correctness of the procedure that generates an output using the random source
That the distribution of the random source is what you expect
That the distribution of the output is what you expect

1 should be deterministic and you can unit test it by supplying a chosen set of known "random" values and inputs and checking that it produces the known correct outputs. This would be easiest if you structure the code so that the random source is passed as an argument rather than embedded in the code.

2 and 3 cannot be tested absolutely. You can test to some chosen confidence level, but you must be prepared for such tests to fail in some fraction of cases. Probably the thing you really want to look out for is test 3 failing much more often than test 2, since that would suggest that your algorithm is wrong.

The tests to apply will depend on the expected distribution. For 2 you most likely expect the random source to be uniformly distributed. There are various tests for this, depending on how involved you want to be, see for example Tests for pseudo-random number generators on this page.

The expected distribution for 3 will depend very much on exactly what you're producing. The simple 50-50 case in the question is exactly equivalent to testing for a fair coin, but obviously other cases will be more complicated. If you can work out what the distribution should be, a chi-square test against it may help.

Kilian Foth · Accepted Answer · 2010-06-18 09:35:34Z

That depends on the use you make of your test suite. If you run it every few seconds because you embrace test-driven development and aggressive refactoring, then it is very important that it doesn't fail spuriously, because this causes major disruption and lowers productivity, so you should choose a threshold that is practically impossible to reach for a well-behaved implementation. If you run your tests once a night and have some time to investigate failures you can be much stricter.

Under no circumstances should you deploy something that will lead to frequent uninvestigated failures - this defeats the entire purpose of having a test suite, and dramatically reduces its value to the team.

Péter Török · Accepted Answer · 2010-06-18 09:39:11Z

You should test the distribution of results in a "single" unit test, i.e. that the result is as close to the desired distribution as possible in any individual run. For your example, 2 true / 3 false is OK, 4 true / 1 false is not OK as a result.

Also you could write tests which execute the method e.g. 100 times and checks that the average of the distributions is "close enough" to the desired rate. This is a borderline case - running bigger batches may take a significant amount of time, so you might want to run these tests separately from your "regular" unit tests. Also, as Stefan Steinegger points out, such a test is going to fail every now and then if you define "close enough" stricter, or start being meaningless if you define the threshold too loosely. So it is a tricky case...

AndersK · Accepted Answer · 2010-06-18 10:21:34Z

1

I think if I had the same problem I probably construct a confidence interval to detect anomalies if you have some statistics about average/stddev and such. So in your case if the average expected value is 250 then create a 95% confidence interval around the average using a normal distribution. If the results are outside that interval you fail the test.

answered Jun 18, 2010 at 10:21

AndersK

35.9k6 gold badges61 silver badges86 bronze badges

Add a comment |

Fanatic23 · Accepted Answer · 2010-06-18 10:54:43Z

0

Why not re-factor the random number generation code and let the unit test framework and the source code both use it? You are trying to test your algorithm and not the randomized sequence right?

answered Jun 18, 2010 at 10:54

Fanatic23

3,4182 gold badges29 silver badges51 bronze badges

Add a comment |

Grembo · Accepted Answer · 2010-06-18 19:15:24Z

First you have to know what distribution should result from your random number generation process. In your case you are generating a result which is either 0 or 1 with probability -0.5. This describes a binomial distribution with p=0.5.

Given the sample size of n, you can construct (as an earlier poster suggested) a confidence interval around the mean. You can also make various statements about the probability of getting, for instance, 240 or less of either outcome when n=500.

You could use a normal distribution assumption for values of N greater than 20 as long as p is not very large or very small. The Wikipedia post has more on this.

Collectives™ on Stack Overflow

What's the best way to unit test code that generates random output?

9 Answers 9

Your Answer

Not the answer you're looking for? Browse other questions tagged
unit-testing
statistics
probability
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged unit-testingstatisticsprobability or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
unit-testing
statistics
probability
or ask your own question.