Specifically, I've got a method picks n items from a list in such a way that a% of them meet one criterion, and b% meet a second, and so on. A simplified example would be to pick 5 items where 50% have a given property with the value 'true', and 50% 'false'; 50% of the time the method would return 2 true/3 false, and the other 50%, 3 true/2 false.
Statistically speaking, this means that over 100 runs, I should get about 250 true/250 false, but because of the randomness, 240/260 is entirely possible.
What's the best way to unit test this? I'm assuming that even though technically 300/200 is possible, it should probably fail the test if this happens. Is there a generally accepted tolerance for cases like this, and if so, how do you determine what that is?
Edit: In the code I'm working on, I don't have the luxury of using a pseudo-random number generator, or a mechanism of forcing it to balance out over time, as the lists that are picked out are generated on different machines. I need to be able to demonstrate that over time, the average number of items matching each criterion will tend to the required percentage.