Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
79 Answers
This is equivalent to Adam Rosenfield's solution, but may be a bit more clear for some readers. It assumes rand5() is a function that returns a statistically random integer in the range 1 through 5 inclusive.
int rand7()
{
int vals[5][5] = {
{ 1, 2, 3, 4, 5 },
{ 6, 7, 1, 2, 3 },
{ 4, 5, 6, 7, 1 },
{ 2, 3, 4, 5, 6 },
{ 7, 0, 0, 0, 0 }
};
int result = 0;
while (result == 0)
{
int i = rand5();
int j = rand5();
result = vals[i-1][j-1];
}
return result;
}
How does it work? Think of it like this: imagine printing out this double-dimension array on paper, tacking it up to a dart board and randomly throwing darts at it. If you hit a non-zero value, it's a statistically random value between 1 and 7, since there are an equal number of non-zero values to choose from. If you hit a zero, just keep throwing the dart until you hit a non-zero. That's what this code is doing: the i and j indexes randomly select a location on the dart board, and if we don't get a good result, we keep throwing darts.
Like Adam said, this can run forever in the worst case, but statistically the worst case never happens. :)
-
8I understood the logic behind this solution but can't comprehend that how does it result in uniform probability? Can someone explain the math? Nov 15, 2012 at 8:37
-
7@user1071840 - if
rand5
is uniform, every cell in thevals
grid has an equal probability of being picked. The grid contains exactly three copies of each integer in the interval [1, 7], plus four zeroes. So the "raw" stream of results tends to an even mixture of [1, 7] values, plus some zeroes that occur a tad more frequently than any individual allowed value. But that doesn't matter because the zeros are stripped out, leaving just an even mixture of [1, 7] values. Nov 22, 2012 at 15:40 -
4The shortcut way to realising the problem with that: if you're only calling rand5() once, then you only have 5 possible outcomes. There is obviously no way to turn that into more than 5 possible outcomes without adding more randomness. Dec 7, 2012 at 10:43
-
3The longer version: rand5() can only have the values (1, 2, 3, 4, 5). Therefore rand5() * 5 can only have the values (5, 10, 15, 20, 25), which is not the same as a complete range (1...25). If it did, subtracting 4 would make it (-3...21), but in this case it becomes (1, 6, 11, 16, 21), so the end points are correct but there are four big holes: (2..5), (7..10), (12..15), (17..21). Finally you do mod 7 and add 1, giving (2, 7, 5, 3, 1). So neither 4 nor 6 ever occur. But (see above shortcut) we knew there could only be 5 numbers in the resulting range all along, so there had to be two gaps. Dec 7, 2012 at 10:51
-
2
There is no (exactly correct) solution which will run in a constant amount of time, since 1/7 is an infinite decimal in base 5. One simple solution would be to use rejection sampling, e.g.:
int i;
do
{
i = 5 * (rand5() - 1) + rand5(); // i is now uniformly random between 1 and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1; // result is now uniformly random between 1 and 7
This has an expected runtime of 25/21 = 1.19 iterations of the loop, but there is an infinitesimally small probability of looping forever.
-
7the -1 is not needed if the >21 is flipped to >26 b/c it doesn't matter where i's lower bound maps to,– BCSJan 15, 2009 at 18:01
-
26My take on explaining why this is correct: Say that I want to write a program that outputs a stream of uniform random numbers from1 to 25; for that I'd just return 5 * (rand5() - 1) + rand5() as in the code in the answer. Now, if I want to build a stream of uniform random numbers between 1 and 21, if I just use the first stream but filter it so that numbers in [22, 25] are rejected, I can build that stream too. Next, if I take this stream and filter it so that for each element x I output x % 7 + 1, I have a stream of uniform random numbers from 1 to 7! Quite simple, isn't it? :D– PaggasMay 5, 2009 at 6:14
-
7And you're correct that it boils down to whether you want a perfect distribution with unbounded worst case runtime, or an imperfect distribution with a bounded runtime. This is a consequence of the fact that all powers 5 not divisible by 7, or equivalently if you have 5^n equally probably sequences of length n, there is no way to assign to each sequence a number from 1 to 7 such that each of 1..7 is equally probably. May 8, 2009 at 4:27
-
5@Jules Olléon: Suppose there were a solution running in constant time that was guaranteed to make no more than
N
calls torand5()
in the worst case. Then, there are 5^N possible outcomes of the sequence of calls torand5
, each of which has an output of 1-7. So, if you add up all of the possible sequences of calls whose output isk
for each 1≤k≤7, then the probability that the output isk
is m/5^N, where m is the number of such sequences. So, m/5^N = 1/7, but there are no possible integer solutions (N,m) to this ==> contradiction. Jan 30, 2011 at 19:45 -
4@paxdiablo: You are incorrect. The chance of a true RNG generating an infinite sequence of 5's is exactly 0, using similar reasoning to the fact that flipping a coin an infinite number of times is guaranteed not to generate an infinite number of consecutive heads. This also means the chance of this code looping forever is exactly 0 (though there is a positive chance it will loop for any arbitrary number of iterations). May 23, 2011 at 16:45
I'd like to add another answer, in addition to my first answer. This answer attempts to minimize the number of calls to rand5()
per call to rand7()
, to maximize the usage of randomness. That is, if you consider randomness to be a precious resource, we want to use as much of it as possible, without throwing away any random bits. This answer also has some similarities with the logic presented in Ivan's answer.
The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log2 N. Thus, rand5()
has approximately 2.32193 bits of entropy, and rand7()
has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5()
, and apply them to generating 2.80735 bits of entropy needed for each call to rand7()
. The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5()
per call to rand7()
.
Side notes: all logarithms in this answer will be base 2 unless specified otherwise. rand5()
will be assumed to return numbers in the range [0, 4], and rand7()
will be assumed to return numbers in the range [0, 6]. Adjusting the ranges to [1, 5] and [1, 7] respectively is trivial.
So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a
1a
2a
3..., where each digit ai
is chosen by a call to rand5()
. For example, if our RNG chose ai
= 1 for all i
, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/52 + 1/53 + ... = 1/4 (sum of a geometric series).
Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a
, b
] equals the length of that interval, b - a
. The proof is left as an exercise for the reader =).
Now that we have a random real number selected uniformly from the range [0, 1], we need to convert it to a series of uniformly random numbers in the range [0, 6] to generate the output of rand7()
. How do we do this? Just the reverse of what we just did -- we convert it to an infinitely precise decimal in base 7, and then each base 7 digit will correspond to one output of rand7()
.
Taking the example from earlier, if our rand5()
produces an infinite stream of 1's, then our random real number will be 1/4. Converting 1/4 to base 7, we get the infinite decimal 0.15151515..., so we will produce as output 1, 5, 1, 5, 1, 5, etc.
Ok, so we have the main idea, but we have two problems left: we can't actually compute or store an infinitely precise real number, so how do we deal with only a finite portion of it? Secondly, how do we actually convert it to base 7?
One way we can convert a number between 0 and 1 to base 7 is as follows:
- Multiply by 7
- The integral part of the result is the next base 7 digit
- Subtract off the integral part, leaving only the fractional part
- Goto step 1
To deal with the problem of infinite precision, we compute a partial result, and we also store an upper bound on what the result could be. That is, suppose we've called rand5()
twice and it returned 1 both times. The number we've generated so far is 0.11 (base 5). Whatever the rest of the infinite series of calls to rand5()
produce, the random real number we're generating will never be larger than 0.12: it is always true that 0.11 ≤ 0.11xyz... < 0.12.
So, keeping track of the current number so far, and the maximum value it could ever take, we convert both numbers to base 7. If they agree on the first k
digits, then we can safely output the next k
digits -- regardless of what the infinite stream of base 5 digits are, they will never affect the next k
digits of the base 7 representation!
And that's the algorithm -- to generate the next output of rand7()
, we generate only as many digits of rand5()
as we need to ensure that we know with certainty the value of the next digit in the conversion of the random real number to base 7. Here is a Python implementation, with a test harness:
import random
rand5_calls = 0
def rand5():
global rand5_calls
rand5_calls += 1
return random.randint(0, 4)
def rand7_gen():
state = 0
pow5 = 1
pow7 = 7
while True:
if state / pow5 == (state + pow7) / pow5:
result = state / pow5
state = (state - result * pow5) * 7
pow7 *= 7
yield result
else:
state = 5 * state + pow7 * rand5()
pow5 *= 5
if __name__ == '__main__':
r7 = rand7_gen()
N = 10000
x = list(next(r7) for i in range(N))
distr = [x.count(i) for i in range(7)]
expmean = N / 7.0
expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))
print '%d TRIALS' % N
print 'Expected mean: %.1f' % expmean
print 'Expected standard deviation: %.1f' % expstddev
print
print 'DISTRIBUTION:'
for i in range(7):
print '%d: %d (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
print
print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)
Note that rand7_gen()
returns a generator, since it has internal state involving the conversion of the number to base 7. The test harness calls next(r7)
10000 times to produce 10000 random numbers, and then it measures their distribution. Only integer math is used, so the results are exactly correct.
Also note that the numbers here get very big, very fast. Powers of 5 and 7 grow quickly. Hence, performance will start to degrade noticeably after generating lots of random numbers, due to bignum arithmetic. But remember here, my goal was to maximize the usage of random bits, not to maximize performance (although that is a secondary goal).
In one run of this, I made 12091 calls to rand5()
for 10000 calls to rand7()
, achieving the minimum of log(7)/log(5) calls on average to 4 significant figures, and the resulting output was uniform.
In order to port this code to a language that doesn't have arbitrarily large integers built-in, you'll have to cap the values of pow5
and pow7
to the maximum value of your native integral type -- if they get too big, then reset everything and start over. This will increase the average number of calls to rand5()
per call to rand7()
very slightly, but hopefully it shouldn't increase too much even for 32- or 64-bit integers.
-
7+1 for a really interesting answer. Would it be possible, rather than resetting at a certain value, to simply shift off bits that have been used, and move the other bits up, and basically only keeping the bits that are going to be used? Or am I missing something? May 21, 2009 at 3:54
-
1I'm not 100% sure, but I believe if you did that, you would skew the distribution ever so slightly (although I doubt that such skew would be measurable without trillions of trials). May 21, 2009 at 4:44
-
FTW! I tried to make the bignums smaller but it can't be done because no power of 5 has factors in common with a power of 7! Also, good use of the yield keyword. Very well done.– EyalSep 2, 2009 at 7:05
-
2Very nice! Can we retain the extra entropy without growing state? The trick is to notice that both upper- and lower- bounds are at all times rational numbers. We can add, subtract, and multiply these without losing precision. If we do it all in base-35, we're nearly there. The remainder (multiplying by seven and retaining the fractional part) is left as an exercise.– IanAug 14, 2011 at 8:27
-
1@Isaac: No, this doesn't have bounded running time. No exactly correct answer can have bounded running time. Mar 1, 2014 at 18:16
(I have stolen Adam Rosenfeld's answer and made it run about 7% faster.)
Assume that rand5() returns one of {0,1,2,3,4} with equal distribution and the goal is return {0,1,2,3,4,5,6} with equal distribution.
int rand7() {
i = 5 * rand5() + rand5();
max = 25;
//i is uniform among {0 ... max-1}
while(i < max%7) {
//i is uniform among {0 ... (max%7 - 1)}
i *= 5;
i += rand5(); //i is uniform {0 ... (((max%7)*5) - 1)}
max %= 7;
max *= 5; //once again, i is uniform among {0 ... max-1}
}
return(i%7);
}
We're keeping track of the largest value that the loop can make in the variable max
. If the reult so far is between max%7 and max-1 then the result will be uniformly distrubuted in that range. If not, we use the remainder, which is random between 0 and max%7-1, and another call to rand() to make a new number and a new max. Then we start again.
Edit: Expect number of times to call rand5() is x in this equation:
x = 2 * 21/25
+ 3 * 4/25 * 14/20
+ 4 * 4/25 * 6/20 * 28/30
+ 5 * 4/25 * 6/20 * 2/30 * 7/10
+ 6 * 4/25 * 6/20 * 2/30 * 3/10 * 14/15
+ (6+x) * 4/25 * 6/20 * 2/30 * 3/10 * 1/15
x = about 2.21 calls to rand5()
-
2Results cataloged in 1,000,000 tries: 1=47216; 2=127444; 3=141407; 4=221453; 5=127479; 6=167536; 7=167465. As you can see, distribution is lacking in respect to the odds of getting a 1.– Robert KJun 1, 2009 at 14:02
-
2@The Wicked Flea: I think you're mistaken. Are you sure that the input rand5() you were using for your test produced 0-4 instead of 1-5, as specified in this solution? Jun 10, 2009 at 0:38
-
5adding uniformly distributed numbers does not result in a uniformly distributed number. In fact, you only need to sum 6 such uniformly distributed variables to get a reasonable approximation to a normal distribution. Feb 15, 2013 at 8:12
-
2@MitchWheat - Adding two uniformly distributed integers does, in fact, result in a uniformly distributed random integer provided each possible sum can be generated in exactly one way. That happens to be the case in the expression
5 * rand5() + rand5()
.– Ted HoppJun 15, 2015 at 13:06
Algorithm:
7 can be represented in a sequence of 3 bits
Use rand(5) to randomly fill each bit with 0 or 1.
For e.g: call rand(5) and
if the result is 1 or 2, fill the bit with 0
if the result is 4 or 5, fill the bit with 1
if the result is 3 , then ignore and do it again (rejection)
This way we can fill 3 bits randomly with 0/1 and thus get a number from 1-7.
EDIT: This seems like the simplest and most efficient answer, so here's some code for it:
public static int random_7() {
int returnValue = 0;
while (returnValue == 0) {
for (int i = 1; i <= 3; i++) {
returnValue = (returnValue << 1) + random_5_output_2();
}
}
return returnValue;
}
private static int random_5_output_2() {
while (true) {
int flip = random_5();
if (flip < 3) {
return 0;
}
else if (flip > 3) {
return 1;
}
}
}
-
1There always the faint spectre of the halting problem, since a poor random number generator could just generate a lot of threes at some point. Apr 18, 2012 at 13:31
-
"if the result is 1 or 2, fill the bit with 0 if the result is 4 or 5, fill the bit with 1" What is the logic by which 1,2,4,5 were accepted and 3 was rejected? Can you explain this?– gknsDec 11, 2013 at 9:38
-
@gkns There is no logic, you could have 1 and 2 mean fill with 0 bit and 3 and 4 mean fill with 1. The important thing is that each option has 50% chances of occurring, thus guaranteeing that the randomness of your function is at least as random as the original rand(5) function. Its a great solution!– Mo BeigiApr 6, 2015 at 8:24
-
This is neither simple nor efficient. The number of cals to random_5 per random_7 is at best 3 usually more. Other solutions on this page are closer to the actually best which is around 2.2.– EyalSep 3, 2015 at 7:48
-
1
int randbit( void )
{
while( 1 )
{
int r = rand5();
if( r <= 4 ) return(r & 1);
}
}
int randint( int nbits )
{
int result = 0;
while( nbits-- )
{
result = (result<<1) | randbit();
}
return( result );
}
int rand7( void )
{
while( 1 )
{
int r = randint( 3 ) + 1;
if( r <= 7 ) return( r );
}
}
-
2A correct solution, making an average of 30/7 = 4.29 calls to rand5() per call to rand7(). May 8, 2009 at 3:30
rand7() = (rand5()+rand5()+rand5()+rand5()+rand5()+rand5()+rand5())%7+1
Edit: That doesn't quite work. It's off by about 2 parts in 1000 (assuming a perfect rand5). The buckets get:
value Count Error%
1 11158 -0.0035
2 11144 -0.0214
3 11144 -0.0214
4 11158 -0.0035
5 11172 +0.0144
6 11177 +0.0208
7 11172 +0.0144
By switching to a sum of
n Error%
10 +/- 1e-3,
12 +/- 1e-4,
14 +/- 1e-5,
16 +/- 1e-6,
...
28 +/- 3e-11
seems to gain an order of magnitude for every 2 added
BTW: the table of errors above was not generated via sampling but by the following recurrence relation:
p[x,n]
is the number waysoutput=x
can happen givenn
calls torand5
.
p[1,1] ... p[5,1] = 1
p[6,1] ... p[7,1] = 0
p[1,n] = p[7,n-1] + p[6,n-1] + p[5,n-1] + p[4,n-1] + p[3,n-1]
p[2,n] = p[1,n-1] + p[7,n-1] + p[6,n-1] + p[5,n-1] + p[4,n-1]
p[3,n] = p[2,n-1] + p[1,n-1] + p[7,n-1] + p[6,n-1] + p[5,n-1]
p[4,n] = p[3,n-1] + p[2,n-1] + p[1,n-1] + p[7,n-1] + p[6,n-1]
p[5,n] = p[4,n-1] + p[3,n-1] + p[2,n-1] + p[1,n-1] + p[7,n-1]
p[6,n] = p[5,n-1] + p[4,n-1] + p[3,n-1] + p[2,n-1] + p[1,n-1]
p[7,n] = p[6,n-1] + p[5,n-1] + p[4,n-1] + p[3,n-1] + p[2,n-1]
-
9This is not a uniform distribution. It's very close to uniform, but not perfectly uniform. Jan 15, 2009 at 18:06
-
Ah! Dice and 7's. If you are going to say I'm wrong, you shouldn't leave the proof as an exercise for the reader.– BCSJan 25, 2009 at 0:22
-
46The proof that it's not uniform is simple: there are 5^7 possible ways the randomness can go, and as 5^7 is not a multiple of 7, it's not possible that all 7 sums are equally likely. (Basically, it boils down to 7 being relatively prime to 5, or equivalently 1/7 not being a terminating decimal in base 5.) In fact it's not even the "most uniform" possible under this constraint: direct computation shows that of the 5^7=78125 sums, the number of times you get values 1 to 7 is {1: 11145, 2: 11120, 3: 11120, 4: 11145, 5: 11190, 6: 11215, 7: 11190}. Apr 30, 2009 at 16:05
-
@ShreevatsaR So what if instead of taking the sum of rand5() seven times, we did it 5*7 takes - wouldn't that work? 35^7 % 7 = 35^5 % 7 = 0.– kbaJan 1, 2012 at 18:08
-
5@KristianAntonsen: How many ever times you do rand5(), you won't get a uniform distribution. If you do it N times, there are 5^N possible outputs, which is not divisible by 7. (If you do it 35 times, there are 5^35, not 35^7.) You'll get closer and closer to uniform the larger number of calls you use (and it can be any number, doesn't have to be divisible by 7), but IMHO instead of using a very large number of calls to rand(), you may as well use the probabilistic algorithm in the top answers, which gives an exact uniform distribution and whose expected number of calls to rand() is small. Jan 2, 2012 at 2:10
int ans = 0;
while (ans == 0)
{
for (int i=0; i<3; i++)
{
while ((r = rand5()) == 3){};
ans += (r < 3) >> i
}
}
-
2A correct solution, making an average of 30/7 = 4.29 calls to rand5() per call to rand7(). May 8, 2009 at 4:12
-
4Needs to be left shift for the algorithm to work :
ans += (r < 3) << i
– woolfieJul 14, 2016 at 17:25
The following produces a uniform distribution on {1, 2, 3, 4, 5, 6, 7} using a random number generator producing a uniform distribution on {1, 2, 3, 4, 5}. The code is messy, but the logic is clear.
public static int random_7(Random rg) {
int returnValue = 0;
while (returnValue == 0) {
for (int i = 1; i <= 3; i++) {
returnValue = (returnValue << 1) + SimulateFairCoin(rg);
}
}
return returnValue;
}
private static int SimulateFairCoin(Random rg) {
while (true) {
int flipOne = random_5_mod_2(rg);
int flipTwo = random_5_mod_2(rg);
if (flipOne == 0 && flipTwo == 1) {
return 0;
}
else if (flipOne == 1 && flipTwo == 0) {
return 1;
}
}
}
private static int random_5_mod_2(Random rg) {
return random_5(rg) % 2;
}
private static int random_5(Random rg) {
return rg.Next(5) + 1;
}
-
2A correct solution (which puts you way ahead of the curve), although not very efficient. This makes an average of 25/6 = 4.17 calls to random_5_mod_2 per fair coin flip, for a total average of 100/7 = 14.3 calls to random_5() per call to random_7(). May 8, 2009 at 3:28
-
The advantage of this solution over the others is that it can be easily expanded to produce any other uniformly distributed range. Just randomly select each one of the bits, re-rolling on invalid values (like the 0 value in our current solution that produces 8 numbers). Jan 16, 2011 at 3:51
-
1
-
1
int rand7() {
int value = rand5()
+ rand5() * 2
+ rand5() * 3
+ rand5() * 4
+ rand5() * 5
+ rand5() * 6;
return value%7;
}
Unlike the chosen solution, the algorithm will run in constant time. It does however make 2 more calls to rand5 than the average run time of the chosen solution.
Note that this generator is not perfect (the number 0 has 0.0064% more chance than any other number), but for most practical purposes the guarantee of constant time probably outweighs this inaccuracy.
Explanation
This solution is derived from the fact that the number 15,624 is divisible by 7 and thus if we can randomly and uniformly generate numbers from 0 to 15,624 and then take mod 7 we can get a near-uniform rand7 generator. Numbers from 0 to 15,624 can be uniformly generated by rolling rand5 6 times and using them to form the digits of a base 5 number as follows:
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
Properties of mod 7 however allow us to simplify the equation a bit:
5^5 = 3 mod 7
5^4 = 2 mod 7
5^3 = 6 mod 7
5^2 = 4 mod 7
5^1 = 5 mod 7
So
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
becomes
rand5 * 3 + rand5 * 2 + rand5 * 6 + rand5 * 4 + rand5 * 5 + rand5
Theory
The number 15,624 was not chosen randomly, but can be discovered using fermat's little theorem, which states that if p is a prime number then
a^(p-1) = 1 mod p
So this gives us,
(5^6)-1 = 0 mod 7
(5^6)-1 is equal to
4 * 5^5 + 4 * 5^4 + 4 * 5^3 + 4 * 5^2 + 4 * 5 + 4
This is a number in base 5 form and thus we can see that this method can be used to go from any random number generator to any other random number generator. Though a small bias towards 0 is always introduced when using the exponent p-1.
To generalize this approach and to be more accurate we can have a function like this:
def getRandomconverted(frm, to):
s = 0
for i in range(to):
s += getRandomUniform(frm)*frm**i
mx = 0
for i in range(to):
mx = (to-1)*frm**i
mx = int(mx/to)*to # maximum value till which we can take mod
if s < mx:
return s%to
else:
return getRandomconverted(frm, to)
-
2This generator is accurate, but not perfectly uniform. To see this, consider the fact that a uniform generator in [0,15624] has 15625 possible outcomes, which isn't divisible by 7. This introduces a bias to the number 0 (which has 2233/15625 chance, and the others just 2232/15625). After all, while using Fermat's little theorem might seem correct at first glance, it says that (5^6)%7=1, and not (5^6)%7=0. The latter is obviously impossible for any exponent because 5 and 7 are both prime numbers. I think it's still an acceptable solution, and I've edited your post to reflect this.– aviatorJun 4, 2017 at 11:20
Are homework problems allowed here?
This function does crude "base 5" math to generate a number between 0 and 6.
function rnd7() {
do {
r1 = rnd5() - 1;
do {
r2=rnd5() - 1;
} while (r2 > 1);
result = r2 * 5 + r1;
} while (result > 6);
return result + 1;
}
-
3A correct solution (which puts you way ahead of the curve), although not very efficient. This makes an average of 5 calls to rnd5() for each call to rnd7(). May 8, 2009 at 3:24
-
-
1@Barry - First, you can't just add two random numbers together, you don't get a linear solution (consider a pair of dice). Now consider "Base 5": 00, 01, 02, 03, 04, 10, 11. That 0-6 in base 5. So, we simply need to generate 2 digits of the base 5 number, and add them up until we get one that's within the range. That's what the r2*5+r1 does. The r2 > 1 loop is there because we would never want a high digit of > 1. Dec 14, 2011 at 4:12
-
This solution does not generate a uniform distribution. The numbers 1 and 7 can only be generated in one way, but 2 through 6 can each be generated in two ways: with r1 equal to the number minus 1 and r2 equal 0 or with r1 equal to the number minus 2 and r2 equal to 1. Thus 2 through 6 will be returned on average twice as often as 1 or 7.– Ted HoppJun 15, 2015 at 12:53
If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream, I
, of uniformly distributed integers of length m
from 1-5 outputs a stream O
, of uniformly distributed integers from 1-7 of the longest length relative to m
, say L(m)
.
The simplest way to analyse this is to treat the streams I and O
as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the stream a1, a2, a3,... -> a1+5*a2+5^2*a3+..
and similarly for stream O
.
Then if we take a section of the input stream of length m choose n s.t. 5^m-7^n=c
where c>0
and is as small as possible. Then there is a uniform map from the input stream of length m to integers from 1
to 5^m
and another uniform map from integers from 1 to 7^n
to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds 7^n
.
So this gives a value for L(m)
of around m (log5/log7)
which is approximately .82m
.
The difficulty with the above analysis is the equation 5^m-7^n=c
which is not easy to solve exactly and the case where the uniform value from 1
to 5^m
exceeds 7^n
and we lose efficiency.
The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?
If 5^m-7^n=c
then from the input stream we effectively generate a uniform random number from 0
to (5^m)-1
and don't use any values higher than 7^n
. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to 5^m-7^n
. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.
If we let T7(X)
to be the average length of the output sequence of random(1-7)
integers derived from a uniform input of size X
, and assuming that 5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7
.
Then T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0)
since we have a length no sequence with probability 7^n0/5^m with a residual of length 5^m-7^n0
with probability (5^m-7^n0)/5^m)
.
If we just keep substituting we obtain:
T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m
Hence
L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)
Another way of putting this is:
If 5^m has 7-ary representation `a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r
Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)
The best possible case is my original one above where 5^m=7^n+s
, where s<7
.
Then T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1)
as before.
The worst case is when we can only find k and s.t 5^m = kx7+s.
Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1)
Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:
T7(5^m) = m (Log5/Log7)+e(m)
It seems impossible to achieve e(m) = o(1)
in general but hopefully we can prove e(m)=o(m)
.
The whole thing then rests on the distribution of the 7-ary digits of 5^m
for various values of m
.
I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.
-
+2 (if I could)--this was the only good answer (as opposed to merely adequate). You've got the second best answer that will fit in 32 bit integers.– Rex KerrMar 10, 2010 at 19:39
Here is a working Python implementation of Adam's answer.
import random
def rand5():
return random.randint(1, 5)
def rand7():
while True:
r = 5 * (rand5() - 1) + rand5()
#r is now uniformly random between 1 and 25
if (r <= 21):
break
#result is now uniformly random between 1 and 7
return r % 7 + 1
I like to throw algorithms I'm looking at into Python so I can play around with them, thought I'd post it here in the hopes that it is useful to someone out there, not that it took long to throw together.
-
No, that is quite dissimilar from my answer. You're looping 21 times and discarding the first 20 iterations' results. You're also using a rand4() and a rand5() as input, which quite obviously breaks the rules of using only rand5(). Finally, you produce a non-uniform distribution. May 5, 2009 at 13:28
-
Sorry about that. I was pretty tired when I looked this question over, tired enough that I completely misread your algorithm. I actually threw it into Python because I couldn't understand why you were looping 21 times. Makes a lot more sense now. I did the random.randint(1, 4) thing as a shorthand but I guess you are correct, it is against the spirit of the question. I've corrected the code. May 6, 2009 at 0:12
-
@robermorales - As Adam Rosenfeld explained in his answer, every solution that gives a true uniform distribution on [1, 7] will involve some sort of accept-reject loop that is potentially infinite. (However, if
rand5()
is a decent PRNG, then the loop will not be infinite because eventually5*(rand5() - 1) + rand5()
will definitely be <= 21.)– Ted HoppJan 31, 2019 at 3:02
Why not do it simple?
int random7() {
return random5() + (random5() % 3);
}
The chances of getting 1 and 7 in this solution is lower due to the modulo, however, if you just want a quick and readable solution, this is the way to go.
-
14This does not produce a uniform distribution. This produces the numbers 0-6 with probabilities 2/25, 4/25, 5/25, 5/25, 5/25, 3/25, 1/25, as can be verified by counting all 25 possible outcomes. Dec 5, 2009 at 3:40
Assuming that rand(n) here means "random integer in a uniform distribution from 0 to n-1", here's a code sample using Python's randint, which has that effect. It uses only randint(5), and constants, to produce the effect of randint(7). A little silly, actually
from random import randint
sum = 7
while sum >= 7:
first = randint(0,5)
toadd = 9999
while toadd>1:
toadd = randint(0,5)
if toadd:
sum = first+5
else:
sum = first
assert 7>sum>=0
print sum
-
1@robermorales Because Python doesn't have
do ... while
. It could have been1337
, or12345
, or any number > 1.– tckmnJul 6, 2014 at 19:43
The premise behind Adam Rosenfield's correct answer is:
- x = 5^n (in his case: n=2)
- manipulate n rand5 calls to get a number y within range [1, x]
- z = ((int)(x / 7)) * 7
- if y > z, try again. else return y % 7 + 1
When n equals 2, you have 4 throw-away possibilities: y = {22, 23, 24, 25}. If you use n equals 6, you only have 1 throw-away: y = {15625}.
5^6 = 15625
7 * 2232 = 15624
You call rand5 more times. However, you have a much lower chance of getting a throw-away value (or an infinite loop). If there is a way to get no possible throw-away value for y, I haven't found it yet.
-
1There is provably no case without throwaway values--if there was no throwaway, 5^n and 7^m would have a factor in common. But they're (powers of) primes, so they don't.– Rex KerrMar 10, 2010 at 19:28
Here's my answer:
static struct rand_buffer {
unsigned v, count;
} buf2, buf3;
void push (struct rand_buffer *buf, unsigned n, unsigned v)
{
buf->v = buf->v * n + v;
++buf->count;
}
#define PUSH(n, v) push (&buf##n, n, v)
int rand16 (void)
{
int v = buf2.v & 0xf;
buf2.v >>= 4;
buf2.count -= 4;
return v;
}
int rand9 (void)
{
int v = buf3.v % 9;
buf3.v /= 9;
buf3.count -= 2;
return v;
}
int rand7 (void)
{
if (buf3.count >= 2) {
int v = rand9 ();
if (v < 7)
return v % 7 + 1;
PUSH (2, v - 7);
}
for (;;) {
if (buf2.count >= 4) {
int v = rand16 ();
if (v < 14) {
PUSH (2, v / 7);
return v % 7 + 1;
}
PUSH (2, v - 14);
}
// Get a number between 0 & 25
int v = 5 * (rand5 () - 1) + rand5 () - 1;
if (v < 21) {
PUSH (3, v / 7);
return v % 7 + 1;
}
v -= 21;
PUSH (2, v & 1);
PUSH (2, v >> 1);
}
}
It's a little more complicated than others, but I believe it minimises the calls to rand5. As with other solutions, there's a small probability that it could loop for a long time.
-
This produces a distribution not much different from the other solutions but has the added disadvantage of being needlessly complex. It also suffers from the provably incorrect non-deterministic loop-forever possibility if the numbers are truly random. I still think the ones that produce a slightly less uniform distribution (though still far more than adequate) but guarantee deterministic behavior are better. Sep 9, 2009 at 5:37
-
@Pax: Please enlighten me as to how this produces a non-uniform distribution. My analysis of the code, as well as my own testing, indicates that this produces a uniform distribution. As we've previously discussed, it's impossible to both produce a perfectly uniform distribution and have a guaranteed constant time upper bound of the running time. Sep 18, 2009 at 15:53
Simple and efficient:
int rand7 ( void )
{
return 4; // this number has been calculated using
// rand5() and is in the range 1..7
}
(Inspired by What's your favorite "programmer" cartoon?).
I don't like ranges starting from 1, so I'll start from 0 :-)
unsigned rand5()
{
return rand() % 5;
}
unsigned rand7()
{
int r;
do
{
r = rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
r = r * 5 + rand5();
} while (r > 15623);
return r / 2232;
}
-
This is a winner. This produces all 7 outcomes with equal probability.
from collections import defaultdict def r7(n): if not n: yield [] else: for i in range(1, 6): for j in r7(n-1): yield [i] + j def test_r7(): d = defaultdict(int) for x in r7(6): s = (((((((((x[5] * 5) + x[4]) * 5) + x[3]) * 5) + x[2]) * 5) + x[1]) * 5) + x[0] if s <= 15623: d[s % 7] += 1 print d
Dec 1, 2010 at 18:34
As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:
$num = 0;
$possibilities = 1;
sub rand7
{
while( $possibilities < 7 )
{
$num = $num * 5 + int(rand(5));
$possibilities *= 5;
}
my $result = $num % 7;
$num = int( $num / 7 );
$possibilities /= 7;
return $result;
}
-
your distribution is not uniform, at least on the first call. Indeed,
$possibilities
has always to grow to 25 to exit the loop and return. So, your first result is[0-124] % 7
, which is not uniformly distributed because125 % 7 != 0
(this is 6, actually). Jan 31, 2013 at 16:28
I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.
Perhaps Adam Rosenfield would be kind enough to comment?
My (naive?) idea is this:
Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.
To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:
(5a + rand5) % 7 = (k*7 + (5a%7) + rand5) % 7 = ( (5a%7) + rand5) % 7
The rand7() function follows:
(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)
int rand7(){
static int a=0;
static int e=0;
int r;
a = a * 5 + rand5();
e = e + 5; // added 5/7ths of a rand7 number
if ( e<7 ){
a = a * 5 + rand5();
e = e + 5; // another 5/7ths
}
r = a % 7;
e = e - 7; // removed a rand7 number
a = a % 7;
return r;
}
Edit: Added results for 100 million trials.
'Real' rand functions mod 5 or 7
rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046
My rand7
Average looks ok and number distributions look ok too.
randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943
-
You should probably look at sequential correlation. I think if you take successive pairs (each "random" number paired with its predecessor) then you might find surprising things. You haven't explained WHY it should keep the distribution uniform, at any rate. A working program normally should start with an explanation of why it works.– IanAug 14, 2011 at 8:06
-
-
Would sequential correlation apply to many of these solutions? It has been a while since I attempted this and I thought I explained it. Looking at it now, it looks like I am accumulating random bits in a pool from rand5, ensuring enough have been accumulated before withdrawing enough to make a rand7 number and ensuring I don't overflow my accumulator. Aug 29, 2011 at 7:44
There are elegant algorithms cited above, but here's one way to approach it, although it might be roundabout. I am assuming values generated from 0.
R2 = random number generator giving values less than 2 (sample space = {0, 1})
R8 = random number generator giving values less than 8 (sample space = {0, 1, 2, 3, 4, 5, 6, 7})
In order to generate R8 from R2, you will run R2 thrice, and use the combined result of all 3 runs as a binary number with 3 digits. Here are the range of values when R2 is ran thrice:
0 0 0 --> 0
.
.
1 1 1 --> 7
Now to generate R7 from R8, we simply run R7 again if it returns 7:
int R7() {
do {
x = R8();
} while (x > 6)
return x;
}
The roundabout solution is to generate R2 from R5 (just like we generated R7 from R8), then R8 from R2 and then R7 from R8.
-
like a number of others, this approach could take an arbitrarily long time per R7 call, since you could get a long string of sevens from R8. Apr 18, 2012 at 13:57
There you go, uniform distribution and zero rand5 calls.
def rand7:
seed += 1
if seed >= 7:
seed = 0
yield seed
Need to set seed beforehand.
Here's a solution that fits entirely within integers and is within about 4% of optimal (i.e. uses 1.26 random numbers in {0..4} for every one in {0..6}). The code's in Scala, but the math should be reasonably clear in any language: you take advantage of the fact that 7^9 + 7^8 is very close to 5^11. So you pick an 11 digit number in base 5, and then interpret it as a 9 digit number in base 7 if it's in range (giving 9 base 7 numbers), or as an 8 digit number if it's over the 9 digit number, etc.:
abstract class RNG {
def apply(): Int
}
class Random5 extends RNG {
val rng = new scala.util.Random
var count = 0
def apply() = { count += 1 ; rng.nextInt(5) }
}
class FiveSevener(five: RNG) {
val sevens = new Array[Int](9)
var nsevens = 0
val to9 = 40353607;
val to8 = 5764801;
val to7 = 823543;
def loadSevens(value: Int, count: Int) {
nsevens = 0;
var remaining = value;
while (nsevens < count) {
sevens(nsevens) = remaining % 7
remaining /= 7
nsevens += 1
}
}
def loadSevens {
var fivepow11 = 0;
var i=0
while (i<11) { i+=1 ; fivepow11 = five() + fivepow11*5 }
if (fivepow11 < to9) { loadSevens(fivepow11 , 9) ; return }
fivepow11 -= to9
if (fivepow11 < to8) { loadSevens(fivepow11 , 8) ; return }
fivepow11 -= to8
if (fivepow11 < 3*to7) loadSevens(fivepow11 % to7 , 7)
else loadSevens
}
def apply() = {
if (nsevens==0) loadSevens
nsevens -= 1
sevens(nsevens)
}
}
If you paste a test into the interpreter (REPL actually), you get:
scala> val five = new Random5
five: Random5 = Random5@e9c592
scala> val seven = new FiveSevener(five)
seven: FiveSevener = FiveSevener@143c423
scala> val counts = new Array[Int](7)
counts: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0)
scala> var i=0 ; while (i < 100000000) { counts( seven() ) += 1 ; i += 1 }
i: Int = 100000000
scala> counts
res0: Array[Int] = Array(14280662, 14293012, 14281286, 14284836, 14287188,
14289332, 14283684)
scala> five.count
res1: Int = 125902876
The distribution is nice and flat (within about 10k of 1/7 of 10^8 in each bin, as expected from an approximately-Gaussian distribution).
By using a rolling total, you can both
- maintain an equal distribution; and
- not have to sacrifice any element in the random sequence.
Both these problems are an issue with the simplistic rand(5)+rand(5)...
-type solutions. The following Python code shows how to implement it (most of this is proving the distribution).
import random
x = []
for i in range (0,7):
x.append (0)
t = 0
tt = 0
for i in range (0,700000):
########################################
##### qq.py #####
r = int (random.random () * 5)
t = (t + r) % 7
########################################
##### qq_notsogood.py #####
#r = 20
#while r > 6:
#r = int (random.random () * 5)
#r = r + int (random.random () * 5)
#t = r
########################################
x[t] = x[t] + 1
tt = tt + 1
high = x[0]
low = x[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, x[i], 100.0 * x[i] / tt)
if x[i] < low:
low = x[i]
if x[i] > high:
high = x[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / tt)
And this output shows the results:
pax$ python qq.py
0: 99908 14.27257
1: 100029 14.28986
2: 100327 14.33243
3: 100395 14.34214
4: 99104 14.15771
5: 99829 14.26129
6: 100408 14.34400
Variation = 1304 (0.18629%)
pax$ python qq.py
0: 99547 14.22100
1: 100229 14.31843
2: 100078 14.29686
3: 99451 14.20729
4: 100284 14.32629
5: 100038 14.29114
6: 100373 14.33900
Variation = 922 (0.13171%)
pax$ python qq.py
0: 100481 14.35443
1: 99188 14.16971
2: 100284 14.32629
3: 100222 14.31743
4: 99960 14.28000
5: 99426 14.20371
6: 100439 14.34843
Variation = 1293 (0.18471%)
A simplistic rand(5)+rand(5)
, ignoring those cases where this returns more than 6 has a typical variation of 18%, 100 times that of the method shown above:
pax$ python qq_notsogood.py
0: 31756 4.53657
1: 63304 9.04343
2: 95507 13.64386
3: 127825 18.26071
4: 158851 22.69300
5: 127567 18.22386
6: 95190 13.59857
Variation = 127095 (18.15643%)
pax$ python qq_notsogood.py
0: 31792 4.54171
1: 63637 9.09100
2: 95641 13.66300
3: 127627 18.23243
4: 158751 22.67871
5: 126782 18.11171
6: 95770 13.68143
Variation = 126959 (18.13700%)
pax$ python qq_notsogood.py
0: 31955 4.56500
1: 63485 9.06929
2: 94849 13.54986
3: 127737 18.24814
4: 159687 22.81243
5: 127391 18.19871
6: 94896 13.55657
Variation = 127732 (18.24743%)
And, on the advice of Nixuz, I've cleaned the script up so you can just extract and use the rand7...
stuff:
import random
# rand5() returns 0 through 4 inclusive.
def rand5():
return int (random.random () * 5)
# rand7() generator returns 0 through 6 inclusive (using rand5()).
def rand7():
rand7ret = 0
while True:
rand7ret = (rand7ret + rand5()) % 7
yield rand7ret
# Number of test runs.
count = 700000
# Work out distribution.
distrib = [0,0,0,0,0,0,0]
rgen =rand7()
for i in range (0,count):
r = rgen.next()
distrib[r] = distrib[r] + 1
# Print distributions and calculate variation.
high = distrib[0]
low = distrib[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, distrib[i], 100.0 * distrib[i] / count)
if distrib[i] < low:
low = distrib[i]
if distrib[i] > high:
high = distrib[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / count)
-
2Err, let me rephrase that. Given that a particular x was produced at some point in the sequence, only 5 of 7 numbers can be produced for the next number in the sequence. A true RNG would have all samples be independent of one another, but in this case they are clearly not. May 8, 2009 at 3:20
-
3It's true that the original question doesn't specify if the input and output functions produce independent and identically-distributed (iid) samples, but I think it's a reasonable expectation that if the input rand5() is iid, then the output rand7() should also be iid. If you don't think that's reasonable, have fun using your non-iid RNG. May 8, 2009 at 19:54
-
1So, what's the word from the mathematicians at the university? May 12, 2009 at 2:52
-
1This solution is clearly broken. It's obvious that you need to be calling rand5 (on average) more than once per call to rand7 and this solution doesn't. Therefore the results cannot be random by any sane definition of random. Sep 9, 2009 at 4:11
-
1@Pax At every iteration of your function, it can only return one of five different values (albeit in the range 0-6). The very first iteration can only return a number in the range 0-4. So, it should be clear that whilst your function may have uniform distribution, the samples are not independent i.e. they're correlated which isn't something you want in a random number generator. Sep 9, 2009 at 5:57
This answer is more an experiment in obtaining the most entropy possible from the Rand5 function. t is therefore somewhat unclear and almost certainly a lot slower than other implementations.
Assuming the uniform distribution from 0-4 and resulting uniform distribution from 0-6:
public class SevenFromFive
{
public SevenFromFive()
{
// this outputs a uniform ditribution but for some reason including it
// screws up the output distribution
// open question Why?
this.fifth = new ProbabilityCondensor(5, b => {});
this.eigth = new ProbabilityCondensor(8, AddEntropy);
}
private static Random r = new Random();
private static uint Rand5()
{
return (uint)r.Next(0,5);
}
private class ProbabilityCondensor
{
private readonly int samples;
private int counter;
private int store;
private readonly Action<bool> output;
public ProbabilityCondensor(int chanceOfTrueReciprocal,
Action<bool> output)
{
this.output = output;
this.samples = chanceOfTrueReciprocal - 1;
}
public void Add(bool bit)
{
this.counter++;
if (bit)
this.store++;
if (counter == samples)
{
bool? e;
if (store == 0)
e = false;
else if (store == 1)
e = true;
else
e = null;// discard for now
counter = 0;
store = 0;
if (e.HasValue)
output(e.Value);
}
}
}
ulong buffer = 0;
const ulong Mask = 7UL;
int bitsAvail = 0;
private readonly ProbabilityCondensor fifth;
private readonly ProbabilityCondensor eigth;
private void AddEntropy(bool bit)
{
buffer <<= 1;
if (bit)
buffer |= 1;
bitsAvail++;
}
private void AddTwoBitsEntropy(uint u)
{
buffer <<= 2;
buffer |= (u & 3UL);
bitsAvail += 2;
}
public uint Rand7()
{
uint selection;
do
{
while (bitsAvail < 3)
{
var x = Rand5();
if (x < 4)
{
// put the two low order bits straight in
AddTwoBitsEntropy(x);
fifth.Add(false);
}
else
{
fifth.Add(true);
}
}
// read 3 bits
selection = (uint)((buffer & Mask));
bitsAvail -= 3;
buffer >>= 3;
if (selection == 7)
eigth.Add(true);
else
eigth.Add(false);
}
while (selection == 7);
return selection;
}
}
The number of bits added to the buffer per call to Rand5 is currently 4/5 * 2 so 1.6. If the 1/5 probability value is included that increases by 0.05 so 1.65 but see the comment in the code where I have had to disable this.
Bits consumed by call to Rand7 = 3 + 1/8 * (3 + 1/8 * (3 + 1/8 * (...
This is 3 + 3/8 + 3/64 + 3/512 ... so approx 3.42
By extracting information from the sevens I reclaim 1/8*1/7 bits per call so about 0.018
This gives a net consumption 3.4 bits per call which means the ratio is 2.125 calls to Rand5 for every Rand7. The optimum should be 2.1.
I would imagine this approach is significantly slower than many of the other ones here unless the cost of the call to Rand5 is extremely expensive (say calling out to some external source of entropy).
-
Your solution appears correct, aside from some simple errors: "if(count > 1)" should be "if(count <= 1)", and the "i++" that occurs shortly thereafter should be inside the curly braces that precede it. I'm not sure whether or not BitsSet() is correct, but that's somewhat irrelevant. May 13, 2009 at 18:51
-
Overall, though, your function is very difficult to understand. It does make a slightly better use of entropy than it otherwise could, at the cost of more complication. There's also no reason to initially fill the buffer with 35 random bits on the first call, when 3 would suffice. May 13, 2009 at 18:56
-
I corrected the <= thanks, the i++ really should be there though. It should happen on the zero and the 1 case (adding a 1 or a zero respectively to the buffer). This is absolutely not what I would suggest using, it's horribly complicated. I was just interested i how close I could get to the theoretical entropy limits inherent in the problem... Thanks for the feedback. Ironically the filling of the buffer on the first call was to make it simpler to write :) May 13, 2009 at 20:35
-
I reworked this to be easier to understand (at the cost of speed) but also made it correct. It is not optimum yet, for some reason the 1/5 bits cause issues even though they are uniform in count. May 14, 2009 at 10:18
just scale your output from your first function
0) you have a number in range 1-5
1) subtract 1 to make it in range 0-4
2) multiply by (7-1)/(5-1) to make it in range 0-6
3) add 1 to increment the range: Now your result is in between 1-7
-
5Sorry, this would only work if you are working with real numbers or doubles etc... Randomizing is a tricky subject!– cartonnMay 27, 2010 at 8:22
-
At step (1), you have 5 distinct values. Step (2) expands the range but does not increase the nu8mber of distinct values, so you still have only 5 values at the end. Dec 1, 2010 at 18:13
in php
function rand1to7() {
do {
$output_value = 0;
for ($i = 0; $i < 28; $i++) {
$output_value += rand1to5();
}
while ($output_value != 140);
$output_value -= 12;
return floor($output_value / 16);
}
loops to produce a random number between 16 and 127, divides by sixteen to create a float between 1 and 7.9375, then rounds down to get an int between 1 and 7. if I am not mistaken, there is a 16/112 chance of getting any one of the 7 outcomes.
-
although there is probably an easier answer similar to this using no conditional loop, and modulo instead of floor. i just can't crunch the numbers right now. Apr 1, 2011 at 15:09
extern int r5();
int r7() {
return ((r5() & 0x01) << 2 ) | ((r5() & 0x01) << 1 ) | (r5() & 0x01);
}
-
problem: this returns non-uniformly in range 0-7, not 0-6. Indeed, you can have
7 = 111b
withp(7) = 8 / 125
Jan 31, 2013 at 2:10
I think I have four answers, two giving exact solutions like that of @Adam Rosenfield but without the infinite loop problem, and other two with almost perfect solution but faster implementation than first one.
The best exact solution requires 7 calls to rand5
, but lets proceed in order to understand.
Method 1 - Exact
Strength of Adam's answer is that it gives a perfect uniform distribution, and there is very high probability (21/25) that only two calls to rand5() will be needed. However, worst case is infinite loop.
The first solution below also gives a perfect uniform distribution, but requires a total of 42 calls to rand5
. No infinite loops.
Here is an R implementation:
rand5 <- function() sample(1:5,1)
rand7 <- function() (sum(sapply(0:6, function(i) i + rand5() + rand5()*2 + rand5()*3 + rand5()*4 + rand5()*5 + rand5()*6)) %% 7) + 1
For people not familiar with R, here is a simplified version:
rand7 = function(){
r = 0
for(i in 0:6){
r = r + i + rand5() + rand5()*2 + rand5()*3 + rand5()*4 + rand5()*5 + rand5()*6
}
return r %% 7 + 1
}
The distribution of rand5
will be preserved. If we do the math, each of the 7 iterations of the loop has 5^6 possible combinations, thus total number of possible combinations are (7 * 5^6) %% 7 = 0
. Thus we can divide the random numbers generated in equal groups of 7. See method two for more discussion on this.
Here are all the possible combinations:
table(apply(expand.grid(c(outer(1:5,0:6,"+")),(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
15625 15625 15625 15625 15625 15625 15625
I think it's straight forward to show that Adam's method will run much much faster. The probability that there are 42 or more calls to rand5
in Adam's solution is very small ((4/25)^21 ~ 10^(-17)
).
Method 2 - Not Exact
Now the second method, which is almost uniform, but requires 6 calls to rand5
:
rand7 <- function() (sum(sapply(1:6,function(i) i*rand5())) %% 7) + 1
Here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:6){
r = r + i*rand5()
}
return r %% 7 + 1
}
This is essentially one iteration of method 1. If we generate all possible combinations, here is resulting counts:
table(apply(expand.grid(1:5,(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
2233 2232 2232 2232 2232 2232 2232
One number will appear once more in 5^6 = 15625
trials.
Now, in Method 1, by adding 1 to 6, we move the number 2233 to each of the successive point. Thus the total number of combinations will match up. This works because 5^6 %% 7 = 1, and then we do 7 appropriate variations, so (7 * 5^6 %% 7 = 0).
Method 3 - Exact
If the argument of method 1 and 2 is understood, method 3 follows, and requires only 7 calls to rand5
. At this point, I feel this is the minimum number of calls needed for an exact solution.
Here is an R implementation:
rand5 <- function() sample(1:5,1)
rand7 <- function() (sum(sapply(1:7, function(i) i * rand5())) %% 7) + 1
For people not familiar with R, here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:7){
r = r + i * rand5()
}
return r %% 7 + 1
}
The distribution of rand5
will be preserved. If we do the math, each of the 7 iterations of the loop has 5 possible outcomes, thus total number of possible combinations are (7 * 5) %% 7 = 0
. Thus we can divide the random numbers generated in equal groups of 7. See method one and two for more discussion on this.
Here are all the possible combinations:
table(apply(expand.grid(0:6,(1:5)),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
5 5 5 5 5 5 5
I think it's straight forward to show that Adam's method will still run faster. The probability that there are 7 or more calls to rand5
in Adam's solution is still small ((4/25)^3 ~ 0.004
).
Method 4 - Not Exact
This is a minor variation of the the second method. It is almost uniform, but requires 7 calls to rand5
, that is one additional to method 2:
rand7 <- function() (rand5() + sum(sapply(1:6,function(i) i*rand5())) %% 7) + 1
Here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:6){
r = r + i*rand5()
}
return (r+rand5()) %% 7 + 1
}
If we generate all possible combinations, here is resulting counts:
table(apply(expand.grid(1:5,(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6,1:5),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
11160 11161 11161 11161 11161 11161 11160
Two numbers will appear once less in 5^7 = 78125
trials. For most purposes, I can live with that.
-
1I'm not familiar with R, but unless I'm misunderstanding how these work, then method 1 is not exact. It has (5^6)^7 = 5^42 possible outcomes, not (5^6)*7; 5^42 is not divisible by 7. Likewise method 3 is not exact. It has 5^7 possible outcomes, not 5*7. (The last loop iteration in method 3 with
i=7
also has no effect, since adding7*rand5()
tor
does not change the value ofr
mod 7.) Jan 31, 2018 at 22:32
7 * rand5() / 5
?