Generate random number with given probability matlab

Question

I want to generate a random number with a given probability but I'm not sure how to:

I need a number between 1 and 3

num = ceil(rand*3);

but I need different values to have different probabilities of generating eg.

0.5 chance of 1
0.1 chance of 2
0.4 chance of 3

I'm sure this is straightforward but I can't think of how to do it.

Does this answer your question? Draw random numbers from pre-specified probability mass function in Matlab — SecretAgentMan, Oct 29, 2019 at 16:11

Eitan T · Accepted Answer · 2015-01-12 16:58:05Z

The simple solution is to generate a number with a uniform distribution (using rand), and manipulate it a bit:

r = rand;
prob = [0.5, 0.1, 0.4];
x = sum(r >= cumsum([0, prob]));

or in a one-liner:

x = sum(rand >= cumsum([0, 0.5, 0.1, 0.4]));

Explanation

Here r is a uniformly distributed random number between 0 and 1. To generate an integer number between 1 and 3, the trick is to divide the [0, 1] range into 3 segments, where the length of each segment is proportional to its corresponding probability. In your case, you would have:

Segment [0, 0.5), corresponding to number 1.
Segment [0.5, 0.6), corresponding to number 2.
Segment [0.6, 1], corresponding to number 3.

The probability of r falling within any of the segments is proportional to the probabilities you want for each number. sum(r >= cumsum([0, prob])) is just a fancy way of mapping an integer number to one of the segments.

Extension

If you're interested in creating a vector/matrix of random numbers, you can use a loop or arrayfun:

r = rand(3); % # Any size you want
x = arrayfun(@(z)sum(z >= cumsum([0, prob])), r);

Of course, there's also a vectorized solution, I'm just too lazy to write it.

Thanks for your help, is there something I can do when the probabilities are [0,0,1] in this case I need the answer to be 3, but keep getting 1 ? — Eamonn McEvoy, Dec 17, 2012 at 12:42
PS: You forgot to add the 0 on this line -> x = sum(rand >= cumsum([0.5, 0.1, 0.4)); — Eamonn McEvoy, Dec 17, 2012 at 13:05
Vectorized solution: sum(bsxfun(@ge, r, cumsum([0, prob]),2) where r is a column vector and prob a row vector. — Oleg, Oct 20, 2014 at 18:47

Alec Jacobson · Accepted Answer · 2019-02-07 23:12:55Z

9

The answers so far are correct, but slow for large inputs: O(m*n) where n is the number of values and m is the number of random samples. Here is a O(m*log(n)) version that takes advantage of monotonicity of the cumsum result and the binary search used in histc:

% assume n = numel(prob) is large and sum(prob) == 1
r = rand(m,1);
[~,x] = histc(r,cumsum([0,prob]));

edited Feb 7, 2019 at 23:12

answered Dec 4, 2013 at 12:37

Alec Jacobson

6,1545 gold badges51 silver badges91 bronze badges

Small note, depending on how histc is implemented this could be O(n+m*log(n)). Though I'd hope since the first output is not used, this isn't the case.
– Alec Jacobson
Dec 6, 2013 at 18:23
1

There's an even better O(n + m) solution using the Alias Method. I've implemented it in the sample_discrete.m function.
– Alec Jacobson
May 8, 2014 at 1:43
The link is broken, fyi.
– kauffmanes
Feb 7, 2019 at 2:21

Add a comment |

Serg · Accepted Answer · 2012-12-17 18:54:21Z

5

>> c = cumsum([0.5, 0.1, 0.4]);
>> r = rand(1e5, 1);
>> x = arrayfun(@(x) find(x <= c, 1, 'first'), r);
>> h = hist(x, 1:3)

h =

       49953       10047       40000

x distributed as desired.

answered Dec 17, 2012 at 18:54

Serg

13.8k8 gold badges36 silver badges48 bronze badges

@EitanT, I don't think that sum() is faster than find(..., 'first'). Also, there is no need in adding zero. Please test it. In general case I would only add: assert(c(end) == 1);
– Serg
Dec 18, 2012 at 20:45
Now that I think about it, my comment is out of place. My apologies.
– Eitan T
Dec 18, 2012 at 21:10

Add a comment |

mamaj · Accepted Answer · 2016-10-30 14:50:48Z

5

using randsample function from Statistics and Machine Learning Toolbox, you can generate random numbers with specified probability mass function (pmf):

pmf = [0.5, 0.1, 0.4];
population = 1:3;
sample_size = 1;

random_number = randsample(population,sample_size,true,pmf);

I think this is the easiest method.

answered Oct 30, 2016 at 14:50

mamaj

9598 silver badges8 bronze badges

Add a comment |

user1860611 · Accepted Answer · 2012-12-17 16:24:13Z

4

A slightly more general solution would be:

r=rand;
prob=[.5,.1,.4];
prob=cumsum(prob);
value=[1,2,3];    %values corresponding to the probabilities
ind=find(r<=prob,1,'first');
x=value(ind)

answered Dec 17, 2012 at 16:24

user1860611

5291 gold badge4 silver badges16 bronze badges

Add a comment |

Cris Luengo · Accepted Answer · 2020-07-09 02:03:45Z

When the probabilities are nice numbers like this it is possible to do a very simple and performant selection. We repeat population elements such that a uniform selection yields the desired probability distribution. In this case we create a population of 10, with 5 times 1 (0.5 probability of being selected), etc.

p = [1,1,1,1,1,2,3,3,3,3];
x = p(randi(numel(p));

randi takes a second input argument that determines the size of the output (the default is 1), so it’s simple to generate many values from this distribution.

aimor · Accepted Answer · 2021-12-31 02:56:47Z

A vector solution using rand, cumsum, and min.

r = rand(10,1);
p = [0.5 0.1 0.4];
[~, ind] = min(r >= cumsum(p), [], 2)

Randomly sample r from 0..1 using rand. In this case I put my data into a column vector.
Put the probabilities for each output index into p.
r >= cumsum(p) compares every combination of r and the cumulative probabilities of p. In this case the result is a 2D matrix where each row begins with a series of 1s and ends with a series of 0s. The first 0 indicates the element of p that was randomly selected.
min is performed for all rows and returns the column index of the first 0. The third input to min defines the dimension over which to calculate the minimum.

If you want to extend this to n dimensions of r: change the shape of p so that it extends into one more dimension than what r has, and give that dimension as min's third input.

r = rand(3, 5, 7);
p = []; 
p(1,1,1,:) = [0.5 0.1 0.4];
[~, ind] = min(r >= cumsum(p), [], 4)

Collectives™ on Stack Overflow

Generate random number with given probability matlab

7 Answers 7

Explanation

Extension

Your Answer

Not the answer you're looking for? Browse other questions tagged
matlab
math
random
probability
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Explanation

Extension

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged matlabmathrandomprobability or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
matlab
math
random
probability
or ask your own question.