How can I make a random choice according to probabilities stored in a list (weighted random distribution)?

Question

Given a list of probabilities like:

P = [0.10, 0.25, 0.60, 0.05]

(I can ensure that the sum of all the variables in P is always 1)

How can I write a function that randomly returns a valid index, according to the values in the list? In other words, for this specific input, I want it to return 0 10% of the time, 1 25% of the time, 2 60% of the time and 3 the remainind 5% of the time.

Actually, starting from Python 3.6 there is random.choices (note the 's' at the end) which allows submitting relative weights. — Nick, Aug 3, 2020 at 3:44
@NickstandswithUkraine would you please add an answer about this? — Karl Knechtel, Jul 5, 2022 at 8:04
See also stackoverflow.com/questions/352670/…. I think this question is probably the better canonical. — Karl Knechtel, Jul 5, 2022 at 8:06
There is also stackoverflow.com/questions/2140787 to consider, as the specific case of repeated sampling without replacement is somewhat trickier. — Karl Knechtel, Jul 5, 2022 at 21:02
Maybe it's better to edit the random.choices information into the top answer, since the interface is substantially the same. — Karl Knechtel, Jul 5, 2022 at 21:08

a.t. · Accepted Answer · 2020-12-30 17:56:34Z

63

You can easily achieve this with numpy. It has a choice function which accepts the parameter of probabilities.

np.random.choice(
  ['pooh', 'rabbit', 'piglet', 'Christopher'], 
  5,
  p=[0.5, 0.1, 0.1, 0.3]
)

edited Dec 30, 2020 at 17:56

a.t.

2,2804 gold badges33 silver badges79 bronze badges

answered Nov 12, 2016 at 11:22

Salvador Dali

219k150 gold badges710 silver badges758 bronze badges

Concise, although I think numpy is overkill here, especially if the script had no dependencies beyond the standard library
– salezica
May 16, 2020 at 23:29

Add a comment |

Justin Peel · Accepted Answer · 2010-12-14 18:21:38Z

Basically, make a cumulative probability distribution (CDF) array. Basically, the value of the CDF for a given index is equal to the sum of all values in P equal to or less than that index. Then you generate a random number between 0 and 1 and do a binary search (or linear search if you want). Here's some simple code for it.

from bisect import bisect
from random import random

P = [0.10,0.25,0.60,0.05]

cdf = [P[0]]
for i in xrange(1, len(P)):
    cdf.append(cdf[-1] + P[i])

random_ind = bisect(cdf,random())

of course you can generate a bunch of random indices with something like

rs = [bisect(cdf, random()) for i in xrange(20)]

yielding

[2, 2, 3, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2]

(results will, and should vary). Of course, binary search is rather unnecessary for so few of possible indices, but definitely recommended for distributions with more possible indices.

salezica · Accepted Answer · 2010-12-14 08:39:38Z

12

Hmm interesting, how about...

Generate a number between 0 and 1.
Walk the list substracting the probability of each item from your number.
Pick the item that, after substraction, took your number down to 0 or below.

That's simple, O(n) and should work :)

answered Dec 14, 2010 at 8:39

salezica

75.4k26 gold badges108 silver badges167 bronze badges

It will help if the probabilities are pre-sorted in a descending way - the iteration is likely to terminate faster.
– Nick
Aug 4, 2020 at 0:02

Add a comment |

animus144 · Accepted Answer · 2014-05-16 20:28:45Z

This problem is equivalent to sampling from a categorical distribution. This distribution is commonly conflated with the multinomial distribution which models the result of multiple samples from a categorical distribution.

In numpy, it is easy to sample from the multinomial distribution using numpy.random.multinomial, but a specific categorical version of this does not exist. However, it can be accomplished by sampling from the multinomial distribution with a single trial and then returning the non-zero element in the output.

import numpy as np
pvals = [0.10,0.25,0.60,0.05]
ind = np.where(np.random.multinomial(1,pvals))[0][0]

I think using argmax() instead of where()[0][0] is simpler and does the same. — Hawk, Oct 16, 2018 at 19:09

sje397 · Accepted Answer · 2012-09-27 12:57:02Z

3

import random

probs = [0.1, 0.25, 0.6, 0.05]
r = random.random()
index = 0
while(r >= 0 and index < len(probs)):
  r -= probs[index]
  index += 1
print index - 1

edited Sep 27, 2012 at 12:57

answered Dec 14, 2010 at 8:39

sje397

41.5k8 gold badges87 silver badges105 bronze badges

Haha and here I thought ~2 seconds before you posted that I was being original
– salezica
Dec 14, 2010 at 8:41
This always takes O(n) time where n is the len(probs). Can we do better?
– Sush
Aug 28, 2019 at 20:31
@Sush Yes: we can sort the probs and do a binary search. This would reduce to O(log n).
– Nick
Aug 4, 2020 at 0:04

Add a comment |

Nick · Accepted Answer · 2022-07-07 22:22:32Z

2

Starting from Python 3.6 there is the choices method (note the 's' at the end) in random

Quoting from the documentation:

random.choices(population, weights=None, *, cum_weights=None, k=1) Return a k sized list of elements chosen from the population with replacement

So the solution would look like this:

>> choices(['option1', 'option2', 'option3', 'option4'], [0.10, 0.25, 0.60, 0.05])

answered Jul 7, 2022 at 22:22

Nick

3,0284 gold badges38 silver badges45 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How can I make a random choice according to probabilities stored in a list (weighted random distribution)?

6 Answers 6

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
random
probability
weighted
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonrandomprobabilityweighted or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
random
probability
weighted
or ask your own question.