38

Given a list of probabilities like:

P = [0.10, 0.25, 0.60, 0.05]

(I can ensure that the sum of all the variables in P is always 1)

How can I write a function that randomly returns a valid index, according to the values in the list? In other words, for this specific input, I want it to return 0 10% of the time, 1 25% of the time, 2 60% of the time and 3 the remainind 5% of the time.

5
  • Actually, starting from Python 3.6 there is random.choices (note the 's' at the end) which allows submitting relative weights.
    – Nick
    Aug 3, 2020 at 3:44
  • @NickstandswithUkraine would you please add an answer about this? Jul 5, 2022 at 8:04
  • See also stackoverflow.com/questions/352670/…. I think this question is probably the better canonical. Jul 5, 2022 at 8:06
  • There is also stackoverflow.com/questions/2140787 to consider, as the specific case of repeated sampling without replacement is somewhat trickier. Jul 5, 2022 at 21:02
  • Maybe it's better to edit the random.choices information into the top answer, since the interface is substantially the same. Jul 5, 2022 at 21:08

6 Answers 6

63

You can easily achieve this with numpy. It has a choice function which accepts the parameter of probabilities.

np.random.choice(
  ['pooh', 'rabbit', 'piglet', 'Christopher'], 
  5,
  p=[0.5, 0.1, 0.1, 0.3]
)
1
  • Concise, although I think numpy is overkill here, especially if the script had no dependencies beyond the standard library
    – salezica
    May 16, 2020 at 23:29
15

Basically, make a cumulative probability distribution (CDF) array. Basically, the value of the CDF for a given index is equal to the sum of all values in P equal to or less than that index. Then you generate a random number between 0 and 1 and do a binary search (or linear search if you want). Here's some simple code for it.

from bisect import bisect
from random import random

P = [0.10,0.25,0.60,0.05]

cdf = [P[0]]
for i in xrange(1, len(P)):
    cdf.append(cdf[-1] + P[i])

random_ind = bisect(cdf,random())

of course you can generate a bunch of random indices with something like

rs = [bisect(cdf, random()) for i in xrange(20)]

yielding

[2, 2, 3, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2]

(results will, and should vary). Of course, binary search is rather unnecessary for so few of possible indices, but definitely recommended for distributions with more possible indices.

0
12

Hmm interesting, how about...

  1. Generate a number between 0 and 1.

  2. Walk the list substracting the probability of each item from your number.

  3. Pick the item that, after substraction, took your number down to 0 or below.

That's simple, O(n) and should work :)

1
  • It will help if the probabilities are pre-sorted in a descending way - the iteration is likely to terminate faster.
    – Nick
    Aug 4, 2020 at 0:02
6

This problem is equivalent to sampling from a categorical distribution. This distribution is commonly conflated with the multinomial distribution which models the result of multiple samples from a categorical distribution.

In numpy, it is easy to sample from the multinomial distribution using numpy.random.multinomial, but a specific categorical version of this does not exist. However, it can be accomplished by sampling from the multinomial distribution with a single trial and then returning the non-zero element in the output.

import numpy as np
pvals = [0.10,0.25,0.60,0.05]
ind = np.where(np.random.multinomial(1,pvals))[0][0]
1
  • I think using argmax() instead of where()[0][0] is simpler and does the same.
    – Hawk
    Oct 16, 2018 at 19:09
3
import random

probs = [0.1, 0.25, 0.6, 0.05]
r = random.random()
index = 0
while(r >= 0 and index < len(probs)):
  r -= probs[index]
  index += 1
print index - 1
3
  • Haha and here I thought ~2 seconds before you posted that I was being original
    – salezica
    Dec 14, 2010 at 8:41
  • This always takes O(n) time where n is the len(probs). Can we do better?
    – Sush
    Aug 28, 2019 at 20:31
  • @Sush Yes: we can sort the probs and do a binary search. This would reduce to O(log n).
    – Nick
    Aug 4, 2020 at 0:04
2

Starting from Python 3.6 there is the choices method (note the 's' at the end) in random

Quoting from the documentation:

random.choices(population, weights=None, *, cum_weights=None, k=1) Return a k sized list of elements chosen from the population with replacement

So the solution would look like this:

>> choices(['option1', 'option2', 'option3', 'option4'], [0.10, 0.25, 0.60, 0.05])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.