How to pick an item by its probability?

Question

I have a list of items. Each of these items has its own probability.

Can anyone suggest an algorithm to pick an item based on its probability?

Brent Worden · Accepted Answer · 2013-05-22 13:18:39Z

94

Generate a uniformly distributed random number.
Iterate through your list until the cumulative probability of the visited elements is greater than the random number

Sample code:

double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
    cumulativeProbability += item.probability();
    if (p <= cumulativeProbability) {
        return item;
    }
}

edited May 22, 2013 at 13:18

answered Feb 17, 2012 at 15:12

Brent Worden

10.8k8 gold badges55 silver badges58 bronze badges

1

Here p is any random number so how can we say that most probability item get selected first .. eg:[{A,10},{B,20}] so how can you we say that suppose in first iteration p=2 so 2<=10 is true and first item {A,10} is getting selected first even though second item have more probabilty
– HybrisHelp
Sep 4, 2014 at 11:58
10

@U2Answer, This algorithm requires the probabilities to be normalized (divided by the sum of all probabilities). By doing that you ensure that Σp_i=1 and then a random number between 0 and 1 will do the trick.
– aioobe
Jan 11, 2016 at 10:21
1

The order of the items in the list doesn't matter, because the value of r is a uniformly distributed random number, which means that the probability r be a certain value is equal to all other values r may be. Thus the items in the list are not "favored" and it doesn't matter where they are in the list.
– matthaeus
Feb 5, 2016 at 14:51

Add a comment |

Ilia Choly · Accepted Answer · 2015-08-10 21:56:16Z

37

So with each item store a number that marks its relative probability, for example if you have 3 items one should be twice as likely to be selected as either of the other two then your list will have:

 [{A,1},{B,1},{C,2}]

Then sum the numbers of the list (i.e. 4 in our case). Now generate a random number and choose that index. int index = rand.nextInt(4); return the number such that the index is in the correct range.

Java code:

class Item {
    int relativeProb;
    String name;

    //Getters Setters and Constructor
}

...

class RandomSelector {
    List<Item> items = new List();
    Random rand = new Random();
    int totalSum = 0;

    RandomSelector() {
        for(Item item : items) {
            totalSum = totalSum + item.relativeProb;
        }
    }

    public Item getRandom() {

        int index = rand.nextInt(totalSum);
        int sum = 0;
        int i=0;
        while(sum < index ) {
             sum = sum + items.get(i++).relativeProb;
        }
        return items.get(Math.max(0,i-1));
    }
}

edited Aug 10, 2015 at 21:56

Ilia Choly

18.3k14 gold badges93 silver badges162 bronze badges

answered Feb 17, 2012 at 15:02

Usman Ismail

18.3k14 gold badges83 silver badges165 bronze badges

1

thanks Usman. but I wonder should I take i-th item or (i-1)th item ? I mean items.get(i-1) instead of items.get(i)
– Ruzanna
Feb 17, 2012 at 16:05
2

Here p is any random number so how can we say that most probability item get selected first .. eg:[{A,10},{B,20}] so how can you we say that suppose in first iteration p=2 so 2<=10 is true and first item {A,10} is getting selected first even though second item have more probabilty
– HybrisHelp
Sep 4, 2014 at 12:00
p is a random number from 0 to "Sum of all weights" so in your example there is a 10/30 = 1/3 chance of p being an number from 0-9 and there is a 20/30 = 2/3 change of p being a number from 10-29. Hence the chance of getting B is still twice as likely as getting A
– Usman Ismail
Sep 4, 2014 at 14:40
1

Hey great answer but you have to be carefull! The function nextInt(...) returns a value from 0 (inclusive) to totalSum (exclusive). (See docs.oracle.com/javase/7/docs/api/java/util/…). So if 0 is picked (index = 0), you will get an IndexOutOfBoundsException, because you try to access items.get(-1), since you skip the while loop entirely and i = 0. Therefore you should return items.get(Math.max(0, i-1)).
– TehQuila
Nov 28, 2014 at 14:48
1

Based on some experimentation this only worked for me if I replaced int index = rand.nextInt(totalSum) with int index = rand.nextInt(totalSum) + 1 and then of course took out the redundant Math.max. Without this +1 I was getting twice as many of the "base" (and first) element in the list than expected, i.e. the first element in my list has a relative likelihood 1 by default, and everything else is with respect to that.
– Jxek
Feb 24, 2015 at 15:26

Add a comment |

mj_ · Accepted Answer · 2012-02-17 15:02:29Z

33

pretend that we have the following list

Item A 25%
Item B 15%
Item C 35%
Item D 5%
Item E 20%

Lets pretend that all the probabilities are integers, and assign each item a "range" that calculated as follows.

Start - Sum of probability of all items before
End - Start + own probability

The new numbers are as follows

Item A 0 to 25
Item B 26 to 40
Item C 41 to 75
Item D 76 to 80
Item E 81 to 100

Now pick a random number from 0 to 100. Lets say that you pick 32. 32 falls in Item B's range.

mj

answered Feb 17, 2012 at 15:02

mj_

6,3577 gold badges45 silver badges81 bronze badges

This faster than @Brent's answer for selection, but it would take up way too much memory if say, the ranges were from 0 to a million.
– stan
Jul 16, 2014 at 17:39
Probabilities doesn't have to be percentages. We can just pick a random number between 0 to sum of the numbers.
– WVrock
Nov 15, 2014 at 17:01
should Item A be 1 to 25 instead of 0 to 25?
– Snackdex
Dec 11, 2020 at 21:19
I believe it should be 1 to 25 as well, 0 to 25 would mean that Item A has a 26% probability.
– user6307384
Dec 19, 2020 at 2:04

Add a comment |

Sufian Latif · Accepted Answer · 2022-12-09 00:15:48Z

16

You can try the Roulette Wheel Selection.

First, add all the probabilities, then scale all the probabilities in the scale of 1, by dividing each one by the sum. Suppose the scaled probabilities are A(0.4), B(0.3), C(0.25) and D(0.05). Then you can generate a random floating-point number in the range [0, 1). Now you can decide like this:

random number in [0.00, 0.40) -> pick A
              in [0.40, 0.70) -> pick B
              in [0.70, 0.95) -> pick C
              in [0.95, 1.00) -> pick D

You can also do it with random integers - say you generate a random integer between 0 to 99 (inclusive), then you can make decision like the above.

edited Dec 9, 2022 at 0:15

answered Feb 17, 2012 at 15:05

Sufian Latif

13.2k3 gold badges35 silver badges71 bronze badges

(+1) It bothers me that this algorithm almost always seems to be described in terms of GAs (your link on Wikipedia and see here also). The weighted roulette wheel algorithm has all kinds of uses that have nothing to do with GAs (such as this very question).
– Michael McGowan
Feb 17, 2012 at 15:16
Yeah, that's weird. I also learned it's name while studying GAs, but I used the technique much before that for some other reason.
– Sufian Latif
Feb 17, 2012 at 15:18
What will you pick if the random number is exact 0.40?
– Huy
Mar 4, 2022 at 16:12
@Huy To resolve cases like this, the interval should be closed to the left and open to the right, i.e. if 0.00 <= p < 0.40: pick A, else if 0.40 <= p < 0.70: pick B, ....
– Sufian Latif
Mar 6, 2022 at 1:49

Add a comment |

Md. Abu Nafee Ibna Zahid · Accepted Answer · 2017-11-11 11:41:44Z

13

Algorithm described in Ushman's, Brent's and @kaushaya's answers are implemented in Apache commons-math library.

Take a look at EnumeratedDistribution class (groovy code follows):

def probabilities = [
   new Pair<String, Double>("one", 25),
   new Pair<String, Double>("two", 30),
   new Pair<String, Double>("three", 45)]
def distribution = new EnumeratedDistribution<String>(probabilities)
println distribution.sample() // here you get one of your values

Note that sum of probabilities doesn't need to be equal to 1 or 100 - it will be normalized automatically.

edited Nov 11, 2017 at 11:41

Md. Abu Nafee Ibna Zahid

4521 gold badge6 silver badges19 bronze badges

answered Jul 10, 2014 at 12:35

Dmitry

2,9731 gold badge24 silver badges26 bronze badges

Who is @kaushaya? May be s/he has changed name. So it is better to hyperlink the respective answers.
– Md. Abu Nafee Ibna Zahid
Nov 11, 2017 at 7:40

Add a comment |

Community · Accepted Answer · 2017-05-23 12:26:23Z

5

My method is pretty simple. Generate a random number. Now since the probabilities of your items are known,simply iterate through the sorted list of probability and pick the item whose probability is lesser than the randomly generated number.

For more details,read my answer here.

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered Jun 3, 2014 at 12:28

HackCode

1,8376 gold badges36 silver badges67 bronze badges

Add a comment |

WVrock · Accepted Answer · 2017-06-07 20:58:20Z

A slow but simple way to do it is to have every member to pick a random number based on its probability and pick the one with highest value.

Analogy:

Imagine 1 of 3 people needs to be chosen but they have different probabilities. You give them die with different amount of faces. First person's dice has 4 face, 2nd person's 6, and the third person's 8. They roll their die and the one with the biggest number wins.

Lets say we have the following list:

[{A,50},{B,100},{C,200}]

Pseudocode:

 A.value = random(0 to 50);
 B.value = random(0 to 100);
 C.value = random (0 to 200);

We pick the one with the highest value.

This method above does not exactly map the probabilities. For example 100 will not have twice the chance of 50. But we can do it in a by tweaking the method a bit.

Method 2

Instead of picking a number from 0 to the weight we can limit them from the upper limit of previous variable to addition of the current variable.

[{A,50},{B,100},{C,200}]

Pseudocode:

 A.lowLimit= 0; A.topLimit=50;
 B.lowLimit= A.topLimit+1; B.topLimit= B.lowLimit+100
 C.lowLimit= B.topLimit+1; C.topLimit= C.lowLimit+200

resulting limits

A.limits = 0,50
B.limits = 51,151
C.limits = 152,352

Then we pick a random number from 0 to 352 and compare it to each variable's limits to see whether the random number is in its limits.

I believe this tweak has better performance since there is only 1 random generation.

There is a similar method in other answers but this method does not require the total to be 100 or 1.00.

Your method does not map to probabilities as easily as it first seems. Suppose we want two choices, A and B. A has 2/5 = 0 to 40, B has 3/5 = 0 to 60. If we simulate this with your method, 1/3 of the time B will select between 41 and 60, guaranteeing success. Then the other 2/3 of the time, B will be 0 to 40 and A will be 0 to 40, giving them even odds, so that 1/2 of 2/3 of the time, B will win. That's 1/3 + 1/3 = 2/3, which is different from the expected 3/5 that B wins. — Mark Peschel, May 7, 2017 at 1:32
@MauveRanger It was never intended to exactly replicate the possibilities but the math seems to be misleading. I will edit and add another method of doing it that will hopefully be correct with the math. — WVrock, Jun 7, 2017 at 20:32

SMUsamaShah · Accepted Answer · 2021-05-26 19:21:57Z

5

A space-costly way is to clone each item the number of times its probability. Selection will be done in O(1).

For example

//input
[{A,1},{B,1},{C,3}]

// transform into
[{A,1},{B,1},{C,1},{C,1},{C,1}]

Then simply pick any item randomly from this transformed list.

answered May 26, 2021 at 19:21

SMUsamaShah

7,77524 gold badges92 silver badges131 bronze badges

This should have a higher placement than it does, since it's important to be aware of the existince of an O(1) solution.
– Marcus
Mar 14, 2022 at 22:53

Add a comment |

Md. Abu Nafee Ibna Zahid · Accepted Answer · 2018-04-18 04:28:11Z

2

Brent's answer is good, but it doesn't account for the possibility of erroneously choosing an item with a probability of 0 in cases where p = 0. That's easy enough to handle by checking the probability (or perhaps not adding the item in the first place):

double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
    cumulativeProbability += item.probability();
    if (p <= cumulativeProbability && item.probability() != 0) {
        return item;
    }
}

edited Apr 18, 2018 at 4:28

Md. Abu Nafee Ibna Zahid

4521 gold badge6 silver badges19 bronze badges

answered Oct 14, 2013 at 2:18

Dallas Edwards

5527 silver badges11 bronze badges

3

I don't think you need to worry about the case where the item probability is zero. You should either have already exited the loop or you would continue over as the cumulative probability wouldn't have changed.
– rrs
Apr 29, 2014 at 14:23

Add a comment |

Florian Wolf · Accepted Answer · 2020-06-27 12:13:04Z

Adapted the code from https://stackoverflow.com/a/37228927/11257746 into a general extention method. This will allow you to get a weighted random value from a Dictionary with the structure <TKey, int>, where int is a weight value.

A Key that has a value of 50 is 10 times more likely to be chosen than a key with the value of 5.

C# code using LINQ:

/// <summary>
/// Get a random key out of a dictionary which has integer values treated as weights. 
/// A key in the dictionary with a weight of 50 is 10 times more likely to be chosen than an element with the weight of 5.
/// 
/// Example usage to get 1 item: 
/// Dictionary<MyType, int> myTypes;
/// MyType chosenType = myTypes.GetWeightedRandomKey<MyType, int>().First();
/// 
/// Adapted into a general extention method from https://stackoverflow.com/a/37228927/11257746
/// </summary>
public static IEnumerable<TKey> GetWeightedRandomKey<TKey, TValue>(this Dictionary<TKey, int> dictionaryWithWeights)
{
    int totalWeights = 0;
    foreach (KeyValuePair<TKey, int> pair in dictionaryWithWeights)
    { 
        totalWeights += pair.Value;
    }

    System.Random random = new System.Random();
    while (true)
    {
        int randomWeight = random.Next(0, totalWeights);
        foreach (KeyValuePair<TKey, int> pair in dictionaryWithWeights)
        {
            int weight = pair.Value;

            if (randomWeight - weight > 0)
                randomWeight -= weight;
            else
            {
                yield return pair.Key;
                break;
            }
        }
    }
}

Example usage:

public enum MyType { Thing1, Thing2, Thing3 }
public Dictionary<MyType, int> MyWeightedDictionary = new Dictionary<MyType, int>();

public void MyVoid()
{
    MyWeightedDictionary.Add(MyType.Thing1, 50);
    MyWeightedDictionary.Add(MyType.Thing2, 25);
    MyWeightedDictionary.Add(MyType.Thing3, 5);
    
    // Get a single random key
    MyType myChosenType = MyWeightedDictionary.GetWeightedRandomKey<MyType, int>().First();
    
    // Get 20 random keys
    List<MyType> myChosenTypes = MyWeightedDictionary.GetWeightedRandomKey<MyType, int>().Take(20).ToList();
}

Andrei Ciobanu · Accepted Answer · 2018-09-12 12:35:23Z

0

If you don't mind adding a third party dependency in your code you can use the MockNeat.probabilities() method.

For example:

String s = mockNeat.probabilites(String.class)
                .add(0.1, "A") // 10% chance to pick A
                .add(0.2, "B") // 20% chance to pick B
                .add(0.5, "C") // 50% chance to pick C
                .add(0.2, "D") // 20% chance to pick D
                .val();

Disclaimer: I am the author of the library, so I might be biased when I am recommending it.

edited Sep 12, 2018 at 12:35

answered Sep 12, 2018 at 12:26

Andrei Ciobanu

12.7k24 gold badges86 silver badges119 bronze badges

Add a comment |

Marc Dzaebel · Accepted Answer · 2020-04-09 22:01:49Z

0

All mentioned solutions have linear effort. The following has only logarithmic effort and deals also with unnormalized probabilities. I'd reccommend to use a TreeMap rather than a List:

    import java.util.*;
    import java.util.stream.IntStream;

    public class ProbabilityMap<T> extends TreeMap<Double,T>{
        private static final long serialVersionUID = 1L;
        public static Random random = new Random();
        public double sumOfProbabilities;
        public Map.Entry<Double,T> next() {
            return ceilingEntry(random.nextDouble()*sumOfProbabilities);
        }
        @Override public T put(Double key, T value) {
            return super.put(sumOfProbabilities+=key, value);
        }
        public static void main(String[] args) {
            ProbabilityMap<Integer> map = new ProbabilityMap<>();
            map.put(0.1,1);  map.put(0.3,3);    map.put(0.2,2);
            IntStream.range(0, 10).forEach(i->System.out.println(map.next()));
        }
    }

edited Apr 9, 2020 at 22:01

answered Apr 9, 2020 at 21:39

Marc Dzaebel

4254 silver badges13 bronze badges

That's a horrible case of inheritance for implementation, but a totally valid point. If you're going to do a lot of look ups and have a lot of entries then TreeMap looks like a fun approach. (You'd probably have to profile it to decide if it actually offered any runtime benefit - I'm not arguing the O log(n) vs O(n) win here but there's a mental overhead in figuring out how TreeMap works and it might not be worth it.) I like it and hate it simultaneously :)
– Phil Brock
May 12, 2020 at 18:30
Yes, TreeMap has drawbacks e.g. for iteration and construction. However, if you have a lot of entries, picking should be clearly faster than linear approaches.
– Marc Dzaebel
May 14, 2020 at 22:21
1

I roughly tested the performance of TreeMap picking. It's about 5 times faster as the accepted solution if I pick in the middle of the items. For 100 items TreeMap is 5 times faster, for 10000 items TreeMap is 17 times faster.
– Marc Dzaebel
May 16, 2020 at 16:41

Add a comment |

Jihui Han · Accepted Answer · 2015-08-12 00:05:02Z

-3

You could use the Julia code:

function selrnd(a::Vector{Int})
    c = a[:]
    sumc = c[1]
    for i=2:length(c)
        sumc += c[i]
        c[i] += c[i-1]
    end
    r = rand()*sumc
    for i=1:length(c)
        if r <= c[i]
            return i
        end
    end
end

This function returns the index of an item efficiently.

edited Aug 12, 2015 at 0:05

answered Aug 6, 2015 at 13:02

Jihui Han

11 bronze badge

Add a comment |

Collectives™ on Stack Overflow

How to pick an item by its probability?

13 Answers 13

Your Answer

Not the answer you're looking for? Browse other questions tagged
java
list
random
probability
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged javalistrandomprobability or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
java
list
random
probability
or ask your own question.