Questions tagged [probability]
Consider if your question would be better at stats.stackexchange.com. Probability touches upon uncertainty, random phenomena, random numbers, random variables, probability distributions, sampling, combinatorics.
probability
4,065
questions
25
votes
4
answers
13k
views
Is there a Python equivalent to R's sample() function?
I want to know if Python has an equivalent to the sample() function in R.
The sample() function takes a sample of the specified size from the elements of x using either with or without replacement.
...
24
votes
3
answers
28k
views
How to properly hash the custom struct?
In the C++ language there is the default hash-function template std::hash<T> for the most simple types, like std::string, int, etc. I suppose, that these functions have a good entropy and the ...
24
votes
7
answers
12k
views
Computing similarity between two lists
EDIT:
as everyone is getting confused, I want to simplify my question. I have two ordered lists. Now, I just want to compute how similar one list is to the other.
Eg,
1,7,4,5,8,9
1,7,5,4,9,6
What ...
23
votes
5
answers
47k
views
Algorithm to generate Poisson and binomial random numbers?
i've been looking around, but i'm not sure how to do it.
i've found this page which, in the last paragraph, says:
A simple generator for random numbers taken from a Poisson distribution is obtained ...
23
votes
2
answers
21k
views
Fitting distributions, goodness of fit, p-value. Is it possible to do this with Scipy (Python)?
INTRODUCTION: I'm a bioinformatician. In my analysis which I perform on all human genes (about 20 000) I search for a particular short sequence motif to check how many times this motif occurs in each ...
23
votes
4
answers
24k
views
Probability Random Number Generator
Let's say I'm writing a simple luck game - each player presses Enter and the game assigns him a random number between 1-6. Just like a cube. At the end of the game, the player with the highest number ...
23
votes
1
answer
3k
views
sampling multinomial from small log probability vectors in numpy/scipy
Is there a function in numpy/scipy that lets you sample multinomial from a vector of small log probabilities, without losing precision? example:
# sample element randomly from these log probabilities
...
22
votes
10
answers
4k
views
Representing continuous probability distributions
I have a problem involving a collection of continuous probability distribution functions, most of which are determined empirically (e.g. departure times, transit times). What I need is some way of ...
22
votes
2
answers
7k
views
PyMC3 Bayesian Linear Regression prediction with sklearn.datasets
I've been trying to implement Bayesian Linear Regression models using PyMC3 with REAL DATA (i.e. not from linear function + gaussian noise) from the datasets in sklearn.datasets. I chose the ...
22
votes
1
answer
15k
views
How to simulate bimodal distribution?
I have the following code to generate bimodal distribution but when I graph the histogram. I don't see the 2 modes. I am wondering if there's something wrong with my code.
mu1 <- log(1)
mu2 &...
21
votes
3
answers
30k
views
How to compute the probability of a value given a list of samples from a distribution in Python?
Not sure if this belongs in statistics, but I am trying to use Python to achieve this. I essentially just have a list of integers:
data = [300,244,543,1011,300,125,300 ... ]
And I would like to know ...
21
votes
12
answers
21k
views
Probability distribution in Python
I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to be chosen than a less ...
21
votes
5
answers
13k
views
permutation & combinations interview
This is a good one because it's so counter-intuitive:
Imagine an urn filled with balls, two-thirds of which are of one color and one-third of which are of another. One individual has drawn 5 balls ...
21
votes
9
answers
28k
views
Optimal Algorithm for Winning Hangman
In the game Hangman, is it the case that a greedy letter-frequency algorithm is equivalent to a best-chance-of-winning algorithm?
Is there ever a case where it's worth sacrificing preservation of ...
20
votes
7
answers
12k
views
Unbiased random number generator using a biased one
You have a biased random number generator that produces a 1 with a probability p and 0 with a probability (1-p). You do not know the value of p. Using this make an unbiased random number generator ...
20
votes
7
answers
16k
views
What is the probability of guessing (matching) a Guid?
Just curious but what is the probability of matching a Guid?
Say a Guid from SQL server: 5AC7E650-CFC3-4534-803C-E7E5BBE29B3D
is it a factorial?: (36*32)! = (1152)!
discuss
=D
19
votes
6
answers
12k
views
Generating N numbers that sum to 1
Given an array of size n I want to generate random probabilities for each index such that Sigma(a[0]..a[n-1])=1
One possible result might be:
0 1 2 3 4
0.15 0.2 0.18 0.22 0.25
...
19
votes
2
answers
25k
views
How does the predict_proba() function in LightGBM work internally?
This is in reference to understanding, internally, how the probabilities for a class are predicted using LightGBM.
Other packages, like sklearn, provide thorough detail for their classifiers. For ...
19
votes
4
answers
14k
views
Calculate the number of ways to roll a certain number
I'm a high school Computer Science student, and today I was given a problem to:
Program Description: There is a belief among dice players that in
throwing three dice a ten is easier to get than ...
18
votes
6
answers
19k
views
What is the probability of collision with a 6 digit random alphanumeric code?
I'm using the following perl code to generate random alphanumeric strings (uppercase letters and numbers, only) to use as unique identifiers for records in my MySQL database. The database is likely to ...
18
votes
10
answers
30k
views
How can I efficiently calculate the binomial cumulative distribution function?
Let's say that I know the probability of a "success" is P. I run the test N times, and I see S successes. The test is akin to tossing an unevenly weighted coin (perhaps heads is a success, tails is ...
18
votes
6
answers
8k
views
Nth Combination
Is there a direct way of getting the Nth combination of an ordered set of all combinations of nCr?
Example: I have four elements: [6, 4, 2, 1]. All the possible combinations by taking three at a time ...
18
votes
2
answers
2k
views
Effective Java Item 47: Know and use your libraries - Flawed random integer method example
In the example Josh gives of the flawed random method that generates a positive random number with a given upper bound n, I don't understand the two of the flaws he states.
The method from the book ...
18
votes
3
answers
4k
views
Probability of Outcomes Algorithm
I have a probability problem, which I need to simulate in a reasonable amount of time. In simplified form, I have 30 unfair coins each with a different known probability. I then want to ask things ...
18
votes
2
answers
10k
views
Sigmoid output - can it be interpreted as probability?
Sigmoid function outputs a number between 0 and 1. Is this a probability or is it merely a 'yes or no' depending on whether it's above or below 0.5?
Minimal example:
Cats vs dogs binary ...
18
votes
3
answers
776
views
How was there no collision among 50,000 random 7-digit hex strings? (The Birthday Problem)
I've encountered some code that generates a number of UUIDs via UUID.randomUUID(), takes the last 7 digits of each (recent versions of UUID are uniformly distributed in terms of entropy), and uses ...
18
votes
3
answers
3k
views
Combining individual probabilities in Naive Bayesian spam filtering
I'm currently trying to generate a spam filter by analyzing a corpus I've amassed.
I'm using the wikipedia entry http://en.wikipedia.org/wiki/Bayesian_spam_filtering to develop my classification ...
17
votes
8
answers
12k
views
Creating your own Tinyurl style uid
I'm writing a small article on humanly readable alternatives to Guids/UIDs, for example those used on TinyURL for the url hashes (which are often printed in magazines, so need to be short).
The ...
17
votes
5
answers
11k
views
Generate random numbers distributed by Zipf
The Zipf probability distribution is often used to model file size distribution or item access distributions on items in P2P systems. e.g. "Web Caching and Zip like Distribution Evidence and ...
17
votes
6
answers
22k
views
How to shorten UUID V4 without making it non-unique/guessable
I have to generate unique URL part which will be "unguessable" and "resistant" to brute force attack. It also has to be as short as possible :) and all generated values has to be of same length. I was ...
16
votes
4
answers
24k
views
Probability of 64bit Hash Code Collisions
The book Numerical Recipes offers a method to calculate 64bit hash codes in order to reduce the number of collisions.
The algorithm is shown at http://www.javamex.com/tutorials/collections/...
16
votes
3
answers
7k
views
Probability and Neural Networks
Is it a good practice to use sigmoid or tanh output layers in Neural networks directly to estimate probabilities?
i.e the probability of given input to occur is the output of sigmoid function in the ...
16
votes
3
answers
25k
views
Percentage Based Probability
I have this code snippet:
Random rand = new Random();
int chance = rand.Next(1, 101);
if (chance <= 25) // probability of 25%
{
Console.WriteLine("You win");
}
else
{
Console.WriteLine("...
16
votes
4
answers
2k
views
Seeking suggestions for data representation of a probability distribution
I'm looking for an elegant and efficient way to represent and store an arbitrary probability distribution constructed by explicit sampling.
The distribution is expected to have the following ...
15
votes
15
answers
13k
views
Is this a good or bad 'simulation' for Monty Hall? How come? [closed]
Through trying to explain the Monty Hall problem to a friend during class yesterday, we ended up coding it in Python to prove that if you always swap, you will win 2/3 times. We came up with this:
...
15
votes
2
answers
9k
views
Chance of a duplicate hash when using first 8 characters of SHA1
If I have an index of URLs, and ID them by the first 8 characters of a SHA1 hash, what is the probability of two different URLs having identical IDs?
15
votes
3
answers
8k
views
Fast weighted random selection from very large set of values
I'm currently working on a problem that requires the random selection of an element from a set. Each of the elements has a weight(selection probability) associated with it.
My problem is that for ...
15
votes
5
answers
4k
views
Estimating/forecasting download completion time
We've all poked fun at the 'X minutes remaining' dialog which seems to be too simplistic, but how can we improve it?
Effectively, the input is the set of download speeds up to the current time, and ...
15
votes
4
answers
12k
views
Divide each each cell of large matrix by sum of its row
I have a site by species matrix. The dimensions are 375 x 360. Each value represents the frequency of a species in samples of that site.
I am trying to convert this matrix from frequencies to ...
14
votes
3
answers
11k
views
how to show that NDCG score is significant
Suppose the NDCG score for my retrieval system is .8. How do I interpret this score. How do i tell the reader that this score is significant?
14
votes
2
answers
19k
views
Implementation of sequential monte carlo method (particle filters)
I'm interested in the simple algorithm for particles filter given here: http://www.aiqus.com/upfiles/PFAlgo.png It seems very simple but I have no idea on how to do it practically.
Any idea on how to ...
14
votes
5
answers
3k
views
Choose random array element satisfying certain property
Suppose I have a list, called elements, each of which does or does not satisfy some boolean property p. I want to choose one of the elements that satisfies p by random with uniform distribution. I ...
14
votes
3
answers
15k
views
How to generate random numbers with predefined probability distribution?
I would like to implement a function in python (using numpy) that takes a mathematical function (for ex. p(x) = e^(-x) like below) as input and generates random numbers, that are distributed according ...
14
votes
6
answers
3k
views
Implementation of a simple algorithm (to calculate probability)
I've been asked (as part of homework) to design a Java program that does the following:
Basically there are 3 cards:
Black coloured on both sides
Red coloured on both sides
Black on one side, red on ...
14
votes
2
answers
3k
views
Plot probability heatmap/hexbin with different sized bins
This is related to another question: Plot weighted frequency matrix.
I have this graphic (produced by the code below in R):
#Set the number of bets and number of trials and % lines
numbet <- 36
...
13
votes
6
answers
17k
views
How to choose keys from a python dictionary based on weighted probability? [duplicate]
I have a Python dictionary where keys represent some item and values represent some (normalized) weighting for said item. For example:
d = {'a': 0.0625, 'c': 0.625, 'b': 0.3125}
# Note that sum([v ...
13
votes
4
answers
18k
views
Chance of winning PHP percent calculation
I have a "battle" system, the attacker has a battle strength of e.g. 100, the defender has a strength of e.g. 75.
But I'm stuck now, I can't figure out how to find the winner.
I know the attacker has ...
13
votes
3
answers
22k
views
c# probability and random numbers
I want to trigger an event with a probability of 25% based on a random number generated between 1 and 100 using:
int rand = random.Next(1,100);
Will the following achieve this?
if (rand<=25)
{
...
13
votes
6
answers
6k
views
Probabilty based on quicksort partition
I have come across this question:
Let 0<α<.5 be some constant (independent of the input array length n). Recall the Partition subroutine employed by the QuickSort algorithm, as explained in ...
13
votes
2
answers
5k
views
Get true or false with a given probability
I'm trying to write a function in c++ that will return true or false based on a probability given. So, for example if the probability given was 0.634 then, 63.4% of the time the function would return ...