How to calculate probability in a normal distribution given mean & standard deviation?

Question

How to calculate probability in normal distribution given mean, std in Python? I can always explicitly code my own function according to the definition like the OP in this question did: Calculating Probability of a Random Variable in a Distribution in Python

Just wondering if there is a library function call will allow you to do this. In my imagine it would like this:

nd = NormalDistribution(mu=100, std=12)
p = nd.prob(98)

There is a similar question in Perl: How can I compute the probability at a point given a normal distribution in Perl?. But I didn't see one in Python.

Numpy has a random.normal function, but it's like sampling, not exactly what I want.

DSM · Accepted Answer · 2012-09-13 19:03:15Z

164

There's one in scipy.stats:

>>> import scipy.stats
>>> scipy.stats.norm(0, 1)
<scipy.stats.distributions.rv_frozen object at 0x928352c>
>>> scipy.stats.norm(0, 1).pdf(0)
0.3989422804014327
>>> scipy.stats.norm(0, 1).cdf(0)
0.5
>>> scipy.stats.norm(100, 12)
<scipy.stats.distributions.rv_frozen object at 0x928352c>
>>> scipy.stats.norm(100, 12).pdf(98)
0.032786643008494994
>>> scipy.stats.norm(100, 12).cdf(98)
0.43381616738909634
>>> scipy.stats.norm(100, 12).cdf(100)
0.5

[One thing to beware of -- just a tip -- is that the parameter passing is a little broad. Because of the way the code is set up, if you accidentally write scipy.stats.norm(mean=100, std=12) instead of scipy.stats.norm(100, 12) or scipy.stats.norm(loc=100, scale=12), then it'll accept it, but silently discard those extra keyword arguments and give you the default (0,1).]

answered Sep 13, 2012 at 19:03

DSM

348k66 gold badges597 silver badges497 bronze badges

4

How would you get probabilities from ranges? Say from 98 - 102?
– Leon
Aug 15, 2014 at 23:13
3

@DSM: In your above example, when you say scipy.stats.norm(100, 12).pdf(98), does that mean the probability of getting 98 in a distribution with mean 100 and stddev 12 is 0.032 ?
– Srivatsan
May 12, 2015 at 12:15
20

@ThePredator: no, the probability of getting 98 in a normal distribution with mean 100 and stddev 12 is zero. :-) The probability density is 0.032.
– DSM
May 14, 2015 at 21:20
Probability density in that case means the y-value, given the x-value 1.42 for the normal distribution. cdf means what we refer to as the area under the curve.
– shredding
May 9, 2017 at 15:20
9

@Leon, that's rv.cdf(102) - rv.cdf(98) where rv = scipy.stats.norm(100, 12).
– fuglede
Nov 24, 2019 at 15:22

| Show 1 more comment

Josh Bleecher Snyder · Accepted Answer · 2019-01-15 20:51:06Z

61

Scipy.stats is a great module. Just to offer another approach, you can calculate it directly using

import math
def normpdf(x, mean, sd):
    var = float(sd)**2
    denom = (2*math.pi*var)**.5
    num = math.exp(-(float(x)-float(mean))**2/(2*var))
    return num/denom

This uses the formula found here: http://en.wikipedia.org/wiki/Normal_distribution#Probability_density_function

to test:

>>> normpdf(7,5,5)  
0.07365402806066466
>>> norm(5,5).pdf(7)
0.073654028060664664

edited Jan 15, 2019 at 20:51

Josh Bleecher Snyder

8,3323 gold badges36 silver badges38 bronze badges

answered Sep 13, 2012 at 19:32

jiminy_crist

2,4252 gold badges18 silver badges23 bronze badges

Hey, this is a really nice answer. Would you mind providing a step-by step explanation, perhaps?
– Llamageddon
Nov 5, 2016 at 9:14
This method needs less computation time than scipy
– mkm
May 25, 2018 at 11:02
But scipy can handle arrays of means, stdevs and samples: mean = [ 5, 10, 20] stddev = [20, 30, 40] for x in ( [ 5, 10, 20], [10, 20, 40], [15, 30, 50], ): prob = scipy.stats.norm(mean, stddev).cdf(x) print(f'prob = {prob}') outputs: prob = [0.5 0.5 0.5] prob = [0.59870633 0.63055866 0.69146246] prob = [0.69146246 0.74750746 0.77337265]
– John Deighan
Apr 25, 2020 at 15:14

Add a comment |

user · Accepted Answer · 2018-07-11 12:44:55Z

Here is more info. First you are dealing with a frozen distribution (frozen in this case means its parameters are set to specific values). To create a frozen distribution:

import scipy.stats
scipy.stats.norm(loc=100, scale=12)
#where loc is the mean and scale is the std dev
#if you wish to pull out a random number from your distribution
scipy.stats.norm.rvs(loc=100, scale=12)

#To find the probability that the variable has a value LESS than or equal
#let's say 113, you'd use CDF cumulative Density Function
scipy.stats.norm.cdf(113,100,12)
Output: 0.86066975255037792
#or 86.07% probability

#To find the probability that the variable has a value GREATER than or
#equal to let's say 125, you'd use SF Survival Function 
scipy.stats.norm.sf(125,100,12)
Output: 0.018610425189886332
#or 1.86%

#To find the variate for which the probability is given, let's say the 
#value which needed to provide a 98% probability, you'd use the 
#PPF Percent Point Function
scipy.stats.norm.ppf(.98,100,12)
Output: 124.64498692758187

I can't thank enough whoever wrote this answer. I was looking everywhere to solve this but couldn't able to find it. And adding the comments with the code really helped me understand what is happening. Thanks a lot. — bhola prasad, Sep 24, 2020 at 13:13
Just want to ask one question, how to calculate these probabilities when the data is not normally distributed? What I have to do in this case? — bhola prasad, Sep 24, 2020 at 13:57

Xavier Guihot · Accepted Answer · 2019-02-28 19:47:32Z

31

Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.

It can be used to get the probability density function (pdf - likelihood that a random sample X will be near the given value x) for a given mean (mu) and standard deviation (sigma):

from statistics import NormalDist

NormalDist(mu=100, sigma=12).pdf(98)
# 0.032786643008494994

Also note that the NormalDist object also provides the cumulative distribution function (cdf - probability that a random sample X will be less than or equal to x):

NormalDist(mu=100, sigma=12).cdf(98)
# 0.43381616738909634

answered Feb 28, 2019 at 19:47

Xavier Guihot

58.3k24 gold badges304 silver badges195 bronze badges

1

This is the best answer because it uses the native library. Not everyone wants to use scipy
– Alec
Apr 12, 2023 at 20:34

Add a comment |

Prashanth · Accepted Answer · 2019-06-19 04:36:14Z

12

In case you would like to find the area between 2 values of x mean = 1; standard deviation = 2; the probability of x between [0.5,2]

import scipy.stats
scipy.stats.norm(1, 2).cdf(2) - scipy.stats.norm(1,2).cdf(0.5)

answered Jun 19, 2019 at 4:36

Prashanth

1211 silver badge2 bronze badges

Thanks - this formula is very hard to find online, but very useful.
– Pendragon
Dec 8, 2022 at 10:49

Add a comment |

tsveti_iko · Accepted Answer · 2021-01-08 13:58:03Z

Note that probability is different than probability density pdf(), which some of the previous answers refer to. Probability is the chance that the variable has a specific value, whereas the probability density is the chance that the variable will be near a specific value, meaning probability over a range. So to obtain the probability you need to compute the integral of the probability density function over a given interval. As an approximation, you can simply multiply the probability density by the interval you're interested in and that will give you the actual probability.

import numpy as np
from scipy.stats import norm

data_start = -10
data_end = 10
data_points = 21
data = np.linspace(data_start, data_end, data_points)

point_of_interest = 5
mu = np.mean(data)
sigma = np.std(data)                                   
interval = (data_end - data_start) / (data_points - 1)
probability = norm.pdf(point_of_interest, loc=mu, scale=sigma) * interval

The code above will give you the probability that the variable will have an exact value of 5 in a normal distribution between -10 and 10 with 21 data points (meaning interval is 1). You can play around with a fixed interval value, depending on the results you want to achieve.

I think the questioner is referring to "likelihood" instead of real "probability". — Z.LI, Oct 24, 2021 at 11:27

user2724943 · Accepted Answer · 2015-05-25 15:44:03Z

4

The formula cited from wikipedia mentioned in the answers cannot be used to calculate normal probabilites. You would have to write a numerical integration approximation function using that formula in order to calculate the probability.

That formula computes the value for the probability density function. Since the normal distribution is continuous, you have to compute an integral to get probabilities. The wikipedia site mentions the CDF, which does not have a closed form for the normal distribution.

answered May 25, 2015 at 15:44

user2724943

774 bronze badges

3

Thank you for your contribution, although it would fit better as a comment to the answer you are referring at: if I understand well, you aren't really answering to the original question. This way, everyone will see at a first glance what you are talking about.
– Pierre Prinetti
May 25, 2015 at 16:11

Add a comment |

Z.LI · Accepted Answer · 2021-10-24 11:25:23Z

I would like to say: the questioner is asking "How to calculate the likelihood of a given data point in a normal distribution given mean & standard deviation?" instead of "How to calculate probability in a normal distribution given mean & standard deviation?".

For "probability", it must be between 0 and 1, but for "likelihood", it must be non-negative (not necessarily between 0 and 1).

You could use multivariate_normal.pdf(x, mean= mean_vec, cov=cov_matrix) in scipy.stats.multivariate_normal to calculate it.

K.Mulier · Accepted Answer · 2019-12-03 05:30:54Z

I wrote this program to do the math for you. Just enter in the summary statistics. No need to provide an array:

One-Sample Z-Test for a Population Proportion:

To do this for mean rather than proportion, change the formula for z accordingly

EDIT:
Here is the content from the link:

import scipy.stats as stats
import math

def one_sample_ztest_pop_proportion(tail, p, pbar, n, alpha):
    #Calculate test stat

    sigma = math.sqrt((p*(1-p))/(n))
    z = round((pbar - p) / sigma, 2)

    if tail == 'lower':
        pval = round(stats.norm(p, sigma).cdf(pbar),4)
        print("Results for a lower tailed z-test: ")


    elif tail == 'upper':
        pval = round(1 - stats.norm(p, sigma).cdf(pbar),4)
        print("Results for an upper tailed z-test: ")


    elif tail == 'two':
        pval = round(stats.norm(p, sigma).cdf(pbar)*2,4)
        print("Results for a two tailed z-test: ")


    #Print test results
    print("Test statistic = {}".format(z))   
    print("P-value = {}".format(pval))
    print("Confidence = {}".format(alpha))

    #Compare p-value to confidence level
    if pval <= alpha:
        print("{} <=  {}. Reject the null hypothesis.".format(pval, alpha))
    else:
        print("{} > {}. Do not reject the null hypothesis.".format(pval, alpha))


#one_sample_ztest_pop_proportion('upper', .20, .25, 400, .05)

#one_sample_ztest_pop_proportion('two', .64, .52, 100, .05)

While the link might provide a valuable answer, SO asks users to post their code here on SO Links are useful as a reference, but they tend to break after a while, making solutions inaccessible for future visitors. — Mr. T, Jan 24, 2018 at 23:40

Undo · Accepted Answer · 2013-05-01 16:52:33Z

0

You can just use the error function that's built in to the math library, as stated on their website.

edited May 1, 2013 at 16:52

Undo♦

25.6k37 gold badges110 silver badges131 bronze badges

answered May 1, 2013 at 16:33

user2340146

1

Nope @user2340146, also in Python 2: docs.python.org/2/library/math.html#math.erf
– BoltzmannBrain
Jun 18, 2016 at 17:41

Add a comment |

Collectives™ on Stack Overflow

How to calculate probability in a normal distribution given mean & standard deviation?

10 Answers 10

Not the answer you're looking for? Browse other questions tagged
python
statistics
scipy
probability
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

Not the answer you're looking for? Browse other questions tagged pythonstatisticsscipyprobability or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
statistics
scipy
probability
or ask your own question.