Multivariate normal density in Python?

Question

Is there any python package that allows the efficient computation of the PDF (probability density function) of a multivariate normal distribution?

It doesn't seem to be included in Numpy/Scipy, and surprisingly a Google search didn't turn up any useful thing.

@pyCthon Yes, I know my covariance matrix is positive definite from the way it is constructed — Benno, Jul 23, 2012 at 15:51

juliohm · Accepted Answer · 2014-01-03 10:43:07Z

99

The multivariate normal is now available on SciPy 0.14.0.dev-16fc0af:

from scipy.stats import multivariate_normal
var = multivariate_normal(mean=[0,0], cov=[[1,0],[0,1]])
var.pdf([1,0])

answered Jan 3, 2014 at 10:43

juliohm

3,7412 gold badges18 silver badges23 bronze badges

I tried to evaluate it on a grid the numbers kind of resemble a gaussian but the sum is bigger then 1! Not what I expected...
– Nils
Apr 28, 2023 at 21:09
2

Be aware that densities can be greater than one. You need to integrate the density times the infinitesimal rectangles to get one.
– juliohm
May 5, 2023 at 18:21
Thanks for the quick answer. Thinking a bit more about it: In a 1d PDF probabilities are actually only defined for a range and not a point value. Somehow the .pdf (..) method returns values that make sense after normalizing them again. But the SciPy docs do not talk much about it.
– Nils
May 8, 2023 at 11:38

Add a comment |

bougui · Accepted Answer · 2014-12-08 08:58:09Z

I just made one for my purposes so I though I'd share. It's built using "the powers" of numpy, on the formula of the non degenerate case from http://en.wikipedia.org/wiki/Multivariate_normal_distribution and it aso validates the input.

Here is the code along with a sample run

from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
           [0, 1.5, 0, 0],
           [0, 0, 1.7, 0],
           [0, 0,   0, 2]
          ])
# mean vector
mu = array([2,3,8,10])

# input
x = array([2.1,3.5,8, 9.5])

def norm_pdf_multivariate(x, mu, sigma):
    size = len(x)
    if size == len(mu) and (size, size) == sigma.shape:
        det = linalg.det(sigma)
        if det == 0:
            raise NameError("The covariance matrix can't be singular")

        norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
        x_mu = matrix(x - mu)
        inv = sigma.I        
        result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
        return norm_const * result
    else:
        raise NameError("The dimensions of the input don't match")

print norm_pdf_multivariate(x, mu, sigma)

Is there a reason you use math.pow(x, 1.0/2) rather than math.sqrt(x), and similarly, why use math.pow(math.e, x) over math.exp(x)? — lericson, May 20, 2015 at 15:04

user2489252user2489252 · Accepted Answer · 2014-04-16 05:59:27Z

If still needed, my implementation would be

import numpy as np

def pdf_multivariate_gauss(x, mu, cov):
    '''
    Caculate the multivariate normal density (pdf)

    Keyword arguments:
        x = numpy array of a "d x 1" sample vector
        mu = numpy array of a "d x 1" mean vector
        cov = "numpy array of a d x d" covariance matrix
    '''
    assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
    assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
    assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
    assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
    assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
    part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
    part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
    return float(part1 * np.exp(part2))

def test_gauss_pdf():
    x = np.array([[0],[0]])
    mu  = np.array([[0],[0]])
    cov = np.eye(2) 

    print(pdf_multivariate_gauss(x, mu, cov))

    # prints 0.15915494309189535

if __name__ == '__main__':
    test_gauss_pdf()

In case I make future changes, the code is here on GitHub

Sven Marnach · Accepted Answer · 2012-07-23 15:55:56Z

8

In the common case of a diagonal covariance matrix, the multivariate PDF can be obtained by simply multiplying the univariate PDF values returned by a scipy.stats.norm instance. If you need the general case, you will probably have to code this yourself (which shouldn't be hard).

edited Jul 23, 2012 at 15:55

answered Jul 23, 2012 at 15:48

Sven Marnach

587k120 gold badges951 silver badges851 bronze badges

Could you be more specific of how to use norm to calculate the probability? Big thanks!!
– Cyrus
Nov 11, 2021 at 9:50

Add a comment |

Freq · Accepted Answer · 2020-05-21 00:00:34Z

You can easily compute using numpy. I have implemented as below for the purpose of machine learning course and would like to share, hope it helps to someone.

import numpy as np
X = np.array([[13.04681517, 14.74115241],[13.40852019, 13.7632696 ],[14.19591481, 15.85318113],[14.91470077, 16.17425987]])

def est_gaus_par(X):
    mu = np.mean(X,axis=0)
    sig = np.std(X,axis=0)
    return mu,sig

mu,sigma = est_gaus_par(X)

def est_mult_gaus(X,mu,sigma):
    m = len(mu)
    sigma2 = np.diag(sigma)
    X = X-mu.T
    p = 1/((2*np.pi)**(m/2)*np.linalg.det(sigma2)**(0.5))*np.exp(-0.5*np.sum(X.dot(np.linalg.pinv(sigma2))*X,axis=1))

    return p

p = est_mult_gaus(X, mu, sigma)

This was helpful for me when trying to convert the identical function to cupy. I did need to use cp.diag(cp.diag(sigma)) when using a covariance matrix. Thx, — jsfa11, May 29, 2020 at 18:28

Josef · Accepted Answer · 2012-07-28 07:00:06Z

I know of several python packages that use it internally, with different generality and for different uses, but I don't know if any of them are intended for users.

statsmodels, for example, has the following hidden function and class, but it's not used by statsmodels:

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777

Essentially, if you need fast evaluation, rewrite it for your use case.

ajn · Accepted Answer · 2014-01-17 14:55:31Z

I use the following code which calculates the logpdf value, which is preferable for larger dimensions. It also works for scipy.sparse matrices.

import numpy as np
import math
import scipy.sparse as sp
import scipy.sparse.linalg as spln

def lognormpdf(x,mu,S):
    """ Calculate gaussian probability density of x, when x ~ N(mu,sigma) """
    nx = len(S)
    norm_coeff = nx*math.log(2*math.pi)+np.linalg.slogdet(S)[1]

    err = x-mu
    if (sp.issparse(S)):
        numerator = spln.spsolve(S, err).T.dot(err)
    else:
        numerator = np.linalg.solve(S, err).T.dot(err)

    return -0.5*(norm_coeff+numerator)

Code is from pyParticleEst, if you want the pdf value instead of the logpdf just take math.exp() on the returned value

Andrew Mao · Accepted Answer · 2013-01-06 02:37:19Z

The density can be computed in a pretty straightforward way using numpy functions and the formula on this page: http://en.wikipedia.org/wiki/Multivariate_normal_distribution. You may also want to use the likelihood function (log probability), which is less likely to underflow for large dimensions and is a little more straightforward to compute. Both just involve being able to compute the determinant and inverse of a matrix.

The CDF, on the other hand, is an entirely different animal...

tsveti_iko · Accepted Answer · 2021-01-08 14:35:51Z

Here I elaborate a bit more on how exactly to use the multivariate_normal() from the scipy package:

# Import packages
import numpy as np
from scipy.stats import multivariate_normal

# Prepare your data
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x,y)

# Get the multivariate normal distribution
mu_x = np.mean(x)
sigma_x = np.std(x)
mu_y = np.mean(y)
sigma_y = np.std(y)
rv = multivariate_normal([mu_x, mu_y], [[sigma_x, 0], [0, sigma_y]])

# Get the probability density
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
pd = rv.pdf(pos)

Community · Accepted Answer · 2020-06-20 09:12:55Z

The following code helped me to solve,when given a vector what is the likelihood that vector is in a multivariate normal distribution.

import numpy as np
from scipy.stats import multivariate_normal

data with all vectors

d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])

mean of the data in vector form, which will have same length as input vector(here its 3)

mean = sum(d,axis=0)/len(d)

OR
mean=np.average(d , axis=0)
mean.shape

finding covarianve of vectors which will have shape of [input vector shape X input vector shape] here it is 3x3

cov = 0
for e in d:
  cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape

preparing a multivariate Gaussian distribution from mean and co variance

dist = multivariate_normal(mean,cov)

finding probability distribution function.

print(dist.pdf([1,2,3]))

3.050863384798471e-05

The above value gives the likelihood.

in d, are the rows or the columns the signals? If you have 4 signals with 3 elements each, then the covariance should be 4x4 — seralouk, Feb 25, 2020 at 12:37

Collectives™ on Stack Overflow

Multivariate normal density in Python?

10 Answers 10

data with all vectors

mean of the data in vector form, which will have same length as input vector(here its 3)

finding covarianve of vectors which will have shape of [input vector shape X input vector shape] here it is 3x3

preparing a multivariate Gaussian distribution from mean and co variance

finding probability distribution function.

Not the answer you're looking for? Browse other questions tagged
python
numpy
scipy
probability
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

data with all vectors

mean of the data in vector form, which will have same length as input vector(here its 3)

finding covarianve of vectors which will have shape of [input vector shape X input vector shape] here it is 3x3

preparing a multivariate Gaussian distribution from mean and co variance

finding probability distribution function.

Not the answer you're looking for? Browse other questions tagged pythonnumpyscipyprobability or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
numpy
scipy
probability
or ask your own question.