72

Is there any python package that allows the efficient computation of the PDF (probability density function) of a multivariate normal distribution?

It doesn't seem to be included in Numpy/Scipy, and surprisingly a Google search didn't turn up any useful thing.

1
  • @pyCthon Yes, I know my covariance matrix is positive definite from the way it is constructed
    – Benno
    Jul 23, 2012 at 15:51

10 Answers 10

99

The multivariate normal is now available on SciPy 0.14.0.dev-16fc0af:

from scipy.stats import multivariate_normal
var = multivariate_normal(mean=[0,0], cov=[[1,0],[0,1]])
var.pdf([1,0])
3
  • I tried to evaluate it on a grid the numbers kind of resemble a gaussian but the sum is bigger then 1! Not what I expected...
    – Nils
    Apr 28, 2023 at 21:09
  • 2
    Be aware that densities can be greater than one. You need to integrate the density times the infinitesimal rectangles to get one.
    – juliohm
    May 5, 2023 at 18:21
  • Thanks for the quick answer. Thinking a bit more about it: In a 1d PDF probabilities are actually only defined for a range and not a point value. Somehow the .pdf (..) method returns values that make sense after normalizing them again. But the SciPy docs do not talk much about it.
    – Nils
    May 8, 2023 at 11:38
36

I just made one for my purposes so I though I'd share. It's built using "the powers" of numpy, on the formula of the non degenerate case from http://en.wikipedia.org/wiki/Multivariate_normal_distribution and it aso validates the input.

Here is the code along with a sample run

from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
           [0, 1.5, 0, 0],
           [0, 0, 1.7, 0],
           [0, 0,   0, 2]
          ])
# mean vector
mu = array([2,3,8,10])

# input
x = array([2.1,3.5,8, 9.5])

def norm_pdf_multivariate(x, mu, sigma):
    size = len(x)
    if size == len(mu) and (size, size) == sigma.shape:
        det = linalg.det(sigma)
        if det == 0:
            raise NameError("The covariance matrix can't be singular")

        norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
        x_mu = matrix(x - mu)
        inv = sigma.I        
        result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
        return norm_const * result
    else:
        raise NameError("The dimensions of the input don't match")

print norm_pdf_multivariate(x, mu, sigma)
1
  • 12
    Is there a reason you use math.pow(x, 1.0/2) rather than math.sqrt(x), and similarly, why use math.pow(math.e, x) over math.exp(x)?
    – lericson
    May 20, 2015 at 15:04
18

If still needed, my implementation would be

import numpy as np

def pdf_multivariate_gauss(x, mu, cov):
    '''
    Caculate the multivariate normal density (pdf)

    Keyword arguments:
        x = numpy array of a "d x 1" sample vector
        mu = numpy array of a "d x 1" mean vector
        cov = "numpy array of a d x d" covariance matrix
    '''
    assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
    assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
    assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
    assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
    assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
    part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
    part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
    return float(part1 * np.exp(part2))

def test_gauss_pdf():
    x = np.array([[0],[0]])
    mu  = np.array([[0],[0]])
    cov = np.eye(2) 

    print(pdf_multivariate_gauss(x, mu, cov))

    # prints 0.15915494309189535

if __name__ == '__main__':
    test_gauss_pdf()

In case I make future changes, the code is here on GitHub

8

In the common case of a diagonal covariance matrix, the multivariate PDF can be obtained by simply multiplying the univariate PDF values returned by a scipy.stats.norm instance. If you need the general case, you will probably have to code this yourself (which shouldn't be hard).

1
  • Could you be more specific of how to use norm to calculate the probability? Big thanks!!
    – Cyrus
    Nov 11, 2021 at 9:50
7

You can easily compute using numpy. I have implemented as below for the purpose of machine learning course and would like to share, hope it helps to someone.

import numpy as np
X = np.array([[13.04681517, 14.74115241],[13.40852019, 13.7632696 ],[14.19591481, 15.85318113],[14.91470077, 16.17425987]])

def est_gaus_par(X):
    mu = np.mean(X,axis=0)
    sig = np.std(X,axis=0)
    return mu,sig

mu,sigma = est_gaus_par(X)

def est_mult_gaus(X,mu,sigma):
    m = len(mu)
    sigma2 = np.diag(sigma)
    X = X-mu.T
    p = 1/((2*np.pi)**(m/2)*np.linalg.det(sigma2)**(0.5))*np.exp(-0.5*np.sum(X.dot(np.linalg.pinv(sigma2))*X,axis=1))

    return p

p = est_mult_gaus(X, mu, sigma)
1
  • This was helpful for me when trying to convert the identical function to cupy. I did need to use cp.diag(cp.diag(sigma)) when using a covariance matrix. Thx,
    – jsfa11
    May 29, 2020 at 18:28
3

I know of several python packages that use it internally, with different generality and for different uses, but I don't know if any of them are intended for users.

statsmodels, for example, has the following hidden function and class, but it's not used by statsmodels:

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777

Essentially, if you need fast evaluation, rewrite it for your use case.

3

I use the following code which calculates the logpdf value, which is preferable for larger dimensions. It also works for scipy.sparse matrices.

import numpy as np
import math
import scipy.sparse as sp
import scipy.sparse.linalg as spln

def lognormpdf(x,mu,S):
    """ Calculate gaussian probability density of x, when x ~ N(mu,sigma) """
    nx = len(S)
    norm_coeff = nx*math.log(2*math.pi)+np.linalg.slogdet(S)[1]

    err = x-mu
    if (sp.issparse(S)):
        numerator = spln.spsolve(S, err).T.dot(err)
    else:
        numerator = np.linalg.solve(S, err).T.dot(err)

    return -0.5*(norm_coeff+numerator)

Code is from pyParticleEst, if you want the pdf value instead of the logpdf just take math.exp() on the returned value

0
2

The density can be computed in a pretty straightforward way using numpy functions and the formula on this page: http://en.wikipedia.org/wiki/Multivariate_normal_distribution. You may also want to use the likelihood function (log probability), which is less likely to underflow for large dimensions and is a little more straightforward to compute. Both just involve being able to compute the determinant and inverse of a matrix.

The CDF, on the other hand, is an entirely different animal...

1

Here I elaborate a bit more on how exactly to use the multivariate_normal() from the scipy package:

# Import packages
import numpy as np
from scipy.stats import multivariate_normal

# Prepare your data
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x,y)

# Get the multivariate normal distribution
mu_x = np.mean(x)
sigma_x = np.std(x)
mu_y = np.mean(y)
sigma_y = np.std(y)
rv = multivariate_normal([mu_x, mu_y], [[sigma_x, 0], [0, sigma_y]])

# Get the probability density
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
pd = rv.pdf(pos)
0

The following code helped me to solve,when given a vector what is the likelihood that vector is in a multivariate normal distribution.

import numpy as np
from scipy.stats import multivariate_normal

data with all vectors

d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])

mean of the data in vector form, which will have same length as input vector(here its 3)

mean = sum(d,axis=0)/len(d)

OR
mean=np.average(d , axis=0)
mean.shape

finding covarianve of vectors which will have shape of [input vector shape X input vector shape] here it is 3x3

cov = 0
for e in d:
  cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape

preparing a multivariate Gaussian distribution from mean and co variance

dist = multivariate_normal(mean,cov)

finding probability distribution function.

print(dist.pdf([1,2,3]))

3.050863384798471e-05

The above value gives the likelihood.

1
  • in d, are the rows or the columns the signals? If you have 4 signals with 3 elements each, then the covariance should be 4x4
    – seralouk
    Feb 25, 2020 at 12:37

Not the answer you're looking for? Browse other questions tagged or ask your own question.