Probability mass function in statistics.

In probability theory a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value.

Probability mass functions with Python.

It is also known as the discrete density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

Calculating probability mass function for drawing marbles from a jar:

The following Python code shows probabilities and proportions calculation for case of drawing marbles of different colors - blue, yellow and orange - out of the box.

import matplotlib.pyplot as plt
import numpy as np

# colored marble counts
blue   = 40
yellow = 30
orange = 20
totalMarbs = blue + yellow + orange

# put them all in a jar
jar = np.hstack((1*np.ones(blue),2*np.ones(yellow),3*np.ones(orange)))

# now we draw 500 marbles (with replacement)
numDraws = 500
drawColors = np.zeros(numDraws)

for drawi in range(numDraws):
    # generate a random integer to draw
    randmarble = int(np.random.rand()*len(jar))
    # store the color of that marble
    drawColors[drawi] = jar[randmarble]

# now we need to know the proportion of colors drawn
propBlue = sum(drawColors==1) / numDraws
propYell = sum(drawColors==2) / numDraws
propOran = sum(drawColors==3) / numDraws

# plot those against the theoretical probability[1,2,3],[ propBlue, propYell, propOran ],label='Proportion')
plt.plot([0.5, 1.5],[blue/totalMarbs, blue/totalMarbs],'b',linewidth=3,label='Probability')
plt.plot([1.5, 2.5],[yellow/totalMarbs,yellow/totalMarbs],'b',linewidth=3)
plt.plot([2.5, 3.5],[orange/totalMarbs,orange/totalMarbs],'b',linewidth=3)

plt.xlabel('Marble color')

Calculating probabilities for drawing marbles from a jar

Calculating probability density (technically mass) function:

A probability density function (PDF) differes from probability mass function and associated with continuous rather than discrete random variables.

import matplotlib.pyplot as plt
import numpy as np
# continous signal (technically discrete!)
N = 10004
datats1 = np.cumsum(np.sign(np.random.randn(N)))
datats2 = np.cumsum(np.sign(np.random.randn(N)))

# let's see what they look like

# discretize using histograms
nbins = 50

y,x = np.histogram(datats1,nbins)
x1 = (x[1:]+x[:-1])/2
y1 = y/sum(y)

y,x = np.histogram(datats2,nbins)
x2 = (x[1:]+x[:-1])/2
y2 = y/sum(y)

plt.plot(x1,y1, x2,y2,linewidth=3)
plt.xlabel('Data value')

Calculating probability density (technically mass) function

See also related topics: