CDF in statistics.

The cumulative distribution function or CDF is another method to describe the distribution of random variables. The advantage of the CDF is that it can be defined for any kind of random variable, both discrete and continuous.

The cumulative distribution function is used to evaluate probability as area. Mathematically, the cumulative probability density function is the integral of the pdf, and the probability between two values of a continuous random variable will be the integral of the pdf between these two values: the area under the curve between these values.

Cumulative distribution function with Python.

The cumulative distribution function (cdf) is the probability that the variable takes a value less than or equal to x. As by definition it will cumulate or sum all values of pdfs less than or equal to x at each given moment, so graphically it will start at 0 and end in probability of 1 or 100%.

Calculating and drawing PDFs and CDFs:

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

## this example uses log-normal distribution

# variable to evaluate the functions on
x = np.linspace(0,5,1001)

# note the function call pattern...
p1 = stats.lognorm.pdf(x,1)
c1 = stats.lognorm.cdf(x,1)

p2 = stats.lognorm.pdf(x,.1)
c2 = stats.lognorm.cdf(x,.1)

# draw the pdfs
fig,ax = plt.subplots(2,1,figsize=(4,7))

ax[0].plot(x,p1/sum(p1)) # question: why divide by sum here?
ax[0].plot(x,p1/sum(p1), x,p2/sum(p2))

# draw the cdfs
ax[1].plot(x,c1, x,c2)

Calculating and drawing PDFs and CDFs

Computing the cdf from the pdf (overlapping):

The following visualization of overlapping curves shows perfect example that sum of pdfs is equal to cumulative distribution function at any moment for arbitrary example of data.

# compute the cdf
c1x = np.cumsum( p1*(x[1])-x[0] )


Computing the cdf from the pdf (overlapping)

See also related topics: