CDF in statistics.
The cumulative distribution function or CDF is another method to describe the distribution of random variables.
The advantage of the CDF is that it can be defined for any kind of random variable, both discrete and continuous.
The cumulative distribution function is used to evaluate probability as area. Mathematically, the cumulative probability density function is the integral of the pdf, and the probability between two values of a continuous random variable will be the integral of the pdf between these two values: the area under the curve between these values.
The cumulative distribution function (cdf) is the probability that the variable takes a value less than or equal to x. As by definition it will cumulate or sum all values of pdfs less than or equal to x at each given moment, so graphically it will start at 0 and end in probability of 1 or 100%.
Calculating and drawing PDFs and CDFs:
import matplotlib.pyplot as plt import numpy as np import scipy.stats as stats ## this example uses log-normal distribution # variable to evaluate the functions on x = np.linspace(0,5,1001) # note the function call pattern... p1 = stats.lognorm.pdf(x,1) c1 = stats.lognorm.cdf(x,1) p2 = stats.lognorm.pdf(x,.1) c2 = stats.lognorm.cdf(x,.1) # draw the pdfs fig,ax = plt.subplots(2,1,figsize=(4,7)) ax.plot(x,p1/sum(p1)) # question: why divide by sum here? ax.plot(x,p1/sum(p1), x,p2/sum(p2)) ax.set_ylabel('probability') ax.set_title('pdf(x)') # draw the cdfs ax.plot(x,c1) ax.plot(x,c1, x,c2) ax.set_ylabel('probability') ax.set_title('cdf(x)') plt.show()
Computing the cdf from the pdf (overlapping):
The following visualization of overlapping curves shows perfect example that sum of pdfs is equal to cumulative distribution function at any moment for arbitrary example of data.
# compute the cdf c1x = np.cumsum( p1*(x)-x ) plt.plot(x,c1) plt.plot(x,c1x,'--') plt.show()