Partial correlation in statistics.

Partial correlation measures the strength of a relationship or degree of association between two variables, while controlling/removing the effect of one or more other variables.
Partial correlation is used when correlation coefficient will give misleading results if there is another, confounding, variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient.

Partial correlation with Python.

Like the correlation coefficient, the partial correlation coefficient may take values in the range from –1 to 1 wiyh exactly the same interpretations:
- the value 1 tells a perfect positive linear relationship.
- the value – 1 tells a perfect negative correlation.
- the value 0 tells that there is no linear relationship.

Partial correlation - removing the effect of other variables:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats

# pip install pingouin
import pingouin as pg

# raw correlations
rmg = .7
rsg = .8
rms = .9

# partial correlations
rho_mg_s = (rmg - rsg*rms) / ( np.sqrt(1-rsg**2)*np.sqrt(1-rms**2) )
rho_sg_m = (rsg - rmg*rms) / ( np.sqrt(1-rmg**2)*np.sqrt(1-rms**2) )



Partial correlation calculation - 3 datasets:

N = 76

# correlated datasets
x1 = np.linspace(1,10,N) + np.random.randn(N)
x2 = x1 + np.random.randn(N)
x3 = x1 + np.random.randn(N)

# let's convert these data to a pandas frame
df = pd.DataFrame()
df['x1'] = x1
df['x2'] = x2
df['x3'] = x3

# compute the "raw" correlation matrix
cormatR = df.corr()

# print out one value
print(' ')

# partial correlation
pc = pg.partial_corr(df,x='x3',y='x2',covar='x1')
print(' ')

Partial correlation calculation from datasets

Visualizing the matrices - correlation VS partial correlation:

fig,ax = plt.subplots(1,2,figsize=(6,3))

# raw correlations

# add text 
for i in range(3):
    for j in range(3):
        ax[0].text(i,j,np.round(cormatR.values[i,j],2), horizontalalignment='center')

# partial correlations
partialCorMat = df.pcorr()

for i in range(3):
    for j in range(3):
        ax[1].text(i,j,np.round(partialCorMat.values[i,j],2), horizontalalignment='center')

correlation vs partial correlation

See also related topics: