Relation metrics in statistics.
The covariance between two random variables measures the degree to which the two variables move together – it captures the linear relationship.
Properties of covariance:
▪ positive covariance: variables move together
▪ negative covariance: variables move in opposite directions
▪ covariance of variable with itself == variance

Pitfalls of covariance:
▪ actual value of covariance not meaningful
▪ can range from minus to plus infinity
▪ squared units

The correlation coefficient (r) measures the strength of the linear relationship (correlation) between two variables. It´s the standardized
covariance and is easier to interpret as values are between -1 and +1.
Correlation Coefficient (r) Interpretation:
r = 1 - perfect positive correlation
0 < r < 1 - Positive linear relationship
r = 0 - no linear relationship
-1 < r < 0 - negative linear relationship
r = -1 - perfect negative correlation
Simulating correlated data:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
N = 66
# generate correlated data
x = np.random.randn(N)
y = x + np.random.randn(N)
# plot the data
plt.plot(x,y,'kp',markerfacecolor='b',markersize=12)
plt.xlabel('Variable X')
plt.ylabel('Variable Y')
plt.xticks([])
plt.yticks([])
plt.show()

3 ways to calculate covariance:
## compute covariance
# precompute the means
meanX = np.mean(x)
meanY = np.mean(y)
### the loop method
covar1 = 0
for i in range(N):
covar1 = covar1 + (x[i]-meanX)*(y[i]-meanY)
# and now for the normalization
covar1 = covar1/(N-1)
### the linear algebra method
xCent = x-meanX
yCent = y-meanY
covar2 = np.dot(xCent,yCent) / (N-1)
### the Python method
covar3 = np.cov(np.vstack((x,y)))
print(covar1,covar2,covar3)
OUT:
0.9609676940493194 0.9609676940493196 [[1.03431923 0.96096769]
[0.96096769 2.32630356]]
2 ways to calculate correlation:
## now for correlation
### the long method
corr_num = sum( (x-meanX) * (y-meanY) )
corr_den = sum((x-meanX)**2) * sum((y-meanY)**2)
corr1 = corr_num/np.sqrt(corr_den)
### the Python method
corr2 = np.corrcoef(np.vstack((x,y)))
print(corr1,corr2)
OUT:
0.6195099623133035 [[1. 0.61950996]
[0.61950996 1. ]]
Calculating correlation with statistical significance:
r,p = stats.pearsonr(x,y)
print(r,p)
OUT:
0.6195099623133037 2.926584255137327e-08