Z-score standardization in statistics.


A standard score or a z-score is used for standardizing scores on the same scale by dividing a score's deviation by the standard deviation std in a dataset. The result is a standard score thet measures the number of standard deviations that a given data point is from the data center - the mean.

Z-score standardization with Python.


Standardized data set has mean 0 and standard deviation 1, and retains the shape properties of the original data set (same skewness and kurtosis).



Creating data:



import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

data = np.random.poisson(3,1000)**2

## compute the mean and std
datamean = np.mean(data)
datastd  = np.std(data,ddof=1)

# the previous two lines are equivalent to the following two lines
#datamean = data.mean()
#datastd  = data.std(ddof=1)



plt.plot(data,'s',markersize=3)
plt.xlabel('Data index')
plt.ylabel('Data value')
plt.title(f'Mean = {np.round(datamean,2)}; std = {np.round(datastd,2)}')

plt.show()  

Creating data for z-score


Z-scoring and visualisation:



# z-score is data minus mean divided by stdev
dataz = (data-datamean) / datastd

# can also use Python function
dataz = stats.zscore(data)

# compute the mean and std
dataZmean = np.mean(dataz)
dataZstd  = np.std(dataz,ddof=1)

plt.plot(dataz,'s',markersize=3)
plt.xlabel('Data index')
plt.ylabel('Data value')
plt.title(f'Mean = {np.round(dataZmean,2)}; std = {np.round(dataZstd,2)}')

plt.show()

Z-scoring and visualisation




See also related topics: