Data Science X-Validation use case.
When it comes to model performance estimation in most case we trust it's KPIs like accuracy, MAPE, R2 etc. But it's easy to fall in wrong conclusions as we can be just lucky or even unlucky to get some numbers with existing dataset split in train and test set with fixed random_state.
Python Knowledge Base: Make coding great again.
- Updated:
2024-11-20 by Andrey BRATUS, Senior Data Analyst.
To ovecome this situation k-Fold Cross Validation technique is used which runs models performanse estimation k-Fold times, for classification tasks accuracy metric is calculated, for regression case R2 is evaluated. The cv = 10 determines number of split times wich is default and recommended number for most of the cases. Python code example below is for classification case, k-Fold Cross Validation part for regression task is very similar.
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('my_dataset.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)
#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Training the model - Kernel SVM
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)
#Prediction and Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
#Calculating k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))