Embrace Classification with Python and R!

Data Science Classification use case.

Python programming language and its libraries combined together and R language in addition form the powerful tools for solving Classification analysis tasks.

Classification is one of the most fundamental concepts in data science.

Classification models in Python and R. — Classification models meme.

Python Knowledge Base: Make coding great again.
- Updated: 2024-07-26 by Andrey BRATUS, Senior Data Analyst.

Classification is a task that requires the use of machine learning algorithms that learn how to assign a class label to examples from the problem domain. From a modeling perspective, classification requires a training dataset with many examples of inputs and outputs from which to learn. A model will use the training dataset and will calculate how to best map examples of input data to specific class labels. As such, the training dataset must be sufficiently representative of the problem and have many examples of each class label.

Logistic Regression classification model

Logistic Regression in Python


#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('my_dataset.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Training the Logistic Regression model on the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
#Predicting a new result - 2 featrures
print(classifier.predict(sc.transform([[feature1_value,feature2_value]])))
#Predicting the Test set results
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))
#Displaying the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

Logistic Regression in R


#Importing the dataset
dataset = read.csv('my_dataset.csv')
dataset = dataset[3:5]
# Splitting the dataset into the Training set and Test set in R
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Target, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
# Feature Scaling
training_set[, 2:3] = scale(training_set[, 2:3])
test_set[, 2:3] = scale(test_set[, 2:3])
# Fitting Logistic Regression to the Training set in R
classifier = glm(formula = Target ~ .,
                 family = binomial,
                 data = training_set)
# Predicting the Test set results
prob_pred = predict(classifier, type = 'response', newdata = test_set[-3])
y_pred = ifelse(prob_pred > 0.5, 1, 0)
# Making the Confusion Matrix in R
cm = table(test_set[, 3], y_pred > 0.5)

K-Nearest Neighbors (K-NN) classification model

K-Nearest Neighbors (K-NN) in Python


#The only difference in code from Logistic Regression (above) is Training the model step
#Training the K-NN model on the Training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

K-Nearest Neighbors (K-NN) in R


#The differences in code from Logistic Regression (above) are Training and predicting the model steps
#Fitting K-NN to the Training set and Predicting the Test set results
# install.packages('class')
library(class)
y_pred = knn(train = training_set[, -3],
     test = test_set[, -3],
     cl = training_set[, 3],
     k = 5,
     prob = TRUE)

Support Vector Machine (SVM) classification model

Support Vector Machine (SVM) in Python


#The only difference in code from Logistic Regression (above) is Training the model step
#Training the SVM model on the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)

Support Vector Machine (SVM) in R


#The only difference in code from Logistic Regression (above) is Training the model step
#Fitting SVM to the Training set
# install.packages('e1071')
library(e1071)
classifier = svm(formula = Target ~ .,
         data = training_set,
         type = 'C-classification',
         kernel = 'linear')

Kernel SVM classification model

Kernel SVM in Python


#The only difference in code from Logistic Regression (above) is Training the model step
#Training the Kernel SVM model on the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)

Kernel SVM in R


#The only difference in code from Logistic Regression (above) is Training the model step
#Fitting Kernel SVM to the Training set
# install.packages('e1071')
library(e1071)
classifier = svm(formula = Target ~ .,
         data = training_set,
         type = 'C-classification',
         kernel = 'radial')

Naive Bayes classification model

Naive Bayes in Python


#The only difference in code from Logistic Regression (above) is Training the model step
#Training the Naive Bayes model on the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Naive Bayes in R


#The differences in code from Logistic Regression (above) are Encoding the target and Training the model steps
#Encoding the target as factor
dataset$Purchased = factor(dataset$Target, levels = c(0, 1))
#Fitting Naive Bayes to the Training set
# install.packages('e1071')
library(e1071)
classifier = naiveBayes(x = training_set[-3],
                y = training_set$Target)

Decision Tree classification model

Decision Tree in Python


#The only difference in code from Logistic Regression (above) is Training the model step
#Training the Decision Tree model on the Training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

Decision Tree in R


#The differenceы in code from Logistic Regression (above) are Training the model and predicting steps
#Fitting Decision Tree to the Training set
# install.packages('rpart')
library(rpart)
classifier = rpart(formula = Target ~ .,
           data = training_set)
#Predicting the Test set results
y_pred = predict(classifier, newdata = test_set[-3], type = 'class')

Random Forest classification model

Random Forest in Python


#The only difference in code from Logistic Regression (above) is Training the model step
#Training the Random Forest model on the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

Random Forest in R


#he only difference in code from Logistic Regression (above) is Training the model step
#Fitting Random Forest to the Training set
# install.packages('randomForest')
library(randomForest)
set.seed(123)
classifier = randomForest(x = training_set[-3],
                  y = training_set$Target,
                  ntree = 100)

Classification models tips and features.

Predict with Precision: Master Classification Models with Python and R!

Data Science Classification use case.

Logistic Regression classification model

Logistic Regression in Python

Logistic Regression in R

K-Nearest Neighbors (K-NN) classification model

K-Nearest Neighbors (K-NN) in Python

K-Nearest Neighbors (K-NN) in R

Support Vector Machine (SVM) classification model

Support Vector Machine (SVM) in Python

Support Vector Machine (SVM) in R

Kernel SVM classification model

Kernel SVM in Python

Kernel SVM in R

Naive Bayes classification model

Naive Bayes in Python

Naive Bayes in R

Decision Tree classification model

Decision Tree in Python

Decision Tree in R

Random Forest classification model

Random Forest in Python

Random Forest in R

Classification models tips and features.

See also related topics:

Data Analysis, Data Science and Machine Learning: