Dive into Clustering Models with Python and R!

Data Science Clustering use case.

Python programming language and its libraries combined together and R language in addition form the powerful tools for solving Clustering analysis tasks.

Cluster analysis or simply clustering is a branch of machine learning ML which mainly dealt with unsupervised task and usually involves automatically discovering natural grouping in data.

Clustering models in Python and R. — Clustering models meme.

Python Knowledge Base: Make coding great again.
- Updated: 2025-07-11 by Andrey BRATUS, Senior Data Analyst.

Unlike supervised learning (like predictive modeling), clustering algorithms only interpret the input data and find natural groups or clusters using given features.

In other words Clustering techniques apply when there is no class to be predicted but rather when the instances are to be divided into natural groups - clusters.

A cluster is often an area of density in the feature space where examples from the domain (observations or rows of data) are closer to the cluster than other clusters. The cluster may have a center (the centroid) that is a sample or a point feature space and may have a boundary or extent.

K-Means Clustering model

K-Means Clustering in Python


#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('my_dataset.csv')
#specifying 2 features for further visualisation
X = dataset.iloc[:, [2, 3]].values
#Elbow method to find the optimal number of clusters
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 1)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
#Training the K-Means model on the dataset
kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)
#Visualising the clusters - 2 featrures
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow', label = 'Centroids')
plt.title('Clusters')
plt.xlabel('Feature1')
plt.ylabel('Feature2')
plt.legend()
plt.show()

K-Means Clustering in R


#Importing the dataset
dataset = read.csv('my_dataset.csv')
#specifying 2 features for further visualisation
dataset = dataset[3:4]
#Elbow method to find the optimal number of clusters
#set.seed(123)
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(dataset, i)$withinss)
plot(1:10,
     wcss,
     type = 'b',
     main = paste('Elbow Method'),
     xlab = 'Clusters',
     ylab = 'WCSS')

# Fitting K-Means to the dataset
set.seed(123)
kmeans = kmeans(x = dataset, centers = 5)
y_kmeans = kmeans$cluster
# Visualising the clusters
# install.packages('cluster')
library(cluster)
clusplot(dataset,
         y_kmeans,
         lines = 0,
         shade = TRUE,
         color = TRUE,
         labels = 2,
         plotchar = FALSE,
         span = TRUE,
         main = paste('Clusters'),
         xlab = 'Feature1',
         ylab = 'Feature2')

Hierarchical Clustering model

Hierarchical Clustering in Python


#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('my_dataset.csv')
#specifying 2 features for further visualisation
X = dataset.iloc[:, [2, 3]].values
#Dendrogram usage to find the optimal number of clusters
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Observation points')
plt.ylabel('Euclidean distances')
plt.show()
#Training the Hierarchical Clustering model on the dataset
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(X)
#Visualising the clusters - 2 featrures
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.title('Clusters')
plt.xlabel('Feature1')
plt.ylabel('Feature2')
plt.legend()
plt.show()

Hierarchical Clustering in R


#Importing the dataset
dataset = read.csv('my_dataset.csv')
#specifying 2 features for further visualisation
dataset = dataset[3:4]
#dendrogram method to find the optimal number of clusters
dendrogram = hclust(d = dist(dataset, method = 'euclidean'), method = 'ward.D')
plot(dendrogram,
     main = paste('Dendrogram'),
     xlab = 'Observation points',
     ylab = 'Euclidean distances')

# Fitting Hierarchical Clustering to the dataset
hc = hclust(d = dist(dataset, method = 'euclidean'), method = 'ward.D')
y_hc = cutree(hc, 5)
# Visualising the clusters
# install.packages('cluster')
library(cluster)
clusplot(dataset,
         y_hc,
         lines = 0,
         shade = TRUE,
         color = TRUE,
         labels = 2,
         plotchar = FALSE,
         span = TRUE,
         main = paste('Clusters'),
         xlab = 'Feature1',
         ylab = 'Feature2')

Clustering models tips and features.

Grouping the Unseen: Elevate Your Data Analysis with Python and R!

Data Science Clustering use case.

K-Means Clustering model

K-Means Clustering in Python

K-Means Clustering in R

Hierarchical Clustering model

Hierarchical Clustering in Python

Hierarchical Clustering in R

Clustering models tips and features.

See also related topics:

Data Analysis, Data Science and Machine Learning: