Hub based K-Means Subspace Clustering for Improved Efficiency

Authors

  • Anuradha Sanapala, B. Jaya Lakshmi, K. B. Madhuri

DOI:

https://doi.org/10.17762/msea.v72i1.2398

Abstract

Clustering is the primary data mining functionality that groups the data points based on their similarities. As the dimensionality of the dataset increases, each data point appears to be equidistant to each other, thus making distance metrics less significant. Clustering in subspaces attempts to resolve the issue of the curse of dimensionality to some extent. However, determining clusters relevant to a subspace is a challenging task. Hubs are the data points which appear to be neighbours for most of the data points. Hence, the clusters are usually surrounded by such hubs, and it is efficient to consider these hubs as seed points while performing partitional clustering. In this paper, Hub based K-means Subspace Clustering (HKSC) is proposed, where K refers to the number of clusters to be identified. The initial seed points are selected using Hubness Scores on each subspace, and clusters are found using the partitional method. The proposed algorithm is evaluated and compared with state-of-the-art subspace clustering algorithms such as SUBCLU, SCHISM, and PCoC in terms of cluster quality metrics, namely purity and silhouette coefficient. It is proved that the proposed algorithm outperforms the existing algorithms. With regard to purity, on average, HKSC has shown an improvement of 71%, 18%, and 15% over SUBCLU, SCHISM and PCoC respectively. With respect to silhouette coefficient, the clustering result was 300% better when compared to SUBCLU result and 54% better than SCHISM. Concerning the execution time, HKSC showed 56% less than that of SUBCLU. The proposed approach uses the concept of hubs in order to efficiently mine the subspace clusters in partitional subspace clustering.

Downloads

Published

2023-05-26

How to Cite

Anuradha Sanapala, B. Jaya Lakshmi, K. B. Madhuri. (2023). Hub based K-Means Subspace Clustering for Improved Efficiency. Mathematical Statistician and Engineering Applications, 72(1), 1679–1691. https://doi.org/10.17762/msea.v72i1.2398

Issue

Section

Articles