Analysis of Clusters With Indian Patent Data Using Different Word Embedding Techniques

Pankaj Beldar; Mohansingh Pardeshi; Rahul Rakhade; Shilpa Mene

doi:10.53555/sfs.v10i3.2110

pdf

Published: Dec 16, 2023

DOI: https://doi.org/10.53555/sfs.v10i3.2110

Keywords:

K-means Agglomerative clustering Word embedding Patents Silhouette Score

Pankaj Beldar

Mohansingh Pardeshi

Rahul Rakhade

Shilpa Mene

Abstract

This study employs advanced Unsupervised Machine Learning (UML) techniques, including K-means and Agglomerative clustering, to analyze descriptive Indian Patent data. Utilizing silhouette score evaluation, elbow method, and dendrogram analysis, optimal cluster numbers are determined. Various word embedding methods like TF-IDF, Word2Vec, and Countvectorizer, combined with rigorous text processing, are explored. Robust testing of categorical and numerical features yields a high silhouette score of 0.8965 for 2 clusters, showcasing Agglomerative clustering's effectiveness. The research emphasizes the crucial role of UML techniques, word embedding methodologies, and comprehensive text processing in revealing complex structures within Indian Patent data. Besides advancing unsupervised learning methodologies, this work aids scholars, practitioners, and policymakers in comprehending the Indian patent landscape, fostering innovation, and technological progress

Issue

Vol. 10 No. 3 (2023)

Section

Articles

Author Biographies

Pankaj Beldar

K.K.Wagh Institute of Engineering Education and Research

Mohansingh Pardeshi

K.K.Wagh Institute of Engineering Education and Research

Rahul Rakhade

K.K.Wagh Institute of Engineering Education and Research

Shilpa Mene

K. K. Wagh Institute of Engineering Education & Research,Nashik

Article Sidebar

Main Article Content

Abstract

Article Details

Pankaj Beldar

Mohansingh Pardeshi

Rahul Rakhade

Shilpa Mene