Analysis of Clusters With Indian Patent Data Using Different Word Embedding Techniques

Main Article Content

Pankaj Beldar
Mohansingh Pardeshi
Rahul Rakhade
Shilpa Mene

Abstract

This study employs advanced Unsupervised Machine Learning (UML) techniques, including K-means and Agglomerative clustering, to analyze descriptive Indian Patent data. Utilizing silhouette score evaluation, elbow method, and dendrogram analysis, optimal cluster numbers are determined. Various word embedding methods like TF-IDF, Word2Vec, and Countvectorizer, combined with rigorous text processing, are explored. Robust testing of categorical and numerical features yields a high silhouette score of 0.8965 for 2 clusters, showcasing Agglomerative clustering's effectiveness. The research emphasizes the crucial role of UML techniques, word embedding methodologies, and comprehensive text processing in revealing complex structures within Indian Patent data. Besides advancing unsupervised learning methodologies, this work aids scholars, practitioners, and policymakers in comprehending the Indian patent landscape, fostering innovation, and technological progress

Article Details

Section
Articles
Author Biographies

Pankaj Beldar

K.K.Wagh Institute of Engineering Education and Research

 

Mohansingh Pardeshi

K.K.Wagh Institute of Engineering Education and Research

 

Rahul Rakhade

K.K.Wagh Institute of Engineering Education and Research

 

Shilpa Mene

K. K. Wagh Institute of Engineering Education & Research,Nashik