Analysis of Clusters With Indian Patent Data Using Different Word Embedding Techniques
Main Article Content
Abstract
This study employs advanced Unsupervised Machine Learning (UML) techniques, including K-means and Agglomerative clustering, to analyze descriptive Indian Patent data. Utilizing silhouette score evaluation, elbow method, and dendrogram analysis, optimal cluster numbers are determined. Various word embedding methods like TF-IDF, Word2Vec, and Countvectorizer, combined with rigorous text processing, are explored. Robust testing of categorical and numerical features yields a high silhouette score of 0.8965 for 2 clusters, showcasing Agglomerative clustering's effectiveness. The research emphasizes the crucial role of UML techniques, word embedding methodologies, and comprehensive text processing in revealing complex structures within Indian Patent data. Besides advancing unsupervised learning methodologies, this work aids scholars, practitioners, and policymakers in comprehending the Indian patent landscape, fostering innovation, and technological progress