Website Clustering and Classification with ML
This project focuses on clustering webpages using word vectors and classifying new pages based on their content. By leveraging advanced natural language processing techniques, we convert webpage text into word vectors, which capture the semantic meaning and relationships between words. Through clustering algorithms, we group similar webpages together based on their word vector representations, enabling the efficient organization and navigation of related content. Additionally, we develop a classification model trained on the clustered data, allowing us to classify new webpages into their respective clusters or categories. This project aims to improve information retrieval and categorization systems, facilitating better content management and enhancing the user experience in web browsing.
Natural Language Processing
Machine Learning
Python
NLTK
fasttext
scikit-learn