This document discusses using a K-means clustering algorithm to extract concepts from ambiguous text documents. It involves preprocessing the text by tokenizing, removing stop words, and stemming words. The words are then represented as vectors and dimensionality reduction using PCA is applied. Finally, K-means clustering is used to group similar words into clusters to identify the overall concepts in the document without reading the entire text. The aim is to help users understand the key topics in a document in a time-efficient manner without having to read the full text.
Report
Share
Report
Share
1 of 14
Download to read offline
More Related Content
IRJET- Concept Extraction from Ambiguous Text Document using K-Means