Relevance Feature Discovery for Text Mining
DOI:
https://doi.org/10.17762/msea.v70i1.2303Abstract
Due to large size words also data patterns, it is difficult to ensure the quality of relevant characteristics that are found in text documents that describe user preferences. Most widely used text mining and classification techniques now in use have embraced term-based strategies. However, polysemy and synonymy issues have affected them all. The theory that pattern-based approaches should outperform term-based ones in performance in expressing user preferences has been often held throughout the years, however text mining still struggles with how to employ large-scale patterns successfully. This research introduces a novel methodology for relevance feature discovery to address this hard problem. It finds higher level features in text texts that are both positive and negative patterns and uses them instead of low-level features (terms). Additionally, it organised terms into categories and updates term weights according to the patterns and specificity of those distributions. Significant tests employing this model on the datasets RCV1, TREC themes, and Reuters-21578 reveal that it performs noticeably better than both the most advanced term-based approaches and pattern-based methods.