Relevance Feature Discovery for Text Mining

Authors

  • Vikrant Sharma

DOI:

https://doi.org/10.17762/msea.v70i1.2303

Abstract

Due to large size words also data patterns, it is difficult to ensure the quality of relevant characteristics that are found in text documents that describe user preferences. Most widely used text mining and classification techniques now in use have embraced term-based strategies. However, polysemy and synonymy issues have affected them all. The theory that pattern-based approaches should outperform term-based ones in performance in expressing user preferences has been often held throughout the years, however text mining still struggles with how to employ large-scale patterns successfully. This research introduces a novel methodology for relevance feature discovery to address this hard problem. It finds higher level features in text texts that are both positive and negative patterns and uses them instead of low-level features (terms). Additionally, it organised terms into categories and updates term weights according to the patterns and specificity of those distributions. Significant tests employing this model on the datasets RCV1, TREC themes, and Reuters-21578 reveal that it performs noticeably better than both the most advanced term-based approaches and pattern-based methods.

Downloads

Published

2021-01-31

How to Cite

Sharma, V. . (2021). Relevance Feature Discovery for Text Mining. Mathematical Statistician and Engineering Applications, 70(1), 225–233. https://doi.org/10.17762/msea.v70i1.2303

Issue

Section

Articles