Are you need IT Support Engineer? Free Consultant

TEXT MINING – STOP WORDS – A BRIEF INTRODUCTION

  • By admin
  • February 29, 2016
  • 7 Views

While going through a document, you must have come across words like ‘a, an, the’ which has a high frequency of occurrence but does not really bear any significance to the text in question. Agreed that the proper usage of nouns and pronouns are relevant in the formation of sentences and constructing proper language but it holds no water as far as search results are considered.

What Are Stop Words?

Stop words are nothing but commonly repeated words like ‘is, are, an, this, that, these, those, who, if etc’ which have a high degree of occurrence in any particular content. For example, if we type a search sentence like ‘What are the top 10 best short stories of all time?’ we will find that the keyword here is short stories, however due to the frequency of occurrence of words like ‘what, are, the, of, all’ being higher, the search engine could end up throwing results which are totally irrelevant to the question posed by you.

In order to counteract this, a stop word list is used so that irrelevant words are excluded. This helps the search engines in producing the required relevant end result instead of building up your search on inconsequential results. The search engines concentrate on important keywords like ‘best, stories etc’ throwing up search results in line with what the individual desires.

Stop Words – Domain Specific Cases

In some cases like in business or economics related texts, there are certain words like consumer, feasibility, cost, etc which are very frequently used. Or if we take the example of clinical texts, words like Dr, Mcg, etc are very commonly used. In such cases, the usual stop word list won’t do, words like consumer, cost, etc have to be added to the stop word list too because they recur often. In such a case, a domain specific list is much more effective than a list of available stop words.

At Market Quotient, we provide customized solutions pertaining to text analysis and information processing using NLP.  In short, we have the power to extract and crawl data and carry out automatic analysis using our world class infrastructure.

Get to know more about us. Write to us at contact@marketquotient.com

wpChatIcon