Stop words are nothing but a list of irrelevant, repeatedly recurring words that need to be blocked for search engines to throw up proper search results. Listed below are some common stop words placed in the stop word list.
The Different Kinds Of Stop Words
Stop words can be of different types and prepared stop word lists are available online. You can come up with your entire unique stop word list according to your need. Listed below are some of the different types of stop words:
- Determiners – Usually words like ‘a, an, the’ are known as determiners. It usually precedes a noun or is used in connection with a noun.
- Coordinating Conjunctions – Coordinating conjunctions are helpful in connecting words, set of words or sentences together. These are nothing but joining words like ‘and, but, because, etc’.
- Prepositions – These words provide additional information. For example: on, at, or, but, yet.
Therefore it is usually found that determiners, prepositions, coordinating conjunctions and sometimes even adjectives are included in the stop word list.
Nonetheless, if certain adjectives like good or bad or some negation is blocked in the form of stop words from a certain list, it could throw the entire algorithm off track in case of sentiment analysis. Care should be taken that such errors do not crop up as it would lead to the assimilation of inefficient data, data which is in essence useless.
Stop phrases are very much akin to stop words but here instead of a specific word you choose an entire phrase to block. For example, if you are going through a book of fairytales written for children, the phrase ‘once upon a time’ appears often. You could add this phrase to your list of stop phrases to prevent them from being indexed by your search engines.
A fairytale book is a hypothetical scenario and we don’t work with a book of fairytales in real life, however in business documents there are some commonly repeated phrases which are better off blocked. Expressions like, ‘the price of this article’, ‘consumer opinion of this product’, etc which occur often in your document could be added to the stop word list. However, this is not an exhaustive list. You could have your own list of stop phrases that occur frequently in your manuscript.
We at Market Quotient, have helped clients with customized requirement ranging from text analysis, news analysis, to sentiment analysis and information processing with the help of Natural Language Processing (NLP), using Machine Learning (ML) algorithm. In short, we have the power to extract and crawl data and carry out automatic analysis using ML and NLP with our world class infrastructure.
Know more about us at www.marketquotient.com