Problems with sentiment analysis: spam, sarcasm, and human error

This article is the second in a series on problems with sentiment analysis – describing common pitfalls and difficulties that need to be understood in order to correctly use these tools/models. Enjoy!

The previous article in this series offered a brief overview of sentiment analysis and the kinds of datasets we work with. This overview highlighted some of the problems that occur when your model trains on data significantly different from the data you’ll actually make predictions about – for example, using Yelp restaurant reviews  to predict Amazon pest control product reviews (“Dead bugs all over!” means two very different things).

This article will deal with problems inherent to text itself. I’ll give a few quick definitions before we dive into actual examples. Continue reading “Problems with sentiment analysis: spam, sarcasm, and human error”

Problems with sentiment analysis: Domain

This article is the first in a series on problems with sentiment analysis – describing common pitfalls and difficulties that need to be understood in order to correctly use these tools/models. Enjoy!

Short background to sentiment models

A classical sentiment model learns the sentiment value of given words. For example, “FANTASTIC” is generally positive (so it’d have a high sentiment score) and “WORST” is usually negative (meaning a low sentiment score). The model then combines those words to form an overall sentiment score. A document with lots of negative words should probably have a negative score and the opposite is also true.  Continue reading “Problems with sentiment analysis: Domain”

A primer on Naive Bayes for sentiment analysis

This guide is intended to be a very unsophisticated, very broad overview of the most basic kind of sentiment analysis. You can use this to get results fast, but they’ll be dirty results. I’ll begin by throwing out the broad outline and then address several problems. We’ll begin with the basic steps: (1) Seeding, (2) Training, and (3) Evaluation.

For our purposes, we’re going to assume that all texts have a sentiment somewhere between 0 and 1 where 0 is very negative and 1 is very positive. A neutral text has a sentiment score of 0.5 under this system.

 

Continue reading “A primer on Naive Bayes for sentiment analysis”

What’s the difference between Natural Language Processing and Computational Linguistics?

Nearly any time that two disciplines meet, some new group will emerge. As the group begins to formulate the problems and approaches that they find interesting, they’ll often label themselves.

  • A chemist who is interested in living things might study biochemistry.
  • Economists that study the effects of psychology on decision-making are called behavioral economists.
  • Linguists who use computational techniques to define language models are called…what, exactly?

Continue reading “What’s the difference between Natural Language Processing and Computational Linguistics?”

Sentiment mining when you’ve got no labels

What’s sentiment mining?

Sentiment mining is a way of computing how positive/negative a text is. It’s useful when you’ve got too much text to read by yourself, but want to know the overall feeling of certain texts. For example, a company who is curious about public reception of their newest product can read in all tweets mentioning the product’s name and get a sentiment score back. Continue reading “Sentiment mining when you’ve got no labels”