Problems with sentiment analysis: spam, sarcasm, and human error

This article is the second in a series on problems with sentiment analysis – describing common pitfalls and difficulties that need to be understood in order to correctly use these tools/models. Enjoy!

The previous article in this series offered a brief overview of sentiment analysis and the kinds of datasets we work with. This overview highlighted some of the problems that occur when your model trains on data significantly different from the data you’ll actually make predictions about – for example, using Yelp restaurant reviews  to predict Amazon pest control product reviews (“Dead bugs all over!” means two very different things).

This article will deal with problems inherent to text itself. I’ll give a few quick definitions before we dive into actual examples. Continue reading “Problems with sentiment analysis: spam, sarcasm, and human error”

Problems with sentiment analysis: Domain

This article is the first in a series on problems with sentiment analysis – describing common pitfalls and difficulties that need to be understood in order to correctly use these tools/models. Enjoy!

Short background to sentiment models

A classical sentiment model learns the sentiment value of given words. For example, “FANTASTIC” is generally positive (so it’d have a high sentiment score) and “WORST” is usually negative (meaning a low sentiment score). The model then combines those words to form an overall sentiment score. A document with lots of negative words should probably have a negative score and the opposite is also true.  Continue reading “Problems with sentiment analysis: Domain”

Should academics or professionals teach programmers?

One popular topic in Computer Science (among other disciplines) is whether professors are out-of-touch with “real world” programming. Those who argue for less academics in CS education claim that university degrees prepare students poorly for industrial roles. Plagued by an environment in which theory and research are preferred over practical skills for daily software development, these students enter the workforce ill-equipped to write professional software. Continue reading “Should academics or professionals teach programmers?”

Free markets: the economic and technical arguments for strong net neutrality

The modern anti-competition potential of the Internet Service Provider is nuclear. Far from merely an industry-siloed cartel, they control both the products and the means of discovery.

Today the FCC voted to overturn Obama-era policy that prevented Internet Service Providers from blocking or slowing access to certain web content. Chairman Ajit Pai, who spearheaded the successful campaign, argued that the repealed rules stifled competition and represented government interference in the otherwise free market. “The internet wasn’t broken in 2015. We weren’t living in a digital dystopia. To the contrary, the internet is perhaps the one thing in American society we can all agree has been a stunning success.” Continue reading “Free markets: the economic and technical arguments for strong net neutrality”

What is automation?


In August 2015, The Economist published an article entitled “Automation angst” in which they explored the dichotomy of feelings about automation – one side representing the thrill of cheaper production and the other warning of an impending existential crisis. When repetitive human labor is replaced, do the laborers feel better off?

Continue reading “What is automation?”

Nash Equilibrium and Graph Theory

Every once in awhile, there are really big ideas in academia, ideas that change the way we think about the world. Nash equilibrium is one of those ideas.


John Nash wrote about games where people make decisions based on the way they think other people will behave, eventually reaching an equilibrium where no individual can improve their own situation by changing. This equilibrium, however, does not mean that the entire group has achieved an optimal result.

Continue reading “Nash Equilibrium and Graph Theory”

2016: The Year the NBA Played Like Steph Curry

Despite Stephen Curry’s recent downturn that cost his team the NBA finals, he remains the most prolific 3pt shooter of our day. With 482 three-pointers made in the most recent season, he eclipses both second (teammate Klay Thompson at 374) and third place (Damian Lillard at 271). This makes the Golden State Warriors a pain in the neck to defend – they score from further out than any other team in the league. So how does everybody else stay competitive? They play like Steph Curry.

Continue reading “2016: The Year the NBA Played Like Steph Curry”

A primer on Naive Bayes for sentiment analysis

This guide is intended to be a very unsophisticated, very broad overview of the most basic kind of sentiment analysis. You can use this to get results fast, but they’ll be dirty results. I’ll begin by throwing out the broad outline and then address several problems. We’ll begin with the basic steps: (1) Seeding, (2) Training, and (3) Evaluation.

For our purposes, we’re going to assume that all texts have a sentiment somewhere between 0 and 1 where 0 is very negative and 1 is very positive. A neutral text has a sentiment score of 0.5 under this system.


Continue reading “A primer on Naive Bayes for sentiment analysis”

Does Lebron James play worse when he’s tired?

The problem

LeBron James is largely considered one of the best NBA athletes of the past decade, consistently scoring about 0.8 points per minute in play (for comparison, Kobe Bryant scored 0.62 and Blake Griffin scored 0.63 in their last 55 games). He’s also one of the best-paid athletes, contracted for 24 million USD in the 2016-2017 season. Last year, Quora user Shane Hiller calculated that LeBron makes about $107 per second of gameplay. That’s a lot of money, and a good coach should try and maximize LeBron’s performance. Continue reading “Does Lebron James play worse when he’s tired?”