VADER, IBM Watson or TextBlob: Which is Better for Unsupervised Sentiment Analysis?
Sentiment analysis or opinion mining is an advanced technique to gain insights about emotions/sentiments of the person by evaluating a series of words. The series of words can be anything, from a social media comment to a review or something else altogether. It can be related to a product, service, movie or even work environment at the workplace.
Sentiment analysis can be used to improve customer service, promote employee engagement, and other specific purposes. Interestingly, it has been observed that social media sentiments also have the power to impact stock prices.
Understanding Sentiment Analysis
Sentiment analysis can be performed by implementing one of the two different approaches using machine learning — unsupervised or supervised. As it is known sentiments can be either positive or negative. Machine learning algorithms can be used to evaluate if a series of words reflect a positive or negative sentiment.
Coming to unsupervised learning, it involves using a rule-based approach to analyze a comment. The supervised approach is a classification model that involves using traditional machine learning or deep learning methods.
Using Pre-built Libraries for Sentiment Analysis
To help you get started, we will focus on unsupervised approach in this blog and show you how to begin with pre-built libraries to conduct sentiment analysis. Here, we are discussing a few libraries that follow lexical-based approach such as TextBlob, VADER & IBM Watson. While TextBlob & NLTK-VADER are open-source, IBM Watson is a paid library but allows you to access the API on trial basis for a few thousand times.
IBM Watson
IBM Watson is an advanced AI solution that is powered by latest innovations in the world of machine learning. IBM Watson provides API for natural language understanding and performing sentiment analysis. Here are the steps to call IBM Watson API from your local system:
Step 1
Visit https://www.ibm.com/watson/services/natural-language-understanding/ and choose ‘Get Started Free’ for trying the pre-built library.
Step 2
Acquire the unique API key and URL provided by IBM on successful registration. This API Key will be used to call IBM Watson API remotely in the next step.
Step 3
Install Python ‘watson_developer_cloud’ Package using pip with the following command:
$ pip install watson_developer_cloud
Import the NLP module for conducting sentiment analysis and extracting opinion.
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 \ import Features, EntitiesOptions, KeywordsOptions, SentimentOptions, CategoriesOptionsnatural_language_understanding = NaturalLanguageUnderstandingV1(
version='2018-11-16',
iam_apikey='{API key}’, # Use your API key here
url='{url}’ # paste the url here
)
def Sentiment_score(input_text):
# Input text can be sentence, paragraph or document
response = natural_language_understanding.analyze (
text = input_text,
features = Features(sentiment=SentimentOptions())).get_result()
# From the response extract score which is between -1 to 1
res = response.get('sentiment').get('document').get('score')
return res
To gauge the results, you will have to keep looking at the score that is returned as a result. If the score is greater than 0, then the sentiment is considered to be positive, else negative.
TextBlob
TextBlob is a Python-based open source library that can be used to perform sentiment analysis effectively. Use the following steps to gather sentiment score for available data:
Step 1
Install the Python ‘textblob’ package using pip
$ pip install textblob
Step 2
Import TextBlob to get sentiment score using the following code snippet:
from textblob import TextBlob# Get the polarity score using below function
def get_textBlob_score(sent):
# This polarity score is between -1 to 1
polarity = TextBlob(sent).sentiment.polarity
return polarity
TextBlob’s sentiment function returns tuple, polarity, and subjectivity and the polarity score floats within the range of -1.0 & 1.0 where anything greater than 0 is positive and below 0 is negative.
NLTK-VADER
VADER or Valence Aware Dictionary and Sentiment Reasoner is a rule/lexicon-based, open-source sentiment analyzer pre-built library, protected under the MIT license. To use the library, follow the below mentioned steps:
Step 1
First, install the NLTK package using pip.
$ pip install nltk
Import the package and download NLTK wrapper for VADER library using the following code snippet:
import nltk
nltk.download(‘vader_lexicon’)
Step 2
Import the downloaded NLTK-based VADER library and acquire the sentiment score for available input or dataset using the following code:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()def get_vader_score(sent):
# Polarity score returns dictionary
ss = sid.polarity_scores(sent)
for k in sorted(ss):
print('{0}: {1}, '.format(k, ss[k]), end='')
print()
VADER’s sentiment analyzer class will return the polarity score in dictionary format which will help in evaluating the probability of a positive, negative or neutral sentiment. If there is more probability of positive, then predictive label can be given to positive sentiment, else negative.
Comparing Three Sentiment Analysis Libraries Using Real-life Movie Review Data
To gauge the efficacy of these libraries, we performed sentiment analysis on movie review data that was available open-source. The dataset, labeled in positive or negative consisted of 1600 samples with equal distribution of both the sentiment labels.
Running the same data through all three packages, we tabulated the result in number of correct positive and negative predictions. In total, the data set had 800 negative and 800 positive reviews. Here is how all three of the libraries fared in their analysis:
Here are a few sample reviews, their actual sentiment and what sentiment all three libraries predicted:
We saw how to use different pre-built libraries for sentiment analysis using an unsupervised approach. We tried describing all the three packages in discussion i.e. IBM Watson Sentiment Analyzer, TextBlob & NLTK VADER-based sentiment analyzer. Also, we tried to explain how to use these successfully in Python.
While unsupervised approach is built on specific rules, ideal for generic use, supervised approach is an evolutionary step that is better to analyze large amount of labeled data for a specific domain. Though unsupervised approach is good for the start, using the opensource unsupervised approach won’t produce consistent results in domain-specific requirements. In our analysis, we observed while some libraries work better at detecting positive sentiments, others work better with negative data sets.
In the following posts, we will focus on implementing supervised approach using RNN & classification method for sentiment analysis.