Aspect-based Sentiment Analysis — Everything You Wanted to Know!

9 min readFeb 26, 2020

Have you wondered how text-based ‘opinionated’ user-generated content are on the surge? This trend of hot-button topics on anything from politics to climate change can be seen on almost everything. Be it online forums or social networking platforms, or the News networks on television; opinion polls on hot trending topics is the new fad.

All thanks to Sentiment Analysis (also known as opinion mining).

What is Sentiment Analysis — Application, Benefits, and Limitations

Sentiment Analysis is a computational study or technique to distinguish positive and negative opinions from textual data programmatically. Many cutting-edge technologies like Natural Language Processing (NLP), Machine Learning (ML), Text Processing, and Deep Learning (DL) are being used nowadays to automate sentiment analysis. It allows to score within a quantified range — the positive, negative or neutral sentiment from a piece of text, with less human effort.

Sentiment analysis may be applied in multiple areas such as customer feedback, movie or product reviews, and political comments. Large enterprises perform sentiment analysis to analyze public opinion, conduct market research, monitor brand, product reputation, and understand customer experiences.

Various products often provide integration of sentiment analysis APIs/ plugins for customer experience management, social media monitoring, or workforce analysis, in order to deliver useful insights to their customers.

Just Sentiment Analysis is not Enough…

While sentiment analysis can help identify the sentiment behind an opinion or statement, there might be several aspects that have triggered the identified sentiment. And, that is a real challenge for the computer program. For instance, when analyzing reviews, it is easier to comprehend positive reviews than negative ones. Also, it requires determining the intended ‘aspect’ of the review that has generated a negative opinion.

It is possible that the customer in a restaurant/ hotel has given a completely negative review just because of the rude behaviour of one of the staff members but liked the food quality. In such a case, the negative feedback for the staff of the hotel outweighs positive feedback related to food. This is where Aspect Based Sentiment Analysis saves the day.

Decoding Aspect Based Sentiment Analysis

Aspect Based Sentiment Analysis (ABSA) is a technique that takes into consideration the terms related to the aspects and identifies the sentiment associated with each aspect. ABSA model requires aspect categories and its corresponding aspect terms to extract sentiment for each aspect from the text corpus. One can create a domain-specific model for a specific implementation; however, general language models can also be used.

Typical ABSA requires labeled data containing aspect terms and aspect categories for each statement along with its sentiment score. However, it can be solved using the unsupervised approach without having labeled data and a list of aspect terms. For example, what was the overall experience of customers with the hotel staff, food variety, price, taste, and location?

A business needs to identify the aspects of the product/service that attract more customers and/or keep away people to use/ buy the product/ service. ABSA identifies sentiment for each aspect category i.e., hotel staff, food variety, price, taste, and location. It helps business to track how end-users sentiment changes toward specific features and attributes of a service or product

Implementation Details

We at Intellica.ai have implemented ABSA using the unsupervised approach. In this blog, we are showcasing ‘How a Case-study extracts tweets from a given hashtag from Twitter’ using Twitter search APIs and identifies sentiment associated with a given list of aspects. This requires a hashtag and a list of aspects from the user.

We have considered the below inputs:

Hashtag: #oneplus7pro
Aspects: performance, money, camera, screen, speed

Following are the implementation steps:

Data preparation
Extract aspect terms
Sentiment analysis for each aspect

Tools & Framework Used:

spaCy (tokenization, sentence boundary detection, dependency parser, etc.)
NLTK
word2vec pre-trained model
Gensim

Dataset Preparation

We have extracted recent tweets containing the specific hashtags by using developer APIs programmatically. Apart from tweets data, Twitter also provides other details such as timestamp, locations, hashtags, usernames of tagged accounts. Here, we have used a Python wrapper tweepy to fetch tweets containing a given hashtag or from the Account handle. Twitter only provides a maximum of 100 responses in a single request. But tweepy has a feature that allows it to extract tweets of maximum limit using Twitter search API. It has good community support too.

Data Acquisition

It is required to create a developer account to get credentials details for accessing Twitter APIs. After creating an app in the developer account, you will find a total of four credentials i.e., two for Consumer API & two for Access token secret.

In this example, we have extracted tweets with #oneplus7pro.

#import libraries
import pandas as pd
import tweepy
from tweepy import OAuthHandler

#twitter credentials
consumer_key = ## your_consumer_key
consumer_secret = ## your_consumer_secret
access_token = ## your_access_token
access_secret = ## your_access_secret

#Authenticate credentials
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

tweet_created = []
tweet_text = []tweets = tweepy.Cursor(api.search,q='#oneplus7pro',count=100, lang="en",until='2019-09-10')
for tweet in tweets.items():
    tweet_created.append(tweet.created_at)
    tweet_text.append(tweet.text)  

df = pd.DataFrame(columns=['created_on','tweets'])
df['created_on'] = tweet_created
df['tweets']= tweet_text

Data Cleaning

Extracted tweet text may contain hashtags, tagging account, emojis, web-site/HTTP links, and HTML tags that need to be removed in pre-processing steps.

#convert tweets to lower case
data['preprocess_data'] = data['tweets'].str.lower()

#url removes
data['preprocess_data'] = data['preprocess_data'].str.replace(r'(https|http)?:\/(\w|\.|\/|\?|\=|\&|\%)*\b','')
data['preprocess_data'] = data['preprocess_data'].str.replace(r'www\.\S+\.com','')

#removes retweets & cc
data['preprocess_data'] = data['preprocess_data'].str.replace(r'rt|cc', '')

#hashtags removes
data['preprocess_data'] = data['preprocess_data'].str.replace(r'#\S+', '')

#user mention removes
data['preprocess_data'] = data['preprocess_data'].str.replace(r'@\S+', '')

#emoji 
data['preprocess_data'] = data['preprocess_data'].str.replace(r'[^\x00-\x7F]+', '')

#html tags
data['preprocess_data'] = data['preprocess_data'].str.replace(r'<.*?>', '')

#removes extra spaces
data['preprocess_data'] = data['preprocess_data'].str.replace(r' +', ' ')

#punctuation
data['preprocess_data'] = data['preprocess_data'].str.replace('[{}]'.format(string.punctuation), '')

#stop words removes
data['preprocess_data'] = data['preprocess_data'].apply(lambda x: [item for item in x if item not in stop])

#convert preprocessed list words to string 
data['preprocess_str'] = data['preprocess_data'].apply(' '.join)

Aspect-terms Extraction

To extract aspect terms from the text, we have used NOUNS from the text corpus and identified the most similar NOUNS belonging to the given aspect categories, using semantic similarity between a NOUN and aspect category. We have used the POSTaging technique to extract NOUNs from the text.

These terms will be quite relevant to the given input aspect or might be aspect itself. First, we will collect all the NOUNS from pre-processed tweets. The reason behind considering only “NOUN“is that a noun is a word that functions as the name of some specific thing or set of things. So, we will get the relevant words based on the domain from the NOUN.

def pos_tagging(data):
    print("pos tagging")
    req_tag = ['NN']
    extracted_words = []
    i = 0
    try:
        for x in data['preprocess_str']:
            doc = spacy(x)
            for token in doc:
                i += 1
                if token.tag_ in req_tag and token.shape_ != 'x' and token.shape_ != 'xx' and token.shape_ != 'xxx':
                    extracted_words.append(token.lemma_)
        return extracted_words
    except Exception as e:
        return extracted_words
        
extract_words = pos_tagging(data)

We have extracted word2vec features to compute the semantic similarity between each NOUN extracted from the text and a list of given aspect categories. Word2vec is a technique that converts text data into a vector format. Here we have used a pre-trained word2vec model trained on Google News dataset. This returns a vector of 300 length for each of the words.

def word2vec(data):
    terms = listoflist(data)
    try:
        # removal of words which not present in the word2vec model vocabulary. (wrongly spelled)
        filtered_terms = []
        for i in range(len(terms)):
            corrent_words = [token for token in terms[i] if token in model_wiki.wv.vocab]
            if len(corrent_words) > 0 :
                filtered_terms.append(corrent_words[0])

        #converting words into vector
        vector_of_terms = []
        for x in range(len(filtered_terms)):
            vector_of_terms.append(model_wiki.wv[filtered_terms[x]])
        return vector_of_terms,filtered_terms
    except Exception as e:
        return abort(Response(
            json.dumps({'status_code': 400, 'success': False, 'message': 'Something went wrong'}),
            mimetype="application/json"))

We have used cosine similarity to calculate the similarity between extracted feature vectors. Also, we have considered the top 15 NOUNS for each aspect category.

Applying Sentiment Analysis

We have used an unsupervised approach for sentiment analysis whereas we have considered lists of positive and negative words from the NLTK open-source tool.

To apply the sentiment analysis which is based on the lexicon for negative and positive word dictionary, we have used the pre-processed tweets. The following function takes a sentence and returns a score based on lexicon negative, positive words. It also using POS tagging (Part-of-Speech) to identify adverbial modifiers, adjectival modifiers, nouns, and verbs.

# create a list of globally defined positive and negative words to identify sentiment
# sentiment score based on the laxicon neg, pos words
def feature_sentiment(sentence, pos, neg):
    '''
    input: dictionary and sentence
    function: appends dictionary with new features if the feature
              did not exist previously,then updates sentiment to
              each of the new or existing features
    output: updated dictionary
    '''
    sent_dict = dict()
    sentence = spacy(sentence)
    opinion_words = neg + pos
    debug = 0
    for token in sentence:
        # check if the word is an opinion word, then assign sentiment
        if token.text in opinion_words:
            sentiment = 1 if token.text in pos else -1
            # if target is an adverb modifier (i.e. pretty, highly, etc.)
            # but happens to be an opinion word, ignore and pass
            if (token.dep_ == "advmod"):
                continue
            elif (token.dep_ == "amod"):
                sent_dict[token.head.text] = sentiment
            # for opinion words that are adjectives, adverbs, verbs...
            else:
                for child in token.children:
                    # if there's a adj modifier (i.e. very, pretty, etc.) add more weight to sentiment
                    # This could be better updated for modifiers that either positively or negatively emphasize
                    if ((child.dep_ == "amod") or (child.dep_ == "advmod")) and (child.text in opinion_words):
                        sentiment *= 1.5
                    # check for negation words and flip the sign of sentiment
                    if child.dep_ == "neg":
                        sentiment *= -1
                for child in token.children:
                    # if verb, check if there's a direct object
                    if (token.pos_ == "VERB") & (child.dep_ == "dobj"):                        
                        sent_dict[child.text] = sentiment
                        # check for conjugates (a AND b), then add both to dictionary
                        subchildren = []
                        conj = 0
                        for subchild in child.children:
                            if subchild.text == "and":
                                conj=1
                            if (conj == 1) and (subchild.text != "and"):
                                subchildren.append(subchild.text)
                                conj = 0
                        for subchild in subchildren:
                            sent_dict[subchild] = sentiment

                # check for negation
                for child in token.head.children:
                    noun = ""
                    if ((child.dep_ == "amod") or (child.dep_ == "advmod")) and (child.text in opinion_words):
                        sentiment *= 1.5
                    # check for negation words and flip the sign of sentiment
                    if (child.dep_ == "neg"): 
                        sentiment *= -1
                
                # check for nouns
                for child in token.head.children:
                    noun = ""
                    if (child.pos_ == "NOUN") and (child.text not in sent_dict):
                        noun = child.text
                        # Check for compound nouns
                        for subchild in child.children:
                            if subchild.dep_ == "compound":
                                noun = subchild.text + " " + noun
                        sent_dict[noun] = sentiment
                    debug += 1
    return sent_dict
    
# example 
tweet = "food was good but service was disappointing"
print (feature_sentiment(tweet, pos, neg))
## Output: {'food': 1, 'service': -1}

Here, this returns aspect-terms with its sentiment score which we will be aggregate for each aspect-category.

Result Analysis

Here we have performed result analysis by looking into keywords with high occurrences, extracted aspect-terms for each aspect-category and sentiment score for each aspect from the tweet collection.

Below word cloud represents keywords with high frequency.

Extracted aspect-terms with for each aspect-category from the text corpus

The following graph shows identified sentiment scores for each aspect.

This shows that there was a total of 414 tweets having a given hashtag extracted from the Twitter API, from which only 207 tweets contained specific sentiment for any of the given aspects. Amongst all five aspect categories, the Performance of the product is most liked by the users and can be identified from the tweets.

Key Takeaway

Aspect-based sentiment analysis is an essential requirement that calls for a business to listen to customers, understand their feelings, analyze their feedback, and improve customer experiences, besides their expectations for your products/ services. In short, it helps businesses to be customer-centric.

The explained approach in this article works on any unseen domain by using NLP and Machine Learning techniques. However, it can be improved by using supervised data for a specific domain and use Deep Learning techniques for the aspect-based sentiment analysis which will be limited to a single domain and a pre-defined list of aspect-categories.

Want to know more about how your business can benefit from Aspect-based Sentiment Analysis? Talk to our experts now.

Aspect-based Sentiment Analysis — Everything You Wanted to Know!

What is Sentiment Analysis — Application, Benefits, and Limitations

Just Sentiment Analysis is not Enough…

Decoding Aspect Based Sentiment Analysis

Implementation Details

Dataset Preparation

Aspect-terms Extraction

Applying Sentiment Analysis

Result Analysis

Key Takeaway

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Intellica.AI

Responses (11)