Sentiment Analysis and Opinion Mining in the AI4PublicPolicy Platform

The use of Natural Language Processing (NLP) and opinion-mining methods have been utilized throughout the last years through sentiment analysis to detect, obtain, compute, and examine information by organizations, in order to evaluate and improve their customer feedback (Soussan T. & Trovati M., 2022). There has been a rise in the accessibility of online applications and a surge in social platforms for opinion sharing and online reviews, which have captured the attention of stakeholders such as customers, organizations, and governments to analyse and explore these opinions (Saberi B. & Saad S., 2017).

The Sentiment Analysis component aims to provide the sentiment of citizens’ feedback to the Pilots Municipalities, exploiting data from conversations, declarations or even tweets. To process the extensive content collected from Pilots’ different data sources, the sentiment of each sentence/body is evaluated using complex Machine Learning models as well as robust and load-balanced data processing pipelines. The sentiment analysis models specialize in polarity (positive, negative, neutral) but also in emotions (anger, joy, sadness, etc.).

The input data is in raw text format. The data then is processed using the Natural Language Processing (NLP) pipeline for text classification that has been developed for this task. The task aims to improve the evidence-based policy making, leveraging the huge amount of user-generated content that exists in multiple external sources beyond government websites and social media accounts, to support the formulation of public policies.

The component makes use of three classification methods:

Text Classification

Text classification is a construction problem of models which can classify new documents into predefined classes. It is a sophisticated process involving not only the training of models, but also numerous additional procedures, e.g., data pre-processing, transformation, and dimensionality reduction. Text classification remains a prominent research topic, utilizing various techniques and their combinations in complex systems (Mirończuk M. & Protasiewicz J., 2018). One of the most popular forms of text classification is sentiment analysis, which assigns a label like positive, negative, or neutral to a sequence of text. To exact inference on a text, tokenization and a model method need to be chosen.

Sentiment Classification

Sentiment Analysis is a major task in the Natural Language Processing (NLP) field that classifies the polarity of a given text. It is used to understand the sentiments of the customer/people for products, services or topics and helps companies and organizations to understand the feedback. The work of sentiment analysis has focused on topical categorization, attempting to sort documents according to their subject matter. However, recent years have seen rapid growth in online discussions and reviews, where a crucial characteristic of the posted comments is their sentiment or overall opinion towards the subject matter. Labeling these comments with their sentiment provides succinct summaries to readers. Sentiment classification is also helpful in business intelligence applications and recommender systems (Terveen et al. (1997), Tatemura (2000)), where user input and feedback can be quickly summarized. Indeed, free-form survey responses given in natural language format can be processed using sentiment categorization. The task investigates how applicable sentiment analysis techniques can improve evidence-based policymaking. In addition to information on government websites, sentiment analysis can be performed on the huge amount of user-generated content that exists in multiple external online sources.

Emotion Classification

Emotion classification is similar to sentiment analysis. It classifies the polarity of a given text, but instead of only having three distinct classes – positive, negative and neutral – it breaks down into smaller classes of positive and negative emotions. The emotion classification focuses on primary emotions such as sadness, joy, neutral, anger, disgust and surprise. Emotion classification, or emotion categorization, classifies any given input as ‘neutral’ or as one of several given emotions that best represent the mental state of the subject’s facial expression, words, and so on.