Published by FirstAlign
Do you want to create a Machine Learning API but don’t know how to code it? Then you are at right place. In this blog we are going to create an API. This API we will use to perform Sentiment Analysis of Tweets utilizing minimal code, so let’s start.
Sentiment analysis is one of the fundamental tasks of Natural Language Processing (NLP), so let’s first understand – What is NLP?
What is Natural Language Processing?
Natural language processing is combination of linguistics, Computer Science and Artificial Intelligence whose sole purpose is to allow a computer to understand human language. This can be both in the form of text that humans write, or the language human speaks. The most popular examples of NLP are Chabot, Spam detection software, Word Processor Auto Correction, etc.
One such use case of NLP is Sentiment Analysis, let’s understand it.
What is sentiment analysis?
Sentiment Analysis also known as ‘opinion mining’ is an application of Natural Language Processing that derives the opinion from text. For example, you have a text in form of Tweet or comment and you want computer to understand whether this comment says something positive, negative, or if it’s neutral. This is one of most prominent NLP techniques.
Why Sentiment Analysis – What is a typical use case?
You are running an e-commerce store and the people buying from you leave reviews/ comments. You would like to understand which products people are liking or disliking, but you have too many products and comments. There is no way to analyze this manually. Using Sentiment Analysis, your dataset can be reviewed in minutes, showing and collating feedback which is positive, negative and neutral. Automation of this task speeds up the processing of active Business Intelligence allowing the continuous improvement of customer related experiences.
Let’s go through a typical example of how this is done. We will assume for the purpose of this exercise that a relevant dataset is ready and available.
How to do Sentiment Analysis?
There are different ways and tools to perform sentiment analysis, but the basic operation behind all of this is text classification. I.e. we classify text into either a positive or negative class. The most common tools to create a Sentiment Analysis model are python nltk sklearn tensorflow. There are other ways, with many APIs available but we will stick to this method.
How we are performing Sentiment Analysis?
In order to perform Sentiment Analysis we will use Azure machine learning studio. This is a drag and drop tool so a technical coding background is not necessarily needed. We are using a Twitter dataset to train the model. WE will be using the same drag and drop tool to create an API, which can be used to predict sentiment in future.
Before starting let’s be clear on some basic terminologies;
True Positive: True Positive is when model correctly is predict true class.
True Negative: True Negative is when model correctly predict negative class.
False Positive: False Positive when model incorrectly predicts positive class.
False Negative: False Positive when model incorrectly predicts negative class.
ROC Curve: When discrimination threshold of binary classifier is varied the plot that shows diagnostic ability is called Receiver Operating Curve.
Lift Curve: It is the curve that shows the model performance compared to the random guesses.
Precision Recall Curve: This curve plots precision recall for various thresholds
Feature Hashing: According to the Microsoft’s official documentation, feature hashing works by converting unique tokens into integers. It operates on the exact input strings that you provide and does not perform any linguistic analysis or pre-processing.
What is Azure Machine Learning Studio?
According to the Azure documentation, Microsoft Azure Machine Learning Studio (classic) is a collaborative, drag-and-drop tool you can use to build, test, and deploy predictive analytics solutions on your data. Azure Machine Learning Studio (classic) publishes models as web services that can easily be consumed by custom apps or BI tools such as Excel.
The dataset we used has 29,530 rows and contains two columns. The first contains the physical Tweets, the second contains a label of either 0 and 1. 0 means it’s a positive tweet, and 1 means tweet is negative. We will train our model using this dataset. Negative sentiment is defined as a tweet containing hate speech, racist or sexist sentiment, otherwise it is considered positive.
Now we are going to start creating Web Service step by step, let us begin!
Step 1: Go Azure ML Studio and Sign in
Step 2: Upload the database to be used. When you are logged in you will see a dashboard, click on ‘new’ on bottom left and then click on dataset you will see a screen like that shown below.
Now select file and the dataset will be imported.
Step 3: Create New Experiment. Click on ‘new’ on bottom left and then click on ‘experiment’ and ‘blank experiment’ as can be seen in the screen below.
Now we have created a blank experiment now it’s time to start working. We can see on left we have modules, and on right side we have properties for the modules we will be using.
Step 4: Upload dataset module. To retrieve the dataset click on ‘saved dataset’ then ‘my dataset’ and drag and drop the dataset you uploaded into the experiment window. Stop-word as well which will be used in next steps.
We used custom python and R script to complete the formation of the models where functionality in the system was not available.
Step 5: Add custom functionality. To add custom functionality we are going to execute a python script module. Add the script shown below to the python script property on right.
Now this script requires stopwords. We get this from the datasets module and connect them as shown in figure below.
Custom python script: An example for the python script used to perform pre-processing of the data is shown below. If you want to know more about pre-processing you can check out one of my previous blogs, Classification of hate speech and offensive language using machine learning.
Step 6: Add feature hashing. In order to add feature hashing we will use the feature hashing modules. Drag and drop and on right side in the properties section, set n-grams to 2 and set Hashing bitsize to 10. Select the tweet column using the launch column selector.
Now click on ‘run’ then right click on the feature hashing’s output. This is the circle in the button, and click visualize. You will see something like the figure below.
Now we have a feature set, but this data frame also contains the tweets in text which we don’t want. So now we will use ‘select column’ from dataset module and select all modules except the tweets column. We will exclude the tweets column by launching the column selector and select the relevant columns from dataset module.
Step 7: Split data into in train and test sets. Now we have feature set and labels, we want to split the data into a training set for training, and test set for evaluating the results. We do this with the Split Data module, by allocating a fraction of the rows in the first output dataset to 0.8 which means there will be two outputs at the end of this module. We will get 80% of rows for one and 20% for other. This can be seen below.
Step 8: Train the Model. Now we have training set split, use the train model module, with “two class logistic regression” module for training. In train model select the column which is labeled here is sentiment and connects split data. Train model with “two class logistic regression” as shown below.
Step 9: Evaluate model. Now we have trained the model now it’s time evaluatehow good our model testing has done. With the help of the test set from split data module to evaluate, we have used two modules. One is the score model and second is the evaluate model. These models are connected as shown below.
Step 10: Run the model. Click on the ‘Run’ button in the bottom. You will see green tick on all modules that have run correctly. As shown in figure below.
Now right click on ‘output knob’ of evaluate model and click ‘visualize’. You will be able to see the models performance.
Our model has accuracy of 93.5%, precision of 76.9%, recall of 18.8% and F1 score of 30.3%. Our model produced 90 true positives and 388 false negatives, 27 false positives and 5887 true negatives. We have also plotted the ROC curve, Lift curve and Precision Recall Curve as shown below.
Step 11: Creating Web service. Hover your mouse on Setup Web service and click on Predictive Web Service. After completion you will see as shown in figure below.
Step 12: Deploy web service. Click on ‘Run’. When Run is complete, click on ‘deploy web service’ and you will be taken to the page as shown below.
Step 13: Test the Web Service. Click on ‘Test button’. You will be shown the input field. Provide the relevant text and the results will be shown.
In this blog we created a web service, tested it and achieved good accuracy. The test we have carried out is using Azure ML Studio; we could equally have called this web service from outside. We will create clients for this in our next blog including a python client and apply some MLOPs, so stay tuned.
Until then happy coding ❤