Published by FirstAlign
This is the era of artificial intelligence and machine learning, the applications we use in our daily life have gone from purely mobile to highly intelligence, but to gain this intelligence we are require a lot of data. This data is used to train machine learning models.
However, to create a dataset, data annotation is used. In this article, we will review data annotation, what it is? why it is used? and experience some hands on uses cases of the data annotation tools in NLP.
What is Data Annotation?
Data Annotation is the procedure of labeling data, which is available in various formats such as voice, images, videos, and text for example. The labeling for each format occurs in its unique way. For our example we will look at annotation for textual data for various NLP operations such as Text Classification, Named Entity recognition, or POS tagging.
For each operation the data is labeled differently. For example, three different way of tagging;
If we are tagging data for classification, we will take a complete text block and create a label for that entire block.
If we are labeling a dataset for POS  tagging we will instead tag individual words, creating a label for its respective parts.
For named entity recognition we will tag words according to their named entity.
Why Data Annotation is used?
Labeled data is a requirement of the supervised Machine Learning process, and to simplify classification tasks. The annotation and labeling creates a reference library that will improve the accuracy of the processes outcome.
An Example of Data Annotation
So let us perform a text annotation exercise. For performing the annotation, we are going to use an open-source project doccano.
Doccano provides annotation features for text classification, sequence labeling, and sequence to sequence tasks. With this labeled data can be created for sentiment analysis, named entity recognition, text summarization, and so on. Just create a project, upload data, and start annotating. You can build a dataset in hours.
A live demo example can be seen here (https://doccano.herokuapp.com/demo/text-classification/).
We will go through this step by step below; this will demonstrate how we can annotate text with a ‘positive’ or ‘negative’ label to allow the performance of sentiment analysis.
Doccano interface for sentiment annotation
In the picture above we can see the available text, there are two buttons to label this tag positive or negative. The screenshot shows the sample data provided by Doccano.
Interface for Labeling Named Entity Recognition
In this interface, we can select the text and click on entities such as a person, location, organization, event, date, and others to assign or label it by that entity.
Doccano uses a very simple interface for making the labeling easy. You can use Doccano locally with three dependencies;
- Docker Compose
After this installing, these dependencies simply follow the installation guidelines given in its GitHub repository.
You can run Doccano in one of two modes development mode and production.
Why use the Doccano?
The reason for using Doccano for the annotation of textual data is:
- It is easy to use
- It is open source
- It provides collaborative annotation
- It has multi-language support
- It has mobile support
In this article we have discussed data annotation, the different type and why there is need. We used an opensource tool (Doccno) to show data annotation example in real-time, including its features and reasons for use.
I hope you have enjoyed the article, until next one stay tuned and happy coding ❤
Published by FirstAlign
Continuous AI through Build, Operate & Transfer Deployment. Artificial intelligence (AI) is rapidly moving across all markets. There is a fundamental shift in how we do business. Freeing human ingenuity to productively add value, and focus on strategy.
Understand probability, by enhancing predictability!
The outlook for the next decade will see an exponential increase in the use of Artificial Intelligence (AI) throughout business operating models. AI is rapidly moving across all markets; this is a fundamental shift in how you do business. This is not an IF, but WHEN this revolution will impact your business.”, explains Michael Biber, AI Strategist.
With a long-term AI outlook for business metrics and decision-based understanding you have no need to be left behind.
- Do you want to understand AI and how it augments your business to stay in front of your competitors?
- Do you want to understand your market, your customers, your positioning and approaches?
- Would you like a focused strategy that achieves a return on investment and cost savings?
Can you afford to play around and waste a year?
Your competition didn’t and they are now one year ahead. You cannot throw resources at it to catch up.
This means business operations will fundamentally rely on utilizing AI and Machine Learning technologies in the future. You will focus on cross business deployment and a decision-based application that will free your people to focus on what is important.
You are invited to join with us to deliver Continuous AI through BOT deployment. The Build Operate and Transfer Model (BOT) that will integrate AI seamlessly into your business.
Download for free
AI Model: Build, Operate & Transfer
Artificial intelligence (AI) is rapidly moving across all markets. There is a fundamental shift in how we do business. Freeing human ingenuity to productively add value, and focus on strategy.
Do you want to learn more about this paper?
Continuous AI, maximizing human potential.
Making a difference by improving the way the world works. Understand probability, by enhancing predictability and strategic focus.
How FirstAlign can help!
- Discover and deconstruct need
- Design solution and program
- Deliver and embed