Professional Documents
Culture Documents
5df9ce48c3e1a Round1 PMX
5df9ce48c3e1a Round1 PMX
In association with
Data Annotation - An Overview
Thirty years ago computer vision systems could barely recognize hand-
written digits. But now AI-powered machines are used to empower self-
driving vehicles, detect malignant tumors in pathology slides, and review
legal contracts. Along with advanced algorithms and powerful compute
resources, fine-grained labeled datasets play a key role in AI’s renaissance.
The burgeoning demand for labeled data has driven the growth of this
industry that employ armies of highly-trained data labelers — whether in-
house or crowdsourced — and develop advanced annotation tools for
professional labeling services.
About Playment
Consider an image of a street with 5 cars, 2 street light; annotating that image
will be like adding labels with a bounding box that will differentiate 5 cars and
2 street lights i.e. 7 bounding box/annotations( Images later)
Images
Crowd-Sourcing of Data Annotators
Playment works via crowdsourcing. A simple method where any person can
just join their platform, learn about annotations and contribute. Playment
rewards them with monetary benefits, depending on how many tasks they
performed. These data annotators generally belong to tier 2 and tier 3 cities
and are students, house-wives, etc which are in search of an extra source of
income.
For artificial intelligence, this means the quality of the output depends on the
quality of the input. With bad data, applications with AI capabilities, such as
chatbots or personal assistants, will produce results that are inaccurate,
incomplete or incoherent. Having good data is especially important for AI
subsets like machine learning and deep learning, which gain greater
capabilities over time by analyzing large sets of data, learning from them and
ultimately making adjustments that make the applications more intelligent.
The Question of Quality
Problem Statement
For Playment, scale refers to different tasks being run simultaneously with
different sets of annotators trying to complete the task. Playment has a
community of around 30K annotators working on these tasks daily.
So, your task for this round is to help Playment with solutions to ensure
quality in crowdsourcing of data-labeling.
More Information
Annotators are paid for the labeling work they have done and in most cases,
Playment pays them for the annotations they have made but this is not a hard
and fast rule and can be changed from use-case to use-case. Let's take an
example to understand this in detail: 20p for one bounding box/
annotation(labeling) so if there are 20 bounding boxes to be made on an
image then he will earn 20x20p = 4 Rs. Given this information in mind, what
we have noticed till now are as follows:
Your team needs to think of such solutions that can assure the good quality
of labeling. Each solution must be given a heading ( for eg: A community of
skilled labors), followed by an elaborate description and also what problem
will it solve. In the beginning, you must include user-personas/major
problems identified or any other thing you wish to describe and necessarily
describe all deliverables( things that your solutions will cater to). Use
references throughout
Submission format:
A deck( in pdf format) with a maximum of 10 slides
Deadline:
25 Dec, EOD( End of Day)
Sample:
http://bit.ly/pmx_sample_r1
Feel free to reach out at edc@iitg.ac.in for any queries. The problem
statement was made in association with Product Managers at Playment!
THE END