Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Round 1: PMx, IIT Guwahati

In association with
Data Annotation - An Overview

Thirty years ago computer vision systems could barely recognize hand-
written digits. But now AI-powered machines are used to empower self-
driving vehicles, detect malignant tumors in pathology slides, and review
legal contracts. Along with advanced algorithms and powerful compute
resources, fine-grained labeled datasets play a key role in AI’s renaissance.

The burgeoning demand for labeled data has driven the growth of this
industry that employ armies of highly-trained data labelers — whether in-
house or crowdsourced — and develop advanced annotation tools for
professional labeling services.

About Playment

Playment is a fully managed data labeling platform that provides high-quality


training data for computer vision models at scale.

Its mission is to help companies accelerate their AI development by providing


core data related solutions, significantly reducing the time to market. They
enable computer vision and perception teams across the world to effortlessly
build and manage ground truth data with their smart annotation tools and
fully managed labeling services. Today, Playment forms an essential part of a
computer vision engineer's toolkit as a reliable training and validation data
bank.
Partner Companies

Playment partners with AI Self-driving and big computer vision companies


building intensive artificial neural nets, by providing them with quality-labeled
data, that could be used to efficiently train Deep Learning Models. Here
Quality is the key. For getting the most effective results we need labeled data
that is near 100% correct.

How annotation at Playment works?

Data Labelling and tagging:

Consider an image of a street with 5 cars, 2 street light; annotating that image
will be like adding labels with a bounding box that will differentiate 5 cars and
2 street lights i.e. 7 bounding box/annotations( Images later)
Images
Crowd-Sourcing of Data Annotators

Playment works via crowdsourcing. A simple method where any person can
just join their platform, learn about annotations and contribute. Playment
rewards them with monetary benefits, depending on how many tasks they
performed. These data annotators generally belong to tier 2 and tier 3 cities
and are students, house-wives, etc which are in search of an extra source of
income.

The app: https://app.playment.io/#/


(currently inaccessible)

6 annotations supported: https://playment.io/


( also check out the steps involved in a project, industries we support, etc)

Garbage In Garbage Out

For artificial intelligence, this means the quality of the output depends on the
quality of the input. With bad data, applications with AI capabilities, such as
chatbots or personal assistants, will produce results that are inaccurate,
incomplete or incoherent. Having good data is especially important for AI
subsets like machine learning and deep learning, which gain greater
capabilities over time by analyzing large sets of data, learning from them and
ultimately making adjustments that make the applications more intelligent.
The Question of Quality

One of the core challenges of data labeling platforms is maintaining the


quality of annotations, which is one of the things that Playment keeps on
optimizing. For building efficient solutions to real-world problems, Self
Driving Cars, and other intense deep neural training tasks, you can’t
compromise on quality. Maintaining nearly 100% quality in annotations is an
elementary step.

The biggest challenge is providing quality in a crowdsourcing model at scale.

Problem Statement

How would you ensure quality in crowdsourcing annotations at scale?

For Playment, scale refers to different tasks being run simultaneously with
different sets of annotators trying to complete the task. Playment has a
community of around 30K annotators working on these tasks daily.

So, your task for this round is to help Playment with solutions to ensure
quality in crowdsourcing of data-labeling.
More Information

Annotators are paid for the labeling work they have done and in most cases,
Playment pays them for the annotations they have made but this is not a hard
and fast rule and can be changed from use-case to use-case. Let's take an
example to understand this in detail: 20p for one bounding box/
annotation(labeling) so if there are 20 bounding boxes to be made on an
image then he will earn 20x20p = 4 Rs. Given this information in mind, what
we have noticed till now are as follows:

1. It requires a lot of patience to make accurate bounding boxes on an image


(it becomes 3x difficult when one has to make 3d bounding boxes on 3d
point cloud data, check the image attached above for more information). But
users generally don’t pay attention to detail and submit annotations with
inaccurate boundaries or incorrect labels.
2. Playment generally creates easy to understand guidelines for these users
and shares them beforehand with them. This is a required step and should be
done to help users understand not just the labeling guidelines but the
nuances/corner cases as well, but users don’t take it seriously and skip the
guidelines which in return lowers the accuracy of the annotated data. Also,
since Playment works on a crowdsourcing model distributed across the
country, it becomes difficult for everyone to understand guidelines in English.
3. Users sometimes spam the image by making unnecessary annotations in
the image in order to earn more money. This is the worst case one would
want to have in their model.

Important note: Today we maintain continuity across


the platform i.e. we have a predicted supply that helps
us in fulfilling our partner requirements, make sure
your solution will not affect this and should not
increase the cost at which we run operations today.
Hints

Even before building solutions, understanding users is an essential step. From


the information given in this document, try to draw user-personas of a few
typical users.

Then, solutions can be built around these ideas for example:


1. Creating a community of skilled labors( eg: in-house annotators on the
platform)
2. Implementing a classroom or university type experience for annotators like
some sort of teach and test mechanism.
3. Build heuristics inside the system to identify spam (or spammers). Check
how Facebook or Uber does this to get some ideas.
4. Build a checking system to prevent False positives (annotations with
inaccurate boundaries or incorrect labels) and spam. Check out how high-risk
transactional systems and banks do this.

This is just a list of hints and all we wanted to convey is a few


starter ideas. Your final solutions must include newer
areas( actually that will fetch more marks) and must also be
highly elaborate. Copying solutions will lead to disqualification
Submission Details

Your team needs to think of such solutions that can assure the good quality
of labeling. Each solution must be given a heading ( for eg: A community of
skilled labors), followed by an elaborate description and also what problem
will it solve. In the beginning, you must include user-personas/major
problems identified or any other thing you wish to describe and necessarily
describe all deliverables( things that your solutions will cater to). Use
references throughout

Submission format:
A deck( in pdf format) with a maximum of 10 slides

Deadline:
25 Dec, EOD( End of Day)

Sample:
http://bit.ly/pmx_sample_r1

Feel free to reach out at edc@iitg.ac.in for any queries. The problem
statement was made in association with Product Managers at Playment!

THE END

You might also like