Professional Documents
Culture Documents
Build Accurate Training Datasets With Amazon SageMaker Ground Truth AIM308
Build Accurate Training Datasets With Amazon SageMaker Ground Truth AIM308
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data labeling is tedious and difficult
Custom
Dataset Model
Evaluation Deployment
collection training
Machine learning – bird’s-eye view
Dataset Model
Evaluation Deployment
collection training
“Data is the new oil”
{
"label": "Hand",
"score": 0.99794453
}
👍
Your application’s data is different
{
"label": "Hand",
"score": 0.9703855
}
👎
New problem
We have relevant images we can learn from, but no labels
Solution
Amazon SageMaker Ground Truth
What we liked:
• In AWS
• We’re already using some AWS solutions in our training workflows (Amazon SageMaker
training, Amazon S3), so it’s easy to point to data
• Speed
• We can get images labeled on-demand quickly
• Flexibility to leverage public or private workforces to label data
• Ability to kick off labeling jobs programmatically or via UI
Integrating with Amazon SageMaker Ground Truth
InputConfig: &inputConfig,
LabelAttributeName: &attrName,
Pre-processing LabelCategoryConfigS3Uri: &job.CategoryUri,
LabelingJobName: &fullJobName,
OutputConfig: &outputConfig,
Submit Ground RoleArn: roleArn,
Truth job
}
client.CreateLabelingJob(&smInput)
Extract labels
Create TF
Pre-processing
record
Extract
Extract labels
model file
BigQuery
👍
Tips for getting high-quality labels
Provide clear instructions
• Show images of good and bad examples
• Bootstrap this by running a small test set and gathering common
mistakes
☐ Yes
☑ No
Learn more @ https://amzn.to/33IbiyL Auto-segment uses Deep Extreme Cut (DEXTR) algorithm
Use multiplicity to improve accuracy
https://amzn.to/2N9PrsD
Measure accuracy and throughput of labelers
Raw worker responses emitted to S3
{
"answers":
[{"answerContent":
{
"crowd-classifier":{"label":"Athlete"}}, Response from worker 1
"submissionTime":"2019-10-16T03:25:56.656Z",
"workerId":"private.us-west-2.2fa5a9d73ef73ba0",
"workerMetadata":
{ "identityData":
{
"identityProviderType":"Cognito",
"issuer":"https://cognito-idp.us-west-2.amazonaws.com/us-west-2_K2Rl3SHuq",
"sub":"c9a8f4a4-ed4a-4dad-a722-8532d0d6016e“
}
}
},
{"answerContent":
{
"crowd-classifier":{"label":"Animal"}}, Response from worker 2
"submissionTime":"2019-10-16T03:27:31.048Z",
"workerId":"private.us-west-2.7dcbcca1ce3117d8",
"workerMetadata":
{ "identityData":
{
"identityProviderType":"Cognito",
"issuer":"https://cognito-idp.us-west-2.amazonaws.com/us-west-2_K2Rl3SHuq",
"sub":"7eb0d3bc-2da5-4244-b14f-d9ec6ffe2e17“
}
}
}]
}
Measure accuracy and throughput of labelers
70+ free digital ML courses from AWS experts let you learn from
real-world challenges tackled at AWS
Visit https://aws.training/machinelearning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.