Professional Documents
Culture Documents
CMP311-R1 - How NextRoll Leverages AWS Batch For Daily Business Operations
CMP311-R1 - How NextRoll Leverages AWS Batch For Daily Business Operations
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing AWS Batch
Financial market/
Gene sequencing risk analysis
Application
image +
configuration
Scheduler
New: Allocation strategies for AWS Batch
Make capacity/throughput/cost tradeoffs
CE1
(On-
Demand)
Allocation strategy:
BEST_FIT_PROGRESSIVE
CE2
(Spot)
Allocation strategy:
SPOT_CAPACITY_OPTIMIZED
Jobs
• D2C
• B2B
• Platform Services
• Rakuten, Springbot
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cloud processing
$4,000,000
Batch and Spot savings
Why AWS Batch?
• Freedom of stack (since it is using Docker)
• Python, C, Rust, GoLang, Haskell, Java, etc.
• Data processing
• Using in-house customized file format and processing when open-source tech
does not scale properly (compared to Apache Hadoop or Apache Spark)
• Ease of deployment
• It’s only about pushing Docker
Teams
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jobs purposes
• 80B+ events per day
• Each event is 1KB in size
• Need to process 80TB of data
• Attribution
• Processing billions of events to attribute 500,000 conversions every day
• Machine learning
• Training models every night
AWS Batch is good for:
• Monitoring
• Checking logs
• Managed queues
• AWS Batch set the desired vCPU, but it scales slowly
• Batchiepatchie sets the Min vCPUs based on the jobs in the queue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Batchiepatchie
• Open source - https://github.com/AdRoll/batchiepatchie
• Monitoring jobs submitted
• Reviewing logs for jobs
• Searching jobs by name and command line
• Estimating cost of jobs submitted
• Instance and job data available in PostgreSQL
Monitoring jobs
Reviewing logs for jobs
Searching jobs by name and command line
Review the holistic view of queues
Cost Analysis
batchiepatchie=> \d+
List of relations
Name | Type | Owner | Size
------------------------------+----------+----------------+------------
activated_job_queues | table | batchiepatchie | 16 kB
compute_environment_event_log | table | batchiepatchie | 381 MB
goose_db_version | table | batchiepatchie | 40 kB
goose_db_version_id_seq | sequence | batchiepatchie | 8192 bytes
instance_event_log | table | batchiepatchie | 28 GB
instances | table | batchiepatchie | 1523 MB
job_status_events | table | batchiepatchie | 24 kB
job_summary_event_log | table | batchiepatchie | 296 MB
jobs | table | batchiepatchie | 25 GB
AWS Batch Roadmap
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Roadmap
And more…
Thank you!
Roozbeh Zabihollahi Steve Kendrex
roozbeh@nextroll.com kendrexs@amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.