Enterprise Data Science & Analytics: Taking You To New Heights Powered by Topcoder

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

ENTERPRISE DATA

SCIENCE & ANALYTICS


TAKING YOU TO NEW HEIGHTS POWERED BY TOPCODER
LETTER FROM THE
GLOBAL DIRECTOR
Every company wants to innovate, move faster, reduce risk,
protect markets, boost safety and evolve their organizations
to get incredible results. For most, the challenge is staffing
the right people, powering up with emerging technologies,
and adding change without disrupting operations.

When it comes time to find the right people and implement


the latest technologies, most companies go about it basically
the same way they have for decades, weighing the costs and
“IF ONE BRAIN IS
benefits of hiring, contracting, outsourcing. At a time when
home lives are increasingly defined by efficient on-demand GOOD, AND TWO
services like Amazon Prime, UberEats, and Lyft, have you ever BRAINS ARE BETTER,
wondered why the process of bringing new digital capabilities THEN HOW GOOD IS
and talent into organizations is so often accepted as-is? Why
1.3 MILLION BRAINS?”
do we accept practices in our work lives that aren’t equally
convenient and effective? Why are digital opportunities Kendrick Burson, T-Mobile
so often transformed to hiring problems? Why can’t we
get the digital solutions we need, when we need them?

Our customers know that they can with Topcoder. Our cus-
tomers understand the need for next-generation solutions and
experience without sacrificing quality or culture. That’s why
they’ve used Topcoder to get groundbreaking design, data sci-
ence and development work done with crowdsourcing, a delivery
model we’ve pioneered and honed over nearly twenty years.

Crowdsourcing allows companies to experiment, scale, and


get results from a global talent pool of over 1.4 million devel-
opers, designers, data scientists, and testers — all available
24/7, on demand. With the skill, speed, scale, flexibility, and
outcome-based pricing that Topcoder provides, customers
have been able to innovate, grow, and accomplish more.
Together, we can think bigger, aim higher, and create winning
solutions. The future is here, and we’re thrilled to be with you.

Andy LaMora
Global Director, Crowd Analytics & AI
AN INTRODUCTION TO TOPCODER
AND CROWDSOURCING
ON-DEMAND GLOBAL TALENT AND DIGITAL SOLUTIONS AT SCALE

As technology changes, so do expectations. But the time


and technical expertise to exceed evolving customer
expectations and get to market faster can be difficult
to find. Topcoder, the leading crowdsourcing platform,
makes it easy to quickly turn ideas and requirements
into incredible digital solutions — with the help of the
world’s largest talent network. Not only do you pay only
for results, rather than hours, but you also get a secure
project management process, a dedicated project
manager, and a team of experts available 24/7.

WHY CREATE SOMETHING GREAT WITH TOPCODER?

SEE HIGH-QUALITY RESULTS — FAST


With the ability to run concurrent workstreams
on the same platform, get design, development,
data science, and QA work done in record time.

GET A TEAM FOR THE PRICE OF A FREELANCER


Don’t tie your project’s success to one
freelancer’s time and expertise. Get a team
of innovative designers and developers, as
well as end-to-end project management.

PAY FOR OUTCOMES, NOT HOURS


No matter how many hours or
iterations it takes to achieve success,
you pay only for the result.
DATA SCIENCE &
ANALYTICS
CASE STUDIES
COMPUTER VISION - RADIOLOGY

HARVARD
TUMOR HUNT The first stage of this marathon challenge involved
producing an automatic delineation algorithm
that was as accurate as experts in the field.

TOPCODER COMMUNITY DEVELOPED AI TO TREAT During this stage, 31 competitors from various countries
(e.g., United States, Bulgaria, Poland, Brazil, etc.) participated
LUNG TUMOR PATIENTS.
in creating a real, actionable algorithmic solution that can
The second leading cause of death in the United States is be applied on a grand scale. In the second stage, 11 com-
cancer. Among those cancers, lung cancer claims over 150,000 petitors worked to target the credibility of that algorithm
people’s lives every single year. Topcoder joined forces with by pairing it with expert feedback to train the algorithm to
Harvard to tackle one of the most ambitious healthcare avoid mistakes that experts wouldn’t make. A final, private
initiatives ever undertaken in the crowdsourcing world—cre- challenge tasked five privately selected Topcoder members
ating and testing automatic delineation algorithms to help to incorporate final feedback from physicians and experts.
improve treatments of cancerous tumors in patients’ lungs.
In the end, the Harvard Tumor Hunt brought 31 of Topcod-
Manual tumor delineation—the measure of treatment field er’s best and brightest data scientists together to create
borders of tumors—is a time-consuming and complicated an actionable solution that was successful at both rapidly
process. Among other pain points, it introduces individual bias and automatically delineating cancerous tumors with the
as well as person-to-person inconsistencies. The Topcoder same level of skill as experts. By accurately detecting the
Harvard Tumor Hunt aimed to produce an automatic tumor size and scope of cancerous tumors, radiologists can
delineation algorithm that met the accuracy of the average maximize the impact of treatment on cancerous cells and
radiology expert but exceeded them in both processing speed minimize its impact on non-cancerous tissues—as well
and consistency. These three challenges produced an algo- as accurately determining the correct treatments options
rithm that provides real, substantial tumor delineation results for in-need patients—potentially saving lives and breeding
without individual bias and expert-to-expert inconsistencies. superior practices into cancer treatment and identification.
A 10-day marathon challenge was held in which partic-
ALGORITHM OPTIMIZATION - PHARMACEUTICALS
ipants were asked to optimize Pfizer’s existing GWAS

PFIZER GWAS SPEED-UP algorithm for speed and accuracy. With a total of 56 par-
ticipants submitting 292 solutions over the course of the
challenge, the results were astounding: the optimization
delivered through the competition improved the CPU of
Pfizer’s GWAS algorithm by 1200x and reduced process-
STUDYING THE HUMAN GENOME IS THE WORLD’S
ing times from approximately 5 hours to 28 seconds.
LARGEST PROJECT, BUT THE WORK CAN BE
SLOW—EVEN WITH COMPUTERS. This extreme value solution provided a shift in how fast Pfizer
In the fight against deadly diseases, fast and accurate could perform their crucial GWAS research. This extreme value
algorithmic solutions are helping pharmaceutical companies solution enabled them to expand research by completing more
get new medications to market sooner. One powerful weapon experiments faster, putting them one giant step closer to solv-
in this fight is genome-wide association studies (GWAS), ing some of the world's most pressing healthcare challenges.
which analyze large sets of genetic markers across large
cohorts of individuals to locate genetic variants contributing
to the heritability of phenotypes (i.e., traits) of interest. GWAS
analysis is computationally challenging because of the scale
of the data involved and the modeling algorithms required.

Pfizer’s existing GWAS solution had proven to be accurate but


it hampered researchers due to the long ‘run time’ on each
experiment they were seeking to administer. They wanted to
speed up the logistic regression modeling — the most com-
putationally demanding component of many GWAS analyses
— that determines which markers explain specific phenotypes.
OPTICAL CHARACTER RECOGNITION - MIXED FORMATS

MUD LOG DIGITIZATION


CONVERT COMPLEX PAPER RECORDS TO
ACTIONABLE DATA.
Converting paper records into digital format can be
challenging, particularly when the records have various
types of data, each with their own important meaning.

Mud logging, with its varied formats, diversity of graphed


data, as well as hand-written annotations, presents a
significant challenge for digitization efforts. However, a
THE OUTCOMES recent Topcoder customer approached us to assist them
with exactly this challenge because they recognized that the

50+
valuable data contained in mud logs, if digitized, can unlock
new analytical capabilities, speed-up decision-making,
and increase confidence in geological interpretations.
YEARS SAVED IN MUD LOG
PROCESSING The mixed-format of mud log data does not necessarily
lend itself well to fully unassisted, automated approach-
es. To re-enter all data manually would be cost and
time prohibitive (to say nothing of error prong, or other
issues) so another hybrid approach was needed.

Through a series of competitions in the Topcoder community,


a solution was built that combined human-assisted work,
automated OCR, and machine learning algorithms to extract
structured meaning from mud log records. This included
digitized logging of shows, stains, traces, and negatives at re-
spective drilling depths. By turning to Topcoder, this customer
has saved 50+ years in mud log processing time and added
millions in potential revenue through the value generated.
MACHINE LEARNING - DATA LABELING

FAULT IDENTIFICATION
USE THE CROWD TO ENRICH YOUR DATA
AND DEVELOP AI ON DIFFICULT TASKS.
The identification of faults in oil and gas fields is crit-
ical to identifying promising locations and directions
for drilling as well as avoiding unnecessary economic
losses and environmental risks during operations. A
customer approached Topcoder with an interest in both
automating and enhancing the time-consuming manual
work of labeling 3D seismic volumes as well as teasing
out the relationships between intersecting faults.

To make this customer’s fault identification desires a


reality, Topcoder first turned to its community to build a
visualization tool capable of taking 3D array inputs and
displaying the probabilities of various labelings and in-
terrelationships. After this a larger competition was held,
using the customer’s sample data along with the newly
created visualizer to apply labels to each discrete crack
found in the dataset and tease out their relationships.

With 150 submissions representing a variety of problem


solving approaches, a winner was found by comparing
competitors’ work against existing ground truth labels. The
top solution was adept at identifying multiple classes of faults
and their relationships to one another at a speed and degree
of accuracy that provides a significant benefit to our customer
in the form of agility, decision-making, and profitability.
PREDICTIVE ANALYTICS - BANKING AND FINANCIAL SERVICES

PREDICTOR ON TRADED
SECURITIES
As part of the Dodd-Frank Wall Street Reform and Consumer
Protection Act, all registered swap dealers active in credit and
interest rate trading have been required to to send trade data
to public swap data repositories (SDRs) within 5 to 15 minutes
of swap executions as a means of facilitating centralized data
collection and reporting. This aggregation of trade data, in
addition to increasing transparency in the swap market, has
created new opportunities for analysis and price prediction.

Credit Suisse enlisted Topcoder’s predictive analytics


capabilities to create models and algorithms that make
use of volumetric trading data (of vanilla US$/Libor spot Topcoder structured a 12-month Data Science and Analytics
start swap transactions of full-year maturities) to predict program to explore this big data problem and develop potential
the prices of those same instruments over relatively short solutions. Each challenge completed during the course of the
time intervals. With a diversity of factors influencing swap year lasted just 12 days, which enabled agile decision making
prices over time—from basic supply and demand to the and rapid iteration. Trading data from a 6-month period in
maturity of the swap (tenor), and more—gaining even an early 2016 was used to develop and test potential solutions,
incremental edge in predictive abilities has enormous and Topcoder ultimately delivered four winning solutions
potential to boost Credit Suisse’s competitive advantage. with error rates of only 4.3% over short periods of time.

And because Data Science and Analytics programs are


both executional and consultative in nature, Topcoder also
provided recommendations on specific features from the
winning solutions that Credit Suisse could combine in
production to fine-tune their predictive analytics capabilities.
By leveraging the power of the crowd through Topcoder,
Credit Suisse was able to circumvent internal resource
constraints and skill gaps while measurably improving their
ability to transact in the swap market at favorable rates.
by component, as required for the project’s
LAYER 1: AGREEMENTS
security goals. So our members may not
It begins with agreements. When you’re a realize that two projects they’re working
customer of Wipro and Topcoder, we sign an on are even for the same customer.
agreement with you that sets the rules for

7 LAYERS OF what we can and can’t disclose, as well as


the process for disclosing it — exactly like LAYER 4: OBFUSCATION OF DATA

DATA AND
any other prudent commercial transaction. Obfuscation an important, very complex
These terms are typically handled in the MSA, topic. Obfuscation is a best practice-driven
Universal Internet access and the rise of the and more stringent requirements can be scrubbing of personal identifiable information

PROCESS gig economy are delivering on the promise of layered on top when needed, SOW by SOW. (PII) and other sensitive information in order
as-needed, when-needed expert workforces. For projects that require them, our contes- to mask that data and reduce or eliminate

PROTECTIONS
The benefits of these workforces are increas- tants digitally sign NDAs as a condition of the likelihood that a worker can correlate it
ingly compelling; customers can expect access to the challenge. There’s sometimes a with anything else in the field, or even who
their production capacity to flex beyond core misconception that crowdsourcing is unique it’s for. Obfuscation is always a partnership
teams with their real-time demand for it, and in this regard. In reality, customers experience exercise, and either the data is treated before
Andy LaMora can access hard-to-hire skills instantly and a commercial relationship with us, complete it’s handed to us, or Wipro and Topcoder
Global Director, Crowd Analytics & AI as needed, instead of grappling with a job with standard NDAs and contract terms. work with the customer to prepare it. We have
market that is increasingly difficult to access. adopted and developed several approaches
for obfuscation. They range from simple
But with the power of a thousand minds LAYER 2: ATOMIZATION
scrambling of PII or key identifiers (e.g., prod-
on tap comes the risk of sharing your data Topcoder handles projects according to the
uct codes, warehouse IDs, etc.), to statistically
and work with countless strangers. Each of skill types required — through a process
rigorous replication of reference data to
these workers may see a slice of your data called atomization. We take the project you’d
create a fabricated, but still relevant data set.
or strategic intentions. Concerns over IP like to build and break it down into bite-sized
sharing, IP contamination, disclosure, and segments, which become separate challenges
privacy naturally follow. Our crowdsourcing (e.g., app design, coding, etc.) that we run LAYER 5: METAPHORS
platform was founded in 2001 and has dealt through our community. While this process A metaphor transposes the domain. Meta-
with these concerns every day since. Both the was designed to allow us to control time phors have long played a role in gamification
tools that we use and the methods we employ and delivery, atomization also adds obvious (see FoldIT and Play To Cure for examples),
to control these risks change year by year as protection. Think of it like this: members of or abstracting the problem domain from the
new tools emerge and ways of doing business our global crowd don’t get to work on Voltron solution in order to find new approaches.
change. We answer questions about these as a whole; they work on a single robot lion They also help in protection. We’ll apply
methods in every Q&A, and every deal cycle. (or limb) at a time. Workers won’t know there metaphors when even the basic project
are other lion robots that assemble into domain or purpose shouldn’t be exposed. To
As our Global Director of Crowd Analyt- Voltron unless you want them to. Atomization the extent that any position data is needed,
ics & AI, I thought it would be helpful to drastically and naturally reduces the number Topcoder preserves relative but not exact
share the latest basics on how Topcoder of people who see your entire project, which spatial relationships while moving the scene
mitigates these concerns today with is already more protection than a traditional to another continent or even planet, and might
seven distinct layers of security. contractor engagement typically provides. present the problem as a widget manufac-
turer instead. This way, we further distance
the data and topic from its presentation to
LAYER 3: PSEUDONYM
competitors on our platform. Together, Wipro
We don’t disclose the identity of our customer and Topcoder then go on to unwind those
to the Topcoder Community. We assign metaphors when we return results to clients.
a pseudonym instead. Generally, it’s the
same pseudonym across all projects for
any one customer, but we will also assign
them project by project, or even component
LAYER 6: DIRECT REVIEWS AND DIRECT TESTING UP NEXT AT TOPCODER: DIFFERENTIAL PRIVACY

CROWD-POWERED
Our review process uses a two-pronged If concerns about the risks of data sharing
approach. One prong is direct, manual review are on the rise, so fortunately are methods for
performed by no fewer than two expert re- dealing with it. One promising technique for

CONSULTING:
viewers in our community — members who’ve obfuscation on the ascent is called Differ-
proven to be not only technical masters, but ential Privacy (“DP”). DP seeks to replicate
also trustworthy on our platform. For critical important data in a manner that both breaks
code reviews, they inspect code line by line the ability to triangulate data back to reality
and complete lengthy scorecards, search- while also preserving key relationships.

ANALYTICS COE,
ing for best practices and security flaws.
(Reviewers are unable to see the identity of To illustrate the point: imagine being able
the submitters.) A contestant must first get to replicate a data set of disease patients in
past those sentinels if they want a chance an entire state in a manner where hundreds

ENERGY SECTOR
at victory. The second prong is technology. of data scientists can perform tests to seek
We also run the code across SAST when precursor signals, without the risk that
necessary, as well as IP detection platforms. some bad actor can figure out patient or
Mike Morris, our CEO, wrote on this subject in provider identities. Through our innovation
relation to crowdsourcing as being more se- contract with NASA and in partnership with
cure than traditional means of development. NIST, Topcoder will be hosting a Differential
Privacy challenge this November and are
exploring methods to refine these tech-
LAYER 7: RING-FENCED CROWDS
niques into our standard practice. If you’re
There are times when the project or data sim- a data scientist and would like a chance
ply cannot be shared in any form with the pub- to contribute to the solution, click here to
lic crowd. Fortunately, this is quite rare. But stay in the loop and join the contest!
in those cases, we are able to develop a sub-
crowd to work on projects. We will first qualify
TOPCODER’S COMMITMENT TO PRIVACY AND
workers who are interested, available, and ca- One of the world’s largest independent upstream oil and
pable of working on the given solution. These SECURITY
workers are then asked to complete additional As any security professional will tell you, priva-
natural gas companies needed to meet a growing demand
paperwork; past examples include data use cy and security protections is a dynamic field from internal customers for data science, design, and
agreements, network use agreements, even that requires constant diligence. Rest assured,
solution development without adding in-house resources.
background checks. The worker pool may our methods of protecting our clients, our
even include consultants from Wipro or the members, and ourselves, are always evolving. The answer: a crowd-powered Analytics CoE that combines
client’s other trusted vendors. Only after this Wipro and Topcoder resources and expertise.
paperwork (and if necessary, location-spe- Security is an intrinsic component of
cific) conditions are met are project details Topcoder’s offering. It exists in all aspects
shared. When necessary, we can also set up of the business — from a customer’s first
virtual private clouds with I/O restrictions interaction with the platform, to members
CROWDSOURCING ADVANTAGES THE RESULTS
for their use. But it’s worth noting that this registering and competing, to ultimately • Fast start — no waiting for hires Since December 2017, the Analytics
is a last resort step; reducing the size of the delivering solutions. Our platform enables Consulting team has run over 318
• No turnover — the crowd never quits
addressable workforce always has an impact. collaboration yet preserves privacy, allowing competitions on 39 ideas — with an
for experimentation with limited risk. • On-demand access to hard-to-find skills average opportunity ROI of 20x.
(e.g., data science, development, and
UX/UI design) with repeatable results
ADDITIONAL
TOPCODER CASE
STUDIES
DATA VISUALIZATION - EXECUTIVE DASHBOARD With a community of designers and business intelligence spe-

PEPSICO 3D MARKET
cialists at the ready, Topcoder delivered 5 separate data visual-
ization solutions and allowed PepsciCo to choose the elements
that appealed to them the most. The final, combined solution

LEADERSHIP COCKPIT streamlined the user experience for PepsiCo executives and
made it clear what parts of the business were doing well and
what areas were underperforming and required additional
focus. 3D visualization techniques were used to allow users
Like many large organizations, PepsiCo measures the state of its to click into surface data and see a visualization of secondary
business across several KPIs to give executives important up-to-date information. Across markets, product categories, and chan-
information and allow them to make the best-informed decisions nels, PepsiCo’s leaders now have the information they need at
possible. Since business performance data from a range of markets their fingertips to allow them to make data-driven decisions.
and product categories needed to be consolidated in one place, a
new approach to data visualization was requested from Topoder.

PepsiCo executives needed to be able to quickly scan the information,


view key drivers of the business, and take action based on what the infor-
mation was telling them. Because of this any dashboard for displaying
the variety of indicators had to be consistent, clear, and concise. When
PepsiCo turned to Topcoder, they received all of this and more.
MOBILE APP DEVELOPMENT - FOOD INTAKE TRACKER, SPACE

NASA ISS FOOD INTAKE


TRACKER APP
Tracking the food astronauts eat while aboard the International
Space Station (ISS) is essential for NASA to combat the health
risks of long-term spaceflight. When the ISS first took flight as-
tronauts were given an Excel spreadsheet to track food intake,
but NASA quickly realized that they needed a better solution.
Typing in zero gravity is difficult and time consuming, and
many meals went unrecorded. In response, NASA wanted to
design and build a fast, accurate, and simple iPad application
to drive adoption with the brave men and women on the ISS.

With Topcoder, NASA went from application idea to a produc-


tion-ready, astronaut-approved iPad app in just eight months.
Topcoder broke the project down into small units of work
spanning everything from UX design to testing the final appli-
Name of Contracting Organization: cation code, attracting hyper-specialized members from the
US Department of Energy, National
Topcoder Community to solve the challenges that come with
Renewable Energy Lab
building an app for use in space. Not only did the application
Prime/Sub: need to support facial recognition and voice command in the KEY ACCOMPLISHMENTS
Prime Contractor
always-noisy, low-lit ISS, but it also had to recognize and record
MISSION ACCOMPLISHMENT
Contract Number, Period of food and beverage packaging from countries with different
Crowdsourced design concept, develop-
Performance, Total Contract Value: barcodes than the United States (or with no bar codes at all).
Contract Number: NNJ15HK30B, ment, and QA for NASA ISS FIT App
06/04/2015-06/03/2020, NOIS-TO-23,
11/01/2016-03/01/2017, $28,330 After passing with stars (pun intended) through multiple SERVICE/PRODUCT QUALITY
rounds of human testing here on earth, the ISS FIT (Food Repeat customer with 505 submissions from
Intake Tracker) app developed by Topcoder was loaded the community in the first challenge
onto a rocket in late 2016. The ISS FIT app now passes
over our heads 15.54 times per day, ensuring that every DAY TO DAY MANAGEMENT
morsel consumed by astronauts is recorded for analysis. Managed through Topcoder direct and project leaderboard.

In 2017, the app won NASA's coveted SC Direc- STAFF RECRUITMENT AND RETENTION
tor's Innovation Group Achievement Award. Implemented a minisite and newsletter mention to
leverage the NASA brand as a recruitment aid

TIMELINESS
Challenge completion was 10 weeks

COST CONTROL
Project was completed on budget.
PREDICTIVE ANALYTICS & MACHINE LEARNING - COMMUNICATION/MEDIA

MEDIA DEMOGRAPHIC
MEMBERSHIP
The National Readership Survey (NRS) Social Grade is a UK
based grading system for demographic classification that
is a standard for market research among UK organizations.
ABC1 demographic classification calls out the individu-
als that are classified as upper/middle class and lower/
middle class. Since media companies generate revenue by
selling advertising space, and advertisers pay for access
to specific audiences they may target as their ideal con-
sumer demographic, it is critical for these companies to
be able to accurately classify and predict their readers.

Our customer, a European media company, had an existing


model with 80% accuracy in its ABC1 demographic predic-
tive power and they were looking to validate if their model
was the best possible solution or if they could do better.
Topcoder ran a 2-week long Marathon Match using a dataset
of over 20,000 media consumers. In that short span of
time over 100 competitors generated 1066 submissions.

The top 10 solutions all averaged 81% precision and


84% recall. This was a 3% improvement over the media
company’s model but it also served as a validation of the
relative strength and accuracy of the existing solution.
COMPUTER VISION - THREAT
IDENTIFICATION

PIPELINE
THREAT With millions of miles of pipeline across
the country, protecting critical fuel

DETECTION supplies is more difficult than ever.

The U.S. government wanted to develop an


algorithmic solution to detect and classify
objects within a certain range of energy pipe-
lines—and make monitoring more efficient
and secure. Thanks to advances in satellite
capabilities and drone technologies, the
necessary aerial imagery was abundant. But
automatically differentiating between downed
tree limbs and enemy vehicles in those imag-
es required a new approach to the problem.

With an Analytics Starter Pack from


Topcoder, the government client received
on-demand access to more than 100
data scientists who worked to develop
algorithmic solutions that detect and
evaluate potential risks. A crowd-powered
delivery team managed all logistics during
the three-week project, including testing
the 500+ possible solutions submitted.

The winning solutions rapidly process tens


of thousands of images and tag objects
with an appropriate threat level. Today,
the final algorithmic solution delivered by
Topcoder is also being used to drive other
government research in planetary satellite
classification, Mars reconnaissance, federal
disaster response and recovery, and beyond.
I’m Andy LaMora, I’m your point person for
all things Data Analytics and AI at Topcoder.
Questions? Ideas? Projects? Reach out and
let’s get started.

IF YOU’RE READY, Andy LaMora


Global Director, Crowd Analytics & AI

WE’RE READY! alamora@topcoder.com


THE SIMPLEST WAY TO
ACCESS AND EXECUTE WITH
INCREDIBLE DIGITAL TALENT
topcoder.com

You might also like