Professional Documents
Culture Documents
Enterprise Data Science & Analytics: Taking You To New Heights Powered by Topcoder
Enterprise Data Science & Analytics: Taking You To New Heights Powered by Topcoder
Enterprise Data Science & Analytics: Taking You To New Heights Powered by Topcoder
Our customers know that they can with Topcoder. Our cus-
tomers understand the need for next-generation solutions and
experience without sacrificing quality or culture. That’s why
they’ve used Topcoder to get groundbreaking design, data sci-
ence and development work done with crowdsourcing, a delivery
model we’ve pioneered and honed over nearly twenty years.
Andy LaMora
Global Director, Crowd Analytics & AI
AN INTRODUCTION TO TOPCODER
AND CROWDSOURCING
ON-DEMAND GLOBAL TALENT AND DIGITAL SOLUTIONS AT SCALE
HARVARD
TUMOR HUNT The first stage of this marathon challenge involved
producing an automatic delineation algorithm
that was as accurate as experts in the field.
TOPCODER COMMUNITY DEVELOPED AI TO TREAT During this stage, 31 competitors from various countries
(e.g., United States, Bulgaria, Poland, Brazil, etc.) participated
LUNG TUMOR PATIENTS.
in creating a real, actionable algorithmic solution that can
The second leading cause of death in the United States is be applied on a grand scale. In the second stage, 11 com-
cancer. Among those cancers, lung cancer claims over 150,000 petitors worked to target the credibility of that algorithm
people’s lives every single year. Topcoder joined forces with by pairing it with expert feedback to train the algorithm to
Harvard to tackle one of the most ambitious healthcare avoid mistakes that experts wouldn’t make. A final, private
initiatives ever undertaken in the crowdsourcing world—cre- challenge tasked five privately selected Topcoder members
ating and testing automatic delineation algorithms to help to incorporate final feedback from physicians and experts.
improve treatments of cancerous tumors in patients’ lungs.
In the end, the Harvard Tumor Hunt brought 31 of Topcod-
Manual tumor delineation—the measure of treatment field er’s best and brightest data scientists together to create
borders of tumors—is a time-consuming and complicated an actionable solution that was successful at both rapidly
process. Among other pain points, it introduces individual bias and automatically delineating cancerous tumors with the
as well as person-to-person inconsistencies. The Topcoder same level of skill as experts. By accurately detecting the
Harvard Tumor Hunt aimed to produce an automatic tumor size and scope of cancerous tumors, radiologists can
delineation algorithm that met the accuracy of the average maximize the impact of treatment on cancerous cells and
radiology expert but exceeded them in both processing speed minimize its impact on non-cancerous tissues—as well
and consistency. These three challenges produced an algo- as accurately determining the correct treatments options
rithm that provides real, substantial tumor delineation results for in-need patients—potentially saving lives and breeding
without individual bias and expert-to-expert inconsistencies. superior practices into cancer treatment and identification.
A 10-day marathon challenge was held in which partic-
ALGORITHM OPTIMIZATION - PHARMACEUTICALS
ipants were asked to optimize Pfizer’s existing GWAS
PFIZER GWAS SPEED-UP algorithm for speed and accuracy. With a total of 56 par-
ticipants submitting 292 solutions over the course of the
challenge, the results were astounding: the optimization
delivered through the competition improved the CPU of
Pfizer’s GWAS algorithm by 1200x and reduced process-
STUDYING THE HUMAN GENOME IS THE WORLD’S
ing times from approximately 5 hours to 28 seconds.
LARGEST PROJECT, BUT THE WORK CAN BE
SLOW—EVEN WITH COMPUTERS. This extreme value solution provided a shift in how fast Pfizer
In the fight against deadly diseases, fast and accurate could perform their crucial GWAS research. This extreme value
algorithmic solutions are helping pharmaceutical companies solution enabled them to expand research by completing more
get new medications to market sooner. One powerful weapon experiments faster, putting them one giant step closer to solv-
in this fight is genome-wide association studies (GWAS), ing some of the world's most pressing healthcare challenges.
which analyze large sets of genetic markers across large
cohorts of individuals to locate genetic variants contributing
to the heritability of phenotypes (i.e., traits) of interest. GWAS
analysis is computationally challenging because of the scale
of the data involved and the modeling algorithms required.
50+
valuable data contained in mud logs, if digitized, can unlock
new analytical capabilities, speed-up decision-making,
and increase confidence in geological interpretations.
YEARS SAVED IN MUD LOG
PROCESSING The mixed-format of mud log data does not necessarily
lend itself well to fully unassisted, automated approach-
es. To re-enter all data manually would be cost and
time prohibitive (to say nothing of error prong, or other
issues) so another hybrid approach was needed.
FAULT IDENTIFICATION
USE THE CROWD TO ENRICH YOUR DATA
AND DEVELOP AI ON DIFFICULT TASKS.
The identification of faults in oil and gas fields is crit-
ical to identifying promising locations and directions
for drilling as well as avoiding unnecessary economic
losses and environmental risks during operations. A
customer approached Topcoder with an interest in both
automating and enhancing the time-consuming manual
work of labeling 3D seismic volumes as well as teasing
out the relationships between intersecting faults.
PREDICTOR ON TRADED
SECURITIES
As part of the Dodd-Frank Wall Street Reform and Consumer
Protection Act, all registered swap dealers active in credit and
interest rate trading have been required to to send trade data
to public swap data repositories (SDRs) within 5 to 15 minutes
of swap executions as a means of facilitating centralized data
collection and reporting. This aggregation of trade data, in
addition to increasing transparency in the swap market, has
created new opportunities for analysis and price prediction.
DATA AND
any other prudent commercial transaction. Obfuscation an important, very complex
These terms are typically handled in the MSA, topic. Obfuscation is a best practice-driven
Universal Internet access and the rise of the and more stringent requirements can be scrubbing of personal identifiable information
PROCESS gig economy are delivering on the promise of layered on top when needed, SOW by SOW. (PII) and other sensitive information in order
as-needed, when-needed expert workforces. For projects that require them, our contes- to mask that data and reduce or eliminate
PROTECTIONS
The benefits of these workforces are increas- tants digitally sign NDAs as a condition of the likelihood that a worker can correlate it
ingly compelling; customers can expect access to the challenge. There’s sometimes a with anything else in the field, or even who
their production capacity to flex beyond core misconception that crowdsourcing is unique it’s for. Obfuscation is always a partnership
teams with their real-time demand for it, and in this regard. In reality, customers experience exercise, and either the data is treated before
Andy LaMora can access hard-to-hire skills instantly and a commercial relationship with us, complete it’s handed to us, or Wipro and Topcoder
Global Director, Crowd Analytics & AI as needed, instead of grappling with a job with standard NDAs and contract terms. work with the customer to prepare it. We have
market that is increasingly difficult to access. adopted and developed several approaches
for obfuscation. They range from simple
But with the power of a thousand minds LAYER 2: ATOMIZATION
scrambling of PII or key identifiers (e.g., prod-
on tap comes the risk of sharing your data Topcoder handles projects according to the
uct codes, warehouse IDs, etc.), to statistically
and work with countless strangers. Each of skill types required — through a process
rigorous replication of reference data to
these workers may see a slice of your data called atomization. We take the project you’d
create a fabricated, but still relevant data set.
or strategic intentions. Concerns over IP like to build and break it down into bite-sized
sharing, IP contamination, disclosure, and segments, which become separate challenges
privacy naturally follow. Our crowdsourcing (e.g., app design, coding, etc.) that we run LAYER 5: METAPHORS
platform was founded in 2001 and has dealt through our community. While this process A metaphor transposes the domain. Meta-
with these concerns every day since. Both the was designed to allow us to control time phors have long played a role in gamification
tools that we use and the methods we employ and delivery, atomization also adds obvious (see FoldIT and Play To Cure for examples),
to control these risks change year by year as protection. Think of it like this: members of or abstracting the problem domain from the
new tools emerge and ways of doing business our global crowd don’t get to work on Voltron solution in order to find new approaches.
change. We answer questions about these as a whole; they work on a single robot lion They also help in protection. We’ll apply
methods in every Q&A, and every deal cycle. (or limb) at a time. Workers won’t know there metaphors when even the basic project
are other lion robots that assemble into domain or purpose shouldn’t be exposed. To
As our Global Director of Crowd Analyt- Voltron unless you want them to. Atomization the extent that any position data is needed,
ics & AI, I thought it would be helpful to drastically and naturally reduces the number Topcoder preserves relative but not exact
share the latest basics on how Topcoder of people who see your entire project, which spatial relationships while moving the scene
mitigates these concerns today with is already more protection than a traditional to another continent or even planet, and might
seven distinct layers of security. contractor engagement typically provides. present the problem as a widget manufac-
turer instead. This way, we further distance
the data and topic from its presentation to
LAYER 3: PSEUDONYM
competitors on our platform. Together, Wipro
We don’t disclose the identity of our customer and Topcoder then go on to unwind those
to the Topcoder Community. We assign metaphors when we return results to clients.
a pseudonym instead. Generally, it’s the
same pseudonym across all projects for
any one customer, but we will also assign
them project by project, or even component
LAYER 6: DIRECT REVIEWS AND DIRECT TESTING UP NEXT AT TOPCODER: DIFFERENTIAL PRIVACY
CROWD-POWERED
Our review process uses a two-pronged If concerns about the risks of data sharing
approach. One prong is direct, manual review are on the rise, so fortunately are methods for
performed by no fewer than two expert re- dealing with it. One promising technique for
CONSULTING:
viewers in our community — members who’ve obfuscation on the ascent is called Differ-
proven to be not only technical masters, but ential Privacy (“DP”). DP seeks to replicate
also trustworthy on our platform. For critical important data in a manner that both breaks
code reviews, they inspect code line by line the ability to triangulate data back to reality
and complete lengthy scorecards, search- while also preserving key relationships.
ANALYTICS COE,
ing for best practices and security flaws.
(Reviewers are unable to see the identity of To illustrate the point: imagine being able
the submitters.) A contestant must first get to replicate a data set of disease patients in
past those sentinels if they want a chance an entire state in a manner where hundreds
ENERGY SECTOR
at victory. The second prong is technology. of data scientists can perform tests to seek
We also run the code across SAST when precursor signals, without the risk that
necessary, as well as IP detection platforms. some bad actor can figure out patient or
Mike Morris, our CEO, wrote on this subject in provider identities. Through our innovation
relation to crowdsourcing as being more se- contract with NASA and in partnership with
cure than traditional means of development. NIST, Topcoder will be hosting a Differential
Privacy challenge this November and are
exploring methods to refine these tech-
LAYER 7: RING-FENCED CROWDS
niques into our standard practice. If you’re
There are times when the project or data sim- a data scientist and would like a chance
ply cannot be shared in any form with the pub- to contribute to the solution, click here to
lic crowd. Fortunately, this is quite rare. But stay in the loop and join the contest!
in those cases, we are able to develop a sub-
crowd to work on projects. We will first qualify
TOPCODER’S COMMITMENT TO PRIVACY AND
workers who are interested, available, and ca- One of the world’s largest independent upstream oil and
pable of working on the given solution. These SECURITY
workers are then asked to complete additional As any security professional will tell you, priva-
natural gas companies needed to meet a growing demand
paperwork; past examples include data use cy and security protections is a dynamic field from internal customers for data science, design, and
agreements, network use agreements, even that requires constant diligence. Rest assured,
solution development without adding in-house resources.
background checks. The worker pool may our methods of protecting our clients, our
even include consultants from Wipro or the members, and ourselves, are always evolving. The answer: a crowd-powered Analytics CoE that combines
client’s other trusted vendors. Only after this Wipro and Topcoder resources and expertise.
paperwork (and if necessary, location-spe- Security is an intrinsic component of
cific) conditions are met are project details Topcoder’s offering. It exists in all aspects
shared. When necessary, we can also set up of the business — from a customer’s first
virtual private clouds with I/O restrictions interaction with the platform, to members
CROWDSOURCING ADVANTAGES THE RESULTS
for their use. But it’s worth noting that this registering and competing, to ultimately • Fast start — no waiting for hires Since December 2017, the Analytics
is a last resort step; reducing the size of the delivering solutions. Our platform enables Consulting team has run over 318
• No turnover — the crowd never quits
addressable workforce always has an impact. collaboration yet preserves privacy, allowing competitions on 39 ideas — with an
for experimentation with limited risk. • On-demand access to hard-to-find skills average opportunity ROI of 20x.
(e.g., data science, development, and
UX/UI design) with repeatable results
ADDITIONAL
TOPCODER CASE
STUDIES
DATA VISUALIZATION - EXECUTIVE DASHBOARD With a community of designers and business intelligence spe-
PEPSICO 3D MARKET
cialists at the ready, Topcoder delivered 5 separate data visual-
ization solutions and allowed PepsciCo to choose the elements
that appealed to them the most. The final, combined solution
LEADERSHIP COCKPIT streamlined the user experience for PepsiCo executives and
made it clear what parts of the business were doing well and
what areas were underperforming and required additional
focus. 3D visualization techniques were used to allow users
Like many large organizations, PepsiCo measures the state of its to click into surface data and see a visualization of secondary
business across several KPIs to give executives important up-to-date information. Across markets, product categories, and chan-
information and allow them to make the best-informed decisions nels, PepsiCo’s leaders now have the information they need at
possible. Since business performance data from a range of markets their fingertips to allow them to make data-driven decisions.
and product categories needed to be consolidated in one place, a
new approach to data visualization was requested from Topoder.
In 2017, the app won NASA's coveted SC Direc- STAFF RECRUITMENT AND RETENTION
tor's Innovation Group Achievement Award. Implemented a minisite and newsletter mention to
leverage the NASA brand as a recruitment aid
TIMELINESS
Challenge completion was 10 weeks
COST CONTROL
Project was completed on budget.
PREDICTIVE ANALYTICS & MACHINE LEARNING - COMMUNICATION/MEDIA
MEDIA DEMOGRAPHIC
MEMBERSHIP
The National Readership Survey (NRS) Social Grade is a UK
based grading system for demographic classification that
is a standard for market research among UK organizations.
ABC1 demographic classification calls out the individu-
als that are classified as upper/middle class and lower/
middle class. Since media companies generate revenue by
selling advertising space, and advertisers pay for access
to specific audiences they may target as their ideal con-
sumer demographic, it is critical for these companies to
be able to accurately classify and predict their readers.
PIPELINE
THREAT With millions of miles of pipeline across
the country, protecting critical fuel