Adam Devine and David Bernat


WorkFusion provides software-as-a-service to enterprise companies.

The WorkFusion platform automates human data analysis by leveraging machine learning algorithms.

WorkFusion provides business users all of the tools they need to optimize information
processes and better manage global workforces.

Your brain just performed a spectacularly insightful operation. Using unique human cognition, you
extracted meaning and created relationships between the ideas in those sentences to construct a
conceptual definition of what WorkFusion does.

This capability has eluded machines and machine learning because of the siloed approaches of
human computing and machine computing. Human computing refers to optimal processes for
individuals to account and create insight into the processing of data. Machine computing requires
strict representations of quantitative and structured information to produce statistical conclusions.
Collectively integrating the best of human and machine capabilities delegated through workflow
management is what WorkFusion does.

People make decisions, and WorkFusion synthesizes the information that enables human decisions.

A primer on machine learning.

Imagine teaching a child how to throw a Frisbee. Provided you’ve perfected the toss yourself, it’s
simple: you tell the child to watch as you demonstrate. You then give the Frisbee to the child and
ask her to give it a go. Maybe she’s uniquely adept, and the Frisbee glides through the air, level and
stable, but it likely shudders sideways a few feet before plummeting to the ground. You pick it up
and throw it again yourself to reinforce her learning. Her second attempt is better. A crosswind picks
up, and you show her how to compensate for the gusts. After this pattern of watching, learning, and
improving in a variety of conditions repeats a few times, the child is able to flawlessly toss a Frisbee.
This pairing of experience with learning is how WorkFusion uses machine learning to perfect auto-
mation, only the Frisbee toss is a data processing task, the instructor is the quality data generated
by human analysts, and the child is an algorithm. By watching human workers collect data,
WorkFusion learns to automate.

Machine learning (ML) has the power to radically improve the quality, efficiency, and speed of data
work and eliminate manual data entry. WorkFusion puts this power into the hands of enterprise busi-
ness operations by providing an intuitive software platform for configuring and managing workflows,


programmatically sourcing, training, and quality-controlling a large human workforce, and seamlessly
using the accurate output of human work to train automation. This paper explains how the platform
overcomes the common challenges of data monitoring, collection, extraction and analysis by pairing
machine learning with human data analysts.

The problems of high-volume data collection: tedious work

and unreliable results.
Processing data at massive scale is like harvesting wheat without a combine. A human-only data
analyst workforce is an expensive, fixed and finite resource, unable to elastically scale up to meet
bursts of demand or scale down in troughs. Skilled as they may be, people naturally make mistakes,
and sometimes these mistakes are incredibly costly. Supplementing or replacing a full-time equivalent
(FTE) workforce with outsourced workers provides moderate cost relief and incremental scalability,
but as both data volumes and global labor rates rise, the benefits of business and knowledge process
outsourcing (BPO / KPO) fall.

Scrapers, optical character recognition (OCR), and parsers are common rules-based automation
(RBA) point solutions. They work well if the underlying business process and data sources nev-
er change, but change is the only constant in business. RBA requires upfront programming and
configuration by IT, and it’s virtually impossible to program every potential variable in a data process
or account for variations in the formats of PDFs, websites, and other unstructured sources. When
the process or sources change, IT must re-write the rules. The data supply chain halts while the point
solution is repaired, and business continuity is compromised.

WorkFusion brings together on one platform the best of human and machine data processing
capabilities, and ML is what ties them together.

Taming and tuning machine learning for the enterprise.

Despite its power, machine learning on its own is needy and complex. Big businesses in financial
services, healthcare, insurance, and retail have devoted entire data science and engineering teams to
the extensive and expensive work of developing and training machine learning algorithms. These
IT projects often fail, not for lack of in-house talent, but for reasons that are beyond the control of a
single business.

There are three fundamental reasons for these failures.

• Machine learning requires a large volume of high quality training data.


• Matching the right data problem with the right algorithm solution is a time-consuming puzzle.
• Smart as it is, machine learning must be seamlessly integrated with human workers to perform
exceptions, which tunes algorithms.

WorkFusion levels these barriers, allowing business users to self-sufficiently leverage the power of
machine learning.

WorkFusion generates lots of quality training data by ensuring quality work

from an agile human workforce.
Both IT and Operations departments struggle with this problem. IT needs Operations’ processes and
data sets in order to train algorithms, but Ops doesn’t want to distract their data analysts from busi-
ness-as-usual with an IT project. WorkFusion solves both of their problems. WorkFusion provides
robust workflow design tools and process templates for ops, letting users configure the ideal data
process. WorkFusion automatically delegates the tasks within the process to the right worker. Rather
than requiring human data analysts to change their method of work or divert their attention to feeding
ML algorithms, WorkFusion lets workers perform their business-as-usual data work on the platform.

The platform creates an agile workforce to perform tasks by optimally combining and managing a
customer’s FTE human data analysts, BPO workers, and on-demand workers sourced from online
talent markets such as Elance, Upwork (formerly oDesk), and Amazon Mechanical Turk. The software
makes even the largest human workforce nimble, elastic, and scalable and ensures that individual
workers quickly, efficiently, and accurately complete work.

WorkFusion uses a combination of statistical quality control 1, plurality 2, gold data 3, and signals from each
worker (e.g., historical performance, keystrokes, speed) to assess accuracy. The platform turns these
countless and constant quality assessments into a dashboard for users to truly understand the perfor-
mance of their workers and the KPIs of human work.

The quality-controlled output of human workers feeds ML algorithms, turning collective human intelligence
into machine intelligence. As the algorithms build confidence, humans are evaluated not just against one
another but against an established collective intelligence without requiring workers to do the same task. This
incremental process of intelligence transfer and quality control radically reduces costs and increases speed.

As WorkFusion identifies quality data from human workers, the platform’s built-in machine learning
algorithms replicate the pattern of work that generated it.

For more information about how the platform controls quality, please see WorkFusion’s paper, “The Knowledge Work Revolution.”
Plurality: engaging multiple workers to complete a task simultaneously, comparing results and selecting the common answer.
Gold Data: a pre-defined correct answer to a data task, which is used to validate a worker’s answer.


WorkFusion: a task + algorithm matchmaker.

Just like human workers, the performance of machine learning algorithm models vary based on the
nature of the work. Coding, configuring, and testing algorithm models against data sets is PhD-cali-
ber data science work and amounts to a six to nine month IT project for companies attempting to do
this in-house. WorkFusion automates this algorithm matchmaking process entirely.

When a human data analyst begins a task, WorkFusion deploys hundreds of distinct models of a
number of general-purpose learning algorithms (e.g., Markov or Conditional Random Field) to “watch”
and replicate the patterns of human workers. Each algorithm essentially competes with human work-
ers and with other algorithms to consistently meet the required standards of accuracy. Once Work-
Fusion identifies the winning algorithm, the platform sends an “Automation Notification” to the user,
an indication that ML has perfected the task. With one click, a business user can shift from a human
workforce to automation, lift human workers to higher value work, and eliminate the labor cost of the
data task.

WorkFusion dynamically evaluates the best models for new tasks and automatically transfers the
rules of proven algorithms to perform similar tasks, drawing in additional knowledge repositories,
features, toolkits, and worker results necessary for executing and optimizing the process. Once
automation algorithms are engaged, WorkFusion shifts focus from algorithm training to performance
analysis and improvement, all without disrupting business as usual. The system redirects the focus
of workers from doing the routine work used to train algorithms to performing the exceptions that
algorithms cannot perform.

Building a wide variety of distinct machine learning algorithms into one platform and automating the
trial-and-error matchmaking effort eliminates the need for ML IT projects within operations and gives
business users the power of a data science team.

WorkFusion keeps humans in the loop to handle exceptions.

Provided they’re given the budget and time to execute, IT projects often fail after deployment for lack
of a rapid and efficient means of identifying and elevating exceptions to human data analysts. Excep-
tions generally create yet another burden on IT departments by necessitating IT projects to handle
quality checks and algorithm retraining.

WorkFusion eliminates the need for IT quality checks and automation maintenance by automatically
identifying and elevating exceptions to an available human data analyst. This pairing of machine and


1. WorkFusion ensures that data analysts

produce quality data, which trains algorithms

Raw Data 2. WorkFusion’s algorithms learn to automate tasks

{ }

3. Exceptions are programmatically 4. Supervised machine learning and

sent to analysts and algorithms automation produce q uality data in
retrain any format

Figure 1. A high level view of how ML and data analysts create a Virtuous Loop.

human intelligence is commonly referred to as human-in-the-loop computing. Each time an algorithm

encounters a task it cannot perform, the platform engages a human worker to perform it. Just as it did
during initial training, the algorithm watches, learns, and programs itself, dynamically turning an ex-
ception into a new rule. WorkFusion calls this continuous cycle of incremental machine learning the
Virtuous Loop. [see Figure 1] Every data curve ball makes WorkFusion more adept at catching them.

The Web contains the collective knowledge of the world. While search engines provide entry points
into general subject matter queries, such as locating a specific company’s website, searchable da-
tabases of domain knowledge has been an expensive mash-up of blunt instrument automation and
cumbersome human collection, extraction, validation, and enrichment.

Whether extracting a descriptive sentence from a company’s website or making sense of internal
streams of data, the problem and the intent is identical: turning free-form text into whole, meaningful
information by identifying the relationships between the parts. Traditional human or machine solu-
tions don’t scale, cost a lot, and break frequently. WorkFusion’s unique combination of human-ma-
chine computing solves this problem.


Real world examples of how it works: turning the free-form text into
machine-searchable databases.
A common application of WorkFusion is extracting descriptive information about real-world entities,
such as companies, a role traditionally requiring expert human analysts. Writing programmatic rules
to extract this information is nearly impossible, simply due to the vast variation writers use to express
their ideas. Consider these descriptions of WorkFusion:

WorkFusion provides software-as-a-service to enterprise companies.

The WorkFusion platform automates human data analysis by leveraging machine learning algorithms.

WorkFusion provide business users all of the tools they need to optimize information processes and
better manage global workforces.

To tell whether or not two phrases are similar is an inherently complex, non-quantitative, subjective

Provided the business user can articulate their intention through instructions to be done by human
data analysts, WorkFusion can match the intent and meaning of the text selected by human data
analysts and continue the work automatically and with consistent quality. The algorithms reflexively
adapt throughout the process.

WorkFusion accomplishes this by fusing advanced natural language processing (NLP) with ML to
identify the essential meaning of the sentences and choose those which best describe the company.
As human data analysts extract the sentences the business user needs, WorkFusion guides algo-
rithms to program themselves to build a consensus of correct decisions after weighing hundreds
of thousands of features learned after watching a wide enough variety of patterns to reliably auto-
mate the many variations in the work.


Step 1: A WorkFusion business user uploads a batch of company names or connects the platform to
a database via API.

Step 2: The user selects pre-built templates for machine-assisted human data analysts to locate and verify
company websites.

Step 3: The user selects templates for human data analysts for extracting sentences which provide
description of the company, or its products and services, geographic service areas, executives, etc.


Step 4: The user deploys the workflow into

production. WorkFusion distributes the work
to human data analysts.

Step 5: WorkFusion’s learning algorithms

train on approximately 200 human extractions,
and an architecture of SVM models proves
successful at identifying quality descriptions
and ignoring the other text on the company

Figure 2.
Semi-automated Phase: This company website contains a quality description of the
company. Human analysts transcribe these sentences into a
Step 6: WorkFusion’s learning algorithm searchable database. WorkFusion’s learning algorithms auto-
begins automatically selecting potential mate this process, reducing cost by an order of magnitude.
descriptions from company websites. Work-
Fusion uses the original template to distribute these sentences to human data analysts. These tasks
of validation are faster and simpler, resulting in an immediate drop of cost from 25 cents/company
to 5 cents/company. Additional learning algorithms begin training on approximately 2000 additional
companies using this data.

Automated Phase:

Step 7: After additional training, WorkFusion’s learning algorithm begins automatically processing
23% of company websites by extracting descriptions at a quality equal to or higher than the human work-
ers. The remaining 77% of websites are handled using Step 6, resulting in a final cost of 4 cents/company.

As human workers continue to solve the harder cases and the learning algorithm improves, the
percentage of websites processed automatically increases and the costs drop.

Statistical Quality Control:

Step 8: Quality automation uses well-defined processes of statistical quality control to ensure pro-
duction quality does not waver, even when website sources change, different domains are entered, or
workers churn.

Case vitals and KPIs:

Process: Create a description of a company from disparate text

Source volume: 1 million companies
Source format: websites and annual reports


Human-only workforce WorkFusion

Headcount required 20 workers for 36 days 20 for 6 days
Time per description 10.4 minutes 27 seconds
Cost per description $0.57 $0.05
Accuracy rate 84% 99.1%

WorkFusion produces new company descriptions at a rate of 1,000 per hour at a cost of $15 per hour.


A common application of WorkFusion is extracting information from text and identifying meaningful
relationship between the extracted information. Take for example the extraction of specific education
and career information from the biographies of company executives. Again, writing programmatic
rules to extract information such as degrees, majors, and academic institutions from unstructured
text is challenging because of variations in formats and lexicons. The mistakes that rules-based au-
tomation would make would outnumber the
number of correct outputs, and constantly
re-writing rules to accommodate exceptions
would be a fulltime job for an engineer.

WorkFusion’s machine learning overcomes

this challenge by guiding machine learning
algorithms to program themselves after
watching a wide enough variety of patterns
to reliably automate the many variations in
the work. The process is as follows.

Figure 3. Step 1: A WorkFusion business user up-

A GUI configured to help workers quickly and accurate extract
loads a batch of source files or connects the
data and simultaneously train ML automation on the content
and context of collected data. platform to a database via API.

Step 2: The user selects a pre-built instructions template for human data analysts for extracting ex-
ecutive information from corporate bios – e.g., degrees, majors, school names, year graduated, and

Step 3: The user adds the desired data attributes to a modular graphical user interface for the human
workforce. See figure 3.

Step 4: The user deploys the workflow into production, and WorkFusion distributes the work to human
data analysts.


Step 5: WorkFusion’s learning algorithms train on approximately 200 human extractions, and a
Marcov model proves to be successful at extracting the attributes from the text.

Case vitals and KPIs:

Process: Extract essential executive bio information from text

Source volume: 300 bios
Source format: PDFs

Human-only workforce WorkFusion

Headcount required 11 3
Time per extraction 4 minutes 5 seconds
Cost per extraction $2.10 $0.19
Accuracy rate 84% 97%

Applying WorkFusion to your operation.

Using WorkFusion for data work radically improves business KPIs and gives expert data analysts
more time to focus on higher value work, incrementally raising the application of human intelligence
by automating the work that’s beneath it.

WorkFusion has automated repetitive work and optimized human work for the world’s largest data
vendors, global banks and investment businesses, retailers, and consumer packaged goods compa-
nies. Use cases range from simple but high-volume scraping and analysis of website data to highly
complex extraction of high-stakes financial data from digitized documents.

WorkFusion is optimal for collecting data that requires a shorter amount of time for human workers
to find and a much longer amount of time to manually enter into a structured format. For example,
finding the 12-digit alpha-numerical International Securities Issuance Number (ISIN) code within a
paragraph of text might take only a second for a human, but keying it in might take 30 seconds with vari-
able accuracy. WorkFusion quickly and confidently automates this kind of find-and-key-in sort of work.

Machine learning and the pairing of smart machines with smart people is quickly evolving from a
competitive advantage for early adopters to standard operating procedure across data-driven indus-
tries. It will change the shape of the human workforce at enterprise businesses from a triangle to a di-
amond: machines will automate the tedious work at the bottom of the workforce pyramid and elevate
data analysts to the service of customers and driving revenue for the business. Gartner’s esteemed
Digital Workplace analyst, Tom Austin, succinctly summarizes the enterprise mandate for adopting


machine learning in his report, “Top 10 Strategic Technologies – The Rise of Smart Machines:”

" IT leaders need to aggressively examine and act on the promise,

threat and effects of smart machines on work patterns (man­machine
collaboration), staffing shifts and enterprise business opportunities."

Machine Learning is an exponential and transformational technology when deployed in the right way,
and WorkFusion has proven that the right way is a Virtuous Loop of man-machine collaboration
enabled by an intuitive, built-for-business platform. We welcome your questions, interests and ideas at

About the Authors

Adam Devine
VP Product Marketing

Adam Devine leads product marketing for WorkFusion, a software platform that automates manual
work and optimizes human work through machine learning. He is responsible for identifying and
educating data-intensive businesses seeking new ways to radically reduce data operations costs and
improve data quality. Adam has 15 years of experience growing businesses through product market-
ing, including product positioning, market intelligence, messaging, and content creation. He began
his career in management consulting at BearingPoint’s Banking & Capital Markets practice. Adam
speaks frequently about human-in-the-loop computing, machine learning, and smart automation at finan-
cial industry conferences, including FIMA, FISD, Massolution, MarketTech, NAFIS, NFAIS, and SIIA.
He can be reached at

David Bernat, Ph. D.

Chief Scientist

David Bernat leads research and development for WorkFusion. The concept driving our research
team is simple: WorkFusion built our platform by integrating human annotators, automation pipelines,
and statistical quality control for limitlessly general workflow architectures and unstructured information
sources. Our human computation platform gives us the power to reinvent machine learning solutions
for web scale language processing, real-world image recognition, crowdsourcing task management,
and operations research. We build scalable and innovative technology in-house, and build strong
connections to complementing technology and academic faculty.


David has a doctorate in physics and astrophysics from Cornell University and a bachelors of science in
physics from the Caltech, and has previous experience as a research engineer at Google AI, as a FX/FICC
Strategist at Goldman Sachs, and as chief executive officer of a research team designing small satellites
for agriculture applications. He frequently speaks at conference and University seminars. He can be
reached at

