Professional Documents
Culture Documents
Data Mining For Business Intelligence - Concepts, Techniques, and Applications in Microsoft Office Excel With XLMiner (Shmueli, Patel & Bruce 2010-10-26)
Data Mining For Business Intelligence - Concepts, Techniques, and Applications in Microsoft Office Excel With XLMiner (Shmueli, Patel & Bruce 2010-10-26)
Smart People +
Smart Machines
How WorkFusion Uses Machine Learning
Adam Devine and David Bernat
Abstract
WorkFusion provides software-as-a-service to enterprise companies.
The WorkFusion platform automates human data analysis by leveraging machine learning algorithms.
WorkFusion provides business users all of the tools they need to optimize information
processes and better manage global workforces.
Your brain just performed a spectacularly insightful operation. Using unique human cognition, you
extracted meaning and created relationships between the ideas in those sentences to construct a
conceptual definition of what WorkFusion does.
This capability has eluded machines and machine learning because of the siloed approaches of
human computing and machine computing. Human computing refers to optimal processes for
individuals to account and create insight into the processing of data. Machine computing requires
strict representations of quantitative and structured information to produce statistical conclusions.
Collectively integrating the best of human and machine capabilities delegated through workflow
management is what WorkFusion does.
People make decisions, and WorkFusion synthesizes the information that enables human decisions.
Machine learning (ML) has the power to radically improve the quality, efficiency, and speed of data
work and eliminate manual data entry. WorkFusion puts this power into the hands of enterprise busi-
ness operations by providing an intuitive software platform for configuring and managing workflows,
programmatically sourcing, training, and quality-controlling a large human workforce, and seamlessly
using the accurate output of human work to train automation. This paper explains how the platform
overcomes the common challenges of data monitoring, collection, extraction and analysis by pairing
machine learning with human data analysts.
Scrapers, optical character recognition (OCR), and parsers are common rules-based automation
(RBA) point solutions. They work well if the underlying business process and data sources nev-
er change, but change is the only constant in business. RBA requires upfront programming and
configuration by IT, and it’s virtually impossible to program every potential variable in a data process
or account for variations in the formats of PDFs, websites, and other unstructured sources. When
the process or sources change, IT must re-write the rules. The data supply chain halts while the point
solution is repaired, and business continuity is compromised.
WorkFusion brings together on one platform the best of human and machine data processing
capabilities, and ML is what ties them together.
• Matching the right data problem with the right algorithm solution is a time-consuming puzzle.
• Smart as it is, machine learning must be seamlessly integrated with human workers to perform
exceptions, which tunes algorithms.
WorkFusion levels these barriers, allowing business users to self-sufficiently leverage the power of
machine learning.
The platform creates an agile workforce to perform tasks by optimally combining and managing a
customer’s FTE human data analysts, BPO workers, and on-demand workers sourced from online
talent markets such as Elance, Upwork (formerly oDesk), and Amazon Mechanical Turk. The software
makes even the largest human workforce nimble, elastic, and scalable and ensures that individual
workers quickly, efficiently, and accurately complete work.
WorkFusion uses a combination of statistical quality control 1, plurality 2, gold data 3, and signals from each
worker (e.g., historical performance, keystrokes, speed) to assess accuracy. The platform turns these
countless and constant quality assessments into a dashboard for users to truly understand the perfor-
mance of their workers and the KPIs of human work.
The quality-controlled output of human workers feeds ML algorithms, turning collective human intelligence
into machine intelligence. As the algorithms build confidence, humans are evaluated not just against one
another but against an established collective intelligence without requiring workers to do the same task. This
incremental process of intelligence transfer and quality control radically reduces costs and increases speed.
As WorkFusion identifies quality data from human workers, the platform’s built-in machine learning
algorithms replicate the pattern of work that generated it.
1
For more information about how the platform controls quality, please see WorkFusion’s paper, “The Knowledge Work Revolution.”
2
Plurality: engaging multiple workers to complete a task simultaneously, comparing results and selecting the common answer.
3
Gold Data: a pre-defined correct answer to a data task, which is used to validate a worker’s answer.
When a human data analyst begins a task, WorkFusion deploys hundreds of distinct models of a
number of general-purpose learning algorithms (e.g., Markov or Conditional Random Field) to “watch”
and replicate the patterns of human workers. Each algorithm essentially competes with human work-
ers and with other algorithms to consistently meet the required standards of accuracy. Once Work-
Fusion identifies the winning algorithm, the platform sends an “Automation Notification” to the user,
an indication that ML has perfected the task. With one click, a business user can shift from a human
workforce to automation, lift human workers to higher value work, and eliminate the labor cost of the
data task.
WorkFusion dynamically evaluates the best models for new tasks and automatically transfers the
rules of proven algorithms to perform similar tasks, drawing in additional knowledge repositories,
features, toolkits, and worker results necessary for executing and optimizing the process. Once
automation algorithms are engaged, WorkFusion shifts focus from algorithm training to performance
analysis and improvement, all without disrupting business as usual. The system redirects the focus
of workers from doing the routine work used to train algorithms to performing the exceptions that
algorithms cannot perform.
Building a wide variety of distinct machine learning algorithms into one platform and automating the
trial-and-error matchmaking effort eliminates the need for ML IT projects within operations and gives
business users the power of a data science team.
WorkFusion eliminates the need for IT quality checks and automation maintenance by automatically
identifying and elevating exceptions to an available human data analyst. This pairing of machine and
{ }
?
Figure 1. A high level view of how ML and data analysts create a Virtuous Loop.
The Web contains the collective knowledge of the world. While search engines provide entry points
into general subject matter queries, such as locating a specific company’s website, searchable da-
tabases of domain knowledge has been an expensive mash-up of blunt instrument automation and
cumbersome human collection, extraction, validation, and enrichment.
Whether extracting a descriptive sentence from a company’s website or making sense of internal
streams of data, the problem and the intent is identical: turning free-form text into whole, meaningful
information by identifying the relationships between the parts. Traditional human or machine solu-
tions don’t scale, cost a lot, and break frequently. WorkFusion’s unique combination of human-ma-
chine computing solves this problem.
Real world examples of how it works: turning the free-form text into
machine-searchable databases.
A common application of WorkFusion is extracting descriptive information about real-world entities,
such as companies, a role traditionally requiring expert human analysts. Writing programmatic rules
to extract this information is nearly impossible, simply due to the vast variation writers use to express
their ideas. Consider these descriptions of WorkFusion:
The WorkFusion platform automates human data analysis by leveraging machine learning algorithms.
WorkFusion provide business users all of the tools they need to optimize information processes and
better manage global workforces.
To tell whether or not two phrases are similar is an inherently complex, non-quantitative, subjective
decision.
Provided the business user can articulate their intention through instructions to be done by human
data analysts, WorkFusion can match the intent and meaning of the text selected by human data
analysts and continue the work automatically and with consistent quality. The algorithms reflexively
adapt throughout the process.
WorkFusion accomplishes this by fusing advanced natural language processing (NLP) with ML to
identify the essential meaning of the sentences and choose those which best describe the company.
As human data analysts extract the sentences the business user needs, WorkFusion guides algo-
rithms to program themselves to build a consensus of correct decisions after weighing hundreds
of thousands of features learned after watching a wide enough variety of patterns to reliably auto-
mate the many variations in the work.
Step 2: The user selects pre-built templates for machine-assisted human data analysts to locate and verify
company websites.
Step 3: The user selects templates for human data analysts for extracting sentences which provide
description of the company, or its products and services, geographic service areas, executives, etc.
Figure 2.
Semi-automated Phase: This company website contains a quality description of the
company. Human analysts transcribe these sentences into a
Step 6: WorkFusion’s learning algorithm searchable database. WorkFusion’s learning algorithms auto-
begins automatically selecting potential mate this process, reducing cost by an order of magnitude.
descriptions from company websites. Work-
Fusion uses the original template to distribute these sentences to human data analysts. These tasks
of validation are faster and simpler, resulting in an immediate drop of cost from 25 cents/company
to 5 cents/company. Additional learning algorithms begin training on approximately 2000 additional
companies using this data.
Automated Phase:
Step 7: After additional training, WorkFusion’s learning algorithm begins automatically processing
23% of company websites by extracting descriptions at a quality equal to or higher than the human work-
ers. The remaining 77% of websites are handled using Step 6, resulting in a final cost of 4 cents/company.
As human workers continue to solve the harder cases and the learning algorithm improves, the
percentage of websites processed automatically increases and the costs drop.
WorkFusion produces new company descriptions at a rate of 1,000 per hour at a cost of $15 per hour.
Step 2: The user selects a pre-built instructions template for human data analysts for extracting ex-
ecutive information from corporate bios – e.g., degrees, majors, school names, year graduated, and
distinctions.
Step 3: The user adds the desired data attributes to a modular graphical user interface for the human
workforce. See figure 3.
Step 4: The user deploys the workflow into production, and WorkFusion distributes the work to human
data analysts.
Step 5: WorkFusion’s learning algorithms train on approximately 200 human extractions, and a
Marcov model proves to be successful at extracting the attributes from the text.
WorkFusion has automated repetitive work and optimized human work for the world’s largest data
vendors, global banks and investment businesses, retailers, and consumer packaged goods compa-
nies. Use cases range from simple but high-volume scraping and analysis of website data to highly
complex extraction of high-stakes financial data from digitized documents.
WorkFusion is optimal for collecting data that requires a shorter amount of time for human workers
to find and a much longer amount of time to manually enter into a structured format. For example,
finding the 12-digit alpha-numerical International Securities Issuance Number (ISIN) code within a
paragraph of text might take only a second for a human, but keying it in might take 30 seconds with vari-
able accuracy. WorkFusion quickly and confidently automates this kind of find-and-key-in sort of work.
Machine learning and the pairing of smart machines with smart people is quickly evolving from a
competitive advantage for early adopters to standard operating procedure across data-driven indus-
tries. It will change the shape of the human workforce at enterprise businesses from a triangle to a di-
amond: machines will automate the tedious work at the bottom of the workforce pyramid and elevate
data analysts to the service of customers and driving revenue for the business. Gartner’s esteemed
Digital Workplace analyst, Tom Austin, succinctly summarizes the enterprise mandate for adopting
machine learning in his report, “Top 10 Strategic Technologies – The Rise of Smart Machines:”
Machine Learning is an exponential and transformational technology when deployed in the right way,
and WorkFusion has proven that the right way is a Virtuous Loop of man-machine collaboration
enabled by an intuitive, built-for-business platform. We welcome your questions, interests and ideas at
learn@workfusion.com.
Adam Devine leads product marketing for WorkFusion, a software platform that automates manual
work and optimizes human work through machine learning. He is responsible for identifying and
educating data-intensive businesses seeking new ways to radically reduce data operations costs and
improve data quality. Adam has 15 years of experience growing businesses through product market-
ing, including product positioning, market intelligence, messaging, and content creation. He began
his career in management consulting at BearingPoint’s Banking & Capital Markets practice. Adam
speaks frequently about human-in-the-loop computing, machine learning, and smart automation at finan-
cial industry conferences, including FIMA, FISD, Massolution, MarketTech, NAFIS, NFAIS, and SIIA.
He can be reached at adam@workfusion.com.
David Bernat leads research and development for WorkFusion. The concept driving our research
team is simple: WorkFusion built our platform by integrating human annotators, automation pipelines,
and statistical quality control for limitlessly general workflow architectures and unstructured information
sources. Our human computation platform gives us the power to reinvent machine learning solutions
for web scale language processing, real-world image recognition, crowdsourcing task management,
and operations research. We build scalable and innovative technology in-house, and build strong
connections to complementing technology and academic faculty.
David has a doctorate in physics and astrophysics from Cornell University and a bachelors of science in
physics from the Caltech, and has previous experience as a research engineer at Google AI, as a FX/FICC
Strategist at Goldman Sachs, and as chief executive officer of a research team designing small satellites
for agriculture applications. He frequently speaks at conference and University seminars. He can be
reached at david@workfusion.com.