Capturing & Analyzing

High Velocity High Volume

Machine Data

December 3, 2013
Jason Lobel

Internet of Endpoints

Everything (IOE)
Data & Machines
Data is

Primarily sensor-based



Machine readable (API)

Accessible on-demand
Possibly even open (Public)

Includes non-machine generated data or

streaming data (catalogs, locations,
historical data, etc.)

Collect > Unify > Transform > Report > Predict

Capturing Streaming Data Considerations

Smart storage / backend setup is a key catalyst for downstream analysis

Backend Architecture
NoSQL datastore

Why Important
Long-term scale with data volume

High availability

Ideal for unpredictable demand
No joins for queries in reporting

Auto scaling cloud hosting

(AppEngine, AWS)

Spend less time on server tuning

Enable REST APIs

Enable JavaScript & mobile applications

Writeable and Retrievable

Real-time data


Power dashboards or visualizations

APIs for history, real-time, query (SQL), Tracking/ How is data consumed
and even predictive
Unify with other sources
OAuth2.0 Security

API management
Multi-party (internet/external) access

Dedicated caching

Faster data retrieval speed

APIs Fuel Any Channel & Big Data Analytics

Public vs. Private: Estimate 10x more private APIs
Open: Gartner predicts 75% of the Fortune 500 are predicted to have open APIs by 2014
Competition: By 2015, APIs will be default, like websites in 2000 (Kin Lane, ex White House Fellow)

Growth In Public APIs

Unify IOT Data with Other Sources

APIs Fuel Interactive Visualizations

D3.js (
JavaScript library for manipulating documents using HTML, SVG and CSS

APIs => Programmable => Smart Controls

Make Apps Smarter with Machine Learning

Analyzes users' preferences and finds items users might like
Frequent Pattern Mining:
Discovers unique frequently co-occurring items in a transaction list

Learns from existing categorized data
and assigns a category to
uncategorized data

Organizes items from a large volume of data into groups of similar items
and features

Machine Learning Algorithm APIs?





Finding a data scientist

Finding an engineer that can use an API

Training (if needed)

Database selection
Algorithm(s) selection
Model training & iteration
Embedding predictions into applications
Query speed / caching
On-Demand Access


Common ML Applications for Retail

Item Recommendation: observes what the user likes and finds similar items
(I like the Chicago Bulls, I may like the Chicago Bears)
User Recommendation: recommend items finding similar users and sees what
they like (e.g., Kin and I are friends. He likes IPAs. I may like IPAs)

Item/Action Affinity: if X user wants X, what else is Y user likely to want based on
the relationship between X and Y (men
who buy diapers, also buy beer)
Predict Inventory: based on history, predict future sales (next 7, 30 days, etc.)
Discover Customer Segments: examine purchasing habits to identify clusters of
shopper segments
Prevent Fraud: identify anomalies in cashier activity, such as voids (is this likely
fraud? yes/no)

What We Do with Streaming Data

Focus = at least one massive data source can be transformed into many
insights that were not possible before at a fraction of the cost of legacy tools
Supermarkets: point-of-sale data, product catalog, sensors, etc.
eCommerce: web behavior, point-of-sale data, product catalog, etc.

Supermarket / C-Store

Before SwiftIQ
Unable to store POS order and cashier history
After SwiftIQ
Detailed transaction history available on-demand
Able to pursue real-time supply chain initiatives
Now can analyze product affinity to plan merchandising
strategies, promotions and optimize localization
Capable of visualizing data or generating interactive reports
Able to better predict inventory requirements
Better optimize hiring
Identify cashier fraud

Before SwiftIQ
Unable to unify disparate data (POS, web, mobile, CRM)
Unlikely to store web behavior
After SwiftIQ
Enable relevant, personalized digital experiences
Know specific customer segments vs. using intuition
Analyze product affinity to plan merchandising strategies,
promotions and optimize localization
Capable of visualizing data or generating interactive reports
Able to better predict inventory requirements

