Download as pdf or txt
Download as pdf or txt
You are on page 1of 32


2013 Splunk Inc.

Big Data at the Speed of Business

Isaac Mosquera
Director of Mobile, ShareThis

Clint Sharp

Principal Big Data Product Manager, Splunk

What Well Talk About

Our quest for visibility Analyzing at scale Splunk and Big Data Where do you start? Q&A

About Splunk
Company (NASDAQ: SPLK)
" "

Founded 2004, rst so?ware release in 2006 HQ: San Francisco Industry-leading machine data plaHorm On-premise, in the cloud and SaaS 63 of the Fortune 100 Largest license: 100 Terabytes per day

Business Model / Products

" "

5,600+ Customers
" "

#1 Big Data Innovator*

* Fast Company's Most Innova1ve Companies Issue (March 2013)

About ShareThis and Socialize


ShareThis makes the world more connected, trusted and valuable through sharing Powers the social web, touching the lives of 95 percent of U.S. Acquires Socialize, which makes mobile and social more engaging Socialized integrated into thousands of iOS and Android Apps Installed on 80M+ devices





Evaluating 20 Billion

Ad Impressions Monthly

Little Bit About Real-Time Bidding

Ad Impression Ad Click

Ad Request Winning Bidder's Ad

Ad Request Bid Response

Socialize Bidder

All this needs to happen in less than 100 milliseconds!

So What Are Some of the Problems?

" IngesYng more than 10,000
queries per second " Which bids are > 100ms " Quickly nding any errors within the system

Decision Making (Bid Algorithms)

" Campaign spending " Campaign eciency " Dissect data by:
apps users devices

Analyzing Big Data Efficiently

1. 2. 3. 4.



AnalyzaYon/ AggregaYon


Some Options
RDBMS RDBMS NoSQL SQL funcYons like count() presents problems at scale

Write operaYons too high for a single DB, as well as a single point of failure Would work well for high inserts and queries, however we would need to build alerYng, charYng and reporYng dashboards Easy to setup and query using Hive however we would have to setup a new environments and learn new technology


Splunk Fits the Bill

OperaTonal ReporTng AdHoc Queries ApplicaTon ReporTng Scalability Easily idenYfy problems and prevent erroneous spending. When an alert goes o we hit a script which shuts o the bidder. Allows us to nd pacerns in the data to improve our bid algorithms Instantly know campaign metrics for us and our clients Adding new RTB Service providers means billions of new ad requests. Scaling horizontally is key

index=ad_events displayed_ad | bin _time span=1m | stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time | mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
Indexer Indexer Indexer Search Head RDBMS (Generated Reports)

Using Splunk to Analyze Operational Data

InteracYve analysis with Search Processing Language:
source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev(ResponseTime) as stdev_rtime

Easily digest informaYon through charts

Final Architecture
Socialize Bidder

Indexer Indexer Indexer Memcache

Cache Cluster
Memcache Memcache

S3 Snapshots

Search Head

RDBMS (Generated Reports)

So, What is Splunk?


Expanding Universe of Data Sources

2012-12-05 07:04:44 Id=00Q000000Rd910EAJ City=New York Country=US CreatedDate=2012-12-05 07:06:44 Email_Opt_In_c Customer_Street _Address_c=123 Main St. purchased_product_id= product_i BD-01 twitter_username john_t_doe

Business ApplicaTon Data

Highly Structured

Machine-generated Data

Human-generated Data
Arbitrarily Structured

Industry Leading Platform for Machine Data

Any Machine Data Operational Intelligence

Ad hoc search

Monitor and alert

Report and Custom analyze dashboards

Developer Pla^orm

HA Indexes and Storage

Commodity Servers

Analyzing Heterogeneous Data

Universal Index Schema-on-the-y Flexibility and Fast Time to Value
NormalizaYon as its needed Faster implementaYon Easy search language MulYple views into the same data

No data normalizaYon AutomaYcally handles Ymestamps Parsers not required Index every term & pacern blindly No acempt to understand up front

Structure applied at search-Yme No bricle schema to work around AutomaYcally nd transacYons, pacerns and trends

Gain Critical Insights in Real-time

Order Processing

Customer ID

Order ID

Product ID

Order ID
Middleware Error

Customer ID

Time WaiYng On Hold

Care IVR

Customer ID Twicer ID Customers Tweet


Companys Name

Deep Visibility and Insight for IT and Business

IT OperaYons Management ApplicaYon Management Security and Compliance Web Intelligence Business AnalyYcs Industrial Data / Internet of Things

Over 5,600 organizations using Splunk across IT and business users

from Big Data

Driving Insights

The ShareThis Insights Platform

On Fathers day: Who were the most shared about topics? ? What type of type of beers do people drink?

Hadoop API ETL

Pre- aggregaTon AnalyTcs

Finding the Optimal Approach

What should be the core focus or competency of your team?

Hadoop and MapReduce are great for complex data science on data at rest the previous architecture took 9 months with a team of engineers, data architects, etc. The Splunk plaHorm delivers real-Yme, interacYve analysis we can build many of the same insights within 1 hour Conclusion: nd the most opYmal approach for the business


What About
Ad Hoc Analysis?

PR Insights Example
" " " "

What was the situaTon? (e.g. fast moving business, needed real-Yme insights) What was the PR team struggling with? Dicult to nd useful data to build interesYng use-cases What did they want? They wanted a exible real-Yme reporYng environment to extract insights useful for the market How my team helped? Delivered a single dashboard that contained real-Yme data into the sharing behaviors across our network

PR Insights Dashboard

Lets not forget

The low-hanging fruit

Operational Analytics for an Online World

Driving Superior Customer Experience

How many 500 errors have I had over Yme?

Look for anomalies and spikes!

Zone in directly to the customer!

Online Device NoYcaYons

NoTcaTons Systems
API NoYcaYon Apple (APNS) Feedback Processor Google (GCM)

One More Thing


Copyright 2013 Splunk Inc.

Announcing Hunk Beta

New product from Splunk delivers interacTve data exploraTon, analysis and visualizaTons for Hadoop

Splunk AnalyYcs for Hadoop

Derive Actionable Insights from Raw Data


Point Splunk at Hadoop Cluster

Explore Analyze Visualize Dashboards Share

Immediately start exploring, analyzing and visualizing raw data in Hadoop

Hadoop Storage

Learn More

Copyright 2013 Splunk Inc.


You might also like