Splunk 4 Ninjas - ML: Hands On Intro To Splunk Machine Learning Toolkit

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

© 2019 SPLUNK INC.

Splunk 4 Ninjas - ML
Hands on Intro to Splunk Machine Learning Toolkit

14 October 2020
© 2019 SPLUNK INC.

Forward- During the course of this presentation, we may make forward-looking statements regarding
future events or plans of the company. We caution you that such statements reflect our

Looking current expectations and estimates based on factors currently known to us and that actual
events or results may differ materially. The forward-looking statements made in the this

Statements presentation are being made as of the time and date of its live presentation. If reviewed after
its live presentation, it may not contain current or accurate information. We do not assume
any obligation to update any forward-looking statements made herein.

In addition, any information about our roadmap outlines our general product direction and is
subject to change at any time without notice. It is for informational purposes only, and shall
not be incorporated into any contract or other commitment. Splunk undertakes no obligation
either to develop the features or functionalities described or to include any such feature or
functionality in a future release.

Splunk, Splunk>, Turn Data Into Doing, The Engine for Machine Data, Splunk Cloud, Splunk
Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States
and other countries. All other brand names, product names, or trademarks belong to their
respective owners. © 2019 Splunk Inc. All rights reserved.
© 2019 SPLUNK INC.

• Welcome / Introduction
• Intro Machine Learning @ Splunk
• Demo Machine Learning Toolkit with Q&A
• Intro to the Trackday Dataset
• Four Different Challenges (~ 30min each)
• Challenge 1

– Explore the track_day.csv Dataset
• Challenge 2
– Detect Numeric Outliers
• Challenge 3
– Supervised Learning: Predict Categorical Fields
• Challenge 4
– Unsupervised Learning: Clustering

• Wrap Up, Discussion and Feedback

© 2019 SPLUNK INC.

Who are we?

Tanzil Kazi Jenny Seow Dean Moreton

Host Panelist Panelist
© 2019 SPLUNK INC.

What this session is not about and what it is about

• NO replacement for a PhD in machine learning, data science or AI

• NO replacement for Splunk’s Education class for Data Science
• NO comprehensive lecture about all possible concepts and algorithms in ML … but,

• YES first introduction into Machine Learning @ Splunk

• YES getting to know of Splunk’s Machine Learning Toolkit
• YES guided hands-on challenges to explore a few typical ML tasks
Start your own Splunk

© 2019 SPLUNK INC.

© 2019 SPLUNK INC.

Machine Learning Tour

© 2019 SPLUNK INC.

Splunk customers want answers from their data

Anomaly detection Predictive Analytics Clustering

► Deviation from past behavior ► Predict Service Health Score/Churn ► Identify peer groups
► Deviation from peers ► Predicting Events ► Event Correlation
► (aka Multivariate AD or Cohesive AD) ► Trend Forecasting ► Reduce alert noise
► Unusual change in features ► Detecting influencing entities ► Behavioral Analytics
► ITSI Metric Anomaly Detection ► Early warning of failure ► ITSI Event Analytics

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Types of Machine Learning

Supervised Learning Unsupervised Learning Semi-Supervised Learning
(labeled data) (unlabeled data) (with reinforcement or feedback)

► Regression ► Clustering ► Human in the Loop

► Classification ► Anomaly Detection ► Autonomous Systems

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Overview of Machine Learning at Splunk


Platform for Operational Intelligence

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Skill Areas for Machine Learning @ Splunk

• Identify use cases
Premium solutions • Drive decisions
provide out of the box
Domain • Understanding of business

ML capabilities. Expertise
(IT, Security…)
ITSI Splunk ML Toolkit
UBA facilitates and simplifies
via examples & guidance
• Science
• Reporting Expertise • Statistics/math background
• Alerting • Algorithm selection
• Workflow • Model building

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

What Data Scientists Really Do

Data Preparation accounts for about 80% of the work of data scientists

“Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, Forbes Mar 23, 2016

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Custom ML with the Splunk Platform

Splunk’s App Ecosystem contains 1000’s of free add-ons for getting data in,
Ecosystem applying structure and visualizing your data giving you faster time to value.
The Machine Learning Toolkit delivers new SPL commands, custom
MLTK visualizations, assistants, and examples to explore a variety of ml concepts.
Splunk Enterprise is the mission-critical platform for indexing, searching,
Splunk analyzing, alerting and visualizing machine data.

Operationalized Data Science Pipeline

Collect Clean & Search & Pre-processing Choose Build, Test, Operationalize Visualize &
Data Munge Explore Feature Selection Algorithm Improve Models Monitor Alert Share

Ecosystem Ecosystem MLTK MLTK MLTK MLTK Ecosystem

Splunk Splunk Splunk Splunk Splunk Splunk Splunk

Platform for Operational Intelligence

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.
Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Continuous Data Ingest at Scale

Engineers Data Security Business
Analysts Analysts Users

Industrial Data
SCADA, AMI, Meter Reads
Industrial Assets
Native Inputs
Search Alert Visualize Predict Develop
TCP, UDP, Logs, Scripts, Wire, Mobile

Consumer and
Mobile Devices Modular Inputs
Real Time

OT HTTP Event Collector
Token Authenticated Events
Asset Maintenance Data
Info Info Stores
Technology Partnerships
IT Kepware, AWS IoT, Cisco, Palo Alto

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Sense and Respond

Every Search Can Use
Machine Learning
Flash lights

Send an
Industrial Assets

File a
Consumer and ticket
Mobile Devices Real Time Search Alert Tickets

process flow
OT Applications

Send a text
IT and Devices

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

MLTK + Python for Scientific Computing

| fit y from x* into “model” Python for Scientific Computing

Industrial Assets | apply “model”

Consumer and persisted model

Mobile Devices Real Time Search Alert




Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Splunk Machine Learning Toolkit (MLTK)

Extends Splunk platform functions and provides a guided modeling environment

Built for the Citizen Data Scientist

• Experiments and Assistants: Guided model building,
testing, and deployment for common objectives
• Algorithms: 80+ standard algorithms (supervised &

Extensible to operationalize any use case

• Python for Scientific Computing Library:
Access to 300+ open source algorithms
• Deep Learning Toolkit : Supports NN and GPU
accelerated machine learning
• ML-SPL API: Import any open-source or proprietary

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Example: MLTK powered DGA App for Splunk

Detect Malicious Domain Names using Machine Learning

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

Overview of ML including DL at Splunk

(not covered in this workshop)


Platform for Operational Intelligence

Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2019 SPLUNK INC.

DLTK for Splunk

| fit y from x* into “model”

Industrial Assets | apply “model”

Consumer and persisted model

Mobile Devices Real Time Search Alert




Joined late? Register on http://splunk4ninjas.com/5110/self_register/ to setup your own Splunk instance.

Environment takes 5-10 minutes to spin up. Credentials are admin/changeme, available for 24 hours.
© 2018 SPLUNK INC.

Hyatt & Splunk machine learning

1. Improve online check-in experience by using ML to determine
potential issues before they occur.

2. Predictive analytics used for hotel room occupancy.

3. Forecasting the likely Wi-Fi logins based on each property, day of
week, and local and global holiday out two days into the future to
show our expectations to executives.

4. Anomaly detection for security purposes.

© 2020 SPLUNK INC.

© 2019 SPLUNK INC.

Machine Learning Toolkit
© 2019 SPLUNK INC.

Before we get started…

> Follow along for the labs using the ‘handrail’ guide:
> Username: admin
> Password: changeme

> Quick reference guide for future reference, if you want to use the MLTK on
your own: https://bit.ly/MLTK_guide
© 2019 SPLUNK INC.

Hands-on Challenges
© 2019 SPLUNK INC.

Fun Facts about the Track Day dataset

A popular private event of racing and sportscar affine Splunkers in the early days.

Simple concept

Go on a race track, have

fun and collect some car
data to get insights about
driving behavior etc.

A subset of this data is

available in MLTK!

Image Source: https://www.youtube.com/watch?v=meBjI-ay9-U

Today’s Challenges
© 2019 SPLUNK INC.

We are going to create four dashboards:

Explore the Dataset : Create a sample dataset and

1 explore it using different types of visualizations
such as SPL

Detect Numeric Outliers: Explore the MLTK showcase

2 and adapt it to start a new experiment with your own

3 Use a Classification Model: Create a classification

model and use it to predict vehicle types from your
sensor data

4 Use a Clustering Model: Create a clustering model

and and use it to analyze your dataset We’re aiming for a
dashboard like this!
© 2019 SPLUNK INC.

Workshop Goals

• Getting to know Splunk in the context of Machine Learning

• Prepare and analyze a dataset and summarize results on 4 dashboards

© 2019 SPLUNK INC.

Challenge 1: Explore the dataset

Create a Sample Dataset
© 2019 SPLUNK INC.

2. Change to the
Search Tab

1. Access the Splunk

Machine Learning 3. Insert your
Toolkit search query

What’s the benefit of

? renaming variables?
Use Fieldsummary to Explore your Dataset
© 2019 SPLUNK INC.

Eliminate unwanted
Fields with
| fields - values

What’s going on
with the engine
Explore your Dataset with Visualizations
© 2019 SPLUNK INC.

Using Splunk MLTK’s Histogram Macro
© 2019 SPLUNK INC.


Check Macro in Check Macro with

Settings Cmd + Shift + E (Mac) or
Ctrl + Shift + E (Windows)
3 Adjusting the Histogram Macro
© 2019 SPLUNK INC.

How can we get from the

? top to the bottom
3 Adjust the Macro to Split by Vehicle Type
© 2019 SPLUNK INC.

| stats count by x_batteryVoltage

x_batteryVoltage count
| chart count over x_batteryVoltage by y_vehicleType

12.78-12.79 1 x_batteryVoltage Ferrari Audi BMW Chevrolet Ford

13.16-13.17 3 13 0 0 0 0 1

13.46-13.47 1 14 0 0 0 1 1

15 1 0 1 0 0

16 1 1 1 0 0

17 1 0 0 0 1
Working with the Boxplot Macro
© 2019 SPLUNK INC.

? How can this query be improved?

> Scale numeric values using
the fit command with the
Explore the Dataset with Box Plots
© 2019 SPLUNK INC.

> Standardized data fields

have a mean of 0 and a
standard deviation of 1
> The box plots are less
stretched and can be
analyzed more easily
© 2019 SPLUNK INC.

15 minute break
© 2019 SPLUNK INC.

Challenge 2: Detect Numeric Outliers

Detect Numeric Outliers:
© 2019 SPLUNK INC.

2 Explore the MLTK showcase and adapt it

to start a new experiment with your own dataset

> Explore the Outlier > Start your own Outlier > Optionally try to compare
Detection Showcases Detection Experiment different outlier detection
Explore the Outlier Detection Showcases
© 2019 SPLUNK INC.

> Switch to the Showcase tab

of the MLTK and explore the
assistant to detect outliers in
server response time
> We are now going to use
statistics to detect the outliers
Explore the Outlier Detection Showcases
© 2019 SPLUNK INC.

Pick an appropriate threshold method (E.g.

View the corresponding SPL Standard deviation +/- 3)
query to the assistant’s settings
Detecting Outliers with the Density Function
© 2019 SPLUNK INC.

> Switch to the Experiments tab of the MLTK and create a new experiment
> Instead of an approach based on statistics
we are now going to use the density function to detect outliers
Create Your Own Smart Outlier Experiment
© 2019 SPLUNK INC.

Click here to get to the
next step

Look up the dataset you want to work with

Create Your Own Smart Outlier Experiment
© 2019 SPLUNK INC.

Use these settings to get the result on the

© 2019 SPLUNK INC.

Challenge 3: Use a classification model

© 2019 SPLUNK INC.

SPL for MLTK: The fit and apply Commands

<your search> | fit <model name>
<your search> | apply <model name>

> The fit command produces a machine learning model based on the
behaviour of a set of events. It applies the model to the current search
results in the search pipeline
> The apply command applies the machine learning model
that was learned using the fit command
© 2019 SPLUNK INC.

SPL for MLTK: Adapting fit and apply

<your search> | fit StandardScaler <fields> into <model name>
<your search> | apply <model name> | `<macro name>`
<your search> | fit SVM “X X X" from “XXX" “XXX" kfold_cv=3
Check out the confusion
matrix and classification
statistics macros!
> The StandardScaler algorithm uses the scikit-learn
StandardScaler algorithm to standardize data fields
> Splunk’s MLTK allows you to cross-validate your models
right from the search queries that train them. Simply
specify the number of cross-validation folds you want
by setting the fit command’s parameter kfold_cv
Use a Classification Model:
© 2019 SPLUNK INC.

3 Create a classification model and use it

to predict vehicle types from your sensor data

> Explore the Classification > Put your Algorithm into > Optionally find a way to deal
Assistant Practice with model overfitting
Explore the Classification Assistant
© 2019 SPLUNK INC.

Option 1 – Create New Experiment

Option 2 – Use the showcase

‘Showcase à Predict Fields à Predict Categorical Fields à Predict Vehicle Make and Model’
Explore the Classification Assistant
© 2019 SPLUNK INC.

? Why is SVM doing so bad?

© 2019 SPLUNK INC.

3 Normalising data using pre-processing

Now run the same query again using SVM, use the SS_* fields for predicting and you should see much better results!
Alternatively, you can use the ‘RandomForestClassifier’ algorithm.
Save your Classification Model
© 2019 SPLUNK INC.

Publish your model in the

app of your choice
3 Apply your Classification Model
© 2019 SPLUNK INC.

3 Which Car Gets Classified Worst?
© 2019 SPLUNK INC.

? How can you find out where your model is off?
© 2019 SPLUNK INC.

Challenge 4: Use a clustering model

Use a Clustering Model:
© 2019 SPLUNK INC.

4 Create a clustering model and

use it to analyze your dataset

> Explore the Clustering > Cluster Analysis of the > Optionally try and detect
Assistant mytrackdata-Dataset outliers
Explore the Cluster Showcases
© 2019 SPLUNK INC.

> Switch to the Showcase tab

of the MLTK and explore the
assistant to identify clusters of
© 2019 SPLUNK INC.

The MLTK Comes with Many Different Algorithms

<your search> | fit PCA k=<int> <fields>

> Factor analysis with an algorithm such as PCA can

reduce the number of variables one must deal with
> The k parameter specifies the number of features
to be extracted from the data

? Why is there a cluster with "clusterId: null" ?

© 2019 SPLUNK INC.

The MLTK Comes with Many Different Algorithms


> We have missing values in

for that we didn't fix/impute in
© 2019 SPLUNK INC.

Let’s impute some values


| inputlookup mytrackdata.csv
| fit Imputer x_engineCoolantTemperature strategy="median"
| rename Imputed_* as *
| apply car_clustering_StandardScaler_0
| apply car_clustering
| table c* y_* SS_* *
| fit PCA k=3 SS_*
| rename y_vehicleType as clusterId, PC_1 as x, PC_2 as y, PC_3 as z
© 2019 SPLUNK INC.

Wrap Up
© 2019 SPLUNK INC.

Wrap Up

• Don’t boil the ocean: start small or modify existing showcase examples for
some quick wins.

• Docs are your friend: in case you need help, the documentation is pretty
comprehensive. Also conf.splunk.com has > 100+ sessions on ML.

You want to learn more about Splunk’s Machine Learning?

► Check our latest Splunk Blogs around Machine Learning
► Watch videos from Splunk Machine Learning YouTube Channel
► Take the Splunk Education Class for Data Science and Advanced Analytics
► Learn more about Splunk’s Machine Learning Advisory Program
© 2019 SPLUNK INC.
© 2019 SPLUNK INC.

Thank You
© 2019 SPLUNK INC.

Additional Information


► Username: admin
► PW: changeme

Challenge Solution Examples:

We created a dashboard for each challenge with example solutions in the hidden
app “Splunk 4 Ninjas Machine Learning”. Use this app for preparation, debriefing
after the challenges or as assistance for unexperienced attendees.
► http://{your-host}:8000/en-GB/app/s4n_ml/splunk_4_ninjas_ml
► or click button next to “Splunk 4 Ninjas Machine Learning” on top of Home Dashboard

You might also like