Splunk Fundamentals

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 9

splunk fundamentals

chap1 - what is machine data

buttercup games - international with tons of machine data from web servers, sale
servers, badge readers, security appliances, voicemail
-splunk takes a bunch of data, adds structure to unstructured data
-not jsut app issues: security, user behavior, sales, hardware monitor
-translates a huge amount of info
-machine data makes up for than 90% of data accumulated by organizations

chap2 - what is splunk?

index data, search & investigate, add knowledge, monitor and alert, report &
analyze

-indexer: factory takes raw materials (data), determines how to process it, labels
with sourcetype
events are stored in the index
-searcH: values across multiple data sources, run statistics using splunk search
-knowledge objects: classify, enrich, normalize
-monitor: can identify issues before impact, create alerts for specific conditions,
automatically respond
-reports: dashboards, visualizations

indexers, search heads, forwarders

indexers: store results in indexes as events, organizes for search, only needs to
open efficiently
search head: use splunk search language, handle search requests for indexes,
consolidate/enrich
-dashboards, reports, visualizations
forwarder: splunk enterprise instances, consume data, forward to indexes
-minimal resources, usually resides on machine (such as web server)

deploying/scaling > single instance or full distributed


-single: input/parsing/indexing/searching
good for proof of concept, personal use, learning, small environments
-multiple search heads, indexers, which can be clustered for always availability +
searchable

search requests are processed by the indexers


search strings are sent from the search head
clustering is NOT part of a single instance deployment

chap3 - installing splunk single instance deployment

linux: get software from splunk.com > free splunk > free download > linux >
download deb/tgz/rpm
-can also use wget
-should not be done as root user
-sudo tar xvzf splunk-6etcetcetc -C /opt (untars it in /opt directory)
-cd /opt/splunk/bin
./splunk start | stop | restart | help
./splunk start --accept-license

windows: gui or cmd > double click, accept license > customize|install
-change install location > local system|domain account
OSX: dev/testing usually > free splunk > Mac OS (tgz/dmg disc image)
-cd /Applications/Splunk/bin . sudo ./splunk start

splunk cloud: created by splunk, removes infrastructure requirements, 5GB per day
15 days
-30 day free trial, view my instance, accept license

splunk apps and roles > admin changeme


-app: preconfigured environment, built to solve specific use cases, defined by user
w/ admin role
-roles: determine what user sees/interacts with
-admin, power, user
admin: can install apps, create knowledge objects
power: can create and share knowledge objects, do searches
user: only see their own KOs, and those shared with

two apps: Home app (to launch/manage splunk apps), admins can add apps
Search & Reporting: done by power user, splunkbase contains hundreds more

admin vet0pr00f

launch and manage apps from home app: true


default username/pw for newly installed splunk: admin/changeme
Roles define what users can do
Home/Search & Reporting are what ship w/ splunk enterprise

chap 4 - getting data in, types of data input

admin: admin users get data in there, users typically dont but its good to know how
-add data: upload (gets indexed once), monitor (monitor files/directories, scripts,
windows specific data (event logs), forward (receive data from external forwarders,
installed on remote machines, forwarded to indexers)
-used as main source of data input

upload: not useful in production, but useful for dev + testing


-customer survey data from a focus group? upload > next
-sourcetype used for categorizing data (sourcetype csv detected, can select
manually)
-save as sourcetype, or change name + add description, category, app context
(instrumentation/monitoring/search&reporting, which app is it applied to)
-system is a context that is all systems
-host field: machine from where it originates
-indexes: directories where data is stored

web data index, main index, security index


-breaking them apart returns relevant events, allow access by user role (who sees
what data)
-some data requires different time intervals, retention policies per index

monitor: files or ports on an indexer, similar to upload but choose source to


monitor
-files/directories/http events/tcp+udp/scripts
-apache log = files and directories
-continuously monitor or index once (see events as they happen or see snapshot)
-can whitelist/blacklist certain files in the directory
-choose hostname, system, app context

forwarder: not in scope, but minimal resources installed on many host machines
quiz chap 4:
splunk uses sourcetypes to categorize data
uploaded files get indexed ONCE
in production, FORWARDER data is the source of data input

chap 5 - basic searching, create knowledge objects, reports, dashboards, etc


-7 unique items: splunk bar, switch between apps ,edit account, settings, monitor
search jobs, help
-app bar: navigate the app
-search bar: to run searches
-time range picker: specific events over period
-how to search panel: docs/tutorial
-what to search: summary of what is indexed (data summary, host/source/sourcetype)

-source: file or directory path, port, script


-host: hostname (ip or fqdn)

search history: view/re-run past searches

limit by time is best practice, search becomes a job, contains Save as, search
results, timeline
-save search to Knowledge Objects

patterns tab: are there patterns?


statistics/visuals: if not, pivot/quick reports/search commands
-if yes, they are called TRANSFORMING commands

pause stop share export print, jobs remain active for 10 minutes
-shared search jobs last 7 days, readable to all (who are shared with)
-export: in Raw, csv, xml, json

3 search modes: fast/smart/verbose


-fast: cuts down on field info, field discover is disabled, fields required
-verbose: as much field/event as possible
-smart: toggles behavior based on search

timeline: click+drag = select a time range, zoom+in-out uses original search job
for zoom in
-zoom out requires job to be re-run

events returned in reverse chronological, time is normalized (timestamp shows


timezone of user account)
-host/source/sourcetype is default selected fields
-add to search, exclude from search, start a new search by mouseover

wildcards: fail* (failed, failure, failed)


booleans: AND OR NOT - failed NOT password, all events with failed but NOT password
-order of operation: NOT > OR > AND, parenthesis can be used
-failed NOT (success OR password)
-escape is used for quotes if you need to ACTUALLY search for quotes
failed \"chrisv4\" (would find "chrisv4")

opening a search from job save does NOT re-execute it

chap 5 quiz:

search is sent to splunk, it becomes a 'search job'


NOT, then OR, then AND (AND is implied and therefore others take precedence)
events are returned in REVERSE chronological order (newest first)
shared search jobs remain active for SEVEN days by default (SHARED search jobs)

chap 6: using fields > fields sidebar, fields extracted at search


-selected fields, interesting fields
-selected: host/source/sourcetype
-interesting: values in at LEAST 20% of events
a denotes string, # denotes numeral
-values/count/%, can add to selected fields, quick reports vary by value
-clicking on one makes a TRANSFORMING search, statistical data
-persists for subsequent searches
-can use 'all fields', 'more searches'

more effective searches with fields

sourcetype=linux_secure
host!=mail*

chap 6 quiz:

1301 events
if you add a search from search history, the default time of 24 hours is specified,
not time of original search
-nor is search executed

wilcards CAN be used with field searches


a dest 4 (string search containing 4 values/results)
field values are NOT case sensitive
field names ARE case sensitive

so basically the difference is, the results are not sensitive to case, but the
search for those results is

chap 8: SPL fundamentals

search language, built of 5 components


-search terms, commands (charts), functions (how we want to use the charts),
arguments, clauses

stats list(product_name) as "Games Sold"


| pipe passes into the next component
booleans show in orange, commands are in blue, arguments are green, functions are
purple
-parenthesis highlight and match, troubleshoot searches
-ctrl key + \ = moves pipe to new line

search stores results in memory w/ time limiter, makes 'spreadsheet' smaller


-pipes in search commands shorten this search
-once removed, no longer available for subsequent searches

Fields command: include/exclude specific fields

sourcetype=access_combined | fields status clientip


sourcetype=access_combined | fields - status clientip (removes these from the list)
raw and time are always included
-using -_raw removes these fields
exclusion occurs AFTER extraction, does not improve performance
Table command: specified fields are contained in tabulated format
| table JSESSIONID, product_name, price = creates a spreadsheet in the order you
specified

Rename command: rename fields, JSESSIONID for example


- | rename JSESSIONID as "User Session" product_name AS "Name of Product"
-new field names would be used further down the pipeline
for example, | fields - JSESSIONID would no longer function because it is now "User
Session"
-must be wrapped in quotes

Dedup - remove duplicate events


sourcetype=history* Address_Description="San Francisco" | dedup First_Name Last
Name | Table Username

Sort - display in ascending or descending order


| Sort Vendor product_name (sorts by vendor then product name)
sort + sale_price for ASCENDING
sort - sale_price Vendor in DESCENDING order (shows largest value first)

space in the +/- makes it affect ALL fields, but if you remove the space, only
affects that field
sort -sale_price +Vendor

sort -sale_price limit=20 (only shows 20 events)

chap 8 quiz:

excluding fields does NOT benefit performance, because they must be searched then
discluded

for table User IP, quotation marks is missing -> must be table "User IP"
| fields - status is the way to remove the status field, not using NOT status..
this is a field type not a search term

chap 9 - transforming commands - data table for statistical for visualizations


-top: most common values in a result field set (sourcetype=vendor_sales | top
Vendor) > count + %, limit=0
--countfield/percentfield (change column name)
-sourcetype=vendor_sales | top Vendor limit=5 showperc=False countfield="Number of
Sales"
useother=true (show results not in limit)

rare command, shows least common values


-sourcetype=vendor_sales | rare Vendor limit=3

stats command: count, distinct count, sum, average, list


-count: # of events
-dc (distinct count) unique values for a field
-avg: average of numerical values
-list: all values of a given field

sourcetype=vendor_sales | stats count as "Total Sells By Vendors" by product_name


| stats count(action) AS "Action Events"

sourcetype=vendor_sales | stats distinct_count(product_name) as "Number of games


for sale" by sale_price
sourcetype=vendor_sales | stats sum(price) as "Gross Sales" by Product_Name
| stats count as "Units Sold" sum(price) as "Gross Sales"

sourcetype=vendor_sales | stats avg(sale_price) as "Avg Selling Price (missing or


not valid will not be added)

sourcetype=asset_list | stats list (Asset) as "company assets" by Employee


--this would group all Assets owned per employee
--stats value function works like list, but only unique values

sourcetype=cisco_wsa_squid | stats values(s_hostname) by cs_username


-see all sites a user has been on, showing number of unique sites

chap 9 quiz:
sourcetype=vendor* | stats count _AS_ "Units Sold" (this renames count of vendors
to "Units Sold"
most common values = top
avg = average
Addtotals = NOT a stats function
top/rare have TEN results by default

chap 10: reports & dashboards - can save/share searches w/ reports: save as: Report
w/ title
-yes/no on time range picker
-report shows a 'fresh' set of results, can change range if yes
-reports tab of application menu > open in search
-edit menu: description/permission/schedule/clone/embed/delete
-poweruser: can allow read/write
-run as: owner or user (user = only data user has access to)
-accelerated: smaller searches
-save as: visualization, text, both

visualizations: statistical values can be viewed as a chart


-ip > top values, pie charts
-numbers/time/location
-map visualizations are interactive: drill-down, can save as report/dashboard panel

dashboards: collection of reports


count of products sold by product name
-select a visualization that fits data, customize w/ format/chart
-save as dashboard panel > new > column chart in visualization
vendor sales by product over 7 days, save as dashboard panel to SALES panel
previously created
-keep makin visualizations to same dashboard

edit: can click and drag panels using edit bars


-can change visualization/format/drilldown
-add panels in edit mode, new, clone

create a time range picker, but then tie each panel to the time range picker, it
will update all panels tied to
-dashboards menu is a location of where you can access

chap 10 quiz:

admin/user/power can ALL create reports


dashboards: a collection of reports
time range picker can be included in a report: true
charts can be: numbers, time, location
if search returns statistical values, can be viewed as a chart

chap 11: pivots and datasets > pivot allows users to design reports w/out searches
-data models: KOs that drive pivots, created by admins/power users
-basically an easy way to modify reports
-settings > data models > pivot
-count, tools to filter, visualizations, all time
-can create filters based on field, can use IS/ISNOT/CONTAINS
-can use sidebar to visualize, can save to add to report/dashboard
-no data model? instant pivot

instant pivot: all fields, selected fields, or fields with a selected coverage%

datasets: allow users to gain access to small sets of data


-field names are column headers, summarize.
-explore: visualize w/ pivot
using instant pivot, no data model is used, it is generated

-splunkbase > datasets addon for rapid building of data models

chap 11 quiz:
pivots can be saved as reports/dashboards
data models are KOs that provide data structure for a pivot
data models are made up of datasets
instant pivot is displayed when using a NON-TRANSFORMING search (basically helps
you get there)

module 12: lookups, adding fields/values not included in the index


-csv/scripts/geospatial data
-product id with names for example, categorized as a dataset
-define a LOOKUP TABLE, then define lookup, can be configured automatically

sourcetype=access_combined status=xxx
csv with code,description
100,Continue
200,OK
300,etc

create a lookup table > settings > lookups > add new > choose a dest app (only
avail to that app) > find file > give name for file on server
-can move to another app, delete
-verify it is working using | inputlookup filename.csv

now define lookup > settings > lookups > add new > dest app > name > file uploaded
to server
-time-based: if this field involves time, case sensitivity
-batch index query imrpoves perf

now that lookup table is made and lookup is defined


- | lookup http_status code as status
output can be used
OUTPUT code as "HTTP Code",description as "HTTP Description"
if http code already exists, will be overwritten unless you use OUTPUTNEW

automatic lookup: settings > lookups > automatic lookups > dest app > choose name >
choose lookup > sourcetype
-code = status
-lookup output: code=Code, description=Description
-now searches can automatically use those values rather than having to use OUTPUT

additional lookups: populate lookup w/ search results, external script, geospatial

chap 12 quiz on lookups:


a lookup is characterized as a DATASET
first row in csv lookups represents FIELD names
inputlookup http_status.csv to check that a lookup was added
external data used by a lookup can come from: geospatial data, csv files, scripts
outputnew is used to create new fields rather than overwrite existing

chap 13: scheduled reports + alerts


-reports: runs on a weekly schedule, sending reports via email

create a search > save as report > time range picker > schedule > enable >
frequency
-time range is relative to the schedule
-schedule priority: default > higher > highest
-window: report will be delayed as long as it falls within window, only if you're
okay w/ delay
-send email/run script/write to csv lookup

manage scheduled reports > settings > searches+reports+alerts


-description, search, and time range
-disable, clone, embed, disable, delete
-edit permissions: who sees what results

search and reporting options: owner/app/all apps, can also set read/write for the
report
-run as: owner or user, access of user
-embedded must be SCHEDULED before it will work

splunk alerts: once reports complete and criteria are met


-list in interface, log events, send emails, scripts, webhooks, customer alert
-status=5* > save as alert > web500error > private (only you)
-scheduled (how often search is run)/realtime (search runs continuously, lots of
overhead)
-trigger condition: per result, # of results, # of hosts, # of sources, or custom
conditions
-trigger # of results > 1 in 60 minutes
-once (once within scheduled time) or for each result (every time condition is met)
-throttle: can suppress further

log event: sent to indexer, run a script: shell or bash, send email: very powerful,
webhook (create ticket, POST API)

activity: triggered alerts > view results/edit search


-also alerts menu or settings: searches reports alerts

chap13 quiz:

alerts can run scripts


alerts CAN be shared to all apps
realtime alerts run continuously in background
alert actions are triggered by SAVED SEARCHES
final quiz:

machine data makes up for 90%


forwarders are primary way data is supplied for indexing
search requests are processed by the indexers
search strings are sent from the SEARCH HEAD
3 roles: power/user/admin
true: you can launch and manage apps from the home app
sourcetype: where to break event, timestamps, field value pairs
-sourcetypes are used to categorize the type of data being indexed
events are not always returned in chronological order
AND/NOT/OR = booleans
shared search jobs remain active for 7 days
field values: NOT case sensitive
wildcards CAN be used with field searches
@ is used to round down time
separate indexes: faster searches, multiple retention policies, limit access
exclusion does NOT increase performance
dedup removes results with duplicate field values
Addtotals is NOT a STATS function
a time range picker CAN be included in a report
power/admin can create data models (used for datasets in pivots, they are basically
KOs)
pivots CAN be saved as dashboard panels
a lookup is categorized as a dataset
outputnew is used to not overwrite existing fields in a lookup
alerts can be shared to all apps
admin/changeme is default username/password for a newly installed splunk instance
default apps: home app/search & reporting
forwarder is used as source of data in production envs
time stamp you see in events is based on time zone in your USER account
zooming in does not run a new search on event time line
transforming = create statistics and visualizations
field NAMES are CASE SENSITIVE
top or rare = TEN results by default
users CAN create reports (like archer!)
charts: numbers/time/location
|inputlookup is used to view lookups in a csv file

You might also like