Splunk Course Notes

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 70

Splunk Account : KANIPILLAY/Temenos1@

Splunk for windows login: admin / Temenos1@

SPLUNK – Notes Document

Start - 30/11/2021

Answers.splunk.com – get help there

Intro:

Upload a sample file e.g test.csv – it appears file has to be ‘structured’ ?

1. Click Add data button

2. Click ‘Upload file from computer’ – assuming we can upload from

various other sources

3. Choose the file

4. Next

5. Rename ‘Host field value’ field to something useful

6. Click Review & then Submit, then ‘Start Searching’

System then comes up with Standard Splunk fields + Interesting fields

and splits by field name, number of occurrences etc


You can build a custom table view by using the command like below

- You can now also build a visualisation – Click on VISUALISATION

Tab

Use stats command as per below define a count on specific fields from

your csv file as needed

Then click on ‘the Search button

You can then choose a suitable visualisation by clicking on the Splunk

Visualisations menu option


You can then create a DASHBOARD where you save your visualisation or

multiple visuals

Click ‘Save as’, choose new/existing dashboard

Under visualisations, you can also opt for ‘SINGLE VALUE’ which is a

number display. Per below, we’re counting only when

Status=error

We use stats count command on status field and then

We count the table output value

So you can edit the theme on the dashboard etc, save / refresh and so on

Section (2) – Basics of Splunk

SPLUNK COMPONENTS

The software does 3 things mainly:

- Ingests data

- Indexes, parses and stores data

- Run searches on indexed data


Processing components:

- Forwarder

Forwards data from one splunk component to another

e.g. from a source system to an indexer or index cluster

or from a source system directly to a search head

2 types of forwarders:

UNIVERSAL – most common – simply directs data from a source

to target

HEAVY – can also parse data before forwarding

- Indexer

Indexes data and stores data in indexes

In a distributed environment they can:

 Reside on dedicated machines

 Can be clustered or independent – clustered indexes also known

as peer nodes – can attach load balancing to these

- Searcher head – users typically interact with this component

Manages search requests from users

 distributes searches across indexers

 Consolidates results from the indexers

Management Components:

- Monitoring console
- Deployment server

Is a centralised configuration manager for any number of splunk

instances – splunk instances are called DEPLOYMENT CLIENTS

The deployment server pushes apps and config packages to

DEPLOYMENT CLIENTS – These are called DEPLOYMENT APPS

We configure the deployment server through the FORWARDER

MANAGEMENT INTERFACE

- Licence Server Master

Centralised licence manager

Clients are called licence slaves

Manages licence pools and stacks

ALL other SPLUNK components look to the licence Master to

validate licencing

- Indexer Cluster Master

Manages indexer clusters 

 Co-ordinates activity within the cluster

 Manage data replication

 Manage buckets(storage) for the cluster

 Handles updates for the indexer cluster

- Search head Cluster Deployer

Manages baselines and apps for search head cluster members

 Splunk scales via this component

 Note – the Cluster deployer is NOT a member of the cluster

Every SPLUNK component is built using SPLUNK Enterprise – its only a

matter of Configuration?
LICENCE MANAGEMENT IN SPLUNK

>

Licence Types:

- Measured by amount of data ingested per day, NOT the amount of

data stored

- Data volume is measured from midnight>midnight

STANDARD licence:

Can buy a licence e/g/ 5gb data per day

ENTERPRISE TRIAL

Installs with the product – limited to 5oomb per day. Valid for 60days

– converts to free licence

SALES TRIAL

Setup a POC – SO THEY CAN SIZE A TEMP LICENCE AND SEE HOW

SPLUNK WORKS for you

DEV TEST

FREE LICENCE

When enterprise licence expires - limited

INDUSTRIAL IOT

FORWARDER

Used when setting up heavy forwarders


LICENCE VIOLATIONS Section

Will send warnings

From version 6.5 – Enterprise version does not disable search if you

exceed your data

DISTRIBUTED LICENCING:

Most components needs access to an Enterprise licence except Forwarder,

which needs a Forwarder licence

Ideally:

- Setup a licence master – then have all of your splunk instances talk

to the master for licencing

A collection of licences who’s individual volume licencing amounts

aggregate to serve a single unified amount of indexing volume is called a

STACK
Licence POOLS are created from the licence STACKS. Stacks are made

up of one or more licence files

POOLS are sized for specific purposes

The licence master manages licence pools

Indexers and other splunk Enterprise instances are assigned to a licence

pool

Universal FORWARDER does not require a licence

The HEAVY forwarder requires a forwarder licence

LICENCE GROUPS – sets of licence stacks. A stack can only be a member

of (1) licence group, and only (1) group can be active at a point in time

Types of licence groups:

Enterprise / Sales Trial – allows stacking

Enterprise Trial / Free & Forwarders – does not allow stacking


To amend/update licencing go to Settings/Licence

Splunk Configuration Files

Splunk runs on CONFIG files – WITHIN a CONF file, behaviours and

functions are defined

You can have multiple conf files (even with the same name), which

Splunk evaluates based on precedence

Common conf files:

 Inputs.conf

Can set host values for source, which indexers to use to store

events

 Props.conf

Manages the indexing property behaviours


 Transforms.conf

Settings and values that govern data transformation

See admin manual for a full list of conf files

Above we have the Splunk config directory structure

Splunk home directory differs depending on your O/S

/bin – binaries and commands which you can execute from the CLI

/var /lib /splunk contain index files and other important splunky stuff

/etc/system/default – contains the default config files for system /

apps/users

/local – custom config files live here

Regarding conf files, system makes decisions based on the below

precedence
Whats inside a conf file?

Stanza – header in [ ] brackets

Settings are then defined by attribute = value pair

At run time, config files are evaluated as below

- Merges copies of all the stanzas of the same file

- If files are different or have diff attributes

o Uses value with highest precedence, as determined by location

in the directory structure (per above diagram)

Global config files live in /system

BTOOL – what’s this?

It is sometimes difficult (in a very large splunk setup), to understand

which config file is being used

So use BTOOL for this

- Command line tool to help troubleshoot – helps determine which

config file is being used


Example of a btool command – executed from /bin directory

INDEXES – how does Splunk manage this

Indexes are a repository of Events which come built in, OR you can

create custom ones

Indexes contain 3 types of data

- Raw data in compressed form

- Indexes that point to the raw data

- Metadata (timestamp, host value etc)


There are 2 types in INDEXES

1. EVENT

Default type which handles any type of data

2. METRICS

Optimised to store/retrieve metrics data

Metric data comprises timestamp/Metric name like a dns

name/Value and Dimension

How does indexing work?


TYPES OF BUCKETS

A bucket is a place / folder on the filesystem

- HOT bucket – newly indexed data – an index can one or more of

these hot buckets

- WARM – moved from hot without any writing / updates I assume

- COLD – moved from warm – can have many – cold data is data

that is hardly accessed

- FROZEN – moves from cold and is archived – indexer deletes

frozen unless you archive

- THAWED – restored data

Each bucket has defined directory paths


HOW does splunk check DATA INTEGRITY

Double hash method

- Computes hash on newly indexed data

- Computes another hash when moving to warm directory

- Stores hash files in /rawdata path


Some CLI options around integrity / checking hash files etc
Indexes.conf OPTIONS

You can set options at the following levels

We will focus on GLOBAL / PER INDEX

GLOBAL:

PER INDEX:
See admin/Indexesconf document for all options regarding these settings

PER PROVIDER:

Options for external resource providers e.g. HADOOP

All provider Stanzas begin with [provider: <provider name>]

PER PROVIDER FAMILY:

Options for multiple external resources. HADOOP / HIVE / SPARK

Family values take precedence if the same options are specified in PER

PROVIDER

All provider family stanzas begin with [provider-family:<family name>]

PER VIRTUAL INDEX

Common for the HADOOP family

Lets splunk access data stored in external systems


FISH BUCKET

Is a means to keep track of which file/filepart has already been indexed

Contains seed pointers or CRC(Cyclical Redundancy Check) only for the

indexed files and not the files themselves

(demo notes) –

How to create and index

- Settings/Index/New Index

Can define/set various settings here as per notes above

- Apply a data retention policy

By default splunk keeps data for 6 years

You can update by looking into the index.conf file

Explore buckets in the file system – all buckets are in /defaultdb/*

Note – Buckets are organised and processed by AGE!!!


Check hashes for validation : CLI command as below

MANAGING USERS / GROUPS & CAPABILITIES

Authentication Mgmt

3 types of authentication

NATIVE, LDAP & SCRIPTED

Integrating SPLUNK & LDAP(Lightweight Directory Access Protocol)

LDAP defines the protocol to authenticate to, access and update objects

in an X.500 style directory

X.500 defines a hierarchical database of objects on the network

Objects can be anything on a network

Each Object then has attributes

Example:
LDAP integration can be done via Splunk web OR edit the

authentication.conf file

We will need an LDAP strategy comprising the below

LDAP Connection settings

Host – fully qualified name or IP address of the LDAP server or AD

controller

Port – Default is 389. If using SSL, then port 636

Bind DN(Distinguished name) – Recommended to use a service account.

Used to Bind LDAP server to splunk

USER settings

User base DN

User base filter

Username Attrib

Real Name attrib

Email attrib

Group mapping attrib

GROUP settings

Group base DN

Static group search filter


Group name attrib

Static member attrib

DYNAMIC group settings

Dynamic member attrib

Dynamic group search filter

There is also the Splunk native authenticatin steps where you create a

user group, then add capability and then link this to a user

MULTIFACTOR auth is also supported

DUO * RSA can be used as per below steps


Spend a bit of time understanding LDAP (such as Microsoft AD) and how

to hook up with Splunk

Map LDAP groups to SPLUNK roles


Define the connection order

What is a User role – a definition of capabilities / actions that a user can

do. 4 types of roles in splunk. Roles can be edited etc using the GUI or

updating the authorisation.conf file

ADMIN, POWER, USER, CAN_DELETE

How to create a custom role

Access on the Settings/Roles/New role menu –

- Give it a name – can opt to choose other roles to inherit

functionality if you like – after creating – you can edit

ELSE

After creating user without cloning


Click Capabilities tab – add the required cababilities and save

Add Users

Access through Settings/Users/New user

You can then add them to a specific role

GETTING DATA INTO SPLUNK – new section

The pipeline looks like below


Inputs – this is when Splunk receives data from a source – adds on some

metadata like hostname/type etc – It does not actually look at the data

at this stage

- Can Monitor files/directories

- Can upload a file

- Once data is received it is compressed

- Can ingest data from TCP/UDP ports – network inputs

- Can also ingest from SNMP EVENTS

- Can be from Windows inputs

- Can ingest metrics data

- FIFO queues
- Scripted inputs – e.g. from apis

- Modular inputs

- HTTP event collector – web data specifically

BASIC SETTINGS for an input

We can configure inputs in a few ways

- Through an app

- Splunk Web

- CLI e.g. ./splunk add monitor <path>

- By editing inputs.conf – add a stanza for each input

- GDO(Guided Data Onboarding_ - is new and acts like data input

wizard

- Parsing phase – Splunk now looks at the data (examines, analyses

and transforms)

- Indexing phase – Splunk writes parsed data to indexes (buckets)

onto disk

MonitorNoHandle is available on Windows hosts.

‘syslog’ is a form of NETWORK input

WMI IS not required for monitoring inputs on Windows

SPLUNK FORWARDER TYPES


UNIVERSAL – Collects data from a source and forwards it to a receiver.

Can forward to an indexer or index cluster

It has no licence – does not do anything other than forward data

HOW TO CONFIGURE Universal Forwarder

- Config receiving on Splunk Enterprise instance

- Download and install the UF

- Start the UF

- Config the UF to send data

- Config the UF to collect data from host

To configure receiving

See Settings/Forwarding and Receiving/Receiving/new Receiving port

Default port = 9997

You need to download / install a Universal forwarder from Splunk

HEAVY – needs licence

Is more featured than the Universal forwarder

DISTRIBUTED SEARCH options

Enables you to scale deploying – seperates search management and

presentation from indexing and search retrieval layers


Two components:

- SEARCH HEADS

- INDEXERS or search peers

Why the need for distributions?

- Enables horizontal scaling – we can put in more search heads

- We can manage access control better

- Data coming from multiple locations – geolocations

DIFFERENT TYPES OF DISTRIBUTION

- ONE OR MORE INDEPENDENT SEARCH HEADS – that can search

across multiple indexers (peers)


- INDIVIDUAL INDEXERS or peers

- A group of search heads clustered together

We have a deployer which sends config data to the Cluster

- Clustered indexers so that they share data as below


Here the data is replicated across the indexer cluster

The MASTER CLUSTER NODE manages data replication and

failover within the indexer cluster

We can also combine a Search head cluster & Indexer Cluster as

below

Scaling Options
By adding components to each tier we can scale horizontally

Adding search heads – creates a Search head cluster

Adding indexers – create an indexer cluster

Adding forwarders creates a load balancing forwarder cluster

How to CONFIGURE A DISTRIBUTED SEARCH GROUP


We Need:

- Machines (virtual/physical or cloud_ Splunk Enterprise must be

installed on all

- Network – machines must be able to talk to each other. And all

machines must have a FQDN (Fully qualified domain name)

- Know the CLI commands

Notes:

- Network setup must be impeccable

- Static IP addresses recommended

- Must have a DNS server – or manually configure hosts files

- Firewalls setup

- Min 1 deployer to 3 cluster members

General workflow to setup this distribution


So, assuming you have your requirement defined:

1. Configure the deployer


2. Initialise the cluster, by repeating the below for EVERY CLUSTER

MEMBER

3. Configure the CAPTAIN


4. Optionally adding other members to the cluster

The demo – notes on setting up this distribution

Note : we have 2 members in this cluster and 1 x deployer

Note their FQDN’s

1. Setup deployer
Locate/edit server.conf file in Program files/splunk/etc/system/local

and add below lines. Save file when done 

2. Setup the members

Use a powershell – go to Program files/splunk/bin

Use following command

./splunk init shcluster-config

–auth <splunk login:password>

-mgmt_uri https://fully qualified domain name of this member:8089

which is the portnumber

-replication_port <can be any port>

-replication_factor <max value of 3) – this is an optional switch

-conf_deploy_fetch_url <deployer IP or FQDN>:8089

-secret <password saved in server.conf file>

-shcluster_label <as defined in server.conf file>

Server or splunkd must be then restarted

3. Sort out the captain per the below syntax


To check status of your distrib, you can do as below from /bin

directory

./splunk show shcluster-status –auth:<splunk login dets?

To add a new member to cluster do as below:

STAGING DATA

The 3 phases of Splunk indexing process

- Data input from whatever source

Files and directories

Network events e.g. SNMP

Windows sources

Other – like HTTP Event Collector / Queues / Metrics

- Parsing – breaks the data into individual events / lines – adds

metadata

- Indexing – writes parsed data to index buckets


CONFIGURING FORWARDERS

Universal Forwarders – just forward data – no parsing or search

capability

5 steps to configuring Universal forwarder


4 ways to configure

1. GUI Windows during installation via MSI

2. CLI

3. Config files

4. Deployment server

Heavy forwarders – Full Enterprise installation and must apply a licence

- Can parse and route data

- Can index locally


Or

Can edit the outputs.conf file directly

Can also setup some advanced configs as per below

Other options with forwarders

Load balancing – forwarder can split data amongst indexers / receivers


Forwarders can route data based on time / volume intervals

INTERMEDIATE FORWARDERS

Are a setup where you configure forwarder A to route data to another

forwarder
MULTIPLE PIPELINE SETS

The forwarder can process multiple events at the same time

You can setup like below


Demo notes – setting up heavy forwarder

We have (1) the search head with IP …149

We have (2) the forwarder with IP …121

1. We apply the forwarder licence to the forwarder instance

Settings/licencing > Change licence group > choose forwarder

licence and apply the licence

A restart is then required

2. Configure RECEIVING on the search head instance (IP …149)

Settings/Forwarding and Receiving – See Receive data – Add new

- Listen on port – default 9997

- Save

3. Set up forwarding on the forwarder instance (IP …121)

Settings/Forwadring&Receiving – choose Configure forwarding –

Add new

Host = add the DNS or IP for the search head instance like this:

192.168.1.149:<port>9997

Save

4. Tell the forwarder which data to forward

Settings/Add data – in this case we chose ‘Monitor’ / choose

something from the available options

Select Next – hostname – check it

Click review

Click submit
You can also tell forwarder to index data locally

Settings/Forward&Receive / Forwarding defaults / select Radio

button to store local copy

DEMO NOTES – setting up Universal forwarder

1. On windows – download the software from Splunk

2. Refer to earlier notes – I think we covered this

Forwarder Management / DEPLOYMENT servers

DEPLOYMENT servers:

- Group components by commons characteristics

- Then distribute content based on these groups

When NOT to use a deployment server

- Indexer clusters – you can use DS to update the master node only

- Search head clusters except the deployer node


To refresh deployment server after creating a deployment app do as

below

Splunk reload deploy-server

The deployment process at high level

DEPLOYMENT APPS

Not necessary traditional Splunk apps


- Can be any content you want to send to the client instance

e.g. apps, configs

- default location = /opt/splunk/etc/deployment-apps

DEPLOYMENT CLIENTS

- Grouping by class . characteristic

- A client can belong to more than one class

Cli command to setup deployment client

The ip address is that of the deployment server


Monitor Inputs – new section

3 options available

MONITOR

- Continously monitors a specified directory / path remotely or

locally

MONITORNOHANDLE

- Windows only – monitors files on windows systems as they are

written, using a kernel

UPLOAD

- You can upload a file for a one time analysis

If splunk is restarted, monitoring continues processing where it left off

Compressed files are decompressed before indexing

If you add data to a compressed file, the entire file is re-indexed

Restrictions:

- CANNOT monitor a file who’s path exceeds 1024 chars

- Files with .splunk extension are not monitored

You can configure monitoring through inputs.conf like below


Can also use the CLI to add monitoring as below

Can also monitor via BATCH as per below


Typically used for large batches of historic data

BATCH will ingest the file, index, then delete the file

HOW TO CONFIGURE LOCAL MONITORING via Splunk Web

Settings / Add data – choose Monitor option

Selection type of file to monitor e.g. files & Directories – browse and

choose file / path

For remote monitoring – it can be done via the heavy forwarder using

the similar commands


NETWORK and scripted Inputs – new section

Here we create network inputs using TCP/UDP protocols

- Can configure splunk to accept network inputs on any port

- Splunk can then consume data arriving on these ports

- TCP is recommended

- SPLUNK CLOUD will only accept data from forwarders with SSL

certificate
Scripted Inputs

How to config a network input

Settings/Data Inputs – choose TCP

Click New Local TCP

- DEFAULT PORT = 514 and leave other defaults as is

- Click Next
- Select source type = syslog

- App context = Search & Reporting

- Host method = ip address

- Review/Submit

To add scripts in Splunk

Settings/Data Input – choose scripts

You have some default scripts or you can choose ‘New Local script’

To handle syslog data splunk:

- Config an intermediate forwarder

- Splunk cloud requires an intermediate forwarder

AGENTLESS inputs – new section

- Getting data into Splunk without using a forwarder

- Popular for syslog / SNMP data

Windows Input types can be

- Event logs

- PerfMon

- Registry

- WMI

- Active Directory

Ensure Firewall / Anti virus / AD Authentication isn’t blocking splunk

Set up for Windows – can use splunk app for Windows named INF
HTTP EVENT COLLECTOR (HEC)

- Allows sending of data to splunk over HTTP/S

- Useful for monitoring client side web applications

- Uses token based authentication

Like below command using curl: Notes on the token value are on next

page
Setting up the HEC:

Settings/Data Inputs – Choose HTTP Event Collector

The collector works on tokens

Check Global settings:

Then create a new token – Click ‘New token’

Click Next

Make changes as per next screenshot before clicking ‘Review’ & Submit
You then get a token value

The token becomes your AUTHENTICATION over HTTP like a

username/password

Remember the next settings changes: Select allowed Indexes e.g. main

Quiz note:
All of the above is the answer

FINE TUNING INPUTS – new section


- Notes on what SPLUNK does when it gets the data

Quiz notes:

- Splunk stores data in 64k block sizes

- What is a source type? – is a field that describes the structure of

data in an event

- What is the default host value – it is the DNS name of the machine

- Default char set encoding for splunk = UTF8


- Can specify alternate char set encoding on props.conf config file

The below happens during the parsing phase

There are 4 things that happen

How does Splunk determine event bouncaries – how does it split data

into separate events


Typically / commonly looks for CR / LF

Can also set custom line breaks by defining in props.conf file

TIMESTAMPS – if exists in the data is converted to UNIX time (no.of

seconds elapsed since 1970) and stored in the _time field

- Uses the timezone setting as defined on the splunk instance

How does Splunk look for timestamp data?

4 steps:
Last resort is current system time, at the point of indexing, for each

event

USING DATA PREVIEW – just lets you view events as they are created

We need to set up limits.conf

This will be in the path

Manipulating raw data – new section


Why the need to transform data?

Regulatory reasons usually means data is masked

We also may want to route data to specific indexes based on event

content

2 main methods to

- Transform

- Route

- Mask

1. SEDCMD – use inputs.conf / props.conf – easier to use if you are

familiar with Unix


SED – Linux stream editor

- Edits data as it comes in – streaming

- Replace strings using the (s) token

- Substitute chars using the (y) token

- We put the SEDCMD key value pairs into props.conf

- we tell splunk to find data via inputs.conf

The /g switch above – means make the change required globally

Heres an example
See regexr.com – useful for helping build regex commands

SEDCMD syntax : s/<regex>/<replacement>/g

2. TRANSFORMATIONS – uses inputs.conf / props.conf /

transforms.conf – Takes longer to setup

Using Transforms to Anonymising data

- Edit inputs.conf – tells Splunk where the data is

- Use transforms.conf – Use regular expressions and generally defining

masking parameters

- Use props.conf – this references the parameters in transforms.conf

S0 – it would be something like below


Using Transforms to Override Sourcetype or Host

- Occurs at parse time

- Only works on an indexer or heavy forwarder

- Need edits to transforms.conf / props.conf

…and it would like this:


Using Transforms to Route events to specific indexes

- is configured on a heavy forwarder

- need edits to

- props.conf – define routing based on event data

- transforms.conf – specifiy criteria based on regular expressions

(REGEX)

- outputs.conf – Define target groups


Using Transforms.conf to prevent unwanted data from being indexed

- define a REGEX to match events you want to keep in

transforms.conf

- Send everything else to nullqueue using props.conf


The <spec> stanza can be host / source or sourcetype

You might also like