Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

SPLUNK INDEXES

An indexer is repository for data. Indexers store data into index (indexes).

This chapter will cover the following topics.


- Understanding Splunk indexes
- Understanding Splunk buckets
- Creating Splunk indexes

Understanding Splunk indexes


An index is a specific type of data storage inside Splunk Enterprise or it is a repository of data.

There are 2 types of Splunk indexes


- Event indexes
- Metrics indexes
Event indexes store any type of text data, and this is the default index type.
Metrics indexes only store metrics data

We will focus on event indexes.


Splunk allows you to create your own indexes, and it also includes some event indexes, each
serving a specific purpose.

 main: the default Splunk index if one isn’t specified in external input sources.
 Summary: A default summary index for storing summary data.
 _internal: Stores Splunk internal logs from a range of internal sources, such as splunkd
daemon logs, web access logs, scheduler logs, python logs and more.
 _audit: Stores Splunk internal audit logs such as logins, searches, data accesses, and
administrative activities.
 _introspection: Store Splunk resource consumption and performance data
 _fishbucket: Stores checkpoint information of file monitoring inputs
 _telemetry: Stores instrumentation information if enabled through telemetry.conf.

There is more than one way to create a Splunk index. It all depends on the type of Splunk
deployment architecture.
Regardless of the approach, you do the configuration of the indexes.conf file in the
indexer.

Let’s go through the important characteristics of indexes:

 Splunk allows administrators to create custom indexes to stores and organize specific
types of data based on their needs and requirements.
 Every index has data size and retention time limitations. To view the configuration
settings, see the indexes.conf definition of _internal index in the
$SPLUNK_HOME/etc/system/default directory.

[_internal]
homePath = $SPLUNK_DB/_internaldb/db
coldPath = $SPLUNK_DB/_internaldb/colddb
thawedPath = $SPLUNK_DB/_internaldb/thaweddb
tstatsHomePath = volume:_splunk_summaries/_internaldb/datamodel_summary
maxDataSize = 1000
maxHotSpanSecs = 432000_
frozenTimePeriodInSecs = 2592000

 Indexes contain originally received data in a compressed format, which is called raw
data. They also contain index called tsidx files and metadata files.
 Indexes organize data in the form of buckets on the filesystem.
 Indexes are by default stored at $SPLUNK_HOME/var/lib/splunk. The path is referred
to with the $SPLUNK_DB environment variable.
 Once the data is written to an index, it is final and cannot be modified.
 An authorized user that has can-delete role can issue a delete command to prevent from
participating in a search. This command applies delete markers to data. This means it
doesn’t permanently delete event data, execute the Splunk clean command with caution.
 The segregation of indexes depends on data size, data retention, access policies, and the
types of data being stored in them.
Understanding buckets

Buckets are an integral part of indexes; they contain raw data and index files. They are organized
in the form of folders on a filesystem with a specific naming pattern. These folders are explicitly
used by the Splunk indexer for data storage and for search processing.
To learn a bit more about them, let’s look at the default _introspection index folder structure:

This shows the _introspection index inside the $SPLUNK_DB path.

Let’s take a look at the indexes.conf file located in the $SPLUNK_HOME/etc/system/default


directory. It contains the _introspection index settings that correlate with the folder structure.

[_introspection]
homePath = $SPLUNK_DB/_introspection/db
coldPath = $SPLUNK_DB/_introspection/colddb
thawedPath = $SPLUNK_DB/_introspection/thaweddb
maxDataSize = 1024
frozenTimePeriodInSecs = 1209600

in this configuration of the _intropection index, while the term ‘buckets” is not explicitly used,
we can establish an equivalent mapping to the following settings:
 homePath stores hot and warm buckets
 coldPath stores cold buckets
 thawedPtah stores preprocessed data for re-indexing

Splunk stores data in more than one type of bucket. They are all given the suffix db
inside the configuration. The types of buckets are Hot, Warm, Cold, Frozen and
Thawed.

Let’s go through the bucket type and how the work:

Hot: hot buckets contain freshly arrived data. Before it is stored, data is parsed in
the parsing phase and goes through the license meter. Hot buckets are writable and
readable simultaneously. In the index definition, hot buckets are specified to stored
in the the homePath location.
Warm: Hot nucket rolled over into warm in three scenarios: when the indexer is
rebooted, when the hot bucket reaches the maximum size (maxDataSize) or when
the maximum lifespan of buckets is reached (maxHotSpanSecs).
Warm buckets are stored in the same location as hot buckets, homePtah and they
are only readable. The folder is name using this convention:
db_<earliesttime>_latesttime>_uniqueid.
Example: db_1663851317_1661686760_13 is a warm bucket.
The time is in epoch time format.
When performing a search in Splunk, the search process determines which buckets
to open and search for event matches based on the specified search time range
( earliesttime, latesttime).
Cold: Splunk moves warm buckets into coldPath when they reach certain criteria,
and bucket names remain the same after they move. The oldest warm buckets are
moved first when the homePath location reaches maxDataSizeMB or if the number
of warm buckets reaches the defined maxWarmDBCount setting.
The data in cold buckets is readable and searchable.
Frozen: Frozen buckets are completely optional and configured when the
administrators choose to retain data from cold buckets. Without frozen bucket
configuration, data from cold buckets wil be deleted after it reaches a certain age.
Frozen buckets are not searchable. You can specify either coldtoFrozenScript or
coldToFrozenDir to move raw data from a cold bucket to a frozen bucket.
The age of bucket takes precedence over the size of the index when moving or
deleting data from a cold bucket.

Thawed: This is also optional, and it works in coordination with the Frozen
bucket. We have seen that frozen buckets are not searchable. To make data in
frozen bucket searchable, data from frozen buckets must be copied to thawedPath
and rebuilt using the ./splunk rebuild<bucket> command. Data can be retained in
thawedPath as long as it needs to be searched, and no retention policies apply. This
means it will never be deleted by Splunk Enterprise. If the data is no longer
required for searching, then the administrator can remove it from thawedPath.
Data gets rolled from hot to warm to cold to frozen to thawed buckets depending
on the situation.

Creating Splunk indexes

Creating a custom index is a Splunk administrator’s responsibility. As you have


seen, data is retained in indexes until it reaches a certain age or the index reaches a
certain size, so an estimation of an index’s size before it is created is a crucial step.

In order to estimate the size of the index, we must have the answers to the
following questions:
Q: how long does the data need to be retained in days?
A: for example,100days
Q: How much data volume per day is expected in GB?
A: for example, 1GB

The formula for index size estimation is as follows.


Retain days X volume per day X ½(raw data compression + index files)
If we substitute in the example values, the index size becomes 100 X 1
X(1/2) = 50 GB
Let’s take look at the different ways to create event indexes.
Splunk Web
Log in the Splunk Web as an administrator. Go to the Settings menu in
the top right and click on indexes. A page opens with a New Index
button in the top right. On the same page, you can see the default
indexes are prefixed with_, such as _internal and _audit.
If you wish to see the configuration for one of the default indexes, click
on the index name. Otherwise, proceed with creating a new index by
clicking the New Index button and filling in the Index Name and the
Max Size of the Entire Index fields. The defaults Index Data Type is
Events. Metrics should be explicitly selected if you want to create a
metrics index. I have set the index name to windows-event-logs and set the
size to 500 MB. Then, click the Save button.
CLI
The Splunk CLI offers a variety of commands, and adding indexes is one of them.
You can log in to your instance.
Syntax:$SLUNK_HOME/bin/splunk add index <index-name> -<parameter>
<value>
Parameters are; homePtha, coldpaPath, thawedPath , app and so on

Example; $SPLUNK_HOME/bin/splunk add index winds-eventlogs -app search


Indexes.conf
Considering the example of windows-event-logs that will created from splunk
web, this will be located at $SPLUNK_HOME/etc/apps/search/local

You might also like