Professional Documents
Culture Documents
Splunk Indexes
Splunk Indexes
An indexer is repository for data. Indexers store data into index (indexes).
main: the default Splunk index if one isn’t specified in external input sources.
Summary: A default summary index for storing summary data.
_internal: Stores Splunk internal logs from a range of internal sources, such as splunkd
daemon logs, web access logs, scheduler logs, python logs and more.
_audit: Stores Splunk internal audit logs such as logins, searches, data accesses, and
administrative activities.
_introspection: Store Splunk resource consumption and performance data
_fishbucket: Stores checkpoint information of file monitoring inputs
_telemetry: Stores instrumentation information if enabled through telemetry.conf.
There is more than one way to create a Splunk index. It all depends on the type of Splunk
deployment architecture.
Regardless of the approach, you do the configuration of the indexes.conf file in the
indexer.
Splunk allows administrators to create custom indexes to stores and organize specific
types of data based on their needs and requirements.
Every index has data size and retention time limitations. To view the configuration
settings, see the indexes.conf definition of _internal index in the
$SPLUNK_HOME/etc/system/default directory.
[_internal]
homePath = $SPLUNK_DB/_internaldb/db
coldPath = $SPLUNK_DB/_internaldb/colddb
thawedPath = $SPLUNK_DB/_internaldb/thaweddb
tstatsHomePath = volume:_splunk_summaries/_internaldb/datamodel_summary
maxDataSize = 1000
maxHotSpanSecs = 432000_
frozenTimePeriodInSecs = 2592000
Indexes contain originally received data in a compressed format, which is called raw
data. They also contain index called tsidx files and metadata files.
Indexes organize data in the form of buckets on the filesystem.
Indexes are by default stored at $SPLUNK_HOME/var/lib/splunk. The path is referred
to with the $SPLUNK_DB environment variable.
Once the data is written to an index, it is final and cannot be modified.
An authorized user that has can-delete role can issue a delete command to prevent from
participating in a search. This command applies delete markers to data. This means it
doesn’t permanently delete event data, execute the Splunk clean command with caution.
The segregation of indexes depends on data size, data retention, access policies, and the
types of data being stored in them.
Understanding buckets
Buckets are an integral part of indexes; they contain raw data and index files. They are organized
in the form of folders on a filesystem with a specific naming pattern. These folders are explicitly
used by the Splunk indexer for data storage and for search processing.
To learn a bit more about them, let’s look at the default _introspection index folder structure:
[_introspection]
homePath = $SPLUNK_DB/_introspection/db
coldPath = $SPLUNK_DB/_introspection/colddb
thawedPath = $SPLUNK_DB/_introspection/thaweddb
maxDataSize = 1024
frozenTimePeriodInSecs = 1209600
in this configuration of the _intropection index, while the term ‘buckets” is not explicitly used,
we can establish an equivalent mapping to the following settings:
homePath stores hot and warm buckets
coldPath stores cold buckets
thawedPtah stores preprocessed data for re-indexing
Splunk stores data in more than one type of bucket. They are all given the suffix db
inside the configuration. The types of buckets are Hot, Warm, Cold, Frozen and
Thawed.
Hot: hot buckets contain freshly arrived data. Before it is stored, data is parsed in
the parsing phase and goes through the license meter. Hot buckets are writable and
readable simultaneously. In the index definition, hot buckets are specified to stored
in the the homePath location.
Warm: Hot nucket rolled over into warm in three scenarios: when the indexer is
rebooted, when the hot bucket reaches the maximum size (maxDataSize) or when
the maximum lifespan of buckets is reached (maxHotSpanSecs).
Warm buckets are stored in the same location as hot buckets, homePtah and they
are only readable. The folder is name using this convention:
db_<earliesttime>_latesttime>_uniqueid.
Example: db_1663851317_1661686760_13 is a warm bucket.
The time is in epoch time format.
When performing a search in Splunk, the search process determines which buckets
to open and search for event matches based on the specified search time range
( earliesttime, latesttime).
Cold: Splunk moves warm buckets into coldPath when they reach certain criteria,
and bucket names remain the same after they move. The oldest warm buckets are
moved first when the homePath location reaches maxDataSizeMB or if the number
of warm buckets reaches the defined maxWarmDBCount setting.
The data in cold buckets is readable and searchable.
Frozen: Frozen buckets are completely optional and configured when the
administrators choose to retain data from cold buckets. Without frozen bucket
configuration, data from cold buckets wil be deleted after it reaches a certain age.
Frozen buckets are not searchable. You can specify either coldtoFrozenScript or
coldToFrozenDir to move raw data from a cold bucket to a frozen bucket.
The age of bucket takes precedence over the size of the index when moving or
deleting data from a cold bucket.
Thawed: This is also optional, and it works in coordination with the Frozen
bucket. We have seen that frozen buckets are not searchable. To make data in
frozen bucket searchable, data from frozen buckets must be copied to thawedPath
and rebuilt using the ./splunk rebuild<bucket> command. Data can be retained in
thawedPath as long as it needs to be searched, and no retention policies apply. This
means it will never be deleted by Splunk Enterprise. If the data is no longer
required for searching, then the administrator can remove it from thawedPath.
Data gets rolled from hot to warm to cold to frozen to thawed buckets depending
on the situation.
In order to estimate the size of the index, we must have the answers to the
following questions:
Q: how long does the data need to be retained in days?
A: for example,100days
Q: How much data volume per day is expected in GB?
A: for example, 1GB