Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

How to generate 1 TB of data for Splunk

Performance Testing
By: Donald April 12, 2016

HOW TO GENERATE 1 TB OF DATA FOR SPLUNK PERFORMANCE TESTING

INTRODUCTION
Splunk, a leader in Event Management provides insight into your business’s machine-
generated log data. Splunk enables you to make sense of your business, make smart
decisions and initiate corrective actions.
Processing Big Data is by no means a small feat. The ability to scale Splunk to
accommodate and grow with your business is key to providing reliable and accurate
information. Splunk provides insight into your machine-generated data but there are only a
few apps that provide insight on how Splunk is performing. Performance testing has been
on-going effort by Splunk and various hardware and software vendors for some time now.
Most if not all of these tests were generated by the SplunkIT or Bonnie++. These apps
were designed to measure a single indexer that uses a small sample data size. If you
wanted to test Splunk’s performance on your own environment, the challenge is where are
you going to get a large data set to test with such as 1 TB of machine-generated data?
I will demonstrate how to create 1TB of data with embedded rare and dense search terms
using Splunk’s EventGen for Splunk performance testing.

SEARCHES
Embedding search terms into log data will enable the search to scan all three data tiers (
HOT/WARM, COLD data buckets) of the dataset.
The following are search terms we will be generating based on a 10,000,000 line file.
· Very Dense Search, 1 out of 100 lines, 100,000 occurrences.
· Dense Search, 1 out of 1000 lines, 10,000 occurrences.
· Extremely Rare Search, 1 out of 100,000,000 lines
· Sparse Search, 1 out of 10,000,000 lines, 1 occurrence.
· Rare Search 1 out of 1,000,000 lines, 10 occurrences.

SPLUNK’S EVENT GENERATOR


Splunk Event Generator is a utility that enables you to build real-time events based on file
definitions. We will be using a sample file with embedded search terms to build 1 TB of
data with the Event Generator.
Install Event Generator:
If you do not have an account, simply create one by going to GitHub’s home page
at www.github.com and register a new account.
· Download the Splunk Event Generator from https://github.com/splunk/eventgen
· Click the “Install app from file” button located at the upper left had corner.
· Choose the file by browsing to where the event generator zip file is located, and
choose the event generator file.
· On a terminal, enter the following command to rename to a new directory
mv $SPLUNK_HOME/etc/apps/eventgen-master
$SPLUNK_HOME/etc/apps/eventgen
· Restart Splunk to enable the app.
·
EMBED SEARCH TERM IN SAMPLE FILE
Splunk’s Event Generator can create real-time events from most if not all sample files. In
the past, I was able to create machine-data logs from Cisco:ASA, Cisco:FWSM, syslog,
Mcafee Endpoint protection, Nessus Vulnerability Scan and the many out of the box
samples included in Splunk installations.
For this demonstration, I have chosen to use a syslog data as the sample log to generate 1
TB of data.
1. It is easier to embed the various search terms if your sample data has a defined number
of lines. The file I selected is an old syslog file that is about 12 GB. First, I trimmed the file
to the defined size of 10,000,000 lines.
$head -10000000 syslog.sample.log > new_syslog.sample.log

2. Create Dense Search - Enter the command to find and replace nth number of dense
occurrences.
awk 'c && sub("pattern","replace") {c--}1' c=1 samplefile >
newsamplefilewithreplace
example:
awk 'c && sub("certificate","DENSE100") {c--}1' c=100000 samplefile >
newsamplefilewithreplace
3. Check the number of replacements.
$grep DENSE100 sample.filename.log | wc –l

4. Create Rare Search Results - Insert random rare search terms throughout the sample
file by choosing a random line in vi and inserting a string that would be unique within the
data, for example "$rFv5TgB^yHn".

5. When you have added all the search terms, your sample file is ready to go. Next, move
the sample file to:
$SPLUNK_HOME/etc/apps/eventgen/samples
CONFIGURE THE SPLUNK EVENT GENERATOR
The conf file for the Event Generator is named eventgen.conf. There is an eventgen.conf
located in the default directory. Do not edit this file, instead, create a new eventgen.conf file
in $SPLUNK_HOME/etc/apps/eventgen/local
Below is a simple configuration to get you started building your Splunk Event Generator
data.
To add additional settings, please refer to the README directory located in the root
directory of eventgen.
Example: eventgen.conf configuration
[syslog_sample.log] #sample log name
interval = 3 # number of secs between events
fileMaxBytes = 100000000000 # size of each log file 100 GB
fileBackupFiles = 11# number of files, 10 x 100 GB = 1 TB
count = 0 # use entire sample file
outputMode = file # output mode set to file
fileName = /opt/splunk/var/lib/splunk/whitebox/sample_syslog_06302015.log # Output
file name
#timestamp regular expression match string
token.0.token = \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}
token.0.replacementType = timestamp
#Timestamp replacement string
token.0.replacement = %Y-%m-%d %H:%M:%S,%f

That is all it takes to generate 1 TB of data for Splunk performance testing. Generating 125
GB of data takes about 4 hours.
CONCLUSION
There has been a lot of buzz within the Splunk Community on performance testing and it’s
many approaches. Performance apps like SplunkIT and bonnie++ has laid the foundation
for such testing to occur. However, these tools are limited in some ways because they were
designed to measure a single indexer. By creating your own data with rare and dense
searches, you’ll be able to measure the search performance and narrow down any
bottlenecks within a multi-indexer environment.
Happy Splunking…!
Tags: Splunk, Operational Intelligence, Performance Testing, Big data

You might also like