Download as pdf or txt
Download as pdf or txt
You are on page 1of 107

1/31/2020 Lab Guide: Elasticsearch Engineer I

Lab Guide: Elasticsearch Engineer I


Lab 1: Elastic Stack Overview
Lab 2: Getting Started with Elasticsearch
Lab 3: Querying Data
Lab 4: Text Analysis and Mappings
Lab 5: The Distributed Model
Lab 6: Troubleshooting Elasticsearch
Lab 7: Improving Search Results
Lab 8: Aggregating Data
Lab 9: Securing Elasticsearch
Lab 10: Best Practices

Lab 1: Welcome to the Elastic Stack


Objective: In this lab, you will see how quickly and easily the Elastic Stack can be
used to read data from a database and to collect data from log files. You will startup
Elasticsearch and Kibana; use Logstash to ingest the blogs from a Postgres
database into Elasticsearch; and use Filebeat to ingest the web access log data
from log files.

1. From within the Strigo UI, click on the "My Lab" icon in the left toolbar (if you
do not have access to the virtual environment, click here to see how to run the
labs on your local machine):

www.ponybai.com/elastic/instructions/labs.html 1/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

A command prompt will open, and you will be logged in to a Linux machine that
is configured to provide easy SSH access to three servers: server1, server2
and server3. You will deploy an Elasticsearch cluster on these 3 servers
throughout the course. For this first lab, you will perform all of the tasks on
server1.

2. SSH onto server1:

ssh server1

3. View the contents of the elastic user home folder by entering ls -l. You
should see Elasticsearch, Filebeat, Kibana and Logstash tar files, which were
all simply downloaded from our website. (You will see other files in the home
folder as well):

[elastic@server1 ~]$ ls -l
total 361624
drwxr-xr-x 5 elastic elastic 4096 Jul 10 16:54 datasets
-rw-r--r-- 1 elastic elastic 91429350 Jul 10 16:53 elasticsearch-

www.ponybai.com/elastic/instructions/labs.html 2/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

-rw-r--r-- 1 elastic elastic 12479165 Jul 10 16:53 filebeat-6.3.1


-rw-r--r-- 1 elastic elastic 121962496 Jul 10 16:53 kibana-6.3.1-l
-rw-r--r-- 1 elastic elastic 143870472 Jul 10 16:53 logstash-6.3.1
-rw-r--r-- 1 elastic elastic 551290 Jul 10 16:54 postgresql-9.1

4. Extract Elasticsearch using the following command:

tar -zxf elasticsearch-6.3.1.tar.gz

5. Start Elasticsearch:

./elasticsearch-6.3.1/bin/elasticsearch -E network.host=0.0.0.0

6. Wait for Elasticsearch to startup. You should see a message showing that it
started. (The output order may vary a bit.)

[2018-07-10T19:20:34,138][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServ
[2018-07-10T19:20:34,139][INFO ][o.e.n.Node ] [Y-fW5A
[2018-07-10T19:20:34,159][WARN ][o.e.x.s.a.s.m.NativeRoleMappingSt
[2018-07-10T19:20:34,222][INFO ][o.e.g.GatewayService ] [Y-fW5A
[2018-07-10T19:20:34,365][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,418][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,438][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,470][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,516][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,551][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,583][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,692][INFO ][o.e.c.m.MetaDataIndexTemplateServ
[2018-07-10T19:20:34,738][INFO ][o.e.l.LicenseService ] [Y-fW5A

7. Within Strigo, open a new console by clicking the plus sign "+" next to the tab
of your current console, and you should see a command prompt:

www.ponybai.com/elastic/instructions/labs.html 3/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

8. SSH onto server1:

ssh server1

9. The first dataset you are going to ingest into Elasticsearch is the blogs dataset
using Logstash, which has been downloaded already. Extract Logstash using
the following command:

tar -zxf logstash-6.3.1.tar.gz

10. Logstash is going to retrieve the blogs from a SQL database and index them
into your running Elasticsearch instance. This is accomplished using a
Logstash config file that has been provided for you named blogs_sql.conf.
You do not need to modify this file in any way, but here is what it looks like:
notice the "input" is all blogs from a table named "blogs", and the
"output" is an Elasticsearch index named "blogs":

input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://db_server:5432/"

jdbc_driver_class => "org.postgresql.Driver"

jdbc_driver_library => "/home/elastic/postgresql-9.1-901-1.jdb


jdbc_user => "postgres"
jdbc_password => "password"
statement => "SELECT * from blogs"

}
}

filter {
mutate {
remove_field => ["@version", "host", "message", "@timestamp",
}
}

output {

stdout { codec => "dots"}

www.ponybai.com/elastic/instructions/labs.html 4/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

elasticsearch {

index => "blogs"


}

11. To run Logstash with this config file, execute the following command:

./logstash-6.3.1/bin/logstash -f datasets/blogs_sql.conf

Logstash will take some time to start, so make sure to wait before you go to the
next step. Eventually the output will show dots, and each dot represents a
document being indexed into Elasticsearch. When Logstash is finished, you will
see a "Pipeline has terminated" message:

..................................................................
[2018-07-10T19:24:24,074][INFO ][logstash.pipeline ] Pipeli

12. The "blogs" table in the database contains 1594 entries. Let's verify that all
those entries are in Elasticseach. Execute the following command:

curl 'localhost:9200/blogs/_count?pretty'

You should see the following response:

{
"count" : 1594,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
}
}

www.ponybai.com/elastic/instructions/labs.html 5/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

13. Now that we have all blogs indexed, let's see how the search page works.
Access the blogs search page and play around with it.

Congratulations, the blog data is in Elasticsearch and the webpage is working!


Make sure to familiarize yourself with different searches, filtering by category
and date, and also paging. Over the next chapters you are going to learn how
to write the queries and aggregations used to build this application.

14. Next, let's index the access logs dataset. You will use Filebeat to monitor the
log files and ingest every line. Filebeat is downloaded already on server1, but
you need to extract it using the following command:

tar -zxf filebeat-6.3.1-linux-x86_64.tar.gz


www.ponybai.com/elastic/instructions/labs.html 6/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

15. Run the following command to start Filebeat using the filebeat.yml
configuration file that has been provided. There will be no output - the
command prompt will just hang:

./filebeat-6.3.1-linux-x86_64/filebeat -c datasets/filebeat.yml

16. While Filebeat is running, open a third Terminal tab in the Strigo UI and SSH
onto server1 again:

ssh server1

17. There are 1,751,476 events that occurred between 2017-04-01 and 2017-09-01
across multiple log files. They will be ingested into 3 indices called
"logs_server1", "logs_server2" and "logs_server3". Use the
following command to check how many files have been ingested so far:

curl 'localhost:9200/logs_server*/_count?pretty'

The ingestion might take a few minutes, feel free to go to the next task before
all logs are indexed. When all of the log events are indexed, you can kill the
Filebeat process using Ctrl+c.

18. Now that you have the blogs and access log data in Elasticsearch, you will use
Kibana to analyze it. First, extract Kibana with the following command:

tar -zxf kibana-6.3.1-linux-x86_64.tar.gz

19. Start Kibana:

./kibana-6.3.1-linux-x86_64/bin/kibana --host=0.0.0.0

You should see the following output:

log [19:34:04.836] [info][license][xpack] Imported license inform


log [19:34:34.398] [info][listening] Server running at http://0.0

www.ponybai.com/elastic/instructions/labs.html 7/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

20. Now that Kibana is running, click here to connect.

21. You need to tell Kibana which indexes to analyze, which is accomplished by
defining an Index Pattern:
Click on the Management link in the left-hand toolbar of Kibana, then click
on Index Patterns:

Type logs_server* in the Index pattern field:

Click Next step and select @timestamp as the Time Filter field name
field:

www.ponybai.com/elastic/instructions/labs.html 8/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Click the Create index pattern button. You should see a page that shows
the 39 fields of the web log documents.

22. Now that we have set up our index pattern, let's visualize the data. You will first
create a table with the top 5 most popular blogs:
Click on Visualize in the left-hand toolbar of Kibana, and then click the
Create a visualization button:

Select Data Table from the Data options:

www.ponybai.com/elastic/instructions/labs.html 9/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Click on logs_server*. (This is the index pattern we created, which


defines from which indices we will read data.)

Click on Split Rows under Buckets:

Select Terms under Aggregation:

www.ponybai.com/elastic/instructions/labs.html 10/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Select originalUrl.keyword under Field:

Click on Apply changes (the button that looks like a "play" button):

www.ponybai.com/elastic/instructions/labs.html 11/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Kibana tells you "No results found". That's because Kibana defaults to
showing the last 15 minutes of data. Our data is older than that. In the top
right, use the Time Picker to change Last 15 minutes into Last 2 years:

You should see the 5 most popular URLs on Elastic's blog site.
Save your visualization by clicking on Save in the top-right menu and
name the visualization "Top 5 Blogs data table". Then click the Save
button to save the visualization.
Click the "Visualize" link in the left-hand column of Kibana and you
should see your data table in the list of saved visualizations.

23. Let's build another visualization - a histogram that shows the distribution of the
size of the responses from our website:
www.ponybai.com/elastic/instructions/labs.html 12/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Click on Visualize and then on the "+" button to create a new visualization:

Select Vertical Bar under Basic Charts, then select the logs_server*
index pattern.
Under Buckets choose X-Axis
Under Aggregation, choose Histogram.
Select Field: response_size and enter 5000 for the Minimum Interval.
Click Apply changes to view the bar chart:

Save the visualization, giving it the name "Response Size bar chart"

24. After creating visualizations, you can combine them onto a single dashboard:
Click on Dashboard in the left-hand toolbar of Kibana, then click on the
Create a dashboard button:

www.ponybai.com/elastic/instructions/labs.html 13/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Click on the Add menu item and add both of the visualizations that you
saved earlier:

Click the Add menu item again to close the "Add Panels" dialog
Feel free to resize the visualizations and move them around:

www.ponybai.com/elastic/instructions/labs.html 14/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Save your dashboard by clicking Save in the menu and naming your
dashboard "Web Logs Dashboard". You should now see your saved
dashboard on the "Dashboard" page of Kibana, so you can come back
and view the dashboard at any time.

Summary: You have successfully started and used Elasticsearch, Kibana, Filebeat,
and Logstash. You used Logstash to connect to the PostgreSQL database and read
all records from the blogs table and write them as documents into Elasticsearch.
You used Filebeat to monitor log files and write log events into Elasticsearch. Then,
you used Kibana to read that data and transform it into nice visualizations,
combined onto a single dashboard. This is a little bit of what the Elastic Stack can
do. The remainder of this course will focus on the heart of the Elastic Stack:
Elasticsearch!

End of Lab 1

Lab 2: Getting Started with


Elasticsearch

www.ponybai.com/elastic/instructions/labs.html 15/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Objective: In this lab, we will use Kibana's Console to interact with Elasticsearch.
We will check basic cluster information and then change a few cluster
configurations. Finally, we will define the structure of a few documents and perform
CRUD operations.

1. Elasticsearch and Kibana are already running from the previous lab. In Kibana,
click on the Dev Tools button in the side navigation pane to open the Console
application:

2. Kibana has several handy keyboard shortcuts that will make your time spent in
Console more efficient. To view the keyboard shortcuts, click on "Help" in the
upper-right corner. For example, pressing Ctrl + Enter (Cmd + Enter on
a Mac) submits a request without having to use your mouse.

TIP: Click on "Help" again to close the Help panel.

3. Notice there is a match_all query already written in the Console. Go ahead


and run it by clicking the green "play" button that appears to the right of the
command, or using the Ctrl/Cmd + Enter keyboard shortcut. This query
hits all documents in all indexes of your cluster, but by default only 10
documents are returned.

www.ponybai.com/elastic/instructions/labs.html 16/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

4. The match_all query is implicit if no query is defined. Below the default


match_all request, enter the following request and see that the results are
the same.

GET _search

5. Below the previous _search request, enter the following request to get basic
cluster and node information.

GET /

When you execute this request, you should see a response similar to:

"name": "Y-fW5AH",
"cluster_name": "elasticsearch",
"cluster_uuid": "14TRgGKJSA2zI0wsP98DAg",
"version": {
"number": "6.3.1",
"build_flavor": "default",
"build_type": "tar",
"build_hash": "eb782d0",
"build_date": "2018-06-29T21:59:26.107521Z",

www.ponybai.com/elastic/instructions/labs.html 17/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"build_snapshot": false,
"lucene_version": "7.3.1",
"minimum_wire_compatibility_version": "5.6.0",

"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}

What is the version of your Elasticsearch instance?


Hide answer:
The version of the node is value of the "version.number" field (6.3.1
above).
What is the name of your node?
Hide answer:
The name of the node is the value of the "name" field (Y-fW5AH above,
but it will be different for your node).
What is the name of your cluster?
Hide answer:
In the output above, the cluster is named "elasticsearch". Your
cluster should be named "elasticsearch" also.
How did these names get assigned to your node and cluster?
Hide answer:
The cluster name "elasticsearch" is the default, and the node name
is randomly generated.

6. If you want to see what data you have in an Elasticsearch cluster, you can run
the following command to get a list of all indices:

GET /_cat/indices?v

You should see a response similar to:

health status index uuid pri rep docs.cou

yellow open logs_server3 RIQi8NtxROCeTtNIgpjF2A 5 1 5848


yellow open blogs Kprg3NgVQ_uMLKFtgvu5YA 5 1 159
yellow open logs_server2 dbVJ3RxTTQagltf3iKp-FA 5 1 58459
yellow open logs_server1 lSCHs2CURZ-VnkjyxoyEyw 5 1 5820
green open .kibana n59ZjM5KTDiVl7bBXzfROg 1 0

www.ponybai.com/elastic/instructions/labs.html 18/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Run the command again but without the ?v?


What does ?v add to the response?
Hide answer:
?v is used to include the column header in the response.
How many indices exist in the cluster and what are their names?
Hide answer:
There are 5 indices: .kibana, blogs, logs_server1, logs_server2
and logs_server3.
How many documents are indexed in the blogs index?
Hide answer:
There are 1594 documents index and the size of the index is
approximately 6.1mb.
What about the other values?
Hide answer:
We are going to discuss them in future chapters!.

7. Stop Elasticsearch by hitting Ctrl+c in the terminal tab where you have
Elasticsearch running. You should see the following output, and the command
prompt should also appear:

[2018-02-17T17:42:47,966][INFO ][o.e.n.Node ] [DES8K


[2018-02-17T17:42:48,002][INFO ][o.e.n.Node ] [DES8K
[2018-02-17T17:42:48,002][INFO ][o.e.n.Node ] [DES8K
[2018-02-17T17:42:48,011][INFO ][o.e.n.Node ] [DES8K
[elastic@server1 ~]$

8. The current cluster and node names have been automatically defined. Let's
change both to values that make more sense:
Open the elasticsearch.yml file using a text editor of your choice:
nano, emacs, or vi. If you are not familiar with command-line editors, we
suggest you use nano (click here for tips on using nano). You will find the
file in the elasticsearch-6.3.1/config directory. Notice that every
line in this file is commented out. For example, if you want to use nano:

nano elasticsearch-6.3.1/config/elasticsearch.yml

Set the name of the cluster to "my_cluster":

www.ponybai.com/elastic/instructions/labs.html 19/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

# Use a descriptive name for your cluster:


#
cluster.name: my_cluster

Save your changes to elasticsearch.yml. To save changes to a file


using nano, hit Ctrl+x to exit nano, then press "y" to save your
changes. Hit "Enter" when prompted for a file name.
Run the following command to start Elasticsearch with the node name
node1:

./elasticsearch-6.3.1/bin/elasticsearch -E network.host=0.0.0.0

9. Now check both cluster and indices information to see what changed:

GET /
GET /_cat/indices?v

The output should be very similar to the previous execution, except that the
cluster and node names have changed. Elasticsearch internally uses cluster
and node IDs - the cluster and node names are there for humans. This means
that you can change cluster and node names at any point without affecting the
existing data, but you still need to restart the cluster (so it's probably not
something you would do very often in your Production environment).

10. Elasticsearch is running using the default heap size, which is 1GB. Let's
change it to 512MB. Open the elasticsearch-
6.3.1/config/jvm.options file using a text editor of your choice. Update
both min (-Xms) and max (-Xmx) heap sizes to 512MB:

# Xms represents the initial size of total heap space


# Xmx represents the maximum size of total heap space

-Xms512m
-Xmx512m

www.ponybai.com/elastic/instructions/labs.html 20/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

11. Restart the node so your changes can take effect.

12. Imagine a dataset that is row oriented (e.g. spreadsheet or a traditional


databases). How would you write a JSON document based on sample entries
that look like the following? Think about field structure and empty fields:
id title category date author_first_name author_last_name author_company
Better July
1 query Engineering 15, Adrien Grand Elastic
execution 2015
The May
2 Story of 28, Boaz Leskes
Sense 2015
Hide answer:
Each entry contains a flattened structure, so author_first_name is not
specifically related to author_last_name and author_company. While in
relational databases you can use relations to model it, in document oriented
stores we can preserve structures within a single document. So the entry could
be written as:

{
"id": "1",
"title": "Better query execution",
"category": "Engineering",
"date":"July 15, 2015",
"author":{
"first_name": "Adrian",
"last_name": "Grand",
"company": "Elastic"
}
}

Notice that author is now an object with 3 fields. For the second entry we can
do the same thing to the author structure, but there are also empty fields. In a
document store, empty fields can be just omitted.

{
"id": "2",
"title": "The Story of Sense",

www.ponybai.com/elastic/instructions/labs.html 21/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"date":"May 28, 2015",


"author":{
"first_name": "Boaz",
"last_name": "Leskes"
}
}

Notice we don't have category or author.company fields in this document.


Other options would be to index these fields with a null value (similar effect to
omiting) or empty strings (completely different effect: the field exists and its
value is an empty string).

13. Now that we have defined the documents, let's index them. Notice that the id
field defined inside the documents is just a normal field like any other. The
actual document id is defined in the request URL when you index the
documents. Index both JSON documents into the my_blogs index. Use _doc
for the type and their respective ids.
Hide answer:

PUT my_blogs/_doc/1
{
"id": "1",
"title": "Better query execution",
"category": "Engineering",

"date":"July 15, 2015",


"author":{
"first_name": "Adrian",
"last_name": "Grand",
"company": "Elastic"
}
}
PUT my_blogs/_doc/2
{
"id": "2",
"title": "The Story of Sense",
"date":"May 28, 2015",
"author":{

www.ponybai.com/elastic/instructions/labs.html 22/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"first_name": "Boaz",
"last_name": "Leskes"
}
}

14. The index operation can also be executed without specifying the id. In such a
case, you should use a POST instead of a PUT and an id will be generated
automatically. Index the following document without an id and check the _id in
the response. Make sure you use POST.

{
"id": "57",
"title": "Phrase Queries: a world without Stopwords",
"date":"March 7, 2016",
"category": "Engineering",
"author":{
"first_name": "Gabriel",
"last_name": "Moskovicz"
}
}

Hide answer:

POST my_blogs/_doc/
{
"id": "57",
"title": "Phrase Queries: a world without Stopwords",
"date":"March 7, 2016",
"category": "Engineering",
"author":{
"first_name": "Gabriel",
"last_name": "Moskovicz"
}
}

www.ponybai.com/elastic/instructions/labs.html 23/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

15. Use a GET command to retrieve the document with id of 1 and type _doc from
the my_blogs index.
Hide answer:

GET my_blogs/_doc/1

16. Delete the document with the id of 2 from the my_blogs index. Verify it was
deleted by trying to GET it again. You should get the following response:

{
"_index": "my_blogs",
"_type": "_doc",
"_id": "2",
"found": false
}

Hide answer:

DELETE my_blogs/_doc/2

17. Finally, delete the my_blogs index.


Hide answer:

DELETE my_blogs

Summary: In this lab, you learned how to use console to interact with Elasticsearch.
You also checked basic cluster information and changed a few cluster
configurations. Finally, you defined the structure of a few documents and perform
CRUD operations.

End of Lab 2

www.ponybai.com/elastic/instructions/labs.html 24/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Lab 3: Querying Data


Objective: In this lab, you will write various queries that search documents in the
blogs index using Search API. You will use queries like match, match_phrase,
range and bool.

1. Write and execute a query that matches all the documents in the blogs index.
You should have a total of 1594 hits:

"hits": {
"total": 1594

Hide answer:

GET blogs/_search
{
"query": {
"match_all": {}
}
}

2. Add the "size" parameter to your previous request and set it to 100. You
should now see 100 blogs in the results.
Hide answer:

GET blogs/_search
{
"size": 100,
"query": {
"match_all": {}
}
}

3. Write and execute a query that hits all blog posts published in May, 2017. The
response should tell you that there are 44 hits in total, but notice you only get
www.ponybai.com/elastic/instructions/labs.html 25/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

back 10 actual hits because the default value of size for a query is 10:

"hits": {
"total": 44

Hide answer:
A date range query works well here:

GET blogs/_search
{
"query": {
"range": {
"publish_date": {
"gte": "2017-05-01",
"lte": "2017-05-31"
}
}
}
}

4. Write and execute a match query for blogs that have the term "elastic" in
the "title" field. Your should get 259 hits:

"hits": {
"total": 259,

Hide answer:

GET blogs/_search
{
"query": {
"match": {
"title": "elastic"
}
}
}

www.ponybai.com/elastic/instructions/labs.html 26/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

5. Now run a match query for "elastic stack" in the title field. Why did
the number of hits go up by adding a term to the match query?

"hits": {
"total": 262,

Hide answer:

GET blogs/_search
{
"query": {
"match": {
"title": "elastic stack"
}
}
}

The search returns blogs with "elastic" or "stack", so there are additional hits
from documents with only "stack" in the title.

6. Your search for "elastic stack" cast a very wide net. The top hits look
good, but the precision is not great for many of the lower-scoring hits,
especially those products that have "elastic" in the name but not "stack".
Change the operator of your previous match query to and, then run it again.
Notice this increases the precision, as there are now only 70 hits:

"hits": {
"total": 70,

Hide answer:

GET blogs/_search
{
"query": {
"match": {
"title": {
"query": "elastic stack",
"operator": "and"

www.ponybai.com/elastic/instructions/labs.html 27/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

}
}
}
}

7. Write a query for each of the following searches:


Blogs that have the word "search" in their "content" field.
Hide answer:
There are 432 blogs with the word "search" in their "content" field.

GET blogs/_search
{
"query": {
"match": {
"content": "search"
}
}
}

Blogs that have "search" or "analytics" in their "content" field.


Hide answer:
There are 476 blogs with the terms "search" or "analytics" in their
"content" field.

GET blogs/_search
{
"query": {
"match": {
"content": "search analytics"
}
}
}

Blogs that have "search" and "analytics" in their "content" field.


Hide answer:
There are 110 blogs with the terms "search" and "analytics" in their
"content" field.
www.ponybai.com/elastic/instructions/labs.html 28/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET blogs/_search
{
"query": {
"match": {
"content": {
"query": "search analytics",
"operator": "and"
}
}
}
}

8. Run a match_phrase search for search analytics in the content field


that returns the top 3 hits. You should get 6 hits total.
Hide answer:
There are 6 blogs with the terms search analytics in the content field
where the term analytics is one position after the term search.

GET blogs/_search
{
"size": 3,
"query": {
"match_phrase": {
"content": "search analytics"
}
}
}

9. The phrase "search and analytics" is fairly common in the blog content.
Update the previous match_phrase query so that it allows for 1 term (any
word - not just "and") to appear between "search" and "analytics". How
many hits do you see now?
Hide answer:
There are 41 blogs with the terms "search analytics" in the "content"
field where the term analytics is one or two positions after the term search.

www.ponybai.com/elastic/instructions/labs.html 29/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET blogs/_search
{
"query": {
"match_phrase": {
"content": {
"query": "search analytics",
"slop": 1
}
}
}
}

10. Run a query that answers the question: "Which blogs have performance or
optimizations or improvements in the content field?" You should get the
following hits:

"hits": {
"total": 374,

Hide answer:

GET blogs/_search
{
"query": {
"match": {
"content": "performance optimizations improvements"
}
}
}

11. Run a query that answers the question: "Which blogs have a content field
that includes at least 2 of the terms performance or optimizations or
improvements?" You should get the following hits this time:

"hits": {
"total": 82,

www.ponybai.com/elastic/instructions/labs.html 30/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Hide answer:

GET blogs/_search
{
"query": {

"match": {
"content": {
"query" : "performance optimizations improvements",
"minimum_should_match" : 2
}
}
}
}

12. Let's analyze the hits from the "performance optimizations


improvements" search:
What was the maximum _score?
Hide answer:
The exact max score may vary, but should be around 9.7.
If you were searching for "performance optimizations
improvements" to do some fine tuning in your deployment, do you see
any issues with some of the results from the group of documents with the
highest score?
Hide answer:
Most of them talk about improvements in a specific release of the Elastic
Stack. Those blog posts don't talk about performance tuning a cluster.

13. It looks like releases usually come with performance optimizations and
improvements. Assuming you do not want to upgrade your deployment, update
the previous query so that it must_not contain released or releases or
release in the title field. Your top hits look better now, and notice the
number of total hits dropped down to 47.
Hide answer:

GET blogs/_search
{
"query": {
www.ponybai.com/elastic/instructions/labs.html 31/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"bool": {
"must": [
{
"match": {
"content": {
"query": "performance optimizations improvements",
"minimum_should_match": 2
}
}
}
],
"must_not": [
{
"match": {
"title": "release releases released"
}
}
]
}
}
}

14. In the previous query, let's say we are more interested in blogs about
Elasticsearch. How could you rank the results so that the documents that
mention "Elasticsearch" score higher?
Hide answer:

There is no single "correct" answer here, but adding a should clause to the
previous query to match "elasticsearch" in the "category", "content"
or "title" field produces some relevant top hits. Below an example with
"title" field:

GET blogs/_search
{
"query": {

www.ponybai.com/elastic/instructions/labs.html 32/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"bool": {
"must": [
{
"match": {
"content": {
"query": "performance optimizations improvements",
"minimum_should_match": 2
}
}
}
],
"must_not": [
{
"match": {
"title": "release releases released"
}
}
],
"should": [
{
"match": {
"title": "elasticsearch"
}
}
]
}
}
}

Summary: In this lab, you became familiar with how to write search queries, one of
the most important and frequent tasks that an Elasticsearch Engineer needs to
perform. We still have a lot of search details to discuss in this course, but this lab
gives you an idea of how quickly and easily Elasticsearch can be used to search a
dataset.

End of Lab 3

www.ponybai.com/elastic/instructions/labs.html 33/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Lab 4: Text Analysis and Mappings


Objective: In this lab, you will become familiar with how to read an index mapping
and how to define your own mapping. Also, you will practice with the _analyze API
to explore the different ways to analyze text. Finally, you will use your text analysis
and mappings skills to re-create the blogs index with a better configuration.

1. Let's start this lab by analyzing the dynamic mapping behavior. Index the
following sample document into an index named tmp_blogs. Use "_doc" for
the type.

{
"title": "Elastic Cloud and Meltdown",
"publish_date": "2018-01-07T23:00:00.000Z",

"author": "Elastic Engineering",

"category": "Engineering",
"content": " Elastic is aware of the Meltdown and Spectre vulner
"url": "/blog/elastic-cloud-and-meltdown",

"locales": ["de-de","fr-fr"]

Hide answer:

PUT tmp_blogs/_doc/1

{
"title": "Elastic Cloud and Meltdown",
"publish_date": "2018-01-07T23:00:00.000Z",
"author": "Elastic Engineering",

"category": "Engineering",
"content": " Elastic is aware of the Meltdown and Spectre vulner
"url": "/blog/elastic-cloud-and-meltdown",
"locales": ["de-de","fr-fr"]
}

www.ponybai.com/elastic/instructions/labs.html 34/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

2. View the mapping of tmp_blogs that was automatically created.


Hide answer:

GET tmp_blogs/_mappings

Elasticsearch should return the following:

{
"tmp_blogs": {
"mappings": {
"_doc": {
"properties": {
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",

www.ponybai.com/elastic/instructions/labs.html 35/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"ignore_above": 256
}
}
},
"locales": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",

"ignore_above": 256
}
}
},
"publish_date": {
"type": "date"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
www.ponybai.com/elastic/instructions/labs.html 36/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

}
}

3. What would you change, in regards to data types, in the current mapping of
tmp_blogs in order to make it production-ready for our blogs dataset?
Hide answer:
Let's analyze every field mapped:
The publish_date field is correctly mapped as date.
The author field is a string and is mapped as both text and keyword,
which is great as we can search for authors, but also aggregate
(discussed later) them. Similarly, the fields title and URL are mapped
correctly as we may want to search and sort on those fields.
locales is an array of fixed strings and there is no need to search on it,
so we could update it to only keyword.
The category field could some times be used to search, but in general it
is a drop down menu or a list in the website. That means we can also map
it as a keyword only field.
We will never use the content field to sort or aggregate, only search. So,
we can change the mapping to only be text. However, the content can
be in a language different then English (e.g. French or German). Maybe
we should add multi-fields to also analyze this field in those languages. We
will discuss multi-fields a bit later in this lab.

4. Now that we know how the mapping should look like, delete the tmp_blogs
index and recreate the tmp_blogs index with correct mapping.
Hide answer:

DELETE tmp_blogs

PUT tmp_blogs
{
"mappings": {
"_doc": {
"properties": {
"publish_date": {
"type": "date"
},
"author": {

www.ponybai.com/elastic/instructions/labs.html 37/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"category": {
"type": "keyword"
},
"content": {
"type": "text"
},
"locales": {
"type": "keyword"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
www.ponybai.com/elastic/instructions/labs.html 38/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

}
}

5. Index the sample document from step 1:

PUT tmp_blogs/_doc/1

"title": "Elastic Cloud and Meltdown",

"publish_date": "2018-01-07T23:00:00.000Z",

"author": "Elastic Engineering",

"category": "Engineering",

"content": " Elastic is aware of the Meltdown and Spectre vulner


"url": "/blog/elastic-cloud-and-meltdown",
"locales": ["de-de","fr-fr"]

6. Search for engineering in the category field. Why is the document not
returned? Now search for Engineering. Does it work? Why?
Hide answer:
The first search for engineering (lowercase "e") is not a match because
category is a keyword field (which means that there is no text analysis) and
the document value has an uppercase "e". Therefore, only the second query
returns a hit.

GET tmp_blogs/_search
{
"query": {
"match": {
"category": "engineering"
}
}
}

GET tmp_blogs/_search
{

www.ponybai.com/elastic/instructions/labs.html 39/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"query": {
"match": {
"category": "Engineering"
}
}
}

7. Now that you are more familiar with mappings, let's take a look into text
analysis. Test the following text using the _analyze API. Do not specify an
analyzer - just use the default one:

"Introducing beta releases: Elasticsearch and Kibana Docker images

Hide answer:

GET _analyze
{

"text": "Introducing beta releases: Elasticsearch and Kibana Doc


}

8. What was the default analyzer used in the previous step?


Hide answer:
The standard analyzer is the default.

9. Test the text again, but use the whitespace analyzer. What is the difference?
Hide answer:

GET _analyze

{
"analyzer": "whitespace",
"text": "Introducing beta releases: Elasticsearch and Kibana Doc
}

www.ponybai.com/elastic/instructions/labs.html 40/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

The whitespace analyzer does not lowercase terms and does not remove
punctuation.

10. Change the analyzer a few more times, testing the stop, keyword and
english analyzers on the same text and comparing the different outputs.
Which analyzer do you think works the best for the blogs website?
Hide answer:

GET _analyze

"analyzer": "stop",
"text": "Introducing beta releases: Elasticsearch and Kibana Doc

GET _analyze
{

"analyzer": "keyword",

"text": "Introducing beta releases: Elasticsearch and Kibana Doc


}

GET _analyze
{
"analyzer": "english",
"text": "Introducing beta releases: Elasticsearch and Kibana Doc

It is hard to say which one is the best. Each dataset is different and has its own
details. You must test a lot and see which analyzer gives you the best results.
Often, you use a combination of multiple analyzers in different multi-fields. In
the blogs dataset, both the stop and the english analyzer seem like a good
choice.

11. Let's go one level lower than analyzers. Analyze the following text using the
standard tokenizer and the lowercase and snowball filters. Now, compare
the results with the output of the english analyzer. What is different between
these two techniques?

www.ponybai.com/elastic/instructions/labs.html 41/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"text": "This release includes mainly bug fixes."

Hide answer:

GET _analyze
{
"tokenizer": "standard",
"filter": ["lowercase","snowball"],
"text": "This release includes mainly bug fixes."
}

GET _analyze
{
"analyzer": "english",
"text": "This release includes mainly bug fixes."
}

Notice the english analyzer does not use the snowball stemmer. It uses the
english_stemmer, which stems "mainly" to "mainli" (instead of "main" for the
snowball stemmer). The english analyzer also removes stop words.

12. Using _analyze, configure and test an analyzer that satisfies the following:
uses the standard tokenizer
uses the lowercase token filter
uses the asciifolding token filter
Test your analyzer with the exact text here by copy-and-pasting it:

"text": "Elasticsearch é um motor de buscas distribuído."

Run the command with and without asciifolding to see its effect on special
characters.
Hide answer:

GET _analyze
{
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"],

www.ponybai.com/elastic/instructions/labs.html 42/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"text": "Elasticsearch é motor de buscas distribuído."


}

13. In many use cases, the built-in analyzers are not perfect. Imagine we want to
search for c++ or IT. Both the standard and the english analyzers will not
help much. Test the following sentence using the analyze API:

"text": "C++ can help it and your IT systems."

Hide answer:

GET _analyze
{
"analyzer": "english",
"text": "C++ can help it and your IT systems."
}

14. Next, we are going to build our own analyzer! Create an index named
analysis_test that has an analyzer named my_analyzer which satisfies
the following:
allow queries for C++ to match only documents that contain C++ (transform
c++ and C++ into cpp)
allow queries for IT to match only documents that contain IT and not it.
(for example, transform IT into _IT_ before lowercase)
lowercase all text
remove the default stop words
remove the following terms as well: can, we, our, you, your, all
Hide answer:

GET _analyze

{
"analyzer": "english",
"text": "C++ can help it and your IT systems."
}

PUT analysis_test

www.ponybai.com/elastic/instructions/labs.html 43/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"settings": {

"analysis": {

"char_filter": {
"cpp_it": {

"type": "mapping",
"mappings": ["c++ => cpp", "C++ => cpp", "IT => _IT_"]

}
},

"filter": {
"my_stop": {

"type": "stop",
"stopwords": ["can", "we", "our", "you", "your", "all"]
}
},

"analyzer": {
"my_analyzer": {
"tokenizer": "standard",

"char_filter": ["cpp_it"],
"filter": ["lowercase", "stop", "my_stop"]
}
}

}
}
}

GET analysis_test/_analyze
{
"analyzer": "my_analyzer",
"text": "C++ can help it and your IT systems."

15. Run the following two queries, which search for blogs that contain c++ and IT
in the content field. Notice you get bad results in both cases, which is
because of how the default standard analyzer processes c++ and IT.

www.ponybai.com/elastic/instructions/labs.html 44/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET blogs/_search
{
"query": {
"match": {
"content": "c++"
}
}
}

GET blogs/_search
{
"query": {
"match": {
"title": "IT"
}
}
}

16. Let's setup a new blogs index with the same data, but using a better, more
appropriate analyzer:
Create a new index named blogs_analyzed that uses your custom
my_analyzer from the previous step
Add a multi-field to both the content and title fields named
my_analyzer. These multi-fields should be of type text and set the
analyzer to my_analyzer.
Hide answer:

PUT blogs_analyzed

{
"settings": {

"analysis": {
"char_filter": {

"cpp_it": {
"type": "mapping",

"mappings": [
"c++ => cpp",

www.ponybai.com/elastic/instructions/labs.html 45/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"C++ => cpp",


"IT => _IT_"

]
}

},
"filter": {

"my_stop": {
"type": "stop",

"stopwords": [
"can",

"we",
"your",

"it",
"all"

]
}

},
"analyzer": {

"my_analyzer": {
"tokenizer": "standard",

"char_filter": [
"cpp_it"

],
"filter": [

"lowercase",
"stop",

"my_stop"
]

}
}

}
},

"mappings": {
"_doc": {

"properties": {
"author": {

"type": "text",
www.ponybai.com/elastic/instructions/labs.html 46/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256
}

}
},

"category": {
"type": "keyword"

},
"content": {

"type": "text",
"fields": {

"my_analyzer": {
"type": "text",

"analyzer": "my_analyzer"
}

}
},

"publish_date": {
"type": "date"

},
"locales": {

"type": "keyword"
},

"title": {
"type": "text",

"fields": {
"keyword": {

"type": "keyword",
"ignore_above": 256

},
"my_analyzer": {

"type": "text",
"analyzer": "my_analyzer"

}
}
www.ponybai.com/elastic/instructions/labs.html 47/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

},
"url": {

"type": "text",
"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256
}

}
}

}
}

17. Run the following command to index the current blogs into your new
blogs_analyzed index:

POST _reindex
{

"source": {"index": "blogs"},


"dest": {"index": "blogs_analyzed"}

18. Rerun the searches in the new index using the .my_analyzer field and
compare results.
Hide answer:
You should only get 1 hit for "IT" and 2 hits for "c++" on the .my_analyzer
fields:

GET blogs_analyzed/_search
{

"query": {
"match": {

www.ponybai.com/elastic/instructions/labs.html 48/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"title.my_analyzer": "IT"
}

}
}

GET blogs_analyzed/_search

{
"query": {

"match": {
"content.my_analyzer": "c++"

}
}

19. OPTIONAL: Our other dataset is a collection of access logs, with a sample
document shown below. Put this document into a new index named
logs_temp and view the mapping that Elasticsearch creates for you. How
effective is this default mapping? Notice that in time-series log data we
probably don't have a lot fields that are both text and keyword.

{
"@timestamp": "2017-07-17T00:57:51.548Z",

"method": "GET",
"level": "info",
"user_agent": "Amazon CloudFront",
"language": {

"url": "/blog/introducing-machine-learning-for-the-elastic-sta
"code": "ko-kr"
},

"response_size": 54149,
"geoip": {
"country_name": "Singapore",
"continent_code": "AS",
"country_code3": "SG",

"location": {
"lon": 103.8,

www.ponybai.com/elastic/instructions/labs.html 49/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"lat": 1.3667
},

"country_code2": "SG"
},

"host": "server1",
"status_code": 200,

"originalUrl": "/kr/blog/introducing-machine-learning-for-the-el
"runtime_ms": 109,
"http_version": "1.1"

Summary: In this lab, you became familiar with how to read an index mapping and
how to define your own mapping. Also, you viewed the behavior of the various
analyzers using the _analyze API. Finally, you used your text analysis and
mappings skills to re-create the blogs index with a better configuration.

End of Lab 4

Lab 5: The Distributed Model


Objective: In this lab, you will become familiar with the cluster state, node roles and
shard distribution. You will create new indices with a specific number of shards and
replicas. Then, you will add two extra nodes to the cluster and analyze shard
allocation. Finally, you will update the current nodes to become dedicated nodes.

1. Before you make changes to your cluster, let's start by understanding its
current state. Use the Cluster State API to answer the following questions.
What is the cluster name?
How many nodes are there in the cluster?
Which node is the elected master node?
How many indexes are there in the cluster?
How many shards does the index logs_server2 have? And in which
node are they allocated?
www.ponybai.com/elastic/instructions/labs.html 50/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

You probably know the answers to these questions because your current
cluster only has 1 node and uses a lot of default settings, but these are great
questions to ask about any cluster.
Hide answer:

GET _cluster/state

The cluster name should be my_cluster. You only have a single node, which
is the elected master node. You should have at least 5 indexes in the cluster.
logs_server2 has 5 primary shards and 5 replicas shards, but only the
primaries are allocated, and they are all allocated to the only node in cluster.

2. Another way to answer some of the previous questions is to use the _cat API,
which often returns less information per API, but it is typically easier to read.
Run each of the following commands:

GET _cat/nodes?v

The nodes command gives you information about all the nodes in the cluster,
including their roles. You can see that node1 is a master-eligible, data,
and ingest node (mdi) and that node1 is the elected-master in the cluster.
(Remember to use ?v to see the header.)

GET _cat/indices?v

The indices command gives you information about the indexes in the cluster.
You can see the number of primaries, replicas, documents and deleted
documents. You can also see the size on disk for primaries and in total
(primaries + replicas). Finally, notice that every index has a name and a uuid.
(Index health and status will be discussed later.)

GET _cat/shards?v

The shards command gives you information about the shards in the cluster.
You can see information about primaries and replicas, the number of
documents, size on disk, and which node a shard has been allocated. If you

www.ponybai.com/elastic/instructions/labs.html 51/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

are interested in a specific index, add the index name after /shards, like the
following:

GET _cat/shards/logs_server2?v

3. Create an index named test with 4 primary shards and 2 replicas.


Hide answer:

PUT test

{
"settings": {

"number_of_shards": 4,
"number_of_replicas": 2

}
}

4. Use the following to check the shard allocation of your test index. You should
see 4 primary shards and 8 unallocated replica shards.

GET _cat/shards/test?v

5. Let's scale our cluster by adding another node, but first you need to make a few
changes to node1. Stop Elasticsearch on server1.

6. Add the following setting to elasticsearch.yml on server1. Setting


minimum_master_nodes to 2 is a critical step in building a 3-node cluster
that has mulitlpe master nodes:

discovery.zen.minimum_master_nodes: 2

node.name: node1
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["server1", "server2", "server3"]

www.ponybai.com/elastic/instructions/labs.html 52/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

7. Startup Elasticsearch on server1 using the following command:

./elasticsearch-6.3.1/bin/elasticsearch

Your node will not completely startup, because there are not enough master
nodes to cause an election to occur. You should see a warning similar to the
following:

[WARN ][o.e.d.z.ZenDiscovery ] [node1] not enough master nodes

8. Open a new terminal tab in the Strigo UI and ssh onto server2:

ssh server2

9. Extract Elasticsearch on server2:

tar -zxf elasticsearch-6.3.1.tar.gz

10. Start a node with the following characteristics. Configure the settings in the
appropriate config files (either elasticsearch.yml and jvm.options):
the name of the node is node2
it joins my_cluster
it discovers the cluster via server1 or server2 or server3
binds and publishes to the site-local address (_site_)
set discovery.zen.minimum_master_nodes to 2
set the min and max heap size to 512m
You should see the following output:

[2018-02-16T10:51:13,092][INFO ][o.e.c.s.ClusterApplierService] [n
[2018-02-16T10:51:13,321][INFO ][o.e.h.n.Netty4HttpServerTransport]
[2018-02-16T10:51:13,323][INFO ][o.e.n.Node ] [node2]

Hide answer:
The elasticsearch-6.3.1/config/elasticsearch.yml configuration
file should look like:

www.ponybai.com/elastic/instructions/labs.html 53/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

cluster.name: my_cluster

node.name: node2
network.host: _site_

discovery.zen.ping.unicast.hosts: ["server1", "server2", "server3"]

discovery.zen.minimum_master_nodes: 2

The elasticsearch-6.3.1/config/jvm.options configuration file


should contain:

-Xms512m
-Xmx512m

Start Elasticsearch using the following command:

./elasticsearch-6.3.1/bin/elasticsearch

11. Use _cat/nodes to view the nodes in the cluster. You should see two nodes
now:

ip heap.percent ram.percent cpu load_1m load_5m load_15m n


172.18.0.2 31 98 11 0.01 0.15 0.08 m
172.18.0.3 37 98 11 0.01 0.15 0.08 m

Hide answer:

GET _cat/nodes?v

12. Use the _cat/shards API to see the updated shard information of the test
indices. Notice that 4 replicas are now allocated to node2, but there are still 4
replicas not allocated (unassigned). Remember that Elasticsearch never
allocates more than 1 copy of the same shard to the same node.
Hide answer:

GET _cat/shards/test?v

www.ponybai.com/elastic/instructions/labs.html 54/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

13. Scale your cluster even more by adding a third node. Open another terminal
window in the Strigo UI and ssh onto server3:

ssh server3

14. Extract Elasticsearch on server3:

tar -zxf elasticsearch-6.3.1.tar.gz

15. Modify elasticsearch.yml on server3 with the following settings:

cluster.name: my_cluster
node.name: node3
network.host: _site_

discovery.zen.ping.unicast.hosts: ["server1", "server2", "server3"]

discovery.zen.minimum_master_nodes: 2

Also, change the heap sizes in jvm.options:

-Xms512m

-Xmx512m

16. Start Elasticsearch on server3. You should see the following output:

[2018-02-18T23:26:37,435][INFO ][o.e.b.BootstrapChecks ] [node3]


[2018-02-18T23:26:40,588][INFO ][o.e.c.s.ClusterApplierService] [n
[2018-02-18T23:26:40,892][INFO ][o.e.h.n.Netty4HttpServerTransport]
[2018-02-18T23:26:40,893][INFO ][o.e.n.Node ] [node3]

17. Use _cat/nodes to verify that you have 3 nodes in your cluster.

18. Use the _cat/shards API to see the updated shard information of the test
index. Also, look into the logs_server2 index and see how the shards have
been distributed.
www.ponybai.com/elastic/instructions/labs.html 55/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET _cat/shards/test?v&s=node,shard

Notice that test replicas are now allocated to node3. Also notice that we
added &s=node,shard to sort by node and shard, so it is easier to see the
allocation.

GET _cat/shards/logs_server2?v&s=node,shard

Notice that node1 only contains primaries, node2 only contains replicas, and
node3 contains both primary and replica shards.

19. Update the test index to have 0 replicas and check the new shard distribution.
Hide answer:

PUT test/_settings

{
"settings": {

"number_of_replicas": 0
}

GET _cat/shards/test?v&s=node,shard

20. Stop the master node (probably node1) and check the terminal output in the
other nodes. Among the logged lines you will see something similar to the
following:

[2018-02-16T17:41:09,545][INFO ][o.e.d.z.ZenDiscovery ] [node2]


[2018-02-16T17:41:09,546][WARN ][o.e.d.z.ZenDiscovery ] [node2]
{node2}{a5FAssHWTRGdPFJ9QHswbQ}{LgVYmizcRj6IQm_rdUX9Ig}{172.18
{node1}{srdV3_WySF2E5D5K1Fa-fg}{B6LcisamRsaE7jbAtfwROA}{172.18
{node3}{agVRPI-WSQ6PBl4g0SZJBQ}{_XqYJaBVRZWOmmil_4VJSQ}{172.18
[2018-02-16T17:41:12,566][INFO ][o.e.c.s.MasterService ] [node2]
[2018-02-16T17:41:12,584][INFO ][o.e.c.s.ClusterApplierService] [n

Notice the other nodes realized that the elected master left the cluster, and then
elected a new master.
www.ponybai.com/elastic/instructions/labs.html 56/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

21. Use _cat/nodes API to see the updated cluster information. Why do you get
back a 401 error?
Hide answer:

GET _cat/nodes

{
"statusCode": 401,

"error": "Unauthorized"
}

Kibana is configured to talk to node1. As node1 is down, Kibana cannot


connect to the cluster anymore. Dont't worry about it now, we will get it fixed
soon.

22. Next, you will practice with deploying dedicated nodes. Let's say that we want
node2 to be a dedicated master-eligible node and node1 and node3 to
be dedicated data nodes.
Stop all of your nodes that are still running
Update the configuration files on the three servers to reflect the above.
You will need to set minimum_master_nodes to 1, because you will only
have 1 master-eligible node.
Hide answer:
Add the following to the node2 configuration file:

node.master: true
node.data: false

node.ingest: false

Add the following to the node1 and node3 configuration files:

node.master: false
node.data: true

node.ingest: true

Start node2 first


Then start node3
Finally, start node1
www.ponybai.com/elastic/instructions/labs.html 57/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

23. Verify your nodes are configured the way you want:

GET _cat/nodes

You should see the following:

172.18.0.2 49 96 16 0.24 0.15 0.06 di - node1

172.18.0.4 48 96 11 0.24 0.15 0.06 di - node3

172.18.0.3 32 96 15 0.24 0.15 0.06 m * node2

You should see node2 as the master (by the asterisk in the master column),
and notice its role is only "m". Similarly, notice the role of node1 and node3 is
"di".

Summary: In this lab, you became familiar with the cluster state, node roles and
shard distribution. You created new indices with a specific number of shards and
replicas. Then, you added two extra nodes to the cluster and analyzed shard
allocation. Finally, you updated the current nodes to become dedicated nodes.

End of Lab 5

Lab 6: Troubleshooting Elasticsearch


Objective: In this lab you will become familiar with some of the errors that
Elasticsearch returns, and how to understand and fix some of them. Then, you will
execute search requests that return partial results and hunt down the cause.

1. Let's start with a very common error. Run the following command in Console.
What is the error returned? And how can you fix it?

PUT test/_doc/

{
"test": "test"

}
www.ponybai.com/elastic/instructions/labs.html 58/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Hide answer:
The error returned is 405 ("Method not allowed"). The error message from
Elasticsearch is saying that POST is allowed, but not PUT, which is correct as
our document does not have an id. To fix this problem you can either give an id
to the document or use a POST instead of a PUT.

2. Run the following command. What is the error returned? And how can you fix
it?

GET test1/_doc/1

Hide answer:
The error returned is 404 ("Not found"). The error message from Elasticsearch
is clear and states that the index does not exist. You probably want to make
sure you have the correct index name. If you do, you want to investigate why
the index does not exist or was deleted.

3. Run the following command. What is the error returned? And how can you fix
it?

GET blogs/_search
{

"query": {
"match": {

"title": "open source software",


"minimum_should_match": 2

}
}

Hide answer:
The error returned is a 400 ("Bad Request"). There was a parsing exception as
the match query does not support multiple fields. In order to use the
minimum_should_match parameter you need to use the extended form of
the match query, like the following:

www.ponybai.com/elastic/instructions/labs.html 59/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET blogs/_search

{
"query": {

"match": {
"title": {

"query": "open source software",


"minimum_should_match": 2

}
}

}
}

4. Run the following command. What is the error returned? And how can you fix
it?

GET blogs/_search
{"query":{"match":{"title":{"query":"open source software,"minimum_

Hide answer:
The error returned is 500 ("Internal Server Error"). The error message from
Elasticsearch is clear and states that there was a JSON parse exception. The
JSON is indeed not valid, which happens often when you have it in a single
line, or you have a string in your application with the JSON that should be
executed. To fix the above query, add a double quote after software.

5. NOTE: This step and some of the following steps might have slightly
different results when you execute them, because Elasticsearch does not
always allocate the same shards to the same nodes.

Run the following commands that index two documents and run a query. What
is the error returned from the query? And how can you fix it?

PUT test1/_doc/1

{
"date": "2017-09-10"

}
www.ponybai.com/elastic/instructions/labs.html 60/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

PUT test2/_doc/1
{

"date": "September 10, 2017"


}

GET test*/_search

{
"query": {

"match": {
"date": "September"

}
}

Hide answer:
No error code is returned in this example, the status returned is 200 ("OK").
However, the search results state that out of 14 shards, only 8 were successful
and that 5 shards failed. WHAT? One shard is missing?! We will look into that
in a bit. For now, by looking into the failures, you can see that there was an
exception in which September could not be parsed as a date. The example is
very simple, but this happens often when your search request contains a
wildcard in the index name and you have fields with the same name but
different types. If the date field in the test1 index should be a "string", make
sure define the mapping before indexing the first document (or correct the
mapping and reindex the data). If mappings are correct, feel free to ignore the
failures and work with the partial results.

6. Notice that in the previous exercise, there were a total of 14 shards:


5 from test1
5 from test2
4 from test, which was the index created in the previous chapter
Out of the 14 shards, 5 shards from test1 had an error during query execution
and, therefore, failed. However, there is one shard that is not in the results nor
in the failures. Let's investigate what is wrong.

First, check how the cluster is doing by viewing the cluster health. Is everything
ok? What is the problem?
Hide answer:
www.ponybai.com/elastic/instructions/labs.html 61/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET _cluster/health

The cluster is on red status, which means that at least one primary shard is not
available. Looking at the cluster health API response we can see that there is 1
unassigned shard.

7. Let's keep digging. Figure out which index is red and which shard is
unassigned. Feel free to use the _cluster or _cat APIs.
Hide answer:
Let's figure out which index has the problem. Using the Cluster API, the
command would be:

GET _cluster/health?level=indices

Using the _cat API:

GET _cat/indices

Notice the test index is red. Run the following to check which shard has the
issue:

GET _cluster/health/test?level=shards

Using the _cat API, the similar command would be:

GET _cat/shards/test

8. Now that we know which shard has the issue, use the cluster allocation explain
API to understand it. Can you discover what is problem?
Hide answer:
You can run the short form of the allocation API because there is only one
unassigned shard in the cluster. If you have more unassigned shards in the
cluster, you would need to run the long form.

GET _cluster/allocation/explain

www.ponybai.com/elastic/instructions/labs.html 62/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET _cluster/allocation/explain

{
"index": "test",

"shard": 3,
"primary": true

Change the value of "shard" in the request above into the shard that you
identified earlier as being unassigned.

9. Analyzing the results, you can see that there was no valid copy during a cluster
recovery (which was during your restarting of the cluster). In more detail, the
primary shard existed, but can no longer be found on the nodes in the cluster.
The test index had no replicas and you changed node2 to be a dedicated
master-eligible node. So, the primary shard that was allocated to node2 is now
lost.

10. We should get the cluster back into green status! What can you do to achieve
this? Below you will find a few different solutions. Be careful, because after you
execute one of them you might not be able to try the others. Use the following
to test if the problem is fixed and all indices are green again.

GET _cat/indices

Simple solution:
Hide answer:
The data is still stored in node2. We just need to recover it with the following
steps.
Update node2 configuration to have node.data: true
Restart node2
Problem solved, but node2 is not a dedicated master-eligible node as we
wanted

Medium solution:
Hide answer:
After applying the previous solution, we need to make node2 dedicated
master-eligible again. But first we need make sure we have a copy of the

www.ponybai.com/elastic/instructions/labs.html 63/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

previously unassigned shard in another node. One easy way of doing that is to
dynamically change the number of replicas from 0 to 1 and then changing it
back, like the following:
Increase the number of replicas of index test to 1:

PUT test/_settings

{
"settings": {

"number_of_replicas": 1
}

Update node2 configuration to have node.data: false


Restart node2
Decrease the number of replicas of index test to 0:

PUT test/_settings
{

"settings": {
"number_of_replicas": 0

}
}

Problem completely solved

Advanced solution:
Hide answer:
The previous solution completely solves the problem, but it could involve a lot
of work as you increase and decrease the number of replicas. This solution
uses an advanced API that you should be very careful when using. It is called
Shard Allocation Filtering and it is discussed in the Elasticsearch Engineer II
course. Assuming the Simple solution above was applied, run the following:
Decommission node2 using its IP address:

PUT _cluster/settings

{
"transient" : {

"cluster.routing.allocation.exclude._ip" : "172.18.0.3"
}
www.ponybai.com/elastic/instructions/labs.html 64/107
1/31/2020 Lab Guide: Elasticsearch Engineer I
}

Update node2 configuration to have node.data: false


Restart node2 after all shards have been relocated
Problem cleanly solved

Very simple solution:


Hide answer:
A simple solution is to delete the test index. This is ok in many scenarios
where the cluster is red because of dummy indices that were created to do test
something. Just like we did.

DELETE test

Summary: In this lab you became familiar with some of the errors that Elasticsearch
returns, and how to understand and fix some of them. Then you executed search
requests that returned partial results and hunted down and fixed the cause.

End of Lab 6

Lab 7: Improving Search Results


Objective: In this lab, you will learn how to implement features that users would
expect to find in any good search application which might enable them to explore
and retrieve the documents most relevant to their need using correct ergonomics
and efficient use of system resources.

1. Start by running a query on the blogs index for "open source" in the
content field. Then run a second query for "open source" on the "title"
field. Compare the total hits and top hits returned from both queries. Do the top
hits look relevant? Which query had more hits?
Hide answer:

www.ponybai.com/elastic/instructions/labs.html 65/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

GET blogs/_search

{
"query": {

"match": {
"content": "open source"

}
}

GET blogs/_search
{

"query": {
"match": {

"title": "open source"


}

}
}

Searching for "open source" in the "title" returns hits that look more
relevant, and it also returns far fewer hits.

2. Combine the hits of the last two searches by writing a multi_match query
that searches both the "content" and "title" fields for "open source".
Does the multi_match deliver more or fewer hits? How did this affect the
relevance?
Hide answer:

GET blogs/_search

{
"query": {

"multi_match": {
"query": "open source",

"fields": [
"title",

"content"

www.ponybai.com/elastic/instructions/labs.html 66/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

}
}

The multi_match delivered many more hits, and the top hits maintained their
relevance.

3. Modify your multi_match query by giving the title field a boost of 2. How
does the score of the top hit compare to the previous query without the boost?
Hide answer:

GET blogs/_search
{

"query": {
"multi_match": {

"query": "open source",


"fields": [

"title^2",
"content"

]
}

}
}

The score of the top hit was multiplied by 2.

4. Boost affects the score without impacting recall or precision. Remove the boost
and modify your multi_match query to perform a phrase query, which
increases precision (perhaps at the expense of recall). Did the increase in
precision return more or fewer hits?
Hide answer:

GET blogs/_search

www.ponybai.com/elastic/instructions/labs.html 67/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"query": {
"multi_match": {

"query": "open source",


"fields": [

"title",
"content"

],
"type": "phrase"

}
}

The increased precision reduced the number of hits.

5. Let's see what happens when a user misspells their query. Run the following
request, which searches for "oven sauce" in the "title" field. Notice
_source is used to filter the JSON response to show only the title field of
the document:

GET blogs/_search

{
"_source": "title",

"query": {
"match": {

"title": "oven sauce"


}

}
}

Were there any hits returned?


Hide answer:
The query does not return any hits.

6. Try increasing the recall (perhaps at the expense of precision) by adding the
fuzziness parameter, permitting a maximum of 2 edits per word. Did the
increased recall return more of fewer hits? How relevant are they?
www.ponybai.com/elastic/instructions/labs.html 68/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Hide answer:

GET blogs/_search
{

"_source": "title",
"query": {

"match": {
"title": {

"query" : "oven sauce",


"fuzziness": 2

}
}

}
}

There were lots of hits. Many of them do not seem relevant - allowing 2 edits on
a 4 letter word with 2 vowels is not great for precision.

7. Modify your query so that Elasticsearch uses the auto fuzziness level. Were
more or fewer hits returned? How relevant are they?
Hide answer:

GET blogs/_search

{
"_source": "title",

"query": {
"match": {

"title": {
"query" : "oven sauce",

"fuzziness": "auto"
}

}
}

www.ponybai.com/elastic/instructions/labs.html 69/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

There are fewer hits, and they seem more relevant than a majority of the hits
from the previous request.

8. Write a match_phrase query that searches for "elastic stack" in the


"content" field. Sort the results so that they are returned from newest to
oldest (based on the "publish_date" field).
Hide answer:

GET blogs/_search
{

"query": {
"match_phrase": {

"content": {
"query" : "elastic stack"

}
}

},
"sort": [

{
"publish_date": {

"order": "desc"
}

}
]

9. Modify your previous query so that the results are sorted first by author name
ascending, then from newest to oldest.
Hide answer:

GET blogs/_search
{

"query": {
"match_phrase": {

"content": {

www.ponybai.com/elastic/instructions/labs.html 70/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"query" : "elastic stack"


}

}
},

"sort": [
{

"author.keyword": {
"order": "asc"

}
},

{
"publish_date": {

"order": "desc"
}

}
]

10. Our web application only shows three blog hits at a time. Modify the previous
query so that it only returns the top 3 hits.
Hide answer:

GET blogs/_search
{

"size": 3,
"query": {

"match_phrase": {
"content": {

"query" : "elastic stack"


}

}
},

"sort": [
{

"author.keyword": {
"order": "asc"

www.ponybai.com/elastic/instructions/labs.html 71/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

},
{

"publish_date": {
"order": "desc"

}
}

]
}

11. Suppose a user clicks on page 4 of the search results from your previous
query. Write a query that returns the 3 hits of page 4.
Hide answer:

GET blogs/_search

{
"from": 9,

"size": 3,
"query": {

"match_phrase": {
"content": {

"query" : "elastic stack"


}

}
},

"sort": [
{

"author.keyword": {
"order": "asc"

}
},

{
"publish_date": {

"order": "desc"
}

www.ponybai.com/elastic/instructions/labs.html 72/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

12. Our web application highlights the search terms in the reponse in both the
"title" and "content" fields. Modify the previous query to implement
highlighting, enclosing the matched search terms with a <mark> HTML tag.

HINT: Your query does not search the "title" field, so you will need to add
"require_field_match": false to the highlight clause.
Hide answer:

GET blogs/_search
{

"size": 3,
"query": {

"match_phrase": {
"content": {

"query" : "elastic stack"


}

}
},

"sort": [
{

"author.keyword": {
"order": "asc"

}
},

{
"publish_date": {

"order": "desc"
}

}
],

"highlight": {
"fields": {

"title" : {},

www.ponybai.com/elastic/instructions/labs.html 73/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"content" : {}
},

"require_field_match": false,
"pre_tags": ["<mark>"],

"post_tags": ["</mark>"]
}

13. Our web application allows users to filter results by category. Modify your
previous query for the use case where a user has selected the
"Engineering" category while searching for "elastic stack".
TIP: Copy-and-paste your previous query in Kibana, delete its entire
"match_phrase" block, then start typing in your "bool" query. Kibana will
do a much better job of auto-completing the "query" section for you if you
start with an empty "bool" first.
Hide answer:

GET blogs/_search

{
"size": 3,

"query": {
"bool": {

"must": [
{

"match_phrase": {
"content": "elastic stack"

}
}

],
"filter": {

"match": {
"category.keyword": "Engineering"

}
}

}
},

www.ponybai.com/elastic/instructions/labs.html 74/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"sort": [

{
"author.keyword": {

"order": "asc"
}

},
{

"publish_date": {
"order": "desc"

}
}

],
"highlight": {

"fields": {
"title" : {},

"content" : {}
},

"require_field_match": false,
"pre_tags": ["<mark>"],

"post_tags": ["</mark>"]
}

Summary: You have implemented most of the search features used in our blog
search web application! In this lab, you became familiar with how to query multiple
fields effectively and how to manage precision and recall to refine or broaden the
query scope. You controlled the presentation of search hits using sort, pagination
and highlighting, as well as filtering results by a category.

End of Lab 7

Lab 8: Aggregating Data


www.ponybai.com/elastic/instructions/labs.html 75/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Objective: In this lab, you will become familiar with writing bucket and metrics
aggregations. Write aggregations to answers the questions asked below.

1. How many distinct URL requests were logged in the logs_server* indices?
URL requests are indexed in the "originalUrl" field. You should get around
35,000 as the result.
Hide answer:

GET logs_server*/_search

{
"size": 0,

"aggs": {
"my_url_value_count": {

"cardinality": {
"field": "originalUrl.keyword"

}
}

}
}

2. Add a query that limits the scope of your aggregation in the previous step to
only documents that contain the term "elastic" in the originalUrl field.
You should get around 4,000 as the result.
Hide answer:

GET logs_server*/_search

{
"size": 0,

"query": {
"match": {

"originalUrl": "elastic"
}

},
"aggs": {

"my_url_value_count": {
"cardinality": {

www.ponybai.com/elastic/instructions/labs.html 76/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"field": "originalUrl.keyword"
}

}
}

3. Which request is from the northernmost location, based on the


"geoip.location.lat" field?
Hide answer:

GET logs_server*/_search

{
"size": 0,

"aggs": {
"my_northernmost_request": {

"max": {
"field": "geoip.location.lat"

}
}

}
}

4. Using your result from the previous step, write a query that returns the log
event from the northern-most area. You should see an entry from somewhere in
Greenland.
Hide answer:

GET logs_server*/_search
{

"query": {
"range": {

"geoip.location.lat": {
"gte": 72

}
}

www.ponybai.com/elastic/instructions/labs.html 77/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

5. How many distinct values are there for the "status_code" field?
Hide answer:

GET logs_server*/_search
{

"size": 0,
"aggs": {

"status_code_cardinality": {
"cardinality": {

"field": "status_code"
}

}
}

6. Based on the previous agg, there are 6 unique values for "status_code".
How many requests are there for each of the 6 values?
Hide answer:

GET logs_server*/_search
{

"size": 0,
"aggs": {

"status_code_buckets": {
"terms": {

"field": "status_code"
}

}
}

www.ponybai.com/elastic/instructions/labs.html 78/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

7. A terms agg is sorted by doc_count by default. Modify your previous search


so that its terms are sorted alphabetically.
Hide answer:

GET logs_server*/_search

{
"size": 0,

"aggs": {
"status_code_buckets": {

"terms": {
"field": "status_code",

"order": {
"_key": "asc"

}
}

}
}

8. The following query has 165 hits. Our web app has a UI that allows users to
filter those hits by category, and notice that our UI shows how many hits belong
to each category. Add an aggregation to the following query that returns the
number of hits in each category.

GET blogs/_search
{

"query": {
"bool": {

"must": {
"multi_match": {

"query": "open source",


"fields": [

"title^2",
"content"

],
"type": "phrase"

www.ponybai.com/elastic/instructions/labs.html 79/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

}
}

}
},

"highlight": {
"fields": {

"title": {},
"content": {}

},
"require_field_match": false,

"pre_tags": [
"<mark>"

],
"post_tags": [

"</mark>"
]

}
}

Hide answer:

GET blogs/_search

{
"query": {

"bool": {
"must": {

"multi_match": {
"query": "open source",

"fields": [
"title^2",

"content"
],

"type": "phrase"
}

}
}

},
www.ponybai.com/elastic/instructions/labs.html 80/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"highlight": {

"fields": {
"title": {},

"content": {}
},

"require_field_match": false,
"pre_tags": [

"<mark>"
],

"post_tags": [
"</mark>"

]
},

"aggs": {
"category_terms": {

"terms": {
"field": "category.keyword",

"size": 10
}

}
}

9. How many log requests are there for each week?


Hide answer:

GET logs_server*/_search
{

"size": 0,
"aggs": {

"logs_by_week": {
"date_histogram": {

"field": "@timestamp",
"interval": "week"

}
}

www.ponybai.com/elastic/instructions/labs.html 81/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

}
}

10. For each week of log requests, how many requests were received from each of
the 6 values of "status_code"?
Hide answer:

GET logs_server*/_search

{
"size": 0,

"aggs": {
"logs_by_week": {

"date_histogram": {
"field": "@timestamp",

"interval": "week"
},

"aggs": {
"status_code_buckets": {

"terms": {
"field": "status_code"

}
}

}
}

}
}

11. We need to be sure we are delivering a suitable level of accuracy, so enable


show_term_doc_count_error in the terms query of the "status_code"
field. How precise are the buckets?
Hide answer:

GET logs_server*/_search

{
"size": 0,

www.ponybai.com/elastic/instructions/labs.html 82/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"aggs": {
"logs_by_week": {

"date_histogram": {
"field": "@timestamp",

"interval": "week"
},

"aggs": {
"status_code_buckets": {

"terms": {
"field": "status_code",

"show_term_doc_count_error": true
}

}
}

}
}

Our buckets are very precise! doc_count_error_upper_bound == 0 for


all buckets.

12. What are the top 20 cities that requests are received from? The city is the
"geoip.city_name.keyword" field.
Hide answer:

GET logs_server*/_search

{
"size": 0,

"aggs": {
"top_cities": {

"terms": {
"field": "geoip.city_name.keyword",

"size": 20
}

}
}

}
www.ponybai.com/elastic/instructions/labs.html 83/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

13. Is your city aggregation exact?


Hide answer:
Yes. If you set "show_term_doc_count_error" to true, you can see that
the "doc_count_error_upper_bound" is 0 for each of the top 20 cities.

GET logs_server*/_search
{

"size": 0,
"aggs": {

"top_cities": {
"terms": {

"field": "geoip.city_name.keyword",
"size": 20,

"show_term_doc_count_error": true
}

}
}

14. OPTIONAL: Congratulations, you have answered a lot of questions using


aggregations in Elasticsearch. Now you will render an aggregation in Kibana:
Click on the Visualize icon on the left hand side bar
Click on + Create a visualization
Click on the Vertical Bar icon
Click on the hyperlinked index pattern logs_server*
In the Buckets control select Bucket type X-Axis
In the Aggregation drop-down select Terms (you may have to scroll
down)
In the Field drop-down select status_code
Set Size to 6
Click on the Apply Changes icon and you should see the following bar
chart:

www.ponybai.com/elastic/instructions/labs.html 84/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Now Click on Metrics & Axes


In the Y-Axes control click on LeftAxis-1
Change Scale Type from Linear to Log and click Apply Changes
again to improve visibility of the smaller buckets:

Summary: In this lab, you used aggregations to gain an understanding of the


distribution and content of the web access logs dataset. You partitioned the
documents using structures implied in the metadata, evaluated the size and content
of these partitions, and drilled down deeper inside the partitions.

End of Lab 8
www.ponybai.com/elastic/instructions/labs.html 85/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Lab 9: Securing Elasticsearch


Objective: In this lab, you will secure your 3-node Elasticsearch cluster with Elastic
Security. You will then create users and roles for your secure cluster.

1. Elasticsearch comes with a basic license by default, which does not have
security available. So, let's start a trial in the next step.

2. To start a 30-day trial license you can use either the License Management UI or
the Start Trial API. Let's use the UI.
Click on the Management link in the left-hand toolbar of Kibana, then click
on License Management:

Kibana will show you that the basic license is active. Click on Start trial:

After that Kibana will show you a window that lists the features included in
the trial. Click on Start my trial:
www.ponybai.com/elastic/instructions/labs.html 86/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Finally, Kibana will show you that the trial license is active:

Note that you could also have started the 30-day trial license through the Start
Trial API:

POST _xpack/license/start_trial?acknowledge=true

3. After starting the trial license you will have security availabe, but not enabled.
This happens because the trial license comes with
xpack.security.enabled set to false by default. However, to have
security enabled you have to set it to true on each node of your cluster. So,
let's do that in the next steps.

4. Stop Elasticsearch on all nodes, and stop Kibana on server1.

5. Set xpack.security.enabled to true on each Elastichsearch node.


Hide answer:
www.ponybai.com/elastic/instructions/labs.html 87/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Add the following line to the configuration file (elasticsearch.yml) of each


Elasticsearch node (node1, node2 and node3).

xpack.security.enabled: true

6. Startup Elasticsearch on each of the 3 servers.

7. Try running the following command from a command prompt:

curl 'server1:9200/_cat/nodes?pretty'

You should get a security error, because you are trying to access a secure
cluster without any credentials:

{
"error" : {
"root_cause" : [
{
"type" : "security_exception",
"reason" : "missing authentication token for REST request
"header" : {
"WWW-Authenticate" : "Basic realm=\"security\" charset=\
}
}
],
"type" : "security_exception",
"reason" : "missing authentication token for REST request [/_c
"header" : {
"WWW-Authenticate" : "Basic realm=\"security\" charset=\"UTF
}
},
"status" : 401
}

8. You need to create some credentials, which is accomplished by running the


following script:

./elasticsearch-6.3.1/bin/elasticsearch-setup-passwords interactiv

You will be prompted for a password for each of four users. Just enter
password for all the passwords to keep things simple:
www.ponybai.com/elastic/instructions/labs.html 88/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Initiating the setup of passwords for reserved users elastic,kiban


You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y

Enter password for [elastic]:


Reenter password for [elastic]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [elastic]

9. Startup Kibana again on server1.

./kibana-6.3.1-linux-x86_64/bin/kibana --host=0.0.0.0

10. Try connecting to your Kibana instance from your web browser. You should see
the following warning that it is not possible to login:

11. Stop your Kibana instance.

www.ponybai.com/elastic/instructions/labs.html 89/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

12. The issue you are having now is that Kibana is not authenticated with your
Elasticsearch cluster, which can be accomplished by adding the "kibana"
username and the password you just defined in the elasticsearch-setup-
passwords script to the Kibana config file. Add the following settings to
kibana-6.3.1-linux-x86_64/config/kibana.yml:

elasticsearch.username: "kibana"

elasticsearch.password: "password"

13. Startup Kibana again on server1. Refresh Kibana in your web browser, and
you should be able to login this time. Login as the "elastic" user with
password "password":

14. Notice that you have gained a few tools in Kibana - based on the new links
down the left toolbar.

15. Click on the "Management" link in the left toolbar. Notice you have some new
settings to configure now that Elastic Security is enabled:

www.ponybai.com/elastic/instructions/labs.html 90/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

16. Click on the "Users" link. Notice there are four users defined - these are the
users that you assigned passwords to using the elasticsearch-setup-
passwords script earlier. Their passwords are all currently "password", but
you could change them here in Kibana. Notice the "elastic" user is
assigned the "superuser" role, meaning the "elastic" user can edit all
configurable settings of the cluster.

17. Go back to the "Management" page of Kibana and click on the "Roles" link.
Notice there are a lot of existing roles defined. These are used by Elasticsearch
and the other components of the Elastic Stack. Notice these roles are all
"Reserved" and you can not edit or delete them.

18. Create a new role named "blogs_readonly" that satisfies the following
criteria:
The user has no cluster privileges
The user only has access to indices that match the pattern blogs*
The index privileges are read, read_cross_cluster,
view_index_metadata, and monitor
Hide answer:

19. Create a new user named "blogs_user" that satisfies the following criteria:
password is "password"
enter Blog User for the name of the user

www.ponybai.com/elastic/instructions/labs.html 91/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

use your own email address


assign the user to two roles: blogs_readonly and kibana_user
Hide answer:

www.ponybai.com/elastic/instructions/labs.html 92/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

20. Logout of Kibana, and log back in as the blogs_user. From the Console, try
running the following commands. Notice the search works fine, but the other
commands fail due to security restrictions on blogs_user:

GET blogs*/_search

GET _cat/indices

PUT blogs/_doc/1

"a": "b"
}

DELETE blogs

21. You are going to want to log out as the blogs_user and log back in as the
elastic user.

Summary: In this lab, you secured your 3-node cluster using Elastic Security. You
also saw how to create a user that has limited access to specific indices. Note that
the trial includes other features besides Security, such as Alerting, Reporting,
Graph, and Machine Learning.

End of Lab 9

Lab 10: Best Practices


Objective: In this lab, you will perform some of the best practices discussed in
Chapter 10, like defining aliases, using index templates, bulk operations, scoll
searches, and backing up a cluster.

1. Your current cluster has three indices containing the web access logs. Suppose
we want all current indexing to occur only on logs_server3. Define an alias
www.ponybai.com/elastic/instructions/labs.html 93/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

named access_logs_write that points to logs_server3.


Hide answer:

POST _aliases

"actions": [
{

"add": {

"index": "logs_server3",

"alias": "access_logs_write"
}

2. Define an alias named access_logs_read that points to all three


logs_server* indices.
Hide answer:

POST _aliases

{
"actions": [

"add": {

"index": "logs_server*",
"alias": "access_logs_read"

]
}

3. Run the following query on your access_logs_read alias to verify it points to


all three indices. You should get 1,751,476 hits:

GET access_logs_read/_search

www.ponybai.com/elastic/instructions/labs.html 94/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

4. Define an index template named "access_logs_template" that matches


logs_server* and has the same index settings and mappings as your three
current logs_server* indices. Use 10 for the "order" value.
Hide answer:

PUT _template/access_logs_template

"index_patterns": "logs_server*",

"order": 10,

"settings": {
"number_of_shards": 5,

"number_of_replicas": 1

},

"mappings": {
"doc": {

"properties": {

"@timestamp": {

"type": "date"
},

"geoip": {

"properties": {

"city_name": {
"type": "text",

"fields": {

"keyword": {

"type": "keyword",
"ignore_above": 256

},
"continent_code": {

"type": "text",

"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256

www.ponybai.com/elastic/instructions/labs.html 95/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

},
"country_code2": {

"type": "text",

"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256

}
},

"country_code3": {

"type": "text",

"fields": {
"keyword": {

"type": "keyword",

"ignore_above": 256

}
}

},

"country_name": {

"type": "text",
"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256
}

},

"location": {

"properties": {
"lat": {

"type": "float"

},

"lon": {
"type": "float"

}
www.ponybai.com/elastic/instructions/labs.html 96/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

},
"region_name": {

"type": "text",

"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256

}
}

},

"host": {
"type": "text",

"fields": {

"keyword": {

"type": "keyword",
"ignore_above": 256

},
"http_version": {

"type": "text",

"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256

}
},

"language": {

"properties": {

"code": {
"type": "text",

"fields": {

"keyword": {
www.ponybai.com/elastic/instructions/labs.html 97/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"type": "keyword",
"ignore_above": 256

},
"url": {

"type": "text",

"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256

}
}

},

"level": {
"type": "text",

"fields": {

"keyword": {

"type": "keyword",
"ignore_above": 256

},
"method": {

"type": "text",

"fields": {

"keyword": {
"type": "keyword",

"ignore_above": 256

}
},

"originalUrl": {

"type": "text",

"fields": {
www.ponybai.com/elastic/instructions/labs.html 98/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"keyword": {

"type": "keyword",

"ignore_above": 256

}
}

},

"response_size": {

"type": "long"
},

"runtime_ms": {

"type": "long"

},
"status_code": {

"type": "long"

},

"user_agent": {
"type": "text",

"fields": {

"keyword": {

"type": "keyword",
"ignore_above": 256

}
}

5. Define a new index named logs_server4 and verify the


"access_logs_template" template was applied.
Hide answer:

PUT logs_server4

GET logs_server4

www.ponybai.com/elastic/instructions/labs.html 99/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Verify that logs_server4 contains the same mappings and settings defined
in the template.

6. In the same request, remove the access_logs_write alias to


logs_server3 and instead have it point to logs_server4.
Hide answer:

POST _aliases

"actions": [

{
"remove": {

"alias": "access_logs_write",

"index": "logs_server3"

}
},

"add": {

"index": "logs_server4",
"alias": "access_logs_write"

]
}

7. Index the following document using the access_logs_write alias, assigning


the "_id" to 1. Then GET the document in the logs_server4 index to verify
the alias worked successfully:

{
"@timestamp": "2018-03-21T05:57:19.722Z",

"originalUrl": "/blog/logstash-jdbc-input-plugin",

"host": "server2",

"response_size": 58754,
"status_code": 200,

www.ponybai.com/elastic/instructions/labs.html 100/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"method": "GET",

"runtime_ms": 143,

"geoip": {
"country_code2": "IN",

"country_code3": "IN",

"continent_code": "AS",

"location": {
"lon": 77.5833,

"lat": 12.9833

},

"region_name": "Karnataka",
"city_name": "Bengaluru",

"country_name": "India"

},

"language": {
"url": "/blog/logstash-jdbc-input-plugin",

"code": "en-us"

},

"user_agent": "Amazon CloudFront",


"http_version": "1.1",

"level": "info"

Hide answer:

PUT access_logs_write/doc/1

{
"@timestamp": "2018-03-21T05:57:19.722Z",

"originalUrl": "/blog/logstash-jdbc-input-plugin",

"host": "server2",

"response_size": 58754,
"status_code": 200,

"method": "GET",

"runtime_ms": 143,

"geoip": {
"country_code2": "IN",

"country_code3": "IN",
www.ponybai.com/elastic/instructions/labs.html 101/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

"continent_code": "AS",

"location": {
"lon": 77.5833,

"lat": 12.9833

},

"region_name": "Karnataka",
"city_name": "Bengaluru",

"country_name": "India"

},

"language": {
"url": "/blog/logstash-jdbc-input-plugin",

"code": "en-us"

},

"user_agent": "Amazon CloudFront",


"http_version": "1.1",

"level": "info"

GET logs_server4/doc/1

8. Run the following command, which deletes an index named my_blogs and
then indexes two documents:

DELETE my_blogs

PUT my_blogs/_doc/1
{

"id": "1",

"title": "Better query execution",

"category": "Engineering",
"date":"July 15, 2015",

"author":{

"first_name": "Adrian",

"last_name": "Grand",
"company": "Elastic"

www.ponybai.com/elastic/instructions/labs.html 102/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

PUT my_blogs/_doc/2
{

"id": "2",

"title": "The Story of Sense",

"date":"May 28, 2015",


"author":{

"first_name": "Boaz",

"last_name": "Leskes"

}
}

9. Perform a bulk operation that executes the following operations on the


my_blogs index from the previous step:
Update document 2: set "category" to "Engineering" and set
"author.company" to "Elastic"
Index a new document with "_id" of 3 that contains the following fields
and values:

"title": "Using Elastic Graph",

"category": "Engineering",
"date": "May 25, 2016",

"author": {

"first_name": "Mark",

"last_name": "Harwood",
"company": "Elastic"

Run a search on my_blogs to verify your two bulk operations were both
successful.
Hide answer:

POST my_blogs/_doc/_bulk

{"update": {"_id": 2}}


{"doc": {"category": "Engineering", "author.company":"Elastic"}}
{"index": {"_id": 3}}
www.ponybai.com/elastic/instructions/labs.html 103/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

{"title":"Using Elastic Graph","category":"Engineering","date":"May

GET my_blogs/_search

10. Use _mget to retrieve the documents with the ids 1 and 2 (and type _doc)
from your my_blogs index.
Hide answer:

GET my_blogs/_doc/_mget

{
"docs": [

{"_id":1},

{"_id":2}

]
}

11. Suppose you want to retrieve every blog in the blogs index, but you want to
retrieve the documents in chunks so you do not overwhelm the cluster. Write a
scroll search that retrieves 500 documents from blogs. Sort the documents by
"_doc" to minimize the query's overhead, and set the timeout to 3 minutes.
Hide answer:

GET blogs/_search?scroll=3m

"size": 500,
"query": {

"match_all": {}

},

"sort": [
{

"_doc": {

"order": "asc"

}
}

www.ponybai.com/elastic/instructions/labs.html 104/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

12. Retrieve the next 500 blog documents from your scroll search in the previous
step.
Hide answer:

GET _search/scroll

"scroll": "3m",

"scroll_id": "put_your_scroll_id_here"
}

13. Now you are going to take a snapshot of your blogs index. Stop Elasticsearch
on all three of your servers, but leave the terminal tabs open. (This task will be
easier if you are SSH'd onto each of your three servers in a separate tab.)

14. Notice there is a single folder named /shared_folder that each server can
read/write to. Create a new subfolder of /shared_folder/ called my_repo.
(You only need to run this command on one of the servers):

mkdir /shared_folder/my_repo

15. Add the following path.repo property to the elasticsearch.yml files of


node1, node2 and node3:

path.repo: /shared_folder/my_repo

16. Start Elasticsearch on all three nodes.

17. Within the Kibana Console, register the /shared_folder/my_repo directory


as a repository called my_local_repo.
Hide answer:

www.ponybai.com/elastic/instructions/labs.html 105/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

PUT _snapshot/my_local_repo
{

"type": "fs",

"settings": {

"location": "/shared_folder/my_repo"
}

18. Now that you have a repository configured, you cluster is ready for taking
snapshots. Take a snapshot named blogs_snapshot_1 of the blogs index,
including the cluster state, and saving the snapshot in your new repository.
Hide answer:

PUT _snapshot/my_local_repo/blogs_snapshot_1

"indices": "blogs",
"ignore_unavailable": true,

"include_global_state": true

19. View all of the snapshots in your repository. You should see
blogs_snapshot_1 in the list.
Hide answer:

GET _snapshot/my_local_repo/_all

20. Delete your blogs index. (That's right - delete it!)


Hide answer:

DELETE blogs

21. Restore the blogs index using your snapshot from the previous step. Do not
restore the cluster state (your current cluster state is fine).

www.ponybai.com/elastic/instructions/labs.html 106/107
1/31/2020 Lab Guide: Elasticsearch Engineer I

Hide answer:

POST _snapshot/my_local_repo/blogs_snapshot_1/_restore

"indices": "blogs",
"ignore_unavailable": true,

"include_global_state": false

Summary: In this lab, you defined aliases and index templates, did some bulk
operations, performed a scroll, and backed up an index using snapshots. All of
these tasks are critical skills to have when managing a production Elasticsearch
cluster. Well done! If you have completed every step in every lab, then you should
be well on your way to becoming an Elasticsearch expert, and also well on your way
to becoming an Elastic Certified Engineer. View more details on our website at
elastic.co/training/certification.

End of Lab 10

© Elasticsearch BV 2015-2018. All rights reserved. Decompiling, copying,


publishing and/or distribution without written consent of Elasticsearch BV is strictly
prohibited.

www.ponybai.com/elastic/instructions/labs.html 107/107

You might also like