Professional Documents
Culture Documents
CoreDeveloper-5 5 1
CoreDeveloper-5 5 1
Developer
n
io
is
ec
D
&
s
es
in
us
-B
training.elastic.co
5.5.1
Course: Core Elasticsearch Developer
Version 5.5.1
© 2015-2017 Elasticsearch BV. All rights reserved. Decompiling, copying, publishing and/or distribution without written consent of Elasticsearch BV is
strictly prohibited.
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
‒ set JAVA_HOME environment variable
&
s
es
in
us
-B
‒ do not unzip the lab file into a folder with spaces in the name
ie
st
ba
Se
n
io
is
6 The Distributed Model
ec
D
&
s
es
in
us
8 Aggregations
9-
-0
d
ar
er
9 More Aggregations
G
n
ie
st
ba
Se
10 Handling Relationships
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
3 Text Analysis
4 Mappings
n
6 The Distributed Model
io
Introduction to
is
ec
D
&
s
es 7 Working with Search Results
Elasticsearch
in
us
-B
7
8 Aggregations
1
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• The Story of Elasticsearch
• The Components of Elasticsearch
• Documents
• Indexes
• Indexing Data
n
• Searching Data
io
is
ec
D
&
s
es
• The Bulk API
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
‒ great for full-text search
ec
D
&
s
es
in
‒ challenging to use
G
n
ie
st
ba
Se
n
io
should have”
is
ec
D
&
s
es
in
us
-B
71
20
Shay Banon
p-
Se
9-
Creator of Elasticsearch
-0
d
ar
er
G
n
ie
st
ba
Se
‹#›
The Birth of Elasticsearch
• In 2004, Shay Banon developed a product called Compass
‒ Built on top of Lucene, Shay’s goal was to have search
integrated into Java applications as simply as possible
• The need for scalability became a top priority
• In 2010, Shay completely rewrote Compass with two main
objectives:
n
1. distributed from the ground up in its design
io
is
ec
D
&
s
es
2. easily used by any other programming language
in
us
-B
71
20
• He called it Elasticsearch
p-
Se
9-
-0
d
engine
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 13
distributing without written permission is strictly prohibited
1. Distributed search:
• Elasticsearch is distributed and scales horizontally:
A node is an instance
n
io
Ingest Node
of Elasticsearch Ingest Nodes
is
ec
D
&
s
es
in
A cluster is a
us
-B
collection of
71
Data Nodes
20
HTTP Request
n
io
is
Client
ec
D
REST APIs
&
App s
es
in
us
-B
71
20
p-
Se
Python, Perl, or
G
n
ie
n
• Why the big jump in version number?
io
is
ec
D
&
s
es
‒ The Elastic Stack…
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
Security
Alerting
Kibana
n
Monitoring
io
is
ec
D
&
s
es
Elasticsearch
in
Reporting
us
Graph
9-
-0
d
ar
Beats Logstash
er
G
n
ie
Machine Learning
st
ba
Se
n
io
is
ec
D
&
• Let’s take a look at documents and indexes (indices)… es
in
s
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
• Specifically, a document is a top-level object that is
D
&
s
es
serialized into JSON and stored in Elasticsearch
in
us
-B
71
20
n
io
is
ec
D
&
stocks.csv convert the CSV to JSON
s
es
{
in
us
"volume" : 32203,
-B
71
"high" : 62.83,
20
"stock_symbol" : "PPG",
Se
9-
and values
-0
"low" : 61.46,
d
ar
"close" : 61.86,
er
G
"trade_date" : "2010-05-26T06:00:00.000Z",
n
ie
st
"open" : 62.83
ba
Se
n
io
is
ec
D
&
‒ an index has some type of mechanism to distribute data across a es
s
in
us
cluster
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
‒ logs-2017-01-02
io
is
ec
D
&
es
s
/logs-2017-01-01
‒ logs-2017-01-03
in
us
-B
71
20
‒ and so on /logs-2017-01-02
p-
Se
9-
-0
d
ar
/logs-2017-01-03
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
/my_index1
{
s
"volume": 66605, es
in
"high": 28.86,
us
"stock_symbol": "ALL",
-B
"low": 28.37,
7
"close": 28.82,
1
{
20
"trade_date":
/my_index2
"volume": 46965,
p-
"2009-12-18T07:00:00.000Z",
"high": 31.56,
Se
"stock_symbol": "ALL",
9-
"low": 30.68,
-0
"close": 30.91,
d
ar
{
"trade_date":
er
"volume": 32203,
"2010-01-15T07:00:00.000Z",
G
/my_index3
"high": 62.83,
n
"stock_symbol": "PPG",
ie
st
"low": 61.46,
ba
"close": 61.86,
Se
"trade_date":
"2010-05-26T06:00:00.000Z",
{
"username" : "harrison",
"comment" : "My favorite movie is Star Wars!"
}
{
"username" : "maria",
n
io
is
"comment" : "The North Star is right above my house."
ec
D
}
&
s
es
in
us
-B
{
71
20
"username" : "eniko",
p-
Se
9-
}
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
The name of
G
PUT command
n
ie
n
io
is
ec
D
&
s
es
in
us
{
9-
-0
"username":"harrison",
d
ar
er
}'
st
ba
Se
n
io
is
ec
D
&
s
es
in
curl -XPUT 'http://localhost:9200/my_index/doc/1' -H "Content-
us
-B
{
p-
Se
9-
"username":"harrison",
-0
d
ar
}'
n
ie
st
document is reindexed
n
io
is
ec
D
&
curl -XPOST 'http://localhost:9200/my_index/doc/' -i -H
s
es
in
"Content-Type: application/json" -d '
us
-B
{
71
20
"username" : "maria",
p-
Se
9-
}'
ar
er
G
n
ie
st
ba
Se
{
"_index" : "my_index",
n
io
is
ec
"_type" : "doc",
D
&
"_id" : "AVnWIzyrUxRN673zt1An",
s
es
in
"_version" : 1,
us
-B
"result" : "created",
71
20
"_shards" : {
p-
Se
"total" : 2,
9-
-0
"successful" : 1,
d
ar
"failed" : 0
er
G
n
},
ie
st
ba
"created" : true
Se
n
io
is
ec
D
{
&
s
"_index": "my_index",
es
in
us
"_type": "doc",
-B
7
"_id": "1",
1
20
p-
"_version": 1,
Se
9-
"found": true,
-0
d
"_source": {
ar
er
G
"username": "harrison",
n
ie
st
}
}
n
io
is
ec
D
{
&
s
"username": "harrison",es
in
us
}
1
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
curl -XGET 'http://localhost:9200/my_index/_search' -H
s
es
in
"Content-Type: application/json" -d '
us
-B
{
71
20
"match_all": {}
-0
d
}
ar
er
G
documents
Se
n
{ that “hit” the search criteria
io
is
ec
"_index": "my_index",
D
&
"_type": "doc",
s
es
in
"_id": "AVlBSnbcmM0CN4-9FrtP",
us
-B
"_source": {
p-
Se
"username": "maria",
9-
-0
}
G
n
ie
},
st
ba
...
Se
n
io
is
ec
D
curl -XGET 'http://localhost:9200/my_index/_search' -H
&
s
"Content-Type: application/json" es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT my_index/doc/4/_create
{
Create "username" : "kimchy",
"comment" : "I like search!"
n
}
io
is
ec
D
&
Read es
GET my_index/doc/4
in
s
us
-B
71
POST my_index/doc/4/_update
20
p-
{
Se
9-
"doc" : {
Update
-0
d
ar
}
n
ie
st
}
ba
Se
n
}
io
is
ec
D
&
s
es
in
PUT my_index/doc/6
p-
{
9-
"username" : "maria",
d
ar
indexed.
er
search"
st
ba
}
Se
n
io
is
"_index" : "INDEX_NAME",
ec
D
&
"_type": "TYPE",
s
es
in
"_id" : "ID"
us
-B
},
71
20
{
p-
Se
9-
"_index" : "INDEX_NAME",
-0
d
"_type": "TYPE",
ar
er
G
"_id" : "ID"
n
ie
st
ba
}
Se
]
}'
n
Shells & Cheese 2% Milk"
io
is
{
ec
}
D
&
"_index" : "my_index", },
s
es
in
"_type" : "doc", {
us
-B
"_index": "my_index",
"_id" : 4
71
"_type": "doc",
20
}
p-
"_id": "4",
Se
9-
] "_version": 1,
-0
d
"found": true,
ar
}'
er
G
"_source": {
n
ie
"username" : "kimchy",
st
ba
Se
n
io
• The response is a large JSON structure with the individual
is
ec
D
&
results of each action that was performed es
s
in
us
-B
71
actions
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
application/x-ndjson" -i -d '
ec
D
{"create" : {...}}
&
The line following “create” or “index”
s
es
{DOCUMENT}
in
{"index" : {...}}
71
{DOCUMENT}
20
p-
{"delete" : {...}}
9-
-0
'
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
n
Elasticsearch (using round robin)
io
is
ec
D
&
s
es
invalid JSON or incorrect fix bad request before
in
us
4xx error
-B
exponential backoff)
er
G
n
ie
st
ba
connection
Se
n
io
is
ec
D
• A document can be any JSON-formatted data you want to &
s
es
in
us
-B
n
io
is
ec
D
5. What happens if you attempt to index a new document into &
s
es
in
us
-B
an ID?
er
G
n
ie
st
ba
Se
3 Text Analysis
4 Mappings
n
6 The Distributed Model
io
The Search
is
ec
D
&
s
es 7 Working with Search Results
API
in
us
-B
7
8 Aggregations
1
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• Introduction to the Search API
• URI Searches
• Request Body Searches
• The match Query
• The match_phrase Query
n
• The range Query
io
is
ec
D
&
s
es
• The bool Query
in
us
-B
71
20
• Source Filtering
p-
Se
9-
-0
d
ar
• Relevance
er
G
n
ie
st
ba
Se
• Boosting Relevance
n
io
is
ec
D
‒ URI searches
&
s
es
in
us
-B
n
io
is
ec
D
&
s
es
in
us
-B
n
io
is
ec
‒ They can be handy for a quick curl query
D
&
s
es
in
us
GET index_to_search/_search
n
{
io
is
_search routes to the
ec
"query": {
D
Search API
&
s
es
in
us
}
Se
9-
-0
}
d
ar
er
G
n
ie
st
/_search everything
n
io
is
ec
D
&
s only type-1 docs in index-1
/index-1/type-1/_search es
in
us
-B
71
20
/index-*/_search
G
n
ie
st
ba
Se
n
io
is
ec
‒ range
D
&
s
es
in
us
‒ bool
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
{
is
ec
D
"query": {
&
s
es
"match": {
in
us
-B
"FIELD": "TEXT"
71
}
20
p-
Se
}
9-
-0
}
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
"query": {
D
&
"match": {
s
es
in
}
71
20
p-
}
Se
9-
}
-0
d
ar
er
G
n
ie
st
ba
Se
n
"comment": "My favorite movie is Star Wars!"
io
is
}
ec
D
},
&
s
{ es
in
us
"_index": "my_index",
-B
7
"_type": "doc",
1
20
"_score": 0.5306678,
9-
"_source": {
d
ar
"username": "eniko",
er
G
}
ba
Se
},
{
n
"_id": "1",
io
"query": {
is
ec
"_score": 0.6219729,
D
"match": {
&
"_source": {
s
es
"comment": { "username":
in
us
"harrison",
-B
"comment": "My
1
20
"operator": "and"
p-
}
9-
}
-0
} },
d
ar
er
{
}
G
n
"_index": "my_index",
ie
st
}
ba
"_id": "3",
different syntax
n
“my favorite movie”
io
is
GET my_index/_search
ec
D
{
&
s
es
in
"query": {
us
-B
"match": {
71
20
"comment": {
p-
Se
"minimum_should_match" : 2
ar
er
G
}
n
ie
st
}
ba
Se
}
} Can be an integer or
a percentage
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 70
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Query
The match_phrase
The match_phrase Query
• The match_phrase query is for searching text when you
want to find terms that are near each other
‒ All the terms in the phrase must be in the document
‒ The position of the terms must be in the same relative order
• A match_phrase query can be quite expensive! Use them
wisely.
n
io
GET my_index/_search
is
ec
{
D
&
s
"query": { es
in
us
"match_phrase": {
-B
71
"FIELD": "PHRASE"
20
p-
Se
}
9-
-0
}
d
ar
er
}
G
n
ie
st
ba
Se
I am looking for
“movie star” in the
comments
GET my_index/_search
{
"query": {
n
"match_phrase": {
io
is
ec
"comment": "movie star"
D
&
s
} es
Two things must happen for “movie
in
us
}
-B
}
1
20
p-
Se
GET my_index/_search
"hits": { {
"total": 1, "query": {
"max_score": 0.53066784, "match": {
"hits": [ "comment": {
n
{ "query": "movie star",
io
is
ec
"_index": "my_index", "operator": "and"
D
&
"_type": "doc", }
s
es
}
in
"_id": "3",
us
}
-B
"_score": 0.53066784,
7
}
1
"_source": {
20
p-
Se
"username": "eniko",
9-
-0
"hits": {
ar
"total": 2,
Harrison Ford."
n
ie
"max_score": 0.6219729,
st
}
ba
"hits": [
Se
}
]
GET my_index/_search
{
"query": {
"match_phrase": {
n
io
is
ec
"comment": {
D
&
"query": "favorite star",
s
es
in
"slop" : 1
us
-B
}
Se
9-
}
-0
d
ar
n
io
"_index": "my_index",
is
"match_phrase": {
ec
D
"_type": "doc",
&
"comment": {
s
es
in
"_id": "3",
"query": "favorite star",
us
"_score": 0.33159825,
-B
"slop" : 1 "_source": {
71
20
} "username": "eniko",
p-
Se
}
-0
}
ar
er
Ford."
G
}
n
ie
}
st
ba
}
Se
]
}
GET index/_search
n
{
io
is
ec
"query": {
D
&
s
"range": { es
in
us
"FIELD": {
-B
71
"gte": LOWER,
20
p-
Se
"lte": UPPER
9-
-0
}
d
ar
er
}
G
n
ie
}
st
ba
}
well
n
io
is
"query": { "hits": {
ec
D
"range": {
&
"total": 12829,
s
es
"open": { "max_score": 1,
in
us
-B
} ]
9-
-0
}
d
}
ar
er
G
}
n
ie
st
}
ba
range
n
io
is
"query": {
ec
D
&
"range": {
s
es
in
"trade_date": {
us
-B
"gte": "2009-08",
71
20
"lt": "2009-09"
p-
Se
9-
}
-0
d
}
ar
er
G
}
n
ie
st
ba
}
Se
GET stocks/_search
GET stocks/_search
n
{
io
is
{
ec
"query": {
D
"query": {
&
s
es
in
"range": {
"range": {
us
"trade_date": {
-B
"trade_date": {
7
"gte": "now-1d"
1
20
"gte": "now-1M"
p-
}
Se
}
9-
-0
}
d
}
ar
}
er
G
}
n
}
ie
st
}
ba
Se
n
io
is
ec
now+1d 2016-12-09T12:56:23
D
d days
&
s
es
in
us
h or H hours
-B
now+1h+30m 2016-12-08T14:26:23
71
20
p-
m minutes
Se
9-
-0
s seconds 2017-01-15||+1M
ar
month
er
G
n
ie
st
ba
Se
n
io
is
ec
D
The query should optionally appear in matching
&
should s
es
in
filter
9-
-0
n
io
],
is
Notice the JSON array
ec
"must_not": [
D
&
syntax. You can specify
s
{} es
in
],
-B
{}
p-
Se
9-
],
-0
d
"filter": [
ar
er
{}
G
]
st
ba
Chapter 7
Se
}
}
}
I am looking for
comments from “harrison”
GET my_index/_search that contain “star”
{
"query" : {
"bool" : {
"must" : [
{
"match" : {
"comment" : "star"
}
},
n
io
is
ec
{
D
&
"match" : {
s
es
in
"username" : "harrison"
us
-B
}
71
20
}
p-
Se
]
9-
-0
}
d
ar
}
ie
st
ba
Se
n
io
is
ec
"comment": "wars"
D
&
}
s
es
in
}
us
-B
}
71
20
}
"My favorite movie star in Hollywood is Harrison Ford."
p-
Se
}
9-
-0
n
io
"match": {
is
ec
D
"comment": "star"
&
s
} es
in
us
},
-B
7
"should": {
1
20
p-
"match":{
Se
9-
"comment":"wars"
-0
d
}
ar
er
G
}
n
ie
st
}
ba
Se
}
}
GET my_index/_search
{
"query": {
"bool": {
"must":
{"match": {"comment": "star"}},
"should": [
{"match": {"comment": "favorite"}},
n
{"match": {"comment": "hollywood"}},
io
is
ec
{"match": {"comment": "famous"}},
D
&
{"match": {"username": "harrison"}} s
es
in
us
],
-B
71
"minimum_should_match": "50%"
20
p-
}
Se
9-
-0
}
d
ar
er
}
G
n
ie
st
ba
Se
"hits": [
{
"_index": "my_index",
"_type": "doc", “harrison” + “favorite” >= 50%
"_id": "1",
"_score": 1.6028022,
"_source": {
"username": "harrison",
"comment": "My favorite movie is Star Wars!"
}
},
n
io
{
is
ec
D
"_index": "my_index",
&
s
"_type": "doc", es
in
"_id": "3",
-B
"_score": 1.3930775,
71
20
"_source": {
p-
Se
"username": "eniko",
9-
-0
}
G
n
}
ie
st
ba
]
Se
n
io
is
}
ec
D
},
&
s
es
"should": {
in
us
-B
"match_phrase": {
71
}
9-
-0
}
Hits will include “movie” or
d
ar
}
er
}
n
ie
st
}
Se
GET my_index/_search
{
"query": {
At least one clause
must match for a hit
n
io
"bool": {
is
ec
"should": [
D
&
s
{"match": {"comment": es
in
"north"}},
us
]
-0
d
}
ar
er
}
G
n
ie
}
st
ba
Se
n
],
io
"_score": 1.1174096
is
ec
"should": [
D
"The North Star is right above my house."
&
{
s
es
"bool": {
in
us
"must_not": [
-B
"_score": 1.1174096
1.1682332
7
{
1
20
"match": {
Se
}
d
ar
er
}
G
"_score": 0.13761075
n
]
ie
st
ba
}
]
}
}}
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 93
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Source Filtering
Requesting Specific Fields
• You can request specific fields using the _source
parameter
‒ separate multiple fields by commas
GET stocks/_search?_source=open,close
{
"query": {
"range": {
"open": {
"gte": 50.00,
n
io
is
ec
"lte": 60.00
D
"hits": [
&
}
s
{ es
in
us
} "_index": "stocks",
-B
7
} "_type": "stock_type",
1
20
p-
} "_id": "AVmAGxB992cAvoluWZvm",
Se
9-
"_score": 1,
-0
d
"_source": {
ar
er
G
"close": 57.1,
n
ie
st
"open": 57.58
ba
Se
}
},
GET stocks/_search
{
"_source": ["open", "close"],
n
io
is
"query": {
ec
D
"range": {
&
s
es
"open": {
in
us
-B
"gte": 50.00,
71
20
"lte": 60.00
p-
Se
}
9-
-0
d
}
ar
er
G
}
n
ie
st
}
ba
GET stocks/_search
{
"_source": {
"includes": ["volume","*date*"],
"excludes": ["trade_date_text"]
n
io
},
is
ec
D
"query": {
&
s
"range": { { es
in
us
"_type": "stock_type",
1
"gte": 50.00,
20
p-
"_id": "AVmAGxB992cAvoluWZvn",
Se
"lte": 60.00
9-
"_score": 1,
-0
}
d
"_source": {
ar
er
}
G
"volume": 48749,
n
ie
} "trade_date": "2010-06-10T06:00:00.000Z"
st
ba
Se
} }
},
n
io
is
ec
contain the term, the less important the term is
D
&
s
es
in
us
longer fields
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
number of occurrences
d
ar
er
G
n
ie
st
ba
Se
n
io
GET messages/_search
is
ec
{
D
We discuss function_score and
&
"query": {
s
"function_score": { es
in
"functions": [ other ways to control scoring in our
us
-B
{
Advanced Elasticsearch Developer
7
"script_score": {
1
20
"script": {
course
p-
Se
"lang" : "painless",
9-
"inline" : """
-0
"""
er
G
}
n
ie
}
st
ba
}
Se
]
}
}
}
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
• If boost < 0
&
s
es
in
us
-B
n
}
io
is
ec
},
D
&
{
s
es
in
"match" : {
us
-B
"comment" : {
71
20
"query": "house",
p-
Se
"boost" : 2
9-
-0
}
d
ar
er
}
G
n
}
ie
}
}
}
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 105
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Chapter Review
Summary
• The match query is a simple but powerful search for fields
that are text, numerical values, or dates
• The slop parameter tells how far apart terms are allowed to
be while still considering the document a match
• The range query is for finding documents with fields that fall
in a given range
• A bool query is a combination of one or more of the
n
io
is
ec
following boolean clauses: must, must_not, should and
D
&
s
es
filter
in
us
-B
71
20
parameter
n
io
is
ec
How many of the “should” clauses need to match for a hit?
D
&
s
es
in
us
-B
GET my_index/_search?_source=username
d
ar
er
G
n
ie
st
ba
Se
3 Text Analysis
4 Mappings
n
6 The Distributed Model
io
Text Analysis
is
ec
D
&
s
es
in
7 Working with Search Results
us
-B
7
8 Aggregations
1
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• What is Analysis?
• Building an Inverted Index
• Analyzers
• Custom Analyzers
• Character Filters
n
• Tokenizers
io
is
ec
D
&
s
es
• Token Filters
in
us
-B
71
20
• Defining Analyzers
p-
Se
9-
-0
d
ar
{ Full text
"brandName": "Ahold",
n
io
is
ec
"customerRating": 2,
D
&
"price": 14.15,
Exact values s
es
in
"grp_id": "52903",
us
-B
"quantitySold": 805358,
71
20
"upc12": "688267137853",
p-
Se
}
d
ar
er
G
n
ie
st
ba
Se
n
io
"Ahold Companion Dog Food Kibbles & Munchy Morsels"
is
ec
D
&
s
es
in
us
-B
71
20
p-
dog food
Se
this product
n
ie
st
ba
Se
kibble
n
io
is
ec
D
companion food morsel
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
n
fox 1
io
is
1 The quick brown fox jumps over the lazy dog
ec
D
&
jumping 2
s
es
in
us
jumps 1
-B
lazy 1
p-
Se
9-
-0
over 1
d
ar
er
G
quick 1
n
ie
st
ba
spiders 2
Se
the 1
brown 1
dog 1
fast 2
fox 1
n
io
is
1 The quick brown fox jumps over the lazy dog
ec
jumping 2
D
&
s
es
jumps 1
in
us
-B
lazy 1
1
20
p-
Se
9-
over 1
-0
d
ar
er
quick 1
G
n
ie
st
spiders 2
ba
dog 1
fast 2
fox 1
n
io
is
1 The quick brown fox jumps over the lazy dog
ec
jumping 2
D
&
s
es
in
jumps 1
us
-B
lazy 1
20
p-
Se
9-
-0
over 1
d
ar
er
G
quick 1
n
ie
st
ba
spiders 2
Se
dog 1
fast 2
fox 1
n
io
is
1 The quick brown fox jumps over the lazy dog
ec
jump 1,2
D
&
s
es
lazy 1
in
us
-B
over 1
20
p-
Se
quick 1
9-
-0
d
ar
spider 2
er
G
n
ie
st
ba
Se
dog 1
fast 1,2
fox 1
n
io
is
1 The quick brown fox jumps over the lazy dog
ec
jump 1,2
D
&
s
es
lazy 1
in
us
-B
over 1
20
p-
Se
quick 1,2
9-
-0
d
ar
spider 2
er
G
n
ie
st
ba
Se
token postings
brown 1
a Dog dog 1
dog 1
n
io
is
ec
D
fox 1
&
s
es
in
quick
us
jump 1,2
-B
the quick
2,1
71
20
spiders
p-
spider lazy 1
Se
9-
-0
over 1
d
ar
er
G
n
quick 1,2
ie
st
ba
Se
spider 2
n
io
is
ec
my
D
&
s
es
in
favorit
us
-B
7
char
1
token
tokenizer
20
movi
p-
filters filters
Se
9-
-0
star
d
ar
er
G
n
war
ie
st
ba
Se
GET _analyze
{
"analyzer": "simple",
"text": "My favorite movie is Star Wars!"
n
}
io
is
ec
D
&
s
es
in
my
us
-B
71
20
favorite
p-
Se
9-
movie
d
ar
Wars!"
er
analyzer
G
n
ie
is
st
ba
Se
star
wars
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 126
distributing without written permission is strictly prohibited
Pre-built Analyzers
• standard
• simple
• whitespace
• stop
• keyword
n
• pattern
io
is
ec
D
&
s
es
• language
in
us
-B
71
20
p-
Se
9-
-0
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
Lower Case
D
&
s
es
Tokenizer
in
us
-B
71
20
p-
Se
9-
-0
d
ar
n
io
is
ec
D
&
s
es
Stop
Lower Case
in
us
-B
Token
Tokenizer
71
20
Filter
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Disabled by default
n
io
is
Standard Lower Stop
ec
Standard Tokenizer
D
Token
&
Token Case Filter
s
es
in
Filter Filter
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
GET _analyze
{
"analyzer": "english",
"text": "My favorite movie is Star Wars!"
}
n
io
is
ec
D
&
s
es
in
GET _analyze
us
-B
{
71
20
"analyzer": "french",
p-
Se
}
ar
er
G
n
ie
st
ba
Se
GET _analyze
{
"tokenizer": "keyword",
n
io
is
"char_filter": [ "html_strip" ],
ec
D
&
"text": "<strong>Welcome</strong> to the <em>jungle</em>"
s
es
in
}
us
-B
71
20
p-
Se
9-
"<strong>Welcome</strong>
-0
d
ar
to the <em>jungle</em>"
er
G
n
ie
n
io
is
ec
‒ pattern_replace
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
n
io
"text": "I like to code in C++ and Java."
is
ec
D
}
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
“c++”
d
ar
er
G
n
“cpp”
ie
mapping
st
“C++”
ba
Se
n
io
is
], underscore
ec
D
"tokenizer": "standard",
&
s
es
"text": "123-45-6789"
in
us
-B
}
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
123-45-6789 123_45_6789
ba
pattern_replace
Se
n
io
is
ec
‒ uax_url_email: recognizes URLs/email addresses as tokens
D
&
s
es
in
us
n
io
is
ec
D
&
‒ snowball: stems words using a Snowball-generated stemmer es
in
s
us
-B
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["lowercase", "stop" ], []
"text": "To Be Or Not To Be"
}
GET _analyze
to
n
io
{
is
ec
"tokenizer": "whitespace",
D
be
&
s
"filter": [ "stop", "lowercase" ], es
in
us
}
71
20
p-
not
Se
9-
-0
d
to
ar
er
G
n
ie
st
be
ba
Se
n
io
is
‒ ICU Tokenizer
ec
D
&
s
es
in
‒ Normalization
Se
9-
-0
d
ar
er
‒ Folding
G
n
ie
st
ba
Se
‒ Collation
‒ Transform
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 144
distributing without written permission is strictly prohibited
Example of ICU Tokenizer
• You can use the components of the ICU plugin just like any
other character filter, tokenizer, or token filter
• For example, the following Chinese translates to “Star Wars
is my favorite movie”
‒ The “standard” tokenizer incorrectly splits the sentence into
single characters
n
io
is
星球⼤大战 喜欢
ec
D
&
s
es
in
us
POST _analyze 是 的
-B
71
{
20
p-
我 电影
Se
"tokenizer": "icu_tokenizer",
9-
-0
"text": ["星球⼤大战是我最喜欢的电影"]
d
ar
er
} 最
G
n
ie
st
ba
Se
n
"type" : "custom",
io
is
ec
"char_filter" : ["html_strip"],
D
&
s
"tokenizer" : "standard", es
in
us
}
1
20
p-
}
Se
9-
-0
}
d
ar
}
ie
components of your
st
ba
Se
custom analyzer
n
io
is
ec
},
D
&
"analyzer": {
s
es
in
us
"my_custom_analyzer" : {
-B
"char_filter": ["cpp_filter"],
71
20
p-
"tokenizer" : "standard"
Se
9-
}
-0
d
}
er
G
}
ie
st
ba
}
Se
n
"type" : "stop",
io
is
ec
"stopwords" : ["my", "is"]
D
&
}
s
es
in
},
us
-B
"analyzer": {
71
20
"my_custom_analyzer" : {
p-
Se
"tokenizer" : "standard",
d
ar
er
}
ie
st
ba
}
Se
}
}
n
{
io
is
"start_offset": 10,
ec
"analyzer": "my_custom_analyzer",
D
"end_offset": 18,
&
"text" : "C++ is my favorite language." s
es "type": "<ALPHANUM>",
in
us
} "position": 3
-B
7
},
1
20
p-
{
Se
9-
"token": "languag",
-0
d
"start_offset": 19,
ar
er
"end_offset": 27,
G
n
ie
"type": "<ALPHANUM>",
st
ba
Se
"position": 4
}
]
}
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 150
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
How do we decide
which analyzer to use?
Step 1: Exact value or full text?
• The first question you need to ask is: “Which text fields are
exact values and which fields are full text?”
‒ Exact values do not need to be analyzed
{
"brandName": "Uncle Ben's",
"customerRating": 3,
"price": 8.57,
n
io
is
ec
"grp_id": "26333",
D
&
s
"quantitySold": 514342, es
in
us
"upc12": "054800085453",
-B
7
}
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
char token
tokenizer
us
?
-B
filters filters
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
n
io
is
ec
D
&
‒ Try several options es
in
s
us
-B
n
io
is
• Use the _analyze API to test an analyzer (or the
ec
D
components individually) &
s
es
in
us
-B
71
n
io
is
5. How does the following “my_logfile_analyzer” process the
ec
D
text “192.168.1.201 - jing https://training.Elastic.co/”? &
s
es
in
us
-B
71
20
PUT my_log_files
p-
{
Se
"settings": {
9-
-0
"analysis": {
d
ar
"analyzer": {
er
"my_logfile_analyzer": {
G
n
"tokenizer": "uax_url_email"
ie
st
}
ba
Se
}
}
}
}
3 Text Analysis
4 Mappings
n
6 The Distributed Model
io
Mappings
is
ec
D
&
s
es
in
7 Working with Search Results
us
-B
7
8 Aggregations
1
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• What is a Mapping?
• Dynamic Mappings
• Defining Explicit Mappings
• Adding Fields
• Dive Deeper into Mappings
n
• Specifying Analyzers
io
is
ec
D
&
s
es
• Multi-Fields
in
us
-B
71
20
• Index Templates
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
‒ How the field should be indexed and stored by Lucene s
es
in
us
-B
71
n
io
is
ec
D
{ Every document has
&
s
"_index": "my_index", es
a type
in
us
"_type": "my_type",
-B
71
"_id": "2",
20
p-
Se
"_score": 1,
9-
-0
"_source": {
d
ar
er
"username": "maria",
G
n
ie
}
}
n
io
is
"properties": {
ec
D
define your fields and types here
&
s
es
}
in
us
-B
}
71
20
}
p-
Se
}
9-
-0
d
ar
er
G
n
ie
st
ba
Se
{
GET my_index/_mappings "my_index": {
"mappings": {
"my_type": {
"properties": {
"comment": {
“comment” is a “text” "type": "text",
},
"username": {
"type": "keyword"
“username” is a “keyword”
n
io
is
}
ec
D
}
&
s
es
in }
us
}
-B
}
71
20
}
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
command
io
PUT comments
is
ec
D
{
&
s
"mappings": { es
in
us
}
1
20
p-
}
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
‒ boolean
&
s
es
in
us
-B
text
n
io
is
ec
PUT comments/comment_type/1
D
&
s text
{ es
in
us
"user" : "Ricardo",
-B
71
"comment_time" : "2016-11-06T15:31:18-07:00",
9-
-0
"rating" : 5
d
ar
er
} date
G
n
ie
st
long
ba
Se
Sometimes it conveniently
guesses what you want…
PUT comments/comment_type/2
{ "num_of_views": {
n
io
is
"type": "long"
ec
"num_of_views" : 430
D
},
&
}
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT comments/comment_type/3
{ "comment_time": {
n
"type": "date"
io
"comment_time" : "2016-11-06"
is
ec
} }
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
"average_length": {
PUT comments/comment_type/4 "type": "text",
{
n
"fields": {
io
is
"average_length" : "24.8"
ec
"keyword": {
D
&
} "type": "keyword",
s
es
in
"ignore_above": 256
us
-B
}
71
20
}
p-
Se
}
9-
-0
d
ar
er
G
n
ie
st
ba
for “average_length”
n
io
is
ec
4. Create your index, using your explicit mappings
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT temp_comments/comment_type/1
{
n
io
"user" : "Ricardo",
is
ec
"comment" : "Best movie I've ever seen! :)",
D
&
s
"comment_time" : "2016-11-06T15:31:18-07:00", es
in
us
"rating" : 5
-B
71
}
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
GET temp_comments/_mapping
{
"temp_comments": {
"mappings": {
"comment_type": {
"properties": {
"comment": {
"type": "text",
n
"fields": {
io
is
ec
"keyword": {
D
&
s
"type": "keyword", es
in
us
"ignore_above": 256
-B
71
}
20
p-
Se
}
9-
-0
},
d
ar
er
"comment_time": {
G
n
ie
"type": "date"
st
ba
Se
},
"rating": {
"type": "long"
n
io
"fields": { },
is
ec
"user": {
D
"keyword": {
&
s
es "type": "keyword"
"type": "keyword",
in
us
}
-B
"ignore_above": 256
71
20
}
p-
Se
}
9-
-0
}
d
“keyword”
n
ie
st
ba
Se
n
},
io
is
ec
"comment_time": {
D
&
"type": "date"
s
es
in
},
us
-B
"rating": {
71
20
"type": "byte"
p-
Se
},
9-
-0
d
"user": {
ar
er
G
"type": "keyword"
n
ie
}
st
ba
Se
}
}
}
n
io
}
is
ec
D
}
&
s
es
in
us
-B
71
GET comments/_search
20
p-
Se
{
9-
-0
"query": {
d
ar
er
"match": {
G
n
ie
}
}
}
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 179
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
mapping?
Can you change a
Can you change a mapping?
• No - not without reindexing your documents
‒ You can add fields and modify a few properties, but…
‒ …all other schema changes require a reindexing
• Why not?
‒ If you switch a field’s data type, all existing documents with that
field already indexed would become unsearchable on that field
n
io
is
• Why can I add fields without reindexing?
ec
D
&
s
es
in
us
documents
20
p-
Se
9-
-0
‒ The index can freely grow, but indexed fields can not change
d
ar
er
G
data types
n
ie
st
ba
Se
n
io
is
ec
D
&
PUT comments
s
es
in
{
us
-B
"mappings": {
71
20
"comment_type": {
p-
Se
9-
"dynamic" : "false",
-0
"properties": {
ar
er
G
}
Se
n
io
PUT comments/_mapping/comment_type
is
ec
D
{
&
s
"properties": { es
in
us
"comment_location" : {
-B
71
"type" : "geo_point"
20
p-
Se
}
9-
-0
}
d
ar
er
}
G
n
ie
st
ba
Se
PUT comments/comment_type/2
{
"user" : "Maria",
"comment" : "Good morning everyone",
"comment_time" : "2017-05-15T07:14:39",
"rating" : 4 “rating” is a single value
n
io
is
in this document
ec
}
D
&
s
es
in
us
-B
71
20
p-
POST comments/comment_type/2/_update
Se
9-
{
-0
d
"doc": {
ar
er
"rating" : [4,3,5]
n
ie
st
}
ba
Se
n
io
is
ec
D
PUT comments/comment_type/5
&
s
es
{
in
us
-B
"comment_time" : "2016-11-06T15:31:18-07:00"
71
20
}
p-
Se
9-
-0
d
ar
er
"comment_time": {
G
n
ie
"type": "date"
st
ba
Se
},
{
"mappings": {
"comment_type": {
"properties": {
"last_viewed" : {
"type": "date",
"format": "dd/MM/YYYY" Use || for multiple formats
},
"comment_time" : {
n
io
"type": "date",
is
ec
D
"format" : "basic_date||epoch_millis"
&
s
es
}
in
us
-B
}
71
20
}
p-
Se
}
9-
-0
formats
st
ba
Se
n
io
is
ec
D
&
{
s
es "error": {
in
us
"root_cause": [{
-B
"type": "mapper_parsing_exception",
71
}
Se
PUT comments/comment_type/5 ],
9-
-0
"type": "mapper_parsing_exception",
{
d
ar
"type": "illegal_field_value_exception",
ie
}
st
ba
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT comments
n
io
{
is
ec
D
"mappings": { disable _all for for
&
s
es
"comment_type": { comment_type
in
us
-B
"_all": {
71
20
"enabled": false
p-
Se
},
9-
-0
"properties": {
d
ar
er
"comment": {
G
n
ie
"type": "text",
st
ba
Se
},
...
PUT comments/_mapping/comment_type
{
n
"_meta" : {
io
is
ec
"comment_type_version" : "2.1"
D
&
s
} es
in
us
}
-B
71
20
p-
Se
9-
-0
JSON here…
er
G
n
ie
st
ba
Se
"properties": {
"my_password" : { “my_password” will not
be indexed
n
"type": "keyword",
io
is
ec
"index": false
D
&
},
s
es
in
"my_rating" : {
us
-B
"type": "integer",
71
20
"null_value": 1
p-
}
9-
}
d
ar
n
io
is
]
ec
D
}
&
s
} es
in
}
us
-B
},
71
"mappings": {
20
p-
"comment_type": {
Se
"properties": {
9-
-0
"comment": {
d
ar
"analyzer": "my_analyzer"
to use “my_analyzer”
n
ie
st
}
ba
Se
}
}
}
}
lik
like
n
io
is
ec
D
ela
&
s
es
in
us
-B
elas
71
20
p-
Se
9-
elast
-0
d
ar
er
G
elasti
n
ie
st
ba
Se
elastic
n
io
...
is
ec
D
"mappings": {
&
"comment_type": { es
in
s
Search terms will use the
us
"comment": {
1
20
p-
"type": "text",
Se
9-
"analyzer": "my_analyzer",
-0
d
"search_analyzer": "standard"
ar
er
G
}
n
ie
st
}
ba
Se
}
}
}
n
"keyword": {
io
is
ec
"type": "keyword",
D
&
s
"ignore_above": 256 es
in
us
},
-B
7
"english_comment" : {
1
20
p-
"type" : "text",
Se
9-
-0
"analyzer": "english"
d
ar
},
er
G
n
"whitespace_comment" : {
ie
st
ba
"type" : "text",
Se
"analyzer": "whitespace"
}
PUT comments/comment_type/1
{ “Hello, world!” gets indexed
"comment" : "Hello, world!" four different ways
}
GET comments/_search
n
io
is
{
ec
D
"query": {
&
s
es
in
"match": {
us
-B
"comment.english_comment": "hello"
71
20
}
p-
Se
}
9-
-0
d
}
ar
er
G
n
ie
st
ba
n
io
is
ec
1. The default settings and mappings are applied
D
&
s
es
in
us
applied
20
p-
Se
9-
-0
4. The setting and mappings from the PUT command of the index
are applied last and override all previous templates
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 201
distributing without written permission is strictly prohibited
Defining a Template
• Use the _template endpoint to add, view and delete
templates
name of the template
PUT _template/my_template_1
{
"template" : "*", Apply this template to
"order" : 1, any new index
"mappings": {
"my_type" : {
"_all": {
n
io
is
"enabled": false
ec
D
&
}
s
es
in
}
us
-B
}
71
20
}
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT _template/my_template_2
{
Apply this template to
"template" : "my_*",
"order" : 5,
any new index that
"mappings": { starts with “my_”
"my_type" : {
"properties": {
"version" : {
n
"type" : "integer"
io
is
ec
}
D
&
s
} es
in
us
}
-B
7
}
1
20
p-
}
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT my_test
GET my_test/_mappings
{
"my_test": {
"mappings": {
"my_type": {
"_all": {
n
io
is
"enabled": false
ec
D
},
&
s
es "properties": {
in
us
"version": {
-B
7
"type": "integer"
1
20
p-
}
Se
9-
}
-0
d
ar
}
er
G
}
n
ie
st
}
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
• Templates are a great place to define your custom
io
is
ec
D
analyzers es
in
&
s
us
-B
n
io
is
including the adding of fields.
ec
D
&
s
es
in
query strings
9-
-0
d
ar
er
G
n
“integer” to “long” because those two types are compatible.
io
is
ec
D
&
s
es
5. In the following mapping, which analyzer would get used on
in
us
-B
"my_field": {
d
ar
er
"type": "text",
G
n
ie
"analyzer": "my_custom_analyzer",
st
ba
"search_analyzer": "english"
Se
3 Text Analysis
4 Mappings
n
6 The Distributed Model
io
More Search
is
ec
D
&
s
es 7 Working with Search Results
Features
in
us
-B
7
8 Aggregations
1
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• Filters
• Index Aliases
• The term Query
• More Term-level Queries
• The multi_match Query
n
• Fuzziness
io
is
ec
D
&
s
es
• Highlighting
in
us
-B
71
20
• Synonyms
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
"canola and/or sunflower oil",
is
ec
D
"salt"
&
s
], es
in
"brand_name": "Tostitos",
us
-B
"details": {
p-
Se
"total_fat": 6,
9-
"calories_from_fat": 50,
-0
d
"calories": 140,
ar
er
G
"saturated_fat": 1
n
ie
},
st
ba
“Does this
n
“How well does this
io
is
document match
ec
D
document match
&
es
s this query?”
this query?”
in
us
-B
71
20
p-
Se
9-
“Here’s
-0
“Yes”
d
ar
the _score”
er
G
n
ie
GET my_index/_search
{ The _score will be the result
"query": { of this match query only
"bool": {
"must": {
"match": {
"comment": "star"
}
n
io
is
Hits must have “favorite”
ec
},
D
&
"filter": { in the “comment” field
s
es
in
us
"match": {
-B
"comment": "favorite"
71
20
p-
}
Se
9-
}
-0
d
ar
}
ie
st
}
Se
POST _aliases
{
"actions": [
{
n
io
"add": {
is
ec
"index": "INDEX_NAME",
D
&
s
"alias": "ALIAS_NAME" es
in
}
us
-B
}
71
20
]
p-
Se
}
9-
-0
d
ar
er
G
n
ie
st
ba
Se
POST _aliases
{
"actions": [
{
"add": {
"index": "my_index",
"alias": "my_alias"
}
}
]
n
}
io
is
ec
D
&
GET my_alias/_search
s
es
in
{
us
"query": {
71
“my_index”
20
"match_all": {}
p-
Se
}
9-
-0
}
d
ar
er
G
n
ie
st
ba
Se
POST _aliases
{
"actions": [
{
"add": {
"index": "customers",
"alias": "active_customers",
n
io
is
"filter": {
ec
D
“match": {
&
s
es
"customer_status": "active"
in
us
}
-B
71
}
20
p-
}
Se
9-
}
-0
d
]
er
G
n
io
is
ec
D
&
• Term-level queries are queries that do not go through an s
es
in
us
-B
analysis phase
71
20
p-
Se
9-
‒ for searching numerics, dates, and string values that are exact
-0
d
ar
er
G
n
n
io
}
is
ec
D
The nutrition index has 4 hits
&
s
es
in
us
-B
7
GET nutrition/_search
1
20
“Find
p-
{
Se
"query": {
-0
Market.”
d
"term": {
ar
er
G
"brand_name.keyword" : "Market"
n
ie
st
}
ba
Se
}
}
This search returns 0 hits
n
io
is
ingredients”
ec
"ingredients": "syrup"
D
&
}
s
es
in
}
us
-B
],
71
20
"filter": {
p-
Se
"term": {
9-
-0
}
er
G
n
}
ie
st
ba
}
Se
n
io
is
ec
• You do not want the search string analyzed
D
&
term query
s
es
• You want scoring
in
us
-B
71
20
p-
Se
9-
n
"brand_name": "365 Everyday Value",
io
],
is
"item_name": "Feta Cheese"
ec
D
"should": [
&
s
{ "brand_name": "Kraft", es
in
"term": {
us
"brand_name.keyword": {
71
…and so on
20
"value": "Giant"
p-
Se
}
9-
}
-0
d
}
ar
er
G
]
n
ie
}
st
}
Se
n
io
],
is
ec
"filter": {
D
&
s
"terms": { es
in
"brand_name.keyword": [
us
-B
"Market Basket",
71
20
]
d
terms
ar
er
}
G
n
ie
}
st
ba
Se
}
}
}
n
I am looking for cookies
io
GET nutrition/_search
is
ec
{
D
from a brand name that starts
&
s
"query": { es
with “Ste”.
in
us
"bool": {
-B
"must":
71
20
"filter": {
-0
d
"wildcard": {
ar
er
"brand_name.keyword": {
G
n
"value": "Ste*"
st
ba
}
} from “Stewart’s” brand.
}}}}
n
"match": {
io
is
ec
"item_name": "cookie" cookies that have their
D
&
} ingredients listed.”
s
es
in
}
us
-B
],
71
20
"filter": {
p-
Se
"exists": {
9-
-0
"field": "ingredients"
d
ar
er
}
G
n
ie
}
st
ba
}
Se
}
}
n
io
is
"query": "",
ec
D
"fields": []
&
s
} es
in
us
}
-B
7
}
1
20
search
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
"fields": [
is
ec
D
"item_name",
&
s
"ingredients" es
in
us
]
-B
}
1
20
p-
}
Se
9-
…and so on
GET nutrition/_search
{
"query": { The “ingredients” score is 2 times more
"multi_match": { important than “item_name”
"query": "olive oil",
"fields": [
n
io
"item_name",
is
ec
"ingredients^2"
D
&
s
] es
in
us
}
-B
}
71
20
}
p-
Se
GET nutrition/_search
{
"query": {
"multi_match": {
"query": "rice",
"fields": [
n
Search any field that ends in “name”
io
"*name",
is
ec
D
"ingredients"
&
s
] es
in
us
}
-B
7
}
1
20
p-
}
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
"query": {
ec
D
"multi_match": {
&
s
"query": "rice", es
in
us
"fields": [
-B
7
"ingredients"
Se
9-
],
-0
d
ar
"type": "most_fields"
er
G
}
n
ie
st
}
ba
Se
n
‒ It refers to the number of character modifications, known as 'edits',
io
is
ec
D
to make two words match
&
s
es
fuzziness = 2
in
us
-B
fuzziness = 1
71
20
“nervan”
p-
Se
9-
-0
“the beetles” e i
d
ar
er
G
n
ie
st
e a a
ba
Se
115 hits
GET nutrition/_search
{
"query" : {
"match": { 115 hits
GET nutrition/_search
"ingredients" : "sugar" {
} "query" : {
n
io
}
is
"match": {
ec
D
}
&
"ingredients" : {
s
es
"query" : "suger",
in
us
-B
0 hits "fuzziness": 1
71
20
GET nutrition/_search }
p-
Se
{ }
9-
-0
"query" : { }
d
ar
}
er
"match": {
G
n
"ingredients" : "suger"
ie
st
ba
}
Se
}
}
GET nutrition/_search
n
io
{
is
ec
"query": {
D
&
s
"match": { es
in
61 hits, even though “J & Ds”
"brand_name": {
us
"fuzziness": 2
p-
Se
}
9-
-0
}
d
ar
er
}
G
n
ie
}
st
ba
Se
n
GET nutrition/_search
io
is
Only 2 hits
ec
{
"J & Ds"
D
&
"query": { with “auto” s
es
in
"match": {
us
"J. Skinner"
-B
"brand_name": {
71
20
"fuzziness": "auto"
9-
-0
}
d
ar
er
}
G
n
}
ie
st
ba
}
Se
122 hits
GET nutrition/_search
120 hits {
n
io
GET nutrition/_search
is
"query" : {
ec
{
D
"match": {
&
s
"query": { es
in
"ingredients" : {
us
"fuzziness": 2
1
20
}
p-
}
Se
9-
} }
-0
}
d
ar
}
er
G
}
n
ie
st
ba
Se
GET my_index/_search
{
"query" : {
...
n
io
},
is
ec
D
"highlight": {
&
s
"fields": { es
in
us
...
-B
7
}
1
20
p-
}
Se
9-
}
-0
highlighted
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
“highlight” section
Se
"highlight": {
9-
-0
"ingredients": [
d
ar
er
"vegetable <em>oil</em>",
G
n
ie
]
}
n
io
is
"fields": { "fields": {
ec
D
&
"ingredients" : {}, "ingredients" : {},
s
es
"item_name" : {} "item_name" : {}
in
us
-B
} }
71
} }
20
p-
Se
} }
9-
-0
d
ar
er
G
"highlight": { "highlight": {
n
ie
st
} "item_name": [...]
}
“item_name” not here
"highlight": {
"pre_tags" : ["<elasticsearch-hit>"],
"post_tags" : ["</elasticsearch-hit>"],
"fields": {
"ingredients" : {},
"item_name" : {}
} Results highlighted with your
n
io
} custom tags
is
ec
D
&
s
es
in
us
-B
"highlight": {
71
20
"ingredients": [
p-
Se
"grapeseed <elasticsearch-hit>oil</elasticsearch-hit>"
9-
-0
],
d
ar
"item_name": [
er
G
n
"Grapeseed <elasticsearch-hit>Oil</elasticsearch-hit>"
ie
st
ba
]
Se
n
io
“US”
is
ec
D
&
synonym
s
es
in
filter
71
20
p-
“United States of
Se
9-
-0
America”
d
ar
er
G
n
ie
st
ba
Se
Comma-separated list
If any of the terms is
us, usa, united states of america
won’t, will not
encountered, it is replaced
by all the terms
n
io
is
ec
D
&
s
es
in
us
right
er
G
n
ie
st
ba
Se
n
io
is
ec
D
“queen” gets replaced (expanded):
&
s
es
in
us
-B
n
io
is
ec
D
&
s
es
Query time:
in
us
-B
71
20
GET test_index/_search
p-
Se
{
9-
-0
"query": {
d
ar
er
"match": {
Becomes a search
G
n
for “ibm”
ba
}
Se
}
}
n
io
‒ But it only works at query time, so that might answer your
is
ec
D
question
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
}
]
List the synonyms in the
}, mapping (or point to a
"analyzer": {
"my_analyzer": { text file)
"tokenizer": "standard",
n
io
is
"filter": ["lowercase","my_synonyms"]
ec
D
}
&
}
s
es
in
}
us
},
-B
7
"mappings": {
1
20
"doc": {
p-
Se
"properties": {
9-
-0
"comment": {
d
"type": "text",
ar
er
"search_analyzer": "my_analyzer"
ie
}
Se
}
}
}
}
"filter" : {
"my_synonyms" : {
"type" : "synonym_graph",
"synonyms_path" : "/path/to/file.txt"
}
}
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT test_index/doc/1
{
"comment" : "I won't be in the US in July" Because we are using
} synonym_graph, no
expanding or contracting
happens at index time
n
io
is
ec
PUT test_index/doc/2
D
&
s
{ es
in
us
}
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
}
is
ec
"hits": [
D
&
{
s
es
in
"_index": "test_index",
us
-B
"_type": "doc",
71
"_id": "1",
20
p-
"_score": 8.396978,
Se
9-
"_source": {
-0
d
in July"
G
n
ie
}
st
ba
}
Se
n
}
io
is
ec
"hits": [
D
&
{
s
es
in
"_index": "test_index",
us
-B
"_type": "doc",
71
"_id": "2",
20
p-
"_score": 0.7081689,
Se
9-
"_source": {
-0
d
$0.23 a share"
G
n
ie
}
st
ba
}
Se
PUT test_index/doc/3
{
"comment" : "We won't meet the Queen of
England in the United States of America"
}
The only difference is
PUT test_index/doc/4
“queen” and “monarch"
n
io
{
is
ec
"comment" : "We won't meet the Monarch of
D
&
s
England in the United States of America" es
in
us
}
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
"hits": [
{
"_index": "test_index",
GET test_index/_search "_type": "doc",
{ "_id": "3",
"_score": 0.9792457,
"query": { "_source": {
n
io
"match": {
is
"comment": "We won't meet the Queen
ec
D
"comment": "queen" of England in the United States of
&
s
es America"
}
in
}
us
}
-B
},
7
} {
1
20
"_index": "test_index",
p-
Se
"_type": "doc",
9-
-0
"_id": "4",
Both documents are hits, and
d
"_score": 0.9792457,
ar
er
"_source": {
G
America"
}
}
]
n
“monarch”
io
is
}
ec
D
},
&
s
{ es
in
us
"match": {
-B
"comment": {
71
20
"query": "queen",
Search only for “queen”
p-
Se
"analyzer": "standard"
9-
-0
}
d
ar
}
er
G
}
n
ie
st
]
ba
Se
}
}
}
n
io
"comment": "We won't meet the Queen of England
is
ec
in the United States of America"
D
&
}
s
es
in
},
us
-B
{
71
"_index": "test_index",
20
p-
"_type": "doc",
Se
9-
"_id": "4",
-0
d
"_score": 0.9792457,
ar
er
"_source": {
G
n
ie
}
}
]
n
io
is
• Highlighting allows you to highlight search results that match
ec
D
&
the query es
s
in
us
-B
71
text file
9-
-0
d
ar
er
G
3. What is the easiest way to search for the same terms across multiple
fields?
4. What is the difference between a full-text query and a term-level query?
n
io
is
ec
D
&
5. Suppose we are searching StackOverflow. What is the difference using es
in
s
us
GET stackoverflow/_search
Se
9-
{
-0
"query": {
d
ar
er
"multi_match": {
G
n
"query": "java",
ie
st
ba
}
}
}
3 Text Analysis
4 Mappings
n
io
The Distributed
is
ec
D
&
s
es 7 Working with Search Results
Model
in
us
-B
7
8 Aggregations
1
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• Starting a Node
• Creating an Index
• Starting a Second Node
• Shards: Distribution of an Index
• Distributing Documents
n
• Replication
io
is
ec
D
&
s
es
• Split Brain
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
‒ and persisted in the data folder for future restarts es
in
s
us
-B
71
20
p-
Se
9-
adam_01
-0
d
ar
er
bin/elasticsearch -E node.name=adam_01
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
bin/elasticsearch -E node.name=adam_01 -E cluster.name=training_cluster es
in
us
-B
71
20
training_cluster
p-
Se
9-
-0
d
adam_01
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
training_cluster
D
&
s
es
in
adam_01
us
-B
71
20
clusterState0
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
“I am adam_01, Master
of the Universe!”
n
io
is
ec
training_cluster
D
&
s
es
in
adam_01
us
-B
71
20
clusterState0
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
“OK, so maybe I am
not the master, but I am
n
io
is
ec
training_cluster
eligible.”
D
&
s
es
in
adam_01
us
-B
71
20
clusterState0
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
“I guess I am the only
is
ec
training_cluster
D
one eligible, so I win the
&
s
es
election!”
in
adam_01
us
-B
71
20
clusterState0
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
training_cluster
n
io
is
ec
D
adam_01
&
s
es
in
us
-B
“I am ready for
71
20
some data!”
p-
Se
data allocation
9-
-0
d
ar
er
G
n
ie
st
clusterState0
ba
Se
PUT my_index
{
"settings": {
training_cluster
...
n
io
is
}
ec
D
} adam_01
&
s
es
in
us
-B
7
Client
1
20
p-
data allocation
Se
App
9-
-0
d
ar
er
G
n
ie
st
clusterState0
ba
Se
n
io
training_cluster
is
ec
D
&
s
PUT my_index es adam_01
in
us
-B
7
Client
1
20
p-
Se
App
9-
-0
data allocation
d
ar
er
G
n
ie
st
ba
Se
clusterState0
n
io
training_cluster
is
ec
D
&
s
PUT my_index es adam_01
in
us
-B
7
Client
1
20
p-
Se
App
9-
my_index
-0
d
ar
er
G
data allocation
n
ie
st
ba
Se
200 OK clusterState1
n
io
is
training_cluster
ec
D
princess_10
&
s
es
adam_01
in
us
-B
71
20
p-
Se
data allocation
9-
-0
my_index
d
ar
er
G
n
data allocation
ie
st
ba
Se
clusterState1
n
io
is
training_cluster
ec
D
princess_10 “Hi adam_01.
&
s
es
May I join adam_01
in
us
-B
training_cluster?”
71
20
p-
Se
data allocation
9-
-0
my_index
d
“Sure,
ar
er
G
data allocation
ie
st
ba
training_cluster!”
Se
clusterState1
n
io
is
training_cluster
ec
D
&
s
es
princess_10 adam_01
in
“I am the
us
-B
is the cluster
p-
Se
9-
state”
-0
my_index
data allocation
d
ar
er
G
n
data allocation
ie
st
ba
Se
clusterState2 clusterState2
training_cluster
princess_10 adam_01
n
io
is
ec
D
&
s
es
in
my_index
us
-B
data allocation
71
20
p-
data allocation
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
training_cluster
io
is
ec
D
&
s
princess_10 es
in
adam_01
us
-B
7
1
20
p-
Se
my_in dex
9-
-0
d
ar
er
n
io
is
training_cluster
ec
D
&
s
es
princess_10 adam_01
in
us
-B
71
20
p-
my_index
9-
indexed
er
G
n
training_cluster
clusterState3 contains the
shard allocation table
n
princess_10 adam_01
io
is
ec
D
&
s
es
in
us
-B
assigned to her
er
clusterState3 clusterState3
G
n
ie
st
ba
Se
PUT my_index
n
io
is
{
ec
D
"settings": {
&
s
es
"number_of_shards": 6
in
us
-B
}
71
20
}
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
PUT my_index/doc/101
io
is
training_cluster
ec
{
D
&
"firstname" : "Robert",
s
es
princess_10 adam_01
in
"lastname" : "Smith",
us
-B
"city" : "Lancashire"
71
20
}
p-
Se
9-
-0
d
ar
er
G
n
Client
st
ba
Se
App
clusterState3 clusterState3
201 response
n
io
"firstname" : "Robert",
is
ec
"lastname" : "Smith", training_cluster
D
&
"city" : "Lancashire"
s
es
} adam_01
in
princess_10
us
-B
71
20
p-
Se
9-
Client
-0
d
ar
App
er
G
n
data allocation
ie
data allocation
st
ba
Se
n
io
data allocation
is
ec
D
&
Client s
es
clusterState3
in
us
App 4. delete
-B
71
20
p-
5. index
Se
9-
7. 200 response
-0
6. ok
d
ar
er
G
n
data allocation
ie
st
ba
Se
clusterState3
n
io
data allocation
is
ec
D
&
Client s
es
clusterState3
4. get
in
us
App
-B
71
20
5. delete
p-
Se
9-
8. 200 response
-0
7. ok 6. index
d
ar
er
G
n
data allocation
ie
st
ba
training_cluster
princess_10 adam_01
n
io
is
ec
D
&
clusterState3 clusterState3
s
es
in
us
-B
71
20
p-
PUT my_index/_settings
n
{
io
is
training_cluster
ec
"number_of_replicas": 1
D
&
s
} es
princess_10 adam_01
in
us
-B
71
20
p-
Se
Client
9-
-0
App
d
ar
er
G
n
clusterState4 clusterState4
princess_10 adam_01
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
clusterState5 clusterState5
training_cluster
adam_01
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
data allocation
ie
st
ba
Se
clusterState5
training_cluster
adam_01
n
io
is
ec
D
&
s
es
in
us
promoted to a primary,
20
p-
Se
updated
d
ar
er
G
n
data allocation
ie
st
ba
Se
clusterState6
princess_10 adam_01
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
clusterState7 clusterState7
training_cluster
princess_10 adam_01
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
clusterState8 clusterState8
n
io
is
training_cluster
ec
D
&
s
es
princess_10 adam_01
in
us
-B
71
20
p-
Se
9-
3 2 3 2
-0
d
ar
er
G
n
ie
1
st
0 1 0
ba
Se
n
io
is
training_cluster
ec
D
&
princess_10 s adam_01
es
in
us
2. index to P2
-B
71
20
p-
Se
3. replicate to R2
3 2 3 2
9-
-0
d
ar
er
4. replicate OK
G
n
1
ie
0 1 0
st
ba
Se
5. index OK
n
io
should be written to
is
"_id" : "202",
ec
D
"_version" : 1,
&
s
"result" : "created", es
in
us
"successful" : 1,
Se
9-
"failed" : 0
-0
d
ar
},
“failed” is an array of errors if
er
G
"created" : true
n
}
ba
Se
n
io
is
training_cluster
ec
D
&
s
es
princess_10 adam_01
in
us
-B
71
20
p-
Se
9-
3 2 3 2
-0
d
ar
er
G
n
ie
1
st
0 1 0
ba
Se
n
io
is
ec
training_cluster
D
&
s
es
in
princess_10 adam_01
us
-B
7
Client
1
20
p-
App
Se
9-
-0
3 2 3 2
d
ar
er
G
n
ie
st
ba
1 0 1 0
Se
Client
App
1. delete 7. response
training_cluster
n
io
adam_01
is
princess_10
ec
6. deleted
D
&
s
es
in
5. delete OK
us
-B
3 2 3. delete
7
2
1
3
20
4. delete replica
p-
Se
9-
-0
1 0
d
1
ar
0
er
G
n
ie
st
ba
Se
2. reroute
n
io
is
ec
training_cluster
D
&
s
es
in
us
“Yay - I
-0
“I am the
d
ar
get to be the
er
master!”
G
n
master!”
ie
st
ba
Se
Client
App Inconsistent cluster
states makes it basically
impossible to recover
from a split brain
n
io
is
ec
training_cluster
D
&
s
es
in
us
clusterState4
st
clusterState10 clusterState10
ba
Se
n
io
is
minimum_master_nodes is 2
ec
training_cluster
D
&
s
es
in
us
“I vote for
ar
which makes a
G
adam_01.”
n
ie
quorum.”
st
ba
Se
n
io
index after it has been created
is
ec
D
&
s
es
in
n
4. If number_of_shards for an index is 4, and
io
is
ec
D
number_of_replicas is 2, how many total shards will exist for
&
s
es
in
this index?
us
-B
71
20
p-
3 Text Analysis
4 Mappings
n
io
Working with
is
ec
D
7 Working with Search Results
&
s
es
Search Results
in
us
-B
7
8 Aggregations
1
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• Segments
• The Anatomy of a Search
• DFS Query-then-fetch
• Sorting Results
• Doc Values and Fielddata
n
• Sorting by Strings
io
is
ec
D
&
s
es
• Pagination
in
us
-B
71
20
• Scroll Searches
p-
Se
9-
-0
d
ar
n
io
is
ec
D
&
s
es
3 9 4 8
in
us
-B
71
20
p-
Se
1 7 2 6
9-
-0
d
ar
er
G
n
ie
st
ba
Se
9 7
my_field_food_name
n
io
apple 7,9
is
ec
banana 7
D
&
carrot 9
s
es
in
dill 7,9
us
-B
71
20
9
p-
Se
my_field_color
9-
-0
d
ar
er
green 7,9
7
G
n
orange 9
ie
st
ba
red 7,9
Se
yellow 7
n
large inverted index and too many small ones
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
7
POST my_index/_refresh
1
20
p-
Se
9-
-0
d
ar
er
G
n
n
io
is
ec
‒ true: refresh the affected primary and replica shards so that the
D
&
s
es
changes are visible immediately
in
us
-B
71
20
occur
-0
d
ar
er
G
n
ie
st
ba
Se
PUT my_index/doc/102/?refresh=wait_for
{
n
io
is
"firstname" : "James",
ec
D
&
"lastname" : "Brown",
s
es
in
"address" : "6011 Downtown Lane",
us
-B
"city" : "Detroit"
71
20
}
p-
Se
segment 1
segment 1
segment 2
segment 2
n
io
is
ec
GET my_index/_search
D
&
{
s
es
“query” : {
in
node 2
us
…
-B
7
}
1
20
}
p-
Se
9-
-0
segment 1
d
ar
er
segment 1
G
coordinating node
n
ie
st
segment 2
ba
Se
segment 2
production_cluster
n
io
is
ec
D
&
s
es
in
node 1 node 4
us
-B
Client
71
20
node 2 node 5
p-
Se
9-
-0
node 3 node 6
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
1. Query phase
&
s
es
in
us
-B
2. Fetch phase
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
node 1
Replica
Coordinating node node 2 3
4. Each selected shard
node 3 executes the query
n
io
node 4
is
ec
D
Replica
&
node 5
s
es
1 Primary
in
us
-B
node 6 2
71
20
p-
Se
9-
-0
sorting value
st
any aggregations
ba
Se
n
1
io
is
Coordinating node
ec
D
&
s
es
Client
in
node 4
us
-B
71
20
p-
Replica
Se
9-
3
-0
d
3. Coordinating node
ar
er
G
n
io
is
ec
D
&
s
{ es
in
us
"took": 11,
-B
"timed_out": false,
7
If “successful” is less
1
20
"_shards": {
p-
"total": 3,
-0
"failed": 0
G
n
ie
},
st
ba
Se
"hits": {
n
io
is
0 1 1 filter
ec
1
D
&
s
es
in
us
-B
7
2
1
20
p-
Se
9-
-0
3
er
{
"_index":
This document might
score unrealistically high
“my_index",
{
"_type":
"_index":
{ “my_type”,
“my_index",
"_index":
"_type":
n
on the shard
io
is
ec
D
&
s
es
in
us
{
"_index":
-B
“my_index",
"_type": {
7
"_index":
1
“my_index",
20
"_type":
p-
{
{ "_index":
G
"_index": “my_index",
“my_index",
{
n
"_index":
ie
{
“my_index",
"_index":
st
“my_index",
ba
Se
node 2 node 3
n
io
is
ec
• Tends to fix the issue of skewed datasets
D
&
s
es
in
us
n
io
}
is
ec
D
&
s
es
in
us
-B
GET my_index/_search?search_type=dfs_query_then_fetch
71
20
{
p-
Se
"query" : {
9-
-0
"match": {
d
ar
er
"comment": "star"
G
n
ie
}
st
ba
}
Se
}
dfs_query_then_fetch
n
io
is
{
ec
D
"query": {
&
s
es
...
in
us
},
7
names to sort by
1
"sort": [
20
p-
Se
{
9-
-0
"FIELD": {
d
ar
"order": "desc"
er
G
}
n
ie
st
ba
}
Se
] “desc” or “asc”
}
n
io
}
is
ec
D
},
&
s
"sort": [ es
in
us
{
-B
7
"details.total_fat": {
1
20
p-
"order": "desc"
Se
9-
}
-0
d
}
ar
er
G
]
n
ie
}
st
ba
Se
n
io
total_fat is the first hit
is
...
ec
D
"brand_name": "ProBar",
&
s
es "item_name": "Bar, Koka Moka",
in
us
"details": {
-B
"total_fat": 18,
71
20
...
p-
Se
},
9-
-0
},
er
"sort": [
n
ie
st
]
sorting },
n
io
"_source": {
is
{
ec
"servings": {
D
"_score" : {
&
"serving_size_unit": "cookies",
s
"order" : "desc" es
in
"servings_per_container": 13.5,
us
}
-B
"serving_size_qty": 3
7
}
1
},
20
p-
Holiday Homestyle",
ar
er
},
G
n
ie
"sort": [
st
ba
2.6099455
are provided in the results ]
GET nutrition/_search
{
"query" : {
"match": {
n
"ingredients": "chocolate"
io
is
ec
}
D
&
},
s
es
in
"sort": [
us
-B
"_doc"
71
20
]
p-
Se
}
9-
-0
d
n
"item_name" : "yogurt"
io
is
ec
}
D
&
},
s
es
in
"sort" : [
us
-B
{
71
20
"brand_name" : {
p-
Se
"order" : "asc"
9-
-0
}
d
ar
er
}
G
n
]
ie
st
ba
}
Se
• You can sort by text, but you need to understand the cost
‒ The issue is that text data types are analyzed
‒ The original value of the text has been changed for indexing
n
io
is
ec
D
We want to sort by this data But this is what our data looks like
&
s
es
in
us
-B
“Chobani” “chobani”
71
20
p-
Se
“del”
9-
“My Favorite”
-0
_analyze
d
ar
“essential”
er
G
“Del Monte”
n
ie
st
ba
“everyday”
Se
“Essential Everyday”
“favorite”
“monte”
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 347
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Fielddata
Doc Values and
Introducing Doc Values and Fielddata
• An inverted index is not ideal for sorting on field values
‒ sorting (and aggregations) require us to uninvert the inverted
index
• Two options in Elasticsearch for building this uninverted
index:
‒ doc values
‒ fielddata
n
io
is
ec
D
&
• doc values only work on exact value fields (so NOT on s
es
in
us
-B
analyzed fields)
71
20
p-
Se
9-
to sort by it
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
faster indexing and avoids avoid using too much
in
us
Pros
-B
n
io
is
ec
D
&
‒ enable fielddata on that field (e.g. significant terms) es
s
in
us
-B
7
GET nutrition/_search
{ “Chobani”
"query": {
"match" : { “Chobani Greek Yogurt”
"item_name" : "yogurt"
}
},
“Crowley”
"sort" : [
n
“Dannon”
io
is
{
ec
D
&
"brand_name.keyword" : {
s
es
"order" : "asc" …and so on
in
us
-B
}
71
}
20
p-
Se
]
9-
-0
}
d
“brand_name.keyword” is an OK
ar
er
G
n
“Dannon”
io
is
},
ec
D
"sort" : [
&
s …
es
{
in
us
"brand_name.keyword" : {
-B
7
“Welch’s”
1
"order" : "asc"
20
p-
}
Se
9-
} “Yoplait”
-0
d
ar
]
er
G
} “del Monte”
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT nutrition2
{
"settings": {
"analysis": {
"normalizer" : {
n
io
"my_normalizer" : {
is
ec
D
"type" : "custom",
&
s
"filter" : ["lowercase"] es
in
us
}
-B
7
}
1
20
p-
}, only lowercases
-0
d
...
ar
er
G
n
ie
st
ba
Se
...
"mappings": {
"doc" : {
"properties": {
"brand_name": {
"type": "text",
n
io
is
"fields": {
ec
D
&
"keyword": {
s
es
"type": "keyword",
in
us
-B
"normalizer": "my_normalizer"
71
}
20
p-
Se
}
9-
-0
},
Assign our “normalizer” to
d
ar
er
G
...
ie
st
ba
Se
n
} “Dannon”
io
is
ec
},
D
&
"sort" : [
s “del Monte”
es
in
{
us
-B
"brand_name.keyword" : {
7
…
1
20
"order" : "asc"
p-
Se
}
9-
“Welch’s”
-0
}
d
ar
er
]
G
“Yoplait”
n
ie
}
st
ba
Se
n
io
"query": {
is
ec
"match": {
D
&
s
"ingredients" : "onion" es
in
us
}
-B
},
71
20
"sort" : [
p-
Se
9-
{
-0
d
"details.calories" : {
ar
er
"order" : "desc"
G
n
ie
}
st
ba
Se
}
]
}
n
io
"query": {
is
ec
"match": {
D
&
s
"ingredients" : "onion" es
in
us
}
-B
},
71
20
"sort" : [
p-
Se
9-
{
-0
d
"details.calories" : {
ar
er
"order" : "desc"
G
n
ie
}
st
ba
Se
}
]
}
node 1
coordinating
{
"_index":
{
“my_index",
1000 docs {
"_index":
“my_index",
"_index":
"_type":
“my_index",
"_type":
1000 docs
If our index has 6 shards
n
io
is
ec
and we want 990-1000,
D
{
&
1000 docs
"_index":
1000 docs
p-
{
Se
"_index":
“my_index",
"_type": “my_type”,
"_id": "1",
{
9-
{
-0
"_index":
node 2 node 3
n
ie
st
ba
Se
GET nutrition/_search
{
"size": 10, Run the initial search…and keep
"query": { track of the “sort” block from the
"match": { last document returned
n
"ingredients" : "onion"
io
is
ec
}
D
&
s
}, es
in
"sort" : [
us
-B
]
d
ar
er
}
G
n
ie
“ties” in sorting
GET nutrition/_search
{
"size": 10,
"query": {
"match": {
"ingredients" : "onion"
}
},
The previous hit was 270 calories
n
io
"search_after" : [
is
ec
and _score of 1.0284368
D
270,
&
s
1.0284368, es
in
us
"doc#AVwBRWkdzWyyiGorJZ_j"
-B
7
],
1
20
p-
"sort" : [
Se
9-
]
ba
Se
n
io
is
ec
documents after the scroll is initiated will not show up in your
D
&
s
es
results
in
us
-B
71
20
GET nutrition/_search?scroll=30s
n
{
io
is
ec
"size": 100, If we are idle for more than 30
D
&
s
"query": { es
seconds, then delete the scroll
in
"match_all": {}
us
-B
},
71
20
"sort": [
p-
Se
] to return
d
ar
er
}
G
n
ie
st
ba
Se
{
"_scroll_id":
"DnF1ZXJ5VGhlbkZldGNoCgAAAAAAAAflFnZVWmtsa2VKUUdTVnJCRkFiZ3RjZ2
cAAAAAAAAH5xZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnAAAAAAAAB-
n
io
YWdlVaa2xrZUpRR1NWckJGQWJndGNnZwAAAAAAAAfoFnZVWmtsa2VKUUdTVnJCR
is
ec
D
kFiZ3RjZ2cAAAAAAAAH6RZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnAAAAAAAAB--0
&
s
WdlVaa2xrZUpRR1NWckJGQWJndGNnZwAAAAAAAAfuFnZVWmtsa2VKUUdTVnJCRk es
in
us
FiZ3RjZ2c=",
-B
7
"took": 2,
1
20
p-
"timed_out": false,
Se
9-
"_shards": {
-0
d
"total": 10,
ar
er
G
"successful": 10,
n
ie
st
"failed": 0
ba
Se
},
...
GET _search/scroll
{ Always use the prior _scroll_id
"scroll" : "30s",
n
io
is
ec
"scroll_id" :
D
&
"DnF1ZXJ5VGhlbkZldGNoCgAAAAAAAAflFnZVWmtsa2VKUUdTVnJCRkFiZ3RjZ2cAA
s
es
in
AAAAAAH5xZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnAAAAAAAAB-
us
-B
YWdlVaa2xrZUpRR1NWckJGQWJndGNnZwAAAAAAAAfoFnZVWmtsa2VKUUdTVnJCRkFi
71
20
Z3RjZ2cAAAAAAAAH6RZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnAAAAAAAAB-
p-
Se
oWdlVaa2xrZUpRR1NWckJGQWJndGNnZwAAAAAAAAfrFnZVWmtsa2VKUUdTVnJCRkFi
9-
-0
Z3RjZ2cAAAAAAAAH7BZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnAAAAAAAAB-0WdlVaa2
d
ar
xrZUpRR1NWckJGQWJndGNnZwAAAAAAAAfuFnZVWmtsa2VKUUdTVnJCRkFiZ3RjZ2c=
er
G
n
"
ie
st
ba
}
Se
DELETE _search/scroll/DnF1ZXJ5VGhlbkZldGNoCgAAAAAAAAflFnZVWm
tsa2VKUUdTVnJCRkFiZ3RjZ2cAAAAAAAAH5xZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnA
AAAAAAABYWdlVaa2xrZUpRR1NWckJGQWJndGNnZwAAAAAAAAfoFnZVWmtsa2VKUUdTV
nJCRkFiZ3RjZ2cAAAAAAAAH6RZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnAAAAAAAAB-
oWdlVaa2xrZUpRR1NWckJGQWJndGNnZwAAAAAAAAfrFnZVWmtsa2VKUUdTVnJCRkFiZ
3RjZ2cAAAAAAAAH7BZ2VVprbGtlSlFHU1ZyQkZBYmd0Y2dnAAAAAAAAB-0WdlVaa2xr
n
io
is
ec
ZUpRR1NWckJGQWJndGNnZwAAAAAAAAfuFnZVWmtsa2VKUUdTVnJCRkFiZ3RjZ2c=
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
site like Facebook. A snapshot of the data works best for a large
io
is
ec
D
number of hits that we want to return in small batches.
&
s
es
in
us
-B
• Pagination search:
71
20
p-
Se
9-
n
io
is
ec
D
• Doc values store your original fields’ values on disk
&
s
es
in
us
-B
n
io
is
“keyword”.
ec
D
&
s
es
in
results?
1
20
p-
Se
9-
-0
search?
G
n
ie
st
ba
Se
3 Text Analysis
4 Mappings
n
io
Aggregations
is
ec
7 Working with Search Results
D
&
s
es
in
us
8 Aggregations
-B
71
20
p-
Se
9-
More Aggregations
-0
9
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• What are Aggregations?
• Types of Aggregations
• Buckets and Metrics
• Common Metrics Aggregations
• The range Aggregation
n
• The date_range Aggregation
io
is
ec
D
&
s
es
• The terms Aggregation
in
us
-B
71
20
• Nesting Buckets
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
What
n
io
is
is the average ticket price
ec
What movies have “star” in
D
of a movie in the U.S?
&
s
es
the title?
in
us
-B
71
20
p-
Se
That
9-
That
-0
is an aggregation…
d
is a search
ar
er
G
query…
n
ie
st
ba
Se
n
io
is
ec
file?” day last month?”
D
&
s
es
in
“What
us
-B
Search
criteria
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
Aggregations
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
"AGG_TYPE": {
is
ec
D
...
&
s
} es
The name you choose comes
in
us
}
-B
}
1
20
p-
}
Se
9-
-0
d
ar
types
ie
st
ba
Se
What
is the average
number of calories of items
with “olive oil” in the
ingredients?
GET nutrition/_search
{
"query" : {
"match_phrase": {
"ingredients": "olive oil"
n
io
}
is
The “query” defines the
ec
},
D
&
"aggs": { scope of the aggregation
es
in
s
us
"my_avg_calories": {
-B
"avg": {
71
20
"field": "details.calories"
p-
Se
}
9-
-0
}
G
average
n
ie
}
st
ba
Se
"hits": {
"total": 8,
"max_score": 6.0192027,
"hits": [
... The name you gave
]
n
your aggregation
io
is
},
ec
D
&
"aggregations": {
s
es
"my_avg_calories": {
in
us
-B
"value": 196.25
71
20
}
p-
Se
}
9-
-0
The computation
d
ar
er
G
n
ie
st
ba
Se
n
"match_phrase": {
io
"failed": 0
is
ec
"ingredients": "olive oil"
D
},
&
}
s
es "hits": {
in
},
us
"total": 8,
-B
"aggs": {
7
"max_score": 0,
1
20
"my_avg_calories": {
p-
"hits": []
Se
"avg": {
9-
},
-0
"field": "details.calories"
d
"aggregations": {
ar
er
}
G
"my_avg_calories": {
n
ie
}
st
"value": 196.25
ba
}
Se
}
} }
}
n
• Matrix: aggregation that operates on multiple fields and
io
is
ec
D
produces a matrix result es
in
&
s
us
-B
n
"_id": "26588",
io
is
"_version": 1,
ec
"_score": 8.296555,
D
{ {
"_source":
&
"_index":
"brandName": "product",
"Jelly Belly",
s
"_type": "product_type", es
in
"_id": "26589",
us
"_version": 1,
-B
{ { {
"_score": 8.296555, "_index": "product",
{
"_type": "product_type",
"_index": "product",
{
"_type": "product_type",
"_index": "product",
{
"_type": "product_type",
"_source":{ {
7
"_id":"_index":
"26590","product", "_id":"_index":
"26590","product", "_id":"_index":
"26590","product",
1
"_type":1,"product_type",
"_version": "_type":1,"product_type",
"_version": "_type":1,"product_type",
"_version":
"_index":
"Jelly"product",
20
"_version": 1,
-0
"_score": 8.296555,
"_source": {
d
ar
price > $5
G
&&
st
n
io
is
{
ec
{ "_index": "product", {
"_index": "product", { "_index": "product",
D
"_type": "product_type",
{ {
"_type": "product_type", "_id":"_index":
"26590","product", "_type": "product_type",
"_id":"_index":
"26590","product", "_id":"_index":
"26590","product",
&
"_type":1,"product_type",
"_version":
"_type":1,"product_type",
"_version": "_id":8.296555,
"26590", "_type":1,"product_type",
"_version":
"_score":
s
"_id":8.296555,
"_score": "26590", "_version": 1, "_id":8.296555,
"_score": "26590",
es
"_source": {
"_version":
"_source": { 1, "_score": 8.296555, "_version":
"_source": { 1,
"brandName": "Jelly
"_score": 8.296555, "_source": { "_score": 8.296555,
in
"brandName": "Jelly "brandName": "Jelly
"_source": { "brandName": "Jelly "_source": {
us
{
{ "_index": "product",
er
"_index": "product", {
"_type": "product_type",
{
"_type": "product_type", "_id":"_index":
"26590","product",
G
"_id":"_index":
"26590","product", "_type":1,"product_type",
"_version":
"_type":1,"product_type",
n
"_version": "_id":8.296555,
"_score": "26590",
"_id":8.296555,
"26590",
ie
"_score": "_version":
"_source": { 1,
"_version":
"_source": { 1, "_score": 8.296555,
st
price <= $5
rating=4 rating=5
n
io
is
ec
• Some metrics output multiple values:
D
&
s
es
in
us
n
io
is
ec
D
&
s
es
What
in
us
-B
is the average
71
20
{ { {
er
over $10?
"_id":"_index":
"26590","product", "_id":"_index":
"26590","product", "_id":"_index":
"26590","product",
"_type":1,"product_type", "_type":1,"product_type",
n
"_version": "_type":1,"product_type",
"_version": "_version":
ie
"_id":8.296555,
"_score": "26590", "_id":8.296555,
"26590", "_id":8.296555,
"_score": "26590",
That
"_score":
"_version": 1, "_version": 1, "_version": 1,
st
will require
buckets and
metrics
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 395
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Aggregations
Common Metrics
The avg Metric
• The avg metric aggregation computes the average of a
specified numeric field
n
"match": {
io
is
ec
"ingredients": "sugar" "aggregations": {
D
&
} "my_sugar_calorie_average": {
s
es
in
}, "value": 155.44347826086957
us
-B
"aggs" : { }
71
20
"my_sugar_calorie_average" : { }
p-
Se
"avg": {
9-
-0
"field": "details.calories"
d
ar
er
}
G
n
}
ie
st
ba
}
Se
n
"max": { "my_min_calorie_item": {
io
is
ec
"field": "details.calories" "value": 0
D
&
} },
s
es
in
}, "my_max_calorie_item": {
us
-B
"min": { }
p-
Se
"field": "details.calories" }
9-
-0
}
d
ar
er
}
G
n
ie
}
st
}
Se
multiple aggregations in a
single “aggs” clause
n
GET nutrition/_search
io
is
ec
{ "aggregations": {
D
&
"size" : 0, "my_calorie_stats": {
s
es
in
"aggs" : { "count": 499,
us
-B
"my_calorie_stats" : { "min": 0,
71
20
} "sum": 68257
d
ar
er
} }
G
n
} }
ie
st
ba
}
Se
n
brand names are there?
io
is
ec
GET nutrition/_search
D
&
s
{ es
in
"size" : 0,
us
-B
"aggs": {
71
"aggregations": {
20
"my_brand_name_cardinality": {
p-
"my_brand_name_cardinality": {
Se
"cardinality": {
9-
"value": 430
-0
"field": "brand_name.keyword",
d
ar
}
er
"precision_threshold": 100
G
}
n
ie
}
st
ba
}
Se
}
}
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
"to": 100
is
"key": "100.0-200.0",
ec
D
}, "from": 100,
&
s
{ es
in
"to": 200,
us
"from": 100,
-B
"doc_count": 207
7
"to": 200
1
},
20
p-
}, {
Se
9-
{
-0
"key": "200.0-*",
d
"from" : 200
ar
"from": 200,
er
G
} "doc_count": 114
n
ie
st
] }
ba
Se
} ]
} }
} }
}
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 404
distributing without written permission is strictly prohibited
Labeling Ranges
• Add a “key” field to label a range using any string you like
‒ Your “key” field gets returned in the results
"aggregations": {
"aggs" : { "my_calorie_range_buckets": {
"my_calorie_range_buckets" : { "buckets": [
"range": { {
"field": "details.calories", "key": "low_calorie",
"ranges": [ "to": 100,
{ "doc_count": 178
"key" : "low_calorie", },
n
io
"to": 100 {
is
ec
D
}, "key": "medium_calorie",
&
s
{ es "from": 100,
in
us
}, },
Se
9-
{ {
-0
d
} "doc_count": 114
ba
Se
] }
} ]
} }
n
io
{"to": 100},
is
ec
{"from": 100,"to": 200},
D
&
s
{"from" : 200} es
in
us
]
-B
},
71
20
"aggs": {
p-
Se
"my_average_total_fat": {
-0
d
"field": "details.total_fat"
G
n
ie
}
st
ba
Se
}
}
}
n
io
is
"to": 200,
ec
D
"doc_count": 207,
&
s
es
"my_average_total_fat": {
in
us
-B
"value": 4.618357487922705
71
}
20
p-
},
Se
9-
-0
{
d
ar
"key": "200.0-*",
er
G
"from": 200,
n
ie
st
"doc_count": 114,
ba
Se
"my_average_total_fat": {
"value": 12.20438596501685
}
}
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 407
distributing without written permission is strictly prohibited
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Aggregation
The date_range
The Date Range Aggregation
• The date_range bucket aggregation is similar to range
except it is specifically used for date fields
‒ syntax looks like:
GET my_index/_search
{
"aggs" : {
"my_date_range_agg" : {
n
"date_range": {
io
is
ec
"field": "FIELD",
D
&
"ranges": [
s
es
in
{
us
-B
}
9-
-0
]
d
ar
er
}
G
n
}
ie
st
ba
}
Se
n
io
"from": "2010-01-01", }
is
ec
D
"to": "2010-01-01||+1w"
&
s
} es
"my_total_volume": {
in
us
]
-B
"value": 161270672
7
},
1
20
}
p-
"aggs": {
Se
9-
"my_total_volume": {
-0
d
"sum": {
ar
er
G
"field": "volume"
n
ie
st
}
ba
Se
}
}
}}}
n
Which product has the lowest
io
is
{
ec
{ "_index": "product", {
"_index": "product", { "_index": "product",
D
rating but sells the best?
"_type": "product_type",
{ {
"_type": "product_type", "_id":"_index":
"26590","product", "_type": "product_type",
"_id":"_index":
"26590","product", "_id":"_index":
"26590","product",
&
"_type":1,"product_type",
"_version":
"_type":1,"product_type",
"_version": "_id":8.296555,
"26590", "_type":1,"product_type",
"_version":
"_score":
s
"_id":8.296555,
"_score": "26590", "_version": 1, "_id":8.296555,
"_score": "26590",
es
"_source": {
"_version":
"_source": { 1, "_score": 8.296555, "_version":
"_source": { 1,
"brandName": "Jelly
"_score": 8.296555, "_source": { "_score": 8.296555,
in
"brandName": "Jelly "brandName": "Jelly
"_source": { "brandName": "Jelly "_source": {
us
{
{ "_index": "product",
er
"_index": "product", {
"_type": "product_type",
"_id":"_index":
"26590","product", "_type":1,"product_type",
"_version":
"_type":1,"product_type",
n
"_version": "_id":8.296555,
"_score": "26590",
"_id":8.296555,
"26590",
ie
"_score": "_version":
"_source": { 1,
"_version": 1,
rating=4 rating=5
GET products/_search
{
"size" : 0,
"aggs" : {
"my_customer_ratings" : {
"terms": {
"field": "customerRating",
"size": 5
n
io
}
is
ec
}
D
&
s
} es
in
us
}
-B
71
20
p-
Se
(defaults to 10)
d
ar
er
G
n
ie
st
ba
Se
n
},
io
is
“2” is the most
ec
{
D
&
common rating s
"key": 1,
es
in
"doc_count": 22100
us
-B
},
71
20
{
p-
Se
"key": 4,
9-
-0
"doc_count": 22009
d
ar
er
},
G
n
ie
{
st
ba
"key": 5,
Se
"doc_count": 21984
}]}}
shard0 shard1
n
io
is
ec
D
a:9 b:9 d:4
&
c:3
s
es
in
us
-B
71
20
p-
Se
9-
-0
n
io
is
ec
GET nutrition/_search
D
"aggregations": {
&
{
s
es
in
"my_brand_names": {
"size" : 0,
us
"doc_count_error_upper_bound": 5,
-B
"my_brand_names" : {
p-
"buckets": [
Se
"terms": {
9-
{
-0
"field": "brand_name.keyword"
d
"key": "Giant",
ar
er
} "doc_count": 6
G
n
}
ie
},
st
ba
}
Se
...
} The value 6 may not be
accurate
n
io
"doc_count_error_upper_bound": 5,
is
ec
D
"sum_other_doc_count": 469,
&
s
"buckets": [ es
in
us
{
-B
7
"key": "Giant",
1
20
p-
"doc_count": 6,
Se
"doc_count_error_upper_bound": 1
-0
},
ar
er
G
be at most 7 {
n
ie
st
"doc_count": 4,
"doc_count_error_upper_bound": 3
},
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
"shard_size": 110 "my_brand_names": {
ec
D
} "doc_count_error_upper_bound": 0,
&
s
} es
"sum_other_doc_count": 474,
in
us
}
-B
"buckets": [
7
}
1
{
20
p-
"key": "Giant",
Se
9-
-0
"doc_count": 6,
d
ar
"doc_count_error_upper_bound": 0
er
},
n
ie
"doc_count": 5,
"doc_count_error_upper_bound": 0
},
n
io
is
"my_calories_buckets": {
ec
D
"range": {
&
s
es
"field": "details.calories",
in
us
-B
"ranges": [
71
"to": 100
buckets: “calories” < 100
d
ar
},
er
{
n
ie
st
ba
"from" : 100
Se
}
]
}}}}}}
n
io
“Giant”, with 2 items below "from": 0,
is
ec
"to": 100,
D
100 calories and 4 items
&
s
"doc_count": 2 es
above 100 calories
in
us
},
-B
{
71
20
"key": "100.0-*",
p-
Se
9-
"from": 100,
-0
d
"doc_count": 4
ar
er
}
G
n
ie
]
st
ba
Se
}
},
n
io
is
builds analytic information over a set of documents
ec
D
&
s
es
in
n
io
is
ec
D
5. How can you increase the accuracy of a terms &
s
es
in
us
-B
aggregation?
71
20
p-
Se
9-
3 Text Analysis
4 Mappings
n
io
More
is
ec
7 Working with Search Results
D
&
s
es
Aggregations
in
us
8 Aggregations
-B
71
20
p-
Se
9 More Aggregations
9-
-0
d
ar
er
G
n
10 Handling Relationships
ie
st
ba
Se
Topics covered:
• Aggregation Scope
• Using post_filter
• The missing Aggregation
• Histograms
• Date Histograms
n
• Percentiles
io
is
ec
D
&
s
es
• Top Hits
in
us
-B
71
20
• Significant Terms
p-
Se
9-
-0
d
ar
• Sorting Buckets
er
G
n
ie
st
ba
Se
n
"stock_symbol.keyword": {
io
is
ec
"value": "EBAY"
D
&
}
s
es
in
}
us
-B
},
7
"aggs": {
p-
"ebay_avg_high": {
9-
-0
"field": "high"
G
n
}
ie
st
ba
}
Se
}
}
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
},
io
is
ec
"all_of_my_items" : {
D
&
"global": {},
s
es
in
"aggs" : {
us
-B
"max_total_fat_from_all": {
71
20
"max": {
p-
"field": "details.total_fat"
9-
-0
} items
d
ar
er
}
G
n
ie
}
st
ba
}
Se
}
}
"aggregations": {
"all_of_my_items": {
"doc_count": 499,
"max_total_fat_from_all": {
"value": 36
}
},
"max_total_fat_from_120": {
n
"value": 14
io
is
ec
}
D
&
s
} es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
365 Everyday Value (1)
io
5. Shredded Mozzarella Cheese
is
ec
D
6. Party Cheese Tray
&
s
es
in
7. Cheese & Grapes Snack
us
-B
7
GET nutrition/_search
{
"query": {
"match": {
"item_name": "cheese"
Retrieve items with
} “cheese” in the name
n
io
is
ec
},
D
&
"aggs": {
s
es
in
"popular_brand_names": {
us
"terms": {
71
"field": "brand_name.keyword",
p-
Se
"size": 5 items
9-
-0
}
d
ar
}
er
G
n
}
ie
st
ba
}
Se
n
io
is
ec
Essential Everyday (2) 2. Cheese, Italian Three Cheese
D
&
s
Giant (2) 3. es
Turkey & Cheese
in
us
-B
GET nutrition/_search
{
"query": {
"match": { Retrieve items with
"item_name": "cheese"
} “cheese” in the name
},
"aggs": {
"popular_brand_names": {
"terms": { Retrieve the top 5 brand
"field": "brand_name.keyword", names that sell “cheese”
n
io
"size": 5 items
is
ec
}
D
&
s
} es
in
us
},
-B
"post_filter": {
71
20
"match": {
p-
"brand_name.keyword": "Wegmans"
-0
} “Wegmans”
d
ar
er
}
G
n
ie
}
st
ba
Se
n
Giant (2)
io
is
ec
D
Wegmans (2)
&
s
es
365 Everyday Value (1)
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
GET my_index/_search
&
s
{ es
in
us
"aggs": {
-B
7
"my_bucket": {
1
20
p-
"missing": {
Se
9-
"field": "FIELD_NAME"
-0
d
}
ar
er
G
}
n
ie
st
}
ba
Se
n
io
"doc_count": 250,
is
}
ec
"my_brands": {
D
}
&
s
"doc_count_error_upper_bound": 2, es
}
in
"sum_other_doc_count": 222,
us
-B
} "buckets": [
71
20
{
p-
Se
"key": "Giant",
9-
-0
"doc_count": 5
d
ar
},
G
n
{
ie
ingredients
st
ba
"key": "Unknown",
Se
"doc_count": 5
},
{
GET nutrition/_search
{
n
"size": 0,
io
is
ec
"aggs": {
D
&
"my_calories_histogram": {
s
es
in
"histogram": { A bucket is created for
us
-B
"field": "details.calories",
every 100 calories
71
20
"interval": 100
p-
Se
}
9-
-0
}
d
ar
er
}
G
n
}
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
24 items have calories "doc_count": 24
io
is
ec
between 300 inclusive },
D
&
{
and 400 exclusive s
es
in
"key": 400,
us
-B
"doc_count": 3
71
20
},
p-
Se
{
9-
-0
"key": 500,
d
ar
"doc_count": 1
er
G
n
},
ie
st
ba
{
Se
"key": 600,
"doc_count": 2
}
]
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 447
distributing without written permission is strictly prohibited
The min_doc_count Parameter
• You can use min_doc_count to raise the minimum count to
throw out smaller buckets
"aggregations": {
"my_calories_histogram": {
"buckets": [
GET nutrition/_search {
{ "key": 0,
"size": 0, "doc_count": 178
"aggs": { },
"my_calories_histogram": { {
n
"histogram": { "key": 100,
io
is
ec
"field": "details.calories", "doc_count": 207
D
&
"interval": 100, },
s
es
{
in
"min_doc_count": 10
us
-B
} "key": 200,
71
"doc_count": 84
20
}
p-
Se
} },
9-
-0
{
}
d
ar
"key": 300,
er
G
"doc_count": 24
n
ie
st
}
ba
Se
n
io
"interval": 100,
is
ec
D
"min_doc_count": 10
&
s
}, es
in
us
"aggs": {
-B
"my_bucket_stats": {
1
20
p-
"stats": {
Se
9-
"field": "details.calories"
-0
d
}
ar
er
G
}
n
ie
}
st
ba
Se
}
}
}
n
},
io
is
ec
{
D
&
"key": 100,
s
es
in
"doc_count": 207,
us
-B
"bucket_stats": {
71
20
"count": 207,
p-
Se
"min": 100,
9-
-0
"max": 195,
d
ar
"avg": 137.8985507246377,
er
G
n
"sum": 28545
ie
st
ba
}
Se
},
n
io
is
ec
GET stocks/_search
D
&
{
s
es
in
"aggs": { must be a “date” field
us
-B
"my_date_histogram": {
71
20
"date_histogram": {
p-
Se
"field": "DATE_FIELD",
9-
-0
"interval": "INTERVAL"
d
ar
er
}
G
n
}
st
ba
}
Se
n
io
is
ec
• Time units are:
D
&
s
es
in
us
n
io
is
"date_histogram": {
ec
D
&
"field": "trade_date",
s
es
"interval": "1w"
in
us
-B
}
71
20
}
p-
Se
}
9-
-0
}
d
ar
er
G
n
ie
st
ba
Se
n
io
is
},
ec
D
{
&
s
es
"key_as_string": "2009-09-03T00:00:00.000Z",
in
us
"key": 1251936000000,
-B
71
"doc_count": 1497
20
p-
},
Se
9-
{
-0
d
ar
"key_as_string": "2009-09-10T00:00:00.000Z",
er
G
"key": 1252540800000,
n
ie
st
"doc_count": 2495
ba
Se
},
n
io
is
"interval": "month" "2009-09-01" = 60.339000129699706
ec
D
}, "2009-10-01" = 60.16849994659424
&
s
"aggs": { es
"2009-11-01" = 60.203529582304114
in
us
"avg": {
1
"2010-01-01" = 60.4199995743601
20
p-
} "2010-03-01" = 65.56347905034605
-0
d
"2010-04-01" = 74.24333336239769
ar
}
er
G
} "2010-05-01" = 77.47699966430665
n
ie
st
} "2010-06-01" = 74.70545473965731
ba
Se
} "2010-07-01" = 70.23285675048828
} "2010-08-01" = 69.4557135445731
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
{
io
"aggregations": {
is
ec
"size": 0,
D
"my_calorie_percentiles": {
&
"aggs": {
s
es "values": {
in
"my_calorie_percentiles": {
us
"1.0": 0,
-B
"percentiles": {
7
"5.0": 10,
1
20
"field": "details.calories"
p-
"25.0": 70,
Se
}
9-
"50.0": 120,
-0
}
d
"75.0": 189.16666666,
ar
er
}
G
"95.0": 310,
n
ie
}
st
"99.0": 400
ba
Se
}
}
} 95% of items have 310 or
fewer calories
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 459
distributing without written permission is strictly prohibited
Percentiles Example
• Quintiles are a commonly-used percentile
‒ 5 evenly-spaced percentages: 20% - 40% - 60% - 80% - 100%
n
io
is
"percentiles": {
ec
"20.0": 60,
D
&
"field": "details.calories", "40.0": 110,
s
es
"percents": [
in
"60.0": 140,
us
-B
40,
20
"100.0": 630
p-
Se
60, }
9-
-0
80, }
d
ar
100 }
er
G
]
n
ie
st
ba
}
}
210 or fewer calories
}
What percentile is
150 calories?
GET nutrition/_search
{
n
"size": 0,
io
is
ec
"aggs": { "aggregations": {
D
&
"my_calorie_percentiles": { "my_calorie_percentiles": {
s
es
in
"percentile_ranks": { "values": {
us
-B
} }
9-
-0
} }
d
ar
er
}
G
n
}
ie
st
n
io
is
"my_top_hits": {
ec
D
&
"top_hits": {
s
es
"size": SIZE,
in
us
-B
"sort": [],
71
20
"from": OFFSET
p-
Se
}
9-
-0
}
d
}
er
G
n
io
is
"buckets": [
ec
"aggs": {
D
{
&
"my_calorie_bucket": {
s
es "key": 140,
in
"terms": {
us
"doc_count": 9
-B
"field": "details.calories"
7
},
1
20
}
p-
{
Se
}
9-
"key": 120,
-0
}
d
ar
} "doc_count": 8
er
G
},
n
ie
st
{
ba
Se
"key": 270,
"doc_count": 7
},
n
of the “calorie_bucket”
io
},
is
ec
"aggs": {
D
&
s
"my_top_hits": { es
in
"top_hits": {
us
-B
"size": 5
71
20
}
p-
Se
}
d
}
G
}
st
ba
Se
"aggregations": {
"my_calorie_bucket": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 57,
"buckets": [
{ “Soda”
"key": 140,
n
io
"doc_count": 9, "Hot Cocoa Mix,
is
ec
D
"my_top_hits": {
Cappuccino Classics
&
s
"hits": { es
Suprema 1.16 Oz"
in
us
"total": 9,
-B
"max_score": 1.341344,
71
20
"Raisin Bran"
p-
"hits": [
Se
9-
top 5 hits
-0
]
ar
er
G
Guanabana"
n
ie
st
ba
Se
n
io
“commonly common” - a lot of customers charge things at
is
ec
D
&
Amazon es
in
s
us
-B
n
"properties" : {
io
is
ec
"ingredients" : {
D
&
"type" : "text",
s
es
in
"fielddata" : true
us
-B
}
71
20
}
p-
Se
ingredients
ar
er
G
n
ie
st
ba
Se
n
},
io
is
ec
"aggregations":{
D
&
"my_top_words" : {
s
es
in
"terms" : {
us
-B
"field" : "ingredients",
71
20
"size":5
p-
Se
}
9-
-0
}
d
ar
er
}
G
n
ie
}
st
ba
}
Se
n
io
The “commonly common”
is
},
ec
D
{ ingredients are “salt”, “oil”,
&
s
"key": "and", es
“corn”, “acid” and the
in
us
"doc_count": 11
-B
}, stopword “and”
71
20
{
p-
Se
"key": "corn",
9-
-0
"doc_count": 8
d
ar
},
er
G
{
n
ie
st
"key": "acid",
ba
Se
"doc_count": 7
}
]}
GET nutrition/_search
{ Let’s look for the
"size":0, “uncommonly common”
"aggs" : { ingredients in relationship
"my_total_fat_histogram":{ to total fat.
"histogram": {
"field": "details.total_fat",
"interval": 5,
"min_doc_count": 5
},
n
io
is
"aggregations":{
ec
D
&
"my_top_words" : {
s
es
"significant_terms" : {
in
us
-B
"field" : "ingredients",
71
20
"size":5
p-
Se
}
9-
-0
}
d
ar
}
er
G
n
}
ie
st
ba
}
Se
n
io
"key": "shortening",
is
ec
"doc_count": 4, “cashews”
D
&
"score": 1.532570239334027,
s
es
"bg_count": 5
in
us
},
-B
7
{
1
20
"key": "apples",
p-
Se
"doc_count": 4,
9-
-0
"score": 1.532570239334027,
d
ar
"bg_count": 5
er
G
},
n
ie
{
st
ba
"key": "cashews",
Se
"doc_count": 3,
"score": 1.460978147762747,
"bg_count": 3
n
io
is
ec
D
‒ _key: Sort by the bucket’s key; works with histogram and
&
s
es
in
date_histogram
us
-B
71
20
p-
Se
9-
-0
d
ar
GET stocks/_search
{
"aggs": {
"my_stock_date_histogram": { The default for date_histogram is
"date_histogram": { _key ascending
"field": "trade_date",
"interval": "month",
"order": {
"_key": "desc"
}
},
n
io
is
"aggs": {
ec
D
"my_avg_close": {
&
s
es
"avg": {
in
us
-B
"field": "close"
71
}
20
p-
}
Se
9-
-0
}
d
ar
}
er
G
}
n
ie
st
}
ba
Se
n
io
},
is
This output will be sorted
ec
"aggs": {
D
&
"my_max_calories": { es
in
s
by the results of the
nested metric
us
"max": {
-B
"field": "details.calories"
71
20
}
p-
Se
9-
}
-0
}
d
ar
er
}
G
n
ie
}
st
ba
Se
n
io
is
ec
D
&
• top_hits is a metrics aggregation that allows you to find the es
in
s
us
n
StackOverflow?”
io
is
ec
D
&
s
es
4.You want to compare the total volume of stocks from the
in
us
-B
you use?
9-
-0
d
ar
er
G
n
ie
st
ba
Se
3 Text Analysis
4 Mappings
n
io
Handling
is
ec
7 Working with Search Results
D
&
s
es
Relationships
in
us
8 Aggregations
-B
71
20
p-
Se
9 More Aggregations
9-
-0
d
ar
er
Handling Relationships
G
10
n
ie
st
ba
Se
Topics covered:
• The Need for Data Modeling
• Denormalization
• The Need for Nested Types
• Nested Types
• Querying a Nested Type
n
• Sorting on a Nested Type
io
is
ec
D
&
s
es
• The Nested Aggregation
in
us
-B
71
20
• Parent/Child Types
p-
Se
9-
-0
d
ar
n
• A flat world has its advantages
io
is
ec
D
&
s
es
‒ Indexing and searching is fast
in
us
-B
71
20
n
io
is
ec
‒ Nested objects
D
&
s
es
in
us
‒ Parent/child relationships
-B
71
20
p-
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
users tweets
n
io
is
ec
D
&
s
{ { es
in
us
"state" : "California" }
-0
d
}
ar
er
G
n
ie
st
ba
Se
tweets
n
io
is
ec
D
&
s
PUT tweets/doc/123 es
in
us
{
-B
7
"time" : "2017-01-24T02:32:27",
Se
PUT tweets/doc/456
9-
"userid" : 1,
-0
{
d
"username" : "harrison"
ar
er
}
n
ie
"time" : "1977-05-25T00:00:00",
st
ba
"userid" : 1,
Se
"username" : "harrison"
}
GET tweets/_search
{
"query": {
"bool": {
"must": [
{"match": {"body": "movie"}},
{"match": {"username": "harrison"}}
]
n
io
is
}
ec
D
}
&
s
es
}
in
us
-B
71
20
"_score": 0.55510485,
p-
Se
9-
"_source": {
-0
d
"time": "2017-01-24T02:32:27",
n
ie
st
ba
"userid": 1,
Se
"username": "harrison"
}
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 491
distributing without written permission is strictly prohibited
Tips for Denormalizing Data
• Denormalize data to optimize for reads
• There is no mechanism to keep the denormalized data
consistent with the original document
‒ avoid denormalizing data that is likely to change frequently
• When denormalizing data that changes, you should ensure
that there is 1 and only 1 authoritative source for the data
‒ If you do denormalize data that might change, it can help to also
n
io
is
ec
denormalize a field that does not change, like an _id
D
&
s
es
in
us
-B
71
20
users tweets
p-
Se
9-
-0
d
ar
{
n
{
ie
“userid” as an
st
"userid" : 1,
ba
"userid" : 1,
Se
PUT companies/doc/1
{
"name" : "Stark Enterprises",
"employees" : [
n
io
is
{"first_name" : "Tony", "last_name" : "Stark"},
ec
D
{"first_name" : "Virginia", "last_name" : "Potts"}
&
s
es
]
in
us
}
-B
71
PUT companies/doc/2
20
p-
{
Se
9-
"employees" : [
er
G
]
ba
Se
GET companies/_search
{
"query": {
"bool": {
"must": [
{"match": {"employees.first_name": "Tony"}},
?
n
io
is
{"match": {"employees.last_name": "Potts"}}
ec
D
]
&
s
es
}
in
us
}
-B
71
}
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
"_type": "doc",
is
ec
D
"_id": "1",
&
"_score": 0.51623213,
s
es
in
"_source": {
us
-B
"employees": [
1
20
p-
{
Se
"first_name": "Tony",
9-
-0
},
not have an employee
er
G
{
n
"first_name": "Virginia",
st
ba
"last_name": "Potts"
Se
}
]
}
n
io
is
ec
D
{
&
s
"name" : "Stark Enterprises", es
in
us
}
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
"companies": {
"mappings": {
n
io
is
"doc": {
ec
“first_name” and “last_name” are
D
"properties": {
&
s inner objects - but they should be
es
"employees": {
in
us
"properties": {
71
"first_name": {
20
p-
"type": "text",
Se
9-
-0
"fields": {
d
ar
"keyword": {
er
G
"type": "keyword",
n
ie
st
"ignore_above": 256
ba
Se
}
}
"mappings": {
n
"doc": {
io
is
ec
"properties": {
D
&
"outer_object": {
s
es
in
"type": "nested",
us
-B
"properties": {
71
20
"inner_field": "TYPE",
p-
Se
...
9-
-0
}
d
ar
er
}
G
n
ie
},
st
ba
Se
n
io
"ignore_above": 256
is
ec
}
D
&
s
} es
in
us
},
-B
"last_name": {
71
20
"type": "text",
p-
Se
9-
"fields": {
-0
d
"keyword": {
ar
er
"type": "keyword",
G
n
ie
"ignore_above": 256
st
ba
Se
}
}
}
{
"name" : "Stark Enterprises"
}
{
"employee.first_name" : "Tony",
n
io
is
"employee.last_name" : "Stark"
ec
D
}
&
s
es
{
in
us
-B
"employee.first_name" : "Virginia",
71
"employee.last_name" : "Potts"
20
p-
}
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
GET companies/_search
{
"query": { Specify the “path” to the
"nested": { nested object
"path": "employees",
"query": {
"bool": {
"must": [
n
io
is
{"match": {"employees.first_name": "Tony"}},
ec
D
&
{"match": {"employees.last_name": "Potts"}}
s
es
]
in
us
-B
}
71
}
20
p-
Se
}
9-
-0
}
d
ar
}
er
G
n
ie
st
ba
Se
GET companies/_search
{
"query": {
"nested": {
n
"Stark Enterprises"
io
is
"path": "employees",
ec
D
"query": {
&
s
es
"match": { "NBC Universal"
in
us
"employees.first_name": "Tony"
-B
71
}
20
p-
}
Se
9-
-0
}
d
ar
}
er
G
}
n
ie
st
ba
Se
GET companies/_search
{
"query": {
"nested": {
n
"path": "employees",
io
is
ec
"query": {
D
&
"match": {
s
es
in
"employees.first_name": "Tony"
us
-B
}
71
20
"inner_hits" : {}
9-
to a “nested” query
-0
}
d
ar
er
}
G
n
ie
}
st
ba
Se
n
io
"max_score": 0.6931472,
is
ec
"hits": [
D
&
{
s
"_nested": { es
in
us
"field": "employees",
-B
"offset": 0
71
},
20
p-
"_score": 0.6931472,
Se
"_source": {
9-
"first_name": "Tony",
d
"last_name": "Stark"
er
G
}
n
ie
}
st
ba
]
Se
}
}
}
}
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
}
io
is
ec
},
D
&
"sort": {
s
es
in
"employees.first_name.keyword" : {
us
-B
"order" : "asc",
71
20
"nested_path": "employees",
p-
Se
"nested_filter" : {
9-
-0
}
G
n
}
ie
st
ba
}
Se
n
"first_name": "Tony",
io
Notice the first names are
is
ec
"last_name": "Stark"
D
sorted “asc”
&
},
s
es
in
{
us
-B
"first_name": "Virginia",
71
20
"last_name": "Potts"
p-
Se
}
9-
-0
]
d
ar
er
},
G
n
ie
"sort": [
st
ba
"Virginia"
Se
]
}
GET companies/_search
n
io
is
{
ec
D
"size": 0,
&
s
es "aggregations": {
"aggs": {
in
us
"my_employees_bucket": {
-B
"my_employees_bucket": {
7
"doc_count": 3
1
"nested": {
20
p-
}
Se
"path": "employees"
9-
}
-0
}
d
ar
}
er
G
}
n
ie
st
ba
}
Se
GET companies/_search
{
"size": 0,
"aggs": {
"my_employees_bucket": {
"nested": {
"path": "employees"
n
},
io
is
ec
"aggs": {
D
&
s
"last_name": { es
in
"terms": {
us
-B
"field" : "employees.last_name.keyword"
71
20
}
p-
Se
}
9-
-0
}
d
ar
er
}
G
n
ie
}
st
ba
}
Se
"aggregations": {
"my_employees_bucket": {
"doc_count": 3,
"last_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Potts",
n
io
is
ec
"doc_count": 2
D
&
},
s
es
in
{
us
-B
"key": "Stark",
71
20
"doc_count": 1
p-
Se
}
9-
-0
]
d
ar
}
er
G
n
}
ie
st
ba
}
Se
n
io
is
ec
D
&
‒ the parent can be updated without reindexing the children s
es
in
us
-B
71
n
io
is
ec
D
‒ The parent’s id is used as the routing value for the child
&
s
es
in
document
us
-B
71
20
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
PUT companies
n
{
io
is
ec
"mappings": {
D
&
s
"company" : {}, es
in
"employee" : {
us
-B
"_parent": {
71
20
"type": "company"
p-
Se
}
9-
-0
}
d
ar
er
}
G
n
ie
}
st
ba
Se
PUT companies/company/c1
{
n
io
is
ec
"name" : "Stark Enterprises"
D
&
}
s
es
in
us
-B
PUT companies/company/c2
71
20
{
p-
Se
}
d
ar
er
G
n
ie
st
ba
Se
PUT companies/employee/emp1?parent=c1
{
"first_name" : "Tony",
"last_name" : "Stark" “Stark Enterprises”
}
n
io
is
ec
PUT companies/employee/emp2?parent=c1
D
&
s
{ es
in
"first_name" : "Virginia",
us
-B
"last_name" : "Potts"
71
20
}
p-
Se
9-
-0
{
G
n
ie
"first_name" : "Tony",
st
ba
"last_name" : "Potts"
Se
n
io
"max_score": 1,
is
ec
D
"hits": [
&
s
es ...
in
us
]
-B
71
20
p-
Se
9-
-0
has_parent
G
n
ie
st
ba
Se
n
io
the parent type
is
ec
D
&
s
es
in
us
GET my_index/parent/_search
-B
71
{
20
p-
"query": {
Se
9-
"has_child": {
-0
d
ar
"type": "TYPE",
er
G
"query": {}
n
ie
}
ba
Se
}
}
I want all
companies who have an
employee named
“Stark”
GET companies/company/_search
{ "hits": [
"query": { {
"has_child": { "_index": "companies",
n
io
"type": "employee", "_type": "company",
is
ec
"query": { "_id": "c1",
D
&
s
"match": { es
in
"_score": 1,
us
} }
p-
Se
}
9-
}
-0
}
d
]
ar
er
}
G
n
ie
st
ba
Se
n
io
is
"has_child": { "inner_hits": {
ec
D
"type": "employee", "employee": {
&
s
es
"query": { "hits": {
in
us
-B
"match": { "total": 1,
71
} "hits": [
Se
9-
-0
}, {
d
ar
"inner_hits" : {} "_source": {
er
G
} "first_name": "Tony",
n
ie
st
} "last_name": "Stark"
ba
Se
} }
n
io
is
ec
D
GET my_index/child/_search
&
s
es
{
in
us
"query": {
-B
71
"has_parent": {
20
p-
"parent_type": "TYPE",
Se
9-
"query": {}
-0
d
ar
}
the parent type
er
G
}
n
ie
st
}
ba
Se
GET companies/employee/_search
{
"query": {
"has_parent": { "_source": {
"parent_type": "company", "first_name": "Tony",
n
"last_name": "Potts"
io
"query": {
is
ec
}
D
"match": {
&
s
"name": "NBC" es
in
us
}
-B
7
}
1
20
p-
}
Se
9-
}
-0
d
}
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
{
s
es
"error": { {
in
us
{ "_type": "employee",
1
20
"routing_missing_exception", "_version": 2,
9-
-0
} "first_name": "Tony",
Se
], "last_name": "Stark"
"type": "routing_missing_exception", }
}
POST companies/employee/emp1/_update?parent=c1
n
{
io
is
ec
"doc" : {
D
&
"first_name" : "Anthony"
s
es
in
}
us
-B
}
7
Yes
Are docs
updated Yes
frequently?
n
io
is
ec
D
Consider
&
No
s
es
in
nested docs
us
-B
71
20
Do
p-
Se
your
9-
No Yes Yes
er
matching objects
n
ie
objects?
st
in these
ba
Se
arrays?
No
Copyright Elasticsearch BV 2015-2017 Copying, publishing and/or 534
distributing without written permission is strictly prohibited
Kibana Considerations
• Kibana currently does not support nested types or
parent/child relationships
‒ Important to consider this limitation if you are using your indices
for dashboards and visualizations in Kibana
• Kibana may implement these relationships in the future
‒ but for now, you will have to decide which is more important:
using nested or parent/child relationships, or being able to
n
io
visualize your data in Kibana
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
the root object AND all other of its nested objects
D
&
s
es
in
us
-B
n
io
is
ec
4. True or False: Child objects are routed to the same shard
D
&
s
es
as its _parent object.
in
us
-B
71
20
objects to be deleted.
d
ar
er
G
n
ie
st
Conclusions
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Se
ba
st
ie
n
G
er
ar
d
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Development vs.
Production Mode
HTTP vs. Transport
• There are two important network communication
mechanisms in Elasticsearch to understand:
‒ HTTP: address and port to bind to for HTTP communication,
which is how the Elasticsearch REST APIs are exposed
‒ transport: used for internal communication between nodes
within the cluster
• The defaults are fine for downloading and playing with
Elasticsearch, but not useful for production systems
n
io
is
ec
D
&
‒ bind to localhost by default es
s
in
us
-B
71
20
p-
9200-9299
er
G
HTTP
n
ie
st
ba
Se
n
io
is
development.” environment.”
ec
D
&
s
es
in
us
my_dev_cluster my_production_cluster
-B
71
20
p-
n
io
is
ec
warnings in the Elasticsearch log
D
&
s
es
in
us
-B
‒ https://www.elastic.co/guide/en/elasticsearch/reference/master/
ar
er
G
n
bootstrap-checks.html
ie
st
ba
Se
• Meetups: http://elasticsearch.meetup.com
• Docs: https://elastic.co/docs
n
io
is
ec
D
&
s
es
• Community: https://elastic.co/community
in
us
-B
71
20
p-
Se
9-
-0
d
ar
n
io
is
ec
D
&
s
es
in
us
-B
17
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
Quiz Answers
-0
9-
Se
p-
20
17
-B
us
in
es
s
&
D
ec
is
io
n
Chapter 1 Quiz Answers
1. True
2. As a noun: an index is what Elasticsearch creates for your
documents. As a verb: we “index” documents into an index
3. True
4. True - much more efficient than separate requests
5. Elasticsearch will define a new index for you
n
io
is
ec
D
6. No. If you want Elasticsearch to generate an ID, you have
&
s
es
in
to use POST
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
the _source is returned
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
search time
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
5. We are searching for “java” in both the body and subject.
ec
D
“most_fields” gives a higher score to documents with &
s
es
in
us
-B
better field
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
4. 12 = 4 primary shards + 8 replicas
D
&
s
es
in
us
-B
5. 2
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
6. “from” and “size”
&
s
es
in
us
-B
required or recommended
G
n
ie
st
ba
Se
n
6. cardinality
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
n
io
is
ec
D
5. False &
s
es
in
us
-B
71
6. True
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se
Version 5.5.1
© 2015-2017 Elasticsearch BV. All rights reserved. Decompiling, copying, publishing and/or distribution without written consent of Elasticsearch BV is
strictly prohibited.
n
io
is
ec
D
&
s
es
in
us
-B
71
20
p-
Se
9-
-0
d
ar
er
G
n
ie
st
ba
Se