Elasticsearch Optimization

Elasticsearch: Store, Search, and Analyze
By Ketan Bansal
What is Elasticsearch?
● Elasticsearch is the distributed search and analytics engine at

the heart of Elastic Stack.
● It provides near real-time search and analytics for all types of

data(structured, unstructured, numerical or geospatial data)
● It can efficiently stores and index data in a way that supports
fast searches
● You can even go far beyond from simple data retrieval and
aggregate information to discover trends and pattern in your
data
What is Elasticsearch?
● Elasticsearch offers speed and flexibility to handle data in a wide

variety of cases:
* Add a search box to an app or website
* Store and analyze logs, metrics, and security event data
* Use ML to automatically model the behaviour of the data in
real-time and etc.
A. Create and Delete an Index ( Elasticsearch using Python)
B. Insert and Get Query ( Elasticsearch using Python)
C. Search Query ( Elasticsearch using Python)
D. Mapping ( Elasticsearch using Python)
D.1. Mapping ( Elasticsearch using Python)
D.2. Custom-Mapping ( Elasticsearch using Python)
Kibana: Explore, Visualize, and Share
By Your Name
What is Kibana?
● Kibana enables you to interactively explore, visualize, and share insights

into your data and manage and monitor the Elastic Stack.
● With Kibana, We can:
* Search, Observe, and Protect the data - From discovering documents

to analyzing logs to finding security vulnerabilities
* Analyze your data - Search for hidden Insights, visualyze what we’ve
found in charts, maps, and more, and combine them in a Dashboard
* Manage, Monitor, and Secure the Elastic Stack - Manage your data,
monitor the health of ES and manage accesses to the features
Add Data
● The best way to add data to Elastic Stack is to use one of the integrations
from Kibana Dashboard such as:
1. Add Data with Elastic Solutions - Website Search crawler, Elastic APM,
Endpoint Security
2. Add Data with Programming Languages - Add any data in ES using any
programming language, such as JavaScript, JAVA, Python and Ruby
3. Add Sample Data - Sample data sets come with sample visualizations,
dashboards, and more you to explore data before you add your own data
4. Upload a file - If you have a CSV, TSV, or JSON file you can upload it
and optionally import it into Elasticsearch
Kibana Query Language (KQL)
● KQL is a simple syntax for filtering Elasticsearch data using free text
search or field-based search
● It is only used in filtering data, and has no role in sorting or aggregating

data
● It is able to query nested fields and scripting fields, and does not support
regular expressions or searching with fuzzy terms
Logstash: Collect, Enrich, and Transport
By Your Name
What is Logstash?
● Logstash is an open-source data collection engine with real-time pipeline

capabilities
*Logstash event processing pipeline had 3 stages-
Inputs→filters→outputs
*Inputs generates events, filters modify them, and outputs ship them
elsewhere
● It can dynamically unify data from disparate sources and normalize the
data into the destination of our choice
● Cleanse and Democratize all the data for diverse advanced downstream
analytics and visualization use cases
Natural Language Toolkit (NLTK)
By Your Name
What is NLTK?
● Natural Language Toolkit(NLTK) is a suite of open-source python

modules, data sets, and tutorials supporting research and development in
Natural Language Processing
● A variety of text processing tasks can be performed using NLTK such as

tokenizing, stemming, lemmatization, tagging Parts of Speech etc.
Tokenizing
● By tokenizing, you can easily split up text by word or by sentence
● Convert whole text into various pieces of smaller text that are still
relatively meaningful outside from the main text (converting unstructured
data into structured data)
* Tokenizing by Words : Tokenizing by word allows you to identify words

that come up more often
word_tokenize(your_text) is the class that is used to tokenize your text into

words
Tokenizing
* Tokenizing by Sentence: When we tokenize by sentence, we can analyze

how those words are related to one another and see more context
sent_tokenize(your_text) is the class that is used to tokenize your text into

sentences
NOTE: Before using these classes, you need to first import relevant part of
NLTK
Stemming
● Stemming is a text processing task in which you reduce words to their

roots, which is a core part of a word
● For Example, “helping” and “helper” share the same root i.e. “help”
● NLTK has more than one stemmer, but we’ll use Porter Stemmer
Stemming
Where “words” is a list of tokenized words

Tagging Parts of Speech
● Tagging Parts of Speech, or POS tagging, is the task of labelling the

words in our text according to the parts of speech
● NLTK uses the word determiner to refer to articles(like “a” or “the”)
● nltk.pos_tag() is the library used for tagging, giving the output as tuple
values
Lemmatizing: Like Stemming, Lemmatizing reduces words to their core
meaning, but it’ll give you a complete English word that makes sense of its
own instead of just a fragment of a word like “discoveri”
Elasticsearch practice :
https://github.com/S19CRXPP0098/Practice/blob/main/Elasticsearch_Pr
actice.ipynb
NLTK practice :
https://github.com/S19CRXPP0098/Practice/blob/main/NLTK_Practice.
ipynb
THANK YOU

Elasticsearch Optimization

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Elasticsearch Optimization

Uploaded by

Copyright:

Available Formats

Elasticsearch: Store, Search, and Analyze

● Elasticsearch is the distributed search and analytics engine at

● It provides near real-time search and analytics for all types of

● Elasticsearch offers speed and flexibility to handle data in a wide

● Kibana enables you to interactively explore, visualize, and share insights

● With Kibana, We can:

* Search, Observe, and Protect the data - From discovering documents

● It is only used in filtering data, and has no role in sorting or aggregating

● Logstash is an open-source data collection engine with real-time pipeline

● Natural Language Toolkit(NLTK) is a suite of open-source python

● A variety of text processing tasks can be performed using NLTK such as

● By tokenizing, you can easily split up text by word or by sentence

* Tokenizing by Words : Tokenizing by word allows you to identify words

word_tokenize(your_text) is the class that is used to tokenize your text into

* Tokenizing by Sentence: When we tokenize by sentence, we can analyze

sent_tokenize(your_text) is the class that is used to tokenize your text into

● Stemming is a text processing task in which you reduce words to their

Where “words” is a list of tokenized words

● Tagging Parts of Speech, or POS tagging, is the task of labelling the

● NLTK uses the word determiner to refer to articles(like “a” or “the”)

You might also like