How To Optimize SQL Queries Part II

7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
You have 2 free member-only stories left this month.

Sign up and get an extra one for free.
SQL GUIDE
How to Optimize SQL Queries Part II

This article sorts out some advance techniques for optimizing SQL Queries
Pawan Jain
Jul 11 · 8 min read
SQL is a declarative language — each query declares what we want the SQL engine to
do, but it doesn’t say how.
As it turns out, the how — the “plan” — is what affects the efficiency of the queries,
however, so it’s pretty important.
On a funnier side, one of my colleagues said this to me when we are discussing

different ways to optimize queries 😀
“ The most efficient way to optimize a SQL query is

to eliminate it “
Anyway, In this article, I include some more suggestions on optimizing SQL statements.
I hope it will be helpful to everyone.
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 1/11
Photo by Anthony Shkraba from Pexels
I highly recommend going through my previous article in which I shared another 16

tricks to optimize SQL queries. You will link for that at the end of this story
01 Get advice from PROCEDURE ANALYSE()

PROCEDURE ANALYSE() will let MySQL help you analyze your fields and their actual data,
and will give you some useful suggestions
For example, if you create an INT field as your primary key, but there is not much
data, then PROCEDURE ANALYSE () will suggest that you change the type of this field
to MEDIUMINT
SELECT … FROM … WHERE … PROCEDURE ANALYSE([max_elements,

[max_memory]])
// Example
SELECT col1, col2 FROM table1 PROCEDURE ANALYSE(10, 2000);
max_elements (default 256) is the maximum number of distinct values that

ANALYSE() notices per column
max_memory (default 8192) is the maximum amount of memory that ANALYSE()
should allocate per column
02. Always set an ID for each table
We should set an ID as the primary key for each table in the database, and the best is
an INT type with the AUTO_INCREMENT flag.
Counter-example
CREATE TABLE subs (

email varchar(20) NOT NULL,
name varchar(20)
);
Positive-example
CREATE TABLE subs (

id int(5) NOT NULL AUTO_INCREMENT,
email varchar(20) NOT NULL,
name varchar(20)
);
Reason:
The use of VARCHAR type as the primary key will reduce performance. Besides, in
your code, you should use the table ID to construct your data structure.
Moreover, under the MySQL data engine, there are still some operations that need
to use the primary key. In these cases, the performance and setting of the primary
key become very important, such as clustering, partitioning
03. Use ENUM instead of VARCHAR

If you have a field, such as “gender”, “country”, “ethnicity”, “status” or “department”,
and you know that the values of these fields are limited and fixed, then you should use
ENUM instead of VARCHAR .
Counter-example
CREATE TABLE Persons (

PersonID int,
Status varchar(25)
);
Positive example
CREATE TABLE Persons (

PersonID int,
Status enum('Married', 'Single') NOT NULL
);
Reason:
The ENUM type is very fast and compact. It saves as TINYINT , but its appearance is
displayed as a string.
In this way, using this field to do some option lists becomes quite perfect.
04. Optimize your query by caching

Most MySQL servers have query caching enabled. This is one of the most effective ways
to improve performance, and this is handled by the MySQL database engine.
When many identical queries are executed multiple times, the results of these queries
will be placed in a cache, so that subsequent identical queries will directly access the
cached results without operating the table.
You’ll enable query cache by editing the MySQL configuration file.
Use nano to edit the file:
sudo nano /etc/mysql/my.cnf
Add the following information to the end of your file:
/etc/mysql/my.cnf
...
[mysqld]
query_cache_type=1
query_cache_size = 10M
query_cache_limit=256K
Note: Steps performed are tested on Ubuntu 18.04
05. Usually, the number of indexes should be less than 5.

The reasons are as follows.
The fewer indexes you have, the better, while indexes improve the efficiency of
queries, they also reduce the efficiency of inserts and updates.
When inserting or updating, the index may be rebuilt, so indexing needs to be

carefully considered, depending on the case.
Ideally, a table should have no more than 5 indexes, but if there are too many,
consider removing some unnecessary indexes.
06. Make use of Prepared Statements

Prepared Statements are much like stored procedures. They are a collection of SQL
statements running in the background. We can get many benefits from using prepared
statements, whether it is performance issues or security issues.
To use MySQL prepared statement, you use three following statements:
* PREPARE – prepare a statement for execution.
* EXECUTE – execute a prepared statement prepared by the PREPARE

statement.
* DEALLOCATE PREPARE – release a prepared statement.
Since the prepared statement uses placeholders ( ? ), this helps avoid many variants of
SQL injection hence make your application more secure.
In terms of performance, when the same query is used multiple times, this will give
you a considerable performance advantage. You can define some parameters for these
Prepared Statements, and MySQL will only parse once.
How to Use (A minimal example)
1. Prepare
PREPARE item1 FROM
'SELECT itemcode, itemname
FROM items
WHERE itemcode = ?';
// ic stands for itemcode

SET @ic = 'i012';
2. Execute
EXECUTE item1 USING @pc;
3. Deallocate
DEALLOCATE PREPARE item1;
07. Use the alias of the table and prefix the alias on each column, so that the
semantics are more clear
Counter-example:
select * from A
inner join B
on A.deptId = B.deptId;
Positive example:
select memeber.name, deptment.deptName from A member

inner join B deptment
on member.deptId = deptment.deptId;
Reasons:
It gives more clarity about the columns on which the operation is performed and hence
helpful while debugging
08. If the field type is a string, it must be enclosed in quotation marks

Counter-example:
select * from user where userid =123;
Positive example:
select * from user where userid = ‘123’ ;
Reason:
When the single quotes are not added, it is a comparison between a string and a
number, and their types do not match.
MySQL will do implicit type conversion, convert them to floating-point numbers,

and then compare. It will increase extra overhead on computations
09. When using a joint index, pay attention to the order of the index columns,
generally following the left-most matching principle
CREATE TABLE ` user` (

ìd` int (11) NOT NULL AUTO_INCREMENT,
ùserId` int (11) NOT NULL ,
àge` int (11) DEFAULT NULL ,
` Name ` VARCHAR (255) NOT NULL ,
PRIMARY KEY (ìd`),
KEY ìdx_userid_age` (ùserId`,àge`) USING BTREE
)
Counter-example:
select * from user where age = 10;
Positive example:
//Complies with the left-most matching principle

select * from user where userid=10 and age =10;
//Complies with the left-most matching principle

select * from user where userid =10;
The reasons are as follows:
When we create a joint index, such as (k1, k2, k3), it is equivalent to creating (k1),
(k1, k2), and (k1, k2, k3) three indexes, which is the leftmost matching principle.
The joint index does not satisfy the leftmost principle, and it will generally
invalidate the index
10. Inner join is preferred still if the left join is used, the result of the left table
is as small as possible
Inner join : Inner join, when two tables are joined to query, only keep the exact set
of results in the two tables.
Left join : When performing a join query on two tables, it will return all the rows
of the left table, even if there are no matching records in the right table.
Right join : When joining queries in two tables, it will return all the rows of the
right table, even if there is no matching record in the left table.
Under the premise of satisfying the SQL requirements, it is recommended to use Inner
join first. If you want to use left join , the data result of the left table is as small as
possible.
Counter-example:
select * from
table1 t1 left join table2 t2
on t1.size = t2.size
where t1.id>2;
Positive example:
select * from
(select * from table1 where id >2)
t1 left join table2 t2
on t1.size = t2.size;
Reasons:
In the inner join the number of rows returned is relatively small, so the
performance will be relatively better.
Similarly, if the left join is used, the data result of the left table is as small as
possible, and the conditions are placed on the left table as much as possible, which
means that the number of rows returned may be relatively small.
11. Temporary Table Optimization

When creating a temporary table, if you insert a large amount of data at one time, you
can use select into instead of create table to avoid a large number of logs to
improve the speed
If the amount of data is not large, to ease the resources of the system table, you should
first create table , then insert .
12. Save the IP address as UNSIGNED INT

Many programmers will create a VARCHAR(15) field to store the IP in the form of a
string instead of an integer IP. If you use an integer to store, only 4 bytes are needed,
and you can have fixed-length fields.
Counter-example
CREATE TABLE classes (

id INT AUTO_INCREMENT,
ipadd VARCHAR(15) NOT NULL
);
Positive-example
CREATE TABLE classes (

id INT AUTO_INCREMENT,
ipadd INT(4) UNSIGNED NOT NULL
);
You can also use BINARY(4) in place of UNSIGNED INT
Reason:
We must use UNSIGNED INT , because the IP address will use the entire 32-bit
unsigned integer.
Moreover, this will bring you advantages in the query, especially when you need to
use such WHERE conditions: IP between ip1 and ip2 .
For your query, you can use INET_ATON() to convert a string IP to an integer, and
INET_NTOA() to convert an integer to a string IP
. . .
Conclusion
Your priority should be in order: inner join, left join
You can highly optimize your SQL queries by prepared statements and caching
Always set an ID in the table and use the alias of the table so that the semantics are
more clear
Use enum in place of varchar if your column comprises of option list since it e num
use tinyint which improves efficiency
Get advice from procedure analyse() about using the correct datatype.
You string must be enclosed in quotation marks to avoid extra overheads
Make use of unsigned int or binary rather than varchar for storing an IP address
Follow the left-most matching principle
. . .
Yeah, we made it to the end. Hope you enjoy it and have an idea about the optimizing
and speeding up your SQL queries and learn some new tricks 😊
Here is the link for my previous story about optimizing SQL Queries
How to Optimize SQL Queries

This article sorts out some special techniques for optimizing SQL Queries
towardsdatascience.com
Update: I uploaded all these optimization tricks in this GitHub Readme along with other
cool stuff in repo
Thank you for reading. Don’t hesitate to stay tuned for more!
I send out a monthly newsletter if you would like to join please sign up via this
link. Love to see you 😊
Sign up for The Daily Pick

By Towards Data Science
Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday
to Thursday. Make learning your daily ritual. Learn more
Create a free Medium account to get The Daily Pick in

Get this newsletter your inbox.
Programming Database Sql Data Science Technology
About Help Legal
Get the Medium app

How To Optimize SQL Queries Part II - by Pawan Jain - Jul, 2020 - Towards Data Science

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Optimize SQL Queries Part II - by Pawan Jain - Jul, 2020 - Towards Data Science

Uploaded by

Copyright:

Available Formats

7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

You have 2 free member-only stories left this month.

On a funnier side, one of my colleagues said this to me when we are discussing

“ The most efficient way to optimize a SQL query is

Photo by Anthony Shkraba from Pexels

I highly recommend going through my previous article in which I shared another 16

01 Get advice from PROCEDURE ANALYSE()

SELECT … FROM … WHERE … PROCEDURE ANALYSE([max_elements,

max_elements (default 256) is the maximum number of distinct values that

max_memory (default 8192) is the maximum amount of memory that ANALYSE()

should allocate per column

02. Always set an ID for each table

CREATE TABLE subs (

CREATE TABLE subs (

key become very important, such as clustering, partitioning

03. Use ENUM instead of VARCHAR

CREATE TABLE Persons (

CREATE TABLE Persons (

04. Optimize your query by caching

You’ll enable query cache by editing the MySQL configuration file.

Use nano to edit the file:

sudo nano /etc/mysql/my.cnf

Add the following information to the end of your file:

Note: Steps performed are tested on Ubuntu 18.04

05. Usually, the number of indexes should be less than 5.

When inserting or updating, the index may be rebuilt, so indexing needs to be

06. Make use of Prepared Statements

To use MySQL prepared statement, you use three following statements:

* PREPARE – prepare a statement for execution.

* EXECUTE – execute a prepared statement prepared by the PREPARE

* DEALLOCATE PREPARE – release a prepared statement.

How to Use (A minimal example)

// ic stands for itemcode

select memeber.name, deptment.deptName from A member

08. If the field type is a string, it must be enclosed in quotation marks

select * from user where userid =123;

select * from user where userid = ‘123’ ;

MySQL will do implicit type conversion, convert them to floating-point numbers,

CREATE TABLE ` user` (

select * from user where age = 10;

//Complies with the left-most matching principle

//Complies with the left-most matching principle

The reasons are as follows:

of results in the two tables.

right table, even if there is no matching record in the left table.

11. Temporary Table Optimization

12. Save the IP address as UNSIGNED INT

CREATE TABLE classes (

CREATE TABLE classes (

You can also use BINARY(4) in place of UNSIGNED INT

You string must be enclosed in quotation marks to avoid extra overheads

Follow the left-most matching principle

How to Optimize SQL Queries

Sign up for The Daily Pick

Create a free Medium account to get The Daily Pick in

Programming Database Sql Data Science Technology

About Help Legal

Get the Medium app

You might also like