Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

You have 2 free member-only stories left this month.


Sign up and get an extra one for free.

SQL GUIDE

How to Optimize SQL Queries Part II


This article sorts out some advance techniques for optimizing SQL Queries

Pawan Jain
Jul 11 · 8 min read

SQL is a declarative language — each query declares what we want the SQL engine to
do, but it doesn’t say how.

As it turns out, the how — the “plan” — is what affects the efficiency of the queries,
however, so it’s pretty important.

On a funnier side, one of my colleagues said this to me when we are discussing


different ways to optimize queries 😀

“ The most efficient way to optimize a SQL query is


to eliminate it “
Anyway, In this article, I include some more suggestions on optimizing SQL statements.
I hope it will be helpful to everyone.

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 1/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

Photo by Anthony Shkraba from Pexels

I highly recommend going through my previous article in which I shared another 16


tricks to optimize SQL queries. You will link for that at the end of this story

01 Get advice from PROCEDURE ANALYSE()


PROCEDURE ANALYSE() will let MySQL help you analyze your fields and their actual data,
and will give you some useful suggestions

For example, if you create an INT field as your primary key, but there is not much
data, then PROCEDURE ANALYSE () will suggest that you change the type of this field
to MEDIUMINT

SELECT … FROM … WHERE … PROCEDURE ANALYSE([max_elements,


[max_memory]])

// Example
SELECT col1, col2 FROM table1 PROCEDURE ANALYSE(10, 2000);

max_elements (default 256) is the maximum number of distinct values that


ANALYSE() notices per column

max_memory (default 8192) is the maximum amount of memory that ANALYSE()

should allocate per column

02. Always set an ID for each table

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 2/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

We should set an ID as the primary key for each table in the database, and the best is
an INT type with the AUTO_INCREMENT flag.

Counter-example

CREATE TABLE subs (


email varchar(20) NOT NULL,
name varchar(20)
);

Positive-example

CREATE TABLE subs (


id int(5) NOT NULL AUTO_INCREMENT,
email varchar(20) NOT NULL,
name varchar(20)
);

Reason:

The use of VARCHAR type as the primary key will reduce performance. Besides, in
your code, you should use the table ID to construct your data structure.

Moreover, under the MySQL data engine, there are still some operations that need
to use the primary key. In these cases, the performance and setting of the primary

key become very important, such as clustering, partitioning

03. Use ENUM instead of VARCHAR


If you have a field, such as “gender”, “country”, “ethnicity”, “status” or “department”,
and you know that the values of these fields are limited and fixed, then you should use
ENUM instead of VARCHAR .

Counter-example

CREATE TABLE Persons (


PersonID int,
Status varchar(25)
);

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 3/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

Positive example

CREATE TABLE Persons (


PersonID int,
Status enum('Married', 'Single') NOT NULL
);

Reason:

The ENUM type is very fast and compact. It saves as TINYINT , but its appearance is

displayed as a string.

In this way, using this field to do some option lists becomes quite perfect.

04. Optimize your query by caching


Most MySQL servers have query caching enabled. This is one of the most effective ways
to improve performance, and this is handled by the MySQL database engine.

When many identical queries are executed multiple times, the results of these queries
will be placed in a cache, so that subsequent identical queries will directly access the
cached results without operating the table.

You’ll enable query cache by editing the MySQL configuration file.

Use nano to edit the file:

sudo nano /etc/mysql/my.cnf

Add the following information to the end of your file:

/etc/mysql/my.cnf

...
[mysqld]
query_cache_type=1
query_cache_size = 10M
query_cache_limit=256K

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 4/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

Note: Steps performed are tested on Ubuntu 18.04

05. Usually, the number of indexes should be less than 5.


The reasons are as follows.

The fewer indexes you have, the better, while indexes improve the efficiency of
queries, they also reduce the efficiency of inserts and updates.

When inserting or updating, the index may be rebuilt, so indexing needs to be


carefully considered, depending on the case.

Ideally, a table should have no more than 5 indexes, but if there are too many,
consider removing some unnecessary indexes.

06. Make use of Prepared Statements


Prepared Statements are much like stored procedures. They are a collection of SQL
statements running in the background. We can get many benefits from using prepared
statements, whether it is performance issues or security issues.

To use MySQL prepared statement, you use three following statements:

* PREPARE – prepare a statement for execution.

* EXECUTE – execute a prepared statement prepared by the PREPARE


statement.

* DEALLOCATE PREPARE – release a prepared statement.

Since the prepared statement uses placeholders ( ? ), this helps avoid many variants of
SQL injection hence make your application more secure.

In terms of performance, when the same query is used multiple times, this will give
you a considerable performance advantage. You can define some parameters for these
Prepared Statements, and MySQL will only parse once.

How to Use (A minimal example)

1. Prepare
PREPARE item1 FROM
'SELECT itemcode, itemname
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 5/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

FROM items
WHERE itemcode = ?';

// ic stands for itemcode


SET @ic = 'i012';

2. Execute
EXECUTE item1 USING @pc;

3. Deallocate
DEALLOCATE PREPARE item1;

07. Use the alias of the table and prefix the alias on each column, so that the
semantics are more clear
Counter-example:

select * from A
inner join B
on A.deptId = B.deptId;

Positive example:

select memeber.name, deptment.deptName from A member


inner join B deptment
on member.deptId = deptment.deptId;

Reasons:

It gives more clarity about the columns on which the operation is performed and hence
helpful while debugging

08. If the field type is a string, it must be enclosed in quotation marks


Counter-example:

select * from user where userid =123;

Positive example:

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 6/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

select * from user where userid = ‘123’ ;

Reason:

When the single quotes are not added, it is a comparison between a string and a
number, and their types do not match.

MySQL will do implicit type conversion, convert them to floating-point numbers,


and then compare. It will increase extra overhead on computations

09. When using a joint index, pay attention to the order of the index columns,
generally following the left-most matching principle

CREATE TABLE ` user` (


`id` int (11) NOT NULL AUTO_INCREMENT,
`userId` int (11) NOT NULL ,
`age` int (11) DEFAULT NULL ,
` Name ` VARCHAR (255) NOT NULL ,
PRIMARY KEY (`id`),
KEY `idx_userid_age` (`userId`,`age`) USING BTREE
)

Counter-example:

select * from user where age = 10;

Positive example:

//Complies with the left-most matching principle


select * from user where userid=10 and age =10;

//Complies with the left-most matching principle


select * from user where userid =10;

The reasons are as follows:

When we create a joint index, such as (k1, k2, k3), it is equivalent to creating (k1),
(k1, k2), and (k1, k2, k3) three indexes, which is the leftmost matching principle.

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 7/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

The joint index does not satisfy the leftmost principle, and it will generally
invalidate the index

10. Inner join is preferred still if the left join is used, the result of the left table
is as small as possible
Inner join : Inner join, when two tables are joined to query, only keep the exact set

of results in the two tables.

Left join : When performing a join query on two tables, it will return all the rows

of the left table, even if there are no matching records in the right table.

Right join : When joining queries in two tables, it will return all the rows of the

right table, even if there is no matching record in the left table.

Under the premise of satisfying the SQL requirements, it is recommended to use Inner

join first. If you want to use left join , the data result of the left table is as small as

possible.

Counter-example:

select * from
table1 t1 left join table2 t2
on t1.size = t2.size
where t1.id>2;

Positive example:

select * from
(select * from table1 where id >2)
t1 left join table2 t2
on t1.size = t2.size;

Reasons:

In the inner join the number of rows returned is relatively small, so the
performance will be relatively better.

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 8/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

Similarly, if the left join is used, the data result of the left table is as small as
possible, and the conditions are placed on the left table as much as possible, which
means that the number of rows returned may be relatively small.

11. Temporary Table Optimization


When creating a temporary table, if you insert a large amount of data at one time, you
can use select into instead of create table to avoid a large number of logs to
improve the speed

If the amount of data is not large, to ease the resources of the system table, you should
first create table , then insert .

12. Save the IP address as UNSIGNED INT


Many programmers will create a VARCHAR(15) field to store the IP in the form of a
string instead of an integer IP. If you use an integer to store, only 4 bytes are needed,
and you can have fixed-length fields.

Counter-example

CREATE TABLE classes (


id INT AUTO_INCREMENT,
ipadd VARCHAR(15) NOT NULL
);

Positive-example

CREATE TABLE classes (


id INT AUTO_INCREMENT,
ipadd INT(4) UNSIGNED NOT NULL
);

You can also use BINARY(4) in place of UNSIGNED INT

Reason:

We must use UNSIGNED INT , because the IP address will use the entire 32-bit

unsigned integer.

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 9/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

Moreover, this will bring you advantages in the query, especially when you need to
use such WHERE conditions: IP between ip1 and ip2 .

For your query, you can use INET_ATON() to convert a string IP to an integer, and
INET_NTOA() to convert an integer to a string IP

. . .

Conclusion
Your priority should be in order: inner join, left join

You can highly optimize your SQL queries by prepared statements and caching

Always set an ID in the table and use the alias of the table so that the semantics are
more clear

Use enum in place of varchar if your column comprises of option list since it e num
use tinyint which improves efficiency

Get advice from procedure analyse() about using the correct datatype.

You string must be enclosed in quotation marks to avoid extra overheads

Make use of unsigned int or binary rather than varchar for storing an IP address

Follow the left-most matching principle

. . .

Yeah, we made it to the end. Hope you enjoy it and have an idea about the optimizing
and speeding up your SQL queries and learn some new tricks 😊

Here is the link for my previous story about optimizing SQL Queries

How to Optimize SQL Queries


This article sorts out some special techniques for optimizing SQL Queries
towardsdatascience.com

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 10/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science

Update: I uploaded all these optimization tricks in this GitHub Readme along with other
cool stuff in repo

Thank you for reading. Don’t hesitate to stay tuned for more!

I send out a monthly newsletter if you would like to join please sign up via this
link. Love to see you 😊

Sign up for The Daily Pick


By Towards Data Science
Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday
to Thursday. Make learning your daily ritual. Learn more

Create a free Medium account to get The Daily Pick in


Get this newsletter your inbox.

Programming Database Sql Data Science Technology

About Help Legal

Get the Medium app

https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 11/11

You might also like