Professional Documents
Culture Documents
How To Optimize SQL Queries Part II - by Pawan Jain - Jul, 2020 - Towards Data Science
How To Optimize SQL Queries Part II - by Pawan Jain - Jul, 2020 - Towards Data Science
SQL GUIDE
Pawan Jain
Jul 11 · 8 min read
SQL is a declarative language — each query declares what we want the SQL engine to
do, but it doesn’t say how.
As it turns out, the how — the “plan” — is what affects the efficiency of the queries,
however, so it’s pretty important.
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 1/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
For example, if you create an INT field as your primary key, but there is not much
data, then PROCEDURE ANALYSE () will suggest that you change the type of this field
to MEDIUMINT
// Example
SELECT col1, col2 FROM table1 PROCEDURE ANALYSE(10, 2000);
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 2/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
We should set an ID as the primary key for each table in the database, and the best is
an INT type with the AUTO_INCREMENT flag.
Counter-example
Positive-example
Reason:
The use of VARCHAR type as the primary key will reduce performance. Besides, in
your code, you should use the table ID to construct your data structure.
Moreover, under the MySQL data engine, there are still some operations that need
to use the primary key. In these cases, the performance and setting of the primary
Counter-example
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 3/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
Positive example
Reason:
The ENUM type is very fast and compact. It saves as TINYINT , but its appearance is
displayed as a string.
In this way, using this field to do some option lists becomes quite perfect.
When many identical queries are executed multiple times, the results of these queries
will be placed in a cache, so that subsequent identical queries will directly access the
cached results without operating the table.
/etc/mysql/my.cnf
...
[mysqld]
query_cache_type=1
query_cache_size = 10M
query_cache_limit=256K
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 4/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
The fewer indexes you have, the better, while indexes improve the efficiency of
queries, they also reduce the efficiency of inserts and updates.
Ideally, a table should have no more than 5 indexes, but if there are too many,
consider removing some unnecessary indexes.
Since the prepared statement uses placeholders ( ? ), this helps avoid many variants of
SQL injection hence make your application more secure.
In terms of performance, when the same query is used multiple times, this will give
you a considerable performance advantage. You can define some parameters for these
Prepared Statements, and MySQL will only parse once.
1. Prepare
PREPARE item1 FROM
'SELECT itemcode, itemname
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 5/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
FROM items
WHERE itemcode = ?';
2. Execute
EXECUTE item1 USING @pc;
3. Deallocate
DEALLOCATE PREPARE item1;
07. Use the alias of the table and prefix the alias on each column, so that the
semantics are more clear
Counter-example:
select * from A
inner join B
on A.deptId = B.deptId;
Positive example:
Reasons:
It gives more clarity about the columns on which the operation is performed and hence
helpful while debugging
Positive example:
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 6/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
Reason:
When the single quotes are not added, it is a comparison between a string and a
number, and their types do not match.
09. When using a joint index, pay attention to the order of the index columns,
generally following the left-most matching principle
Counter-example:
Positive example:
When we create a joint index, such as (k1, k2, k3), it is equivalent to creating (k1),
(k1, k2), and (k1, k2, k3) three indexes, which is the leftmost matching principle.
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 7/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
The joint index does not satisfy the leftmost principle, and it will generally
invalidate the index
10. Inner join is preferred still if the left join is used, the result of the left table
is as small as possible
Inner join : Inner join, when two tables are joined to query, only keep the exact set
Left join : When performing a join query on two tables, it will return all the rows
of the left table, even if there are no matching records in the right table.
Right join : When joining queries in two tables, it will return all the rows of the
Under the premise of satisfying the SQL requirements, it is recommended to use Inner
join first. If you want to use left join , the data result of the left table is as small as
possible.
Counter-example:
select * from
table1 t1 left join table2 t2
on t1.size = t2.size
where t1.id>2;
Positive example:
select * from
(select * from table1 where id >2)
t1 left join table2 t2
on t1.size = t2.size;
Reasons:
In the inner join the number of rows returned is relatively small, so the
performance will be relatively better.
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 8/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
Similarly, if the left join is used, the data result of the left table is as small as
possible, and the conditions are placed on the left table as much as possible, which
means that the number of rows returned may be relatively small.
If the amount of data is not large, to ease the resources of the system table, you should
first create table , then insert .
Counter-example
Positive-example
Reason:
We must use UNSIGNED INT , because the IP address will use the entire 32-bit
unsigned integer.
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 9/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
Moreover, this will bring you advantages in the query, especially when you need to
use such WHERE conditions: IP between ip1 and ip2 .
For your query, you can use INET_ATON() to convert a string IP to an integer, and
INET_NTOA() to convert an integer to a string IP
. . .
Conclusion
Your priority should be in order: inner join, left join
You can highly optimize your SQL queries by prepared statements and caching
Always set an ID in the table and use the alias of the table so that the semantics are
more clear
Use enum in place of varchar if your column comprises of option list since it e num
use tinyint which improves efficiency
Get advice from procedure analyse() about using the correct datatype.
Make use of unsigned int or binary rather than varchar for storing an IP address
. . .
Yeah, we made it to the end. Hope you enjoy it and have an idea about the optimizing
and speeding up your SQL queries and learn some new tricks 😊
Here is the link for my previous story about optimizing SQL Queries
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 10/11
7/13/2020 How to Optimize SQL Queries Part II | by Pawan Jain | Jul, 2020 | Towards Data Science
Update: I uploaded all these optimization tricks in this GitHub Readme along with other
cool stuff in repo
Thank you for reading. Don’t hesitate to stay tuned for more!
I send out a monthly newsletter if you would like to join please sign up via this
link. Love to see you 😊
https://towardsdatascience.com/how-to-optimize-sql-queries-part-ii-407311784112 11/11