Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

SQL Tuning Example 3

Case 1: Vitual server, 100 GB RAM, 16 Cores, 2 TB database. Application team reported an issue for
query time out. It was running for 30 mins and timed out. They said that it used to complete in 10 to 20
mins. Also it runs within 2 to 5 seconds in the SQL server directly. We checked the ServerB, query is in
running status with ASYNC_NETWORK_IO. It means execution completed, but client didnt retrieved the
result set. It is running from ServerA using lined server. Complex join. It joins multiple table with local
database and one table using linked server ServerB and another one with linked server ServerC.

Select *

from TableA

INNER JOIN TableB on ....

INNER JOIN ServerB..TreasuryTable on ...

INNER JOIN ServerC..InvestmentTable On....

So it joins using linked servers. It is with nestedloop. Loops through every row in outer table with every
row in inner table. Around 50k in ServerB and 150K in ServerC. Calculate the no of iterations. So it is
dead slow. Luckily, it is inside a stored procedure. Not ADHoc query from application. We modified the
procedure by first loading the data into temp table and join later. It completes within 10 seconds. It
started using hash join because it knows no if records in the temp table.

Select * into #TempServerB from ServerB..TreasureTable

Select * into #TempServerC from ServerC..InvestmentTable

Select *

from TableA

INNER JOIN TableB on ....

INNER JOIN #TempServerB on ...

INNER JOIN #TempServerC On....

Case 2: Vitual server, 128 GB RAM, 16 Cores, 10 TB database. Core SMS banking, delivers SMS to
customers. multiple transactions table like TransactionsA, TransactionsB, ... New records should be
processed, plus failed records should be reprocessed. 300 to 500 PODS in container make 10 to 15
concurrent connections to process the records. Monitored the SQL scripts for few days. Major bottleneck
occurs while reprocessing failed SMS.

insert into @table (smsid)


Select top(100) SMSID from transactionsA table where failed=1 and retry_in_progress=0

Update top (100) transactionsA set retry_in_progress=1

Above logic used for all the table, plus multiple conditions. They had 10 to 15 non clustered indexes.
New record insertion was taking at least 200 to 600 milliseconds. Plus retry count is always increasing.
Created filterIndex with few columns with date field less than 1 day. Remember SQL server filter indexes
are not time to live and expired date records are not removed from indexes. We have to rebuild it. I
convinced them that no point of sending old SMS after few days. Even customer will be panic to receive
old SMS.

Modified the query with single update statement. Also it was developed with multiple if statement in
single stored procedure to handle all the tables. For example it checks whether it is CreditCard or
DebitCard and goes to that section. It is difficult to track during issues time. It always shows single
procedure name. So We split it into multiple procedures. We created separate procedure for each
transactions tables. Removed first select statement by outputing updated records into temp table. SQL
force us to use recompile option for this statement.

Update top (100) transactionsA set retry_in_progress=1 ,...

Output Deleted.SMSID,

Deleted.allColumns...

into @TempTable

from TransactionsA with(index(MyNewIndex))

where conditions...

date > getdate()-1 option (recompile)

Ok. All our issues are solved. No blockings and everything works smoothly. It was temporary solution to
fix it immediately. All are huge tables, updating top 100 records takes little while. We are updating
retry_in_progress to 1, so that other concurrent process will not process same records. Written new
procedure for each transactions table. It act as queue management system with FETCH , OFFSET feature.
Added the logic for LIFO and FIFO. They can decide to choose it. Complete dynamic procedure and
handles concurrent execution. It knows whether records are already sent to application or not and sends
only pendin records. It handles the pagination within the procedure and send results to Application.
updates are not required. Carefully removed nearly 8 to 9 indexes by implementing queue management.
New records are also inserted within 30 to 45 millis.

Case 3: Virtual server, 56 GB RAM, 16 cores, 20 GB database. few tables, one with MD5 hashed value
and ID is primary key and multiple references by child tables. Critically application, So vertically scaled
with compute resources. But SQL server was using only 15 GB RAM and 15% of CPU. General
recommendation was asked to improve the performance. Already over provisioned and resources are
not used, Maintenance plans were there in place. Verified cached plans and didnt find anything. Simple
application retrieves hash value for user ID. App side traces were showing it takes at least 200 to 2000
milliseconds for queries. They didnt use any procedure. All are adhoc queries. Verified ADHoc plans but
coulnd find anything. Setup extended events to capture it for one day. Took the query and ran with
sample input parameters manually. It was very quick. But App side was taking little longer. Noticed N'
conversion. User table ID created with varchar, but inpt parameter was Nvarchar. SQL server converts
millions of UserID from varchar to Nvarchar and compare the matching value. It cant convert Nvarchar to
Varchar, because UNICODE to ASCII is not possible. I asked application team to change it, But it is
NODE.JS application and declares string with UNICODE data type by default. So I decided to change the
columns referred in the where condition from Varchar to NVarchar. Queries are running within 30
milliseconds. Nvarchar will take 2 bytes for single characters, but performance are crucial. So changed it.

Case 4: Virtual server, 32 GB RAM, 16 cores, 2 TB database. Kind of CIBIL score application database.
It updates Cibil score for all the customers every day. Process runs for 1 to 2 hours every day. Complex
logic with multiple statements and joins. They didnt report any issue. We were checking some status and
come to know about the batch and time taken. So started checking the store procedure. Couldnt scorll
left to right , top to bottom of the execution plan. Difficult to point the starting position for perfomance
tuning. It showed one update statement while monitoring the stored procedure execution. Started
testing the script by adding break point in the UAT server. Highlighed select statement was taking more
time in the execution. Ran it seperately to confirm that. They need min value of BatchID after grouping it
by UserID and BatchID is the min value for the Minimum date, not the Minimum BatchID in the group.

UPDATE S

S.SCORE = Temp.Score

from

MyCibilScore S

INNER JOIN Mysecondtable on conditions

INNER JOIN

SELECT DISTINCT

MCS.UID,

FIRST_VALUE (MCS.BatchID) OVER (PARTITION BY UID ORDER BY MCS.Date, MCS.BatchID) AS BatchID

FROM

dbo.MyCibilScore MCS
ORDER BY

MCS.UID, MCS.BatchID

)Temp on ... Condtions

Windowing functions are so complex. No way to tune it. We already had required index. Tried creating
column store index to check the performance. It reduced by 15% of the total execution time. However
rowstore index referenced by other queries as well. Here one thing I noticied is complex query with huge
amount of data with less system resources. Still it runs within 2 hours, SQL server is amazing. Rewrite the
query with 2 group by class. It was quick, completed within 15 minutes with row store index. Updated
application team about scanning entire table. Asked them to add some condition. But it is the business
logic, cant do anything.

UPDATE S

S.SCORE = Temp.Score

from

MyCibilScore S

INNER JOIN Mysecondtable on conditions

INNER JOIN

SELECT

MCS.UID,

MIN(MCS.BatchID) AS BatchID

FROM

dbo.MyCibilScore MCS

INNER JOIN (SELECT UID, MIN(Date) AS Date

FROM

dbo.MyCibilScore

GROUP BY

UID

)MCSMinDate
ON MCSMinDate.UID = MCS.UID

AND MCSMinDate.Date = MCS.Date

GROUP BY

MCS.UID

ORDER BY

MCS.UID, MCS.BatchID

)Temp on ... Condtions

So, Dont create index, maintenance of indexes are not easy. Try to handle with code. Go for
Index only when there is no other option.

You might also like