Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

Jay Liu’s Course

Upgrade Co-op Project

SQL Server Database Fundamentals


(Group 2)
Group Members:
Jacob Dong(L)
Yan Wang
Cindy Zhou
Xuefei Yang

Date:2020-05-06
1. Project Overview

 This project is designed to perform the upgrade of some major technologies


used to support the course for Jay Liu at Victoria Training Center, and this
document describes the detailed work items required for this technology
upgrade.

 This document works on the following technology enhancement or migration,


to modernize, optimize, and consolidate the existing version of the course, SQL
Server Database Fundamentals. With the project completion, the enhanced
features running on the upgraded system design will offer a better designed,
documented, improved, and standardized environment, which supports a new
look of the technologies the course runs on, and offer a nicely-looking
infrastructure of the new version of the course.
2.Enhancement Summary

The following list contains all required tasks,


portions, technologies, with part of scripts, screen
snaps to support the solutions.
2.1 Data Size

 After the upgrade and data stimulation, we managed to


offer 10,000,000 (10M) level of rows as the dataset to
show the 360-degree of data manipulation and
processing.

 1,000,000 records in [Orders] Table

 10,000,000 (10M) records in [Order Details] Table


2.1 Data Size

 In this task, we mainly used ‘select top XX Column from table order by
NEWID()’ to fetch data from current table randomly, such as
‘ProductID’, ’CustomerID’, ’EmployeeID’
 We used ‘ABS(CHECKSUM(NEWID() ’ to create random data, such as
‘OrderDate’
 In order to fetch random data from table ‘Products’, ‘Customers’,
‘Employees’, we used different temp tables to preserve the random data
and update into the desired table repeatedly.
 After the increase of [Order Details], we got some duplicate key rows.
We tried to find them ,deleted them and re-do the deleted amounts of
fetching data randomly to make the table has 10M rows
2.1 Data Size

 To complete the increase of [Orders] to 1M rows of data, we cost


54’’;
 While the increase of [Order Details] to 10M rows of data cost us
2’32’’
 Part of the scripts are demonstrated as below. Full Code of this task
is in ‘2.1-Data Increment.sql’
2.1 Data Size
2.1 Data Size
2.1 Data Size
2.2 Data Age

 The new version show the most recent 20 years of data, from 1999 to
2019.
 Before update the date age, we made a copy for updated and
increased tables - [Orders_1M],and [Order Details_10M]
 After checking the current data, the [OrderDate] is from 1928 to 1998
 To make the date of current rows to the most recent 20 years, we
made a plan to update the column [OrderDate] like as below
--(1928--1948)+71
--(1949--1969)+50
--(1970--1990)+29
--(1991--1999)+8
2.2 Data Age

 We also found that some of the ‘RequiredDate’ were earlier than


‘OrderDate’, and some of the ‘ShippedDate’ were earlier than
‘RequiredDate’ , which made no sense.
 We decided to modify these unnormal data . The simply solution is add
48 hours on the ‘OrderDate’ as the ‘RequiredDate’, and add 72 hours on
the ‘ RequiredDate’ as the ‘ShippedDate’.
 Part of the scripts are demonstrated as below. Full Code of this task is in
‘2.2-Data Age.sql’
2.2 Data Age
2.2 Data Age
2.2 Data Age
2.5 More Data Referencing

 As we finally created two increased table ‘Orders_1M’ and ‘Order


Details_10M’ , We need to build the Primary Key and some Foreign
Keys to reference other master data tables, such as ‘Products’,
’Customers’, ’Employees’ and ‘Shippers’ , to make the two new
tables have relationship with other tables in the database.
 After building these relationship, we can create a new database
diagram to show the whole picture of the relationship.
 When we tried to create the database diagram , we meet a error to
stop us create a new database diagram. after researching some
materials online, we found it was a authorization issue, and
managed to solved this problem
2.5 More Data Referencing

 The database diagram is showed as below


2.5 More Data Referencing

 Part of the scripts are demonstrated as below. Full Code of


this task is in ‘2.1-Data Increment.sql’
2.5 More Data Referencing
2.6 Add Year-Based Reports

 Add two new reports to list:


2.6.1 Over-year Total Orders with Comparison

2.6.2 Over-year Total Order Amount


2.6 Add Year-Based Reports

2.6.1 Over-year Total Orders with Comparison


 The report would be built based on the enlarged table [Orders_1M]
 Looking at the sample report, the Window Function-LAG() and the
Common Table Expression(CTE) can simplify this task
  the syntax of the LAG() function:
LAG(return_ value ,offset [,default])
OVER ( [PARTITION BY partition expression, ... ]
ORDER BY sort expression [ASC | DESC], ...
)
2.6 Add Year-Based Reports
 Part of the scripts are demonstrated as below.
2.6 Add Year-Based Reports

2.6.2 Over-year Total Order Amount


 The report would be built based on the enlarged table [Orders_1M]
joined with [Order Details_10M]
 Looking at the sample report, the Window Function-LAG() and the
Common Table Expression(CTE) can simplify this task
  the syntax of the LAG() function:
LAG(return_ value ,offset [,default])
OVER ( [PARTITION BY partition expression, ... ]
ORDER BY sort expression [ASC | DESC], ...
)
2.6 Add Year-Based Reports

 Part of the scripts are demonstrated as below. Full Code of


this task is in ‘2.6-Add Year-Based Reports.sql’
2.7 Ad-Hoc Queries

 There are 3 Ad-Hoc Queries in SQL to directly read data from different
type of data source such as SQL Server, EXCEL Worksheet, ACCESS
Database.
 These are OpenQuery, OpenRowSet and OpenDataSource
 The syntax are:
OPENQUERY (Linked_Server, ‘query’)
OPENROWSET(‘Provider_name’,’Provider_string’,‘Query or TableName’)
OPENDATASOURCE(‘Provider_name’,’init_string’)
2.7 Ad-Hoc Queries

Some notes about this task:


 We should build a ‘Linked Server’ first before we use OpenQuery
 We must Turn on the Ad-Hoc Distributed Queries before we use
OpenRowSet or OpenDataSource, and after using them ,don’t forget to
turn off it.
 When referencing a table in the Ad-Hoc Queries , ‘dbo’ schema can’t be
omitted
 OPENROWSET and OPENDATASOURCE should be used only to reference
OLE DB data sources that are accessed infrequently.
 For any data sources that will be accessed more than several times, define
a linked server.
2.7 Ad-Hoc Queries

Turn on the Ad-Hoc Distributed Queries

Turn off the Ad-Hoc Distributed Queries


2.7 Ad-Hoc Queries

 Use OpenQuery talking with SQL Server


2.7 Ad-Hoc Queries

 Use OpenQuery talking with EXCEL WorkSheet


2.7 Ad-Hoc Queries

 Use OpenQuery talking with ACCESS Database


2.7 Ad-Hoc Queries

 Use OpenRowSet talking with SQL Server


2.7 Ad-Hoc Queries

 Use OpenRowSet talking with EXCEL WorkSheet


2.7 Ad-Hoc Queries

 Use OpenRowSet talking with ACCESS Database


2.7 Ad-Hoc Queries

 Use OpenDataSource talking with SQL Server


2.7 Ad-Hoc Queries

 Use OpenDataSource talking with Excel WorkSheet


2.7 Ad-Hoc Queries

 Use OpenDataSource talking with ACCESS Database


2.8 Managing Ad-Hoc Queries

 Create a user stored procedure with parameters to talk with different


data sources(EXCEL,ACCESS,SQL SERVER)
 With the parameters, we send the type of connection ,the type of the
source, the file name with the path, the query statement or the table
we want to talk and the username and the password(if needed)
 Part of the scripts are demonstrated as below. Full Code of this task is in ‘2.8-
Managing Ad-Hoc Queries.sql’
2.8 Managing Ad-Hoc Queries
2.8 Managing Ad-Hoc Queries
2.8 Managing Ad-Hoc Queries
2.8 Managing Ad-Hoc Queries
2.9 Key Performance Indicator
 In this task , we create a stored procedure with 2 parameters first to list any 2
years sold orders and the percent of increase of every employee,the stored
procedure is ‘usp_KPI’
 And then we create another stored procedure with 3 parameters to statistic
how many employees got indicated percentage of increase on any of the 2
years. In this stored procedure ,we say ‘usp_KPI_stat’ , we called the SP we
create previously (‘usp_KPI’)
 At last the SP- ‘usp_KPI_stat’ list 3 numbers, show the amounts of the
employees got the different percent of increase depending on the percent
passed from the parameter. i.e. we give 15%, the result shows the amounts of
the employees got the 15% or more increase, between -15% and 15% and the
last one is less than -15%
2.9 Key Performance Indicator

 Part of the scripts are demonstrated as below. Full Code of


this task is in ‘2.9-Key Performance Indicator.sql’
2.9 Key Performance Indicator
2.9 Key Performance Indicator
2.9 Key Performance Indicator
2.10 Parameterized Query

 In this task , we need practice calling sp_executesql


 Considering we used a lot of exec (@string variable) , we decided to use sp_executesql to replace
all the exec (@string variable) in the last task.
 The syntax is :
 sp_executesql [ @stmt = ] statement [ { , [ @params = ] N'@parameter_name data_type [ OUT |
OUTPUT ][ ,...n ]' } { , [ @param1 = ] 'value1' [ ,...n ] } ]
 There are some tricky when we practice the sp_executesql:
 It is a good alternative to build a dynamical query. This system SP saves the need to have to deal
with the extra quotes to get the query to build correctly.
 In the ‘N'@parameter_name data_type ‘, the ‘N’’ is mandatory, or we’ll get an error
 If we need to reference a column in the statement , the [] is mandatory ,or we’ll get an error
 When we define a string variable as a statement to use in the first parameter, the data type must
be ‘nvarchar’ , ‘varchar’ could lead to an error.
2.10 Parameterized Query

 Part of the scripts are demonstrated as below. Full Code of this task is in
‘2.10-Parameterized Query.sql’
2.10 Parameterized Query
2.10 Parameterized Query
Thank you!

You might also like