ETL Day3 Assignment(1)

ETL – DAY 3 ASSIGNMENT
Team No:9
Team members:
2000078100 – A.Durga Sri Ramya
2000078124 – Akhila K
2000078047 – Aftab Sheikh
2000078090 – Akshay Doke
2000078073 – Meghana Gudipudi
2000078272 – Omkar Tambe
2000078254 – Shilpa K
Correlated subquery and uses
Correlated subqueries are used for row-by-row processing. Each subquery is executed once
for every row of the outer query. each row processed by the parent statement which
are SELECT, UPDATE, or DELETE statement. correlated subquery is one way of reading
every row in a table and comparing values in each row against related data.
Syntax
SELECT column1, column2, ....
FROM table1 outer
WHERE column1 operator
(SELECT column1, column2
FROM table2
WHERE expr1 =
outer.expr2);
ETL (Extraction, Transformation and Loading) Testing
ETL testing is done before data is moved to production data warehouse systems.
It is also called as table balancing or product reconciliation.
It involves the verification of data at various stages, which is used between source and
destination.
Types of ETL testing
Production validation: also called “production reconciliation” or “table balancing,”

validates data in production systems and compares it against source data.
Source to target count testing: verifies that the number of records loaded into the
target database match the expected record count.
Source to target data testing: ensures projected data is added to the target system
without loss or truncation, and that the data values meet expectations after transformation.
Metadata testing :performs data type, length, index, and constraint checks of ETL
application metadata (load statistics, reconciliation totals, data quality metrics).
Performance testing: makes sure that data is loaded into the data warehouse within
expected time frames and that the test server response to multiple users and transactions is
adequate for performance and scalability.
Data transformation testing: for each row to verify that the data is correctly
transformed according to business rules.
Data quality testing: runs syntax tests (invalid characters, pattern, case order) and
reference tests (number, date, precision, null check) to make sure the ETL application
rejects, accepts default values, and reports invalid data.
Data integration testing :confirms that the data from all sources has loaded to the
target data warehouse correctly and checks threshold values.
Report testing :reviews data in summary report, verifying layout and functionality are as
expected, and makes calculations.
COALESCE()
The SQL Server COALESCE() function is useful to handle NULL values. The NULL values are
replaced with the user-given value during the expression value evaluation process. The SQL
Server Coalesce function evaluates the expression in a definite order and always results first
not null value from the defined expression list
Syntax:
COALESCE ( exv1, exv2..., exvN )
Where –
exv1, exv2…, exvN are expression values.
Properties of the Syntax of SQL Server Coalesce function :
All expressions must be have same data-type.

It could have multiple expressions
Example-1 :
SELECT COALESCE (NULL, 'X', 'Y')

AS RESULT ;
Output :
RESULT
X
Rank() and Dense_Rank():

RANK and DENSE_RANK are used to order values and assign them numbers depending on
where they fall in relation to one another.
If you’d like to rank rows in the result set, SQL offers the RANK() and DENSE_RANK
functions. These functions are used in SELECT with others columns. After RANK or
DENSE_RANK, we call the OVER() function, which takes an ORDER BY clause with the name
of the column to sort before assigning a ranking.
Unlike DENSE_RANK, RANK skips positions after equal rankings. The number of positions
skipped depends on how many rows had an identical ranking.
For example:
Rank() :
SELECT subjects, s_name, mark, rank()
OVER ( partition by subjects order by mark desc )
AS 'rank' FROM result;
Dense_Rank() :
SELECT subjects, s_name, mark, dense_rank()
OVER ( partition by subjects order by mark desc )
AS 'dense_rank' FROM result;
NVL Function
NVL function is not defined in MySQL or SQL servers but it is defined in Oracle. In sql server
IFNULL is defined. NVL works the same as the IFNULL function in SQL. To replace the value
of NULL, void, empty, or zero we use the NVL function. It returns either numeric or string
values.
➢ Converts a null to an actual value.
➢ Data types that can be used are date character and number
➢ Data types must match o NVL(COMM, 0)
➢ NVL(hiredate, '01-JAN-97')
➢ NVL(job, 'NO JOB YET')
Syntax :
SELECT NVL (Value, Substitute) FROM table;
Example:
SELECT NVL(employee_id,1025) AS id FROM employee;
Max Characters in data types –

Data Type Maximum Characters
Integer -2147483648 to +2147483647
Byte -128 to 127
Float 3.4E +/- 38(7 digits)
Double 1.7E +/- 308(15 digits)
Varchar 0 to 65,535
Date 9999-12-31
Timestamp 2038-01-09 03:14:07
Timestamp without timezone 4713BC to 294276AD
Timestamp with timezone 4713BC to 294276AD
Datetime '9999-12-31 23:59:59'
Time 838:59:59
Year 2155

ETL Day3 Assignment(1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ETL Day3 Assignment(1)

Uploaded by

Copyright:

Available Formats

ETL – DAY 3 ASSIGNMENT

ETL (Extraction, Transformation and Loading) Testing

It is also called as table balancing or product reconciliation.

Types of ETL testing

Production validation: also called “production reconciliation” or “table balancing,”

Properties of the Syntax of SQL Server Coalesce function :

All expressions must be have same data-type.

SELECT COALESCE (NULL, 'X', 'Y')

Rank() and Dense_Rank():

Max Characters in data types –

You might also like