Professional Documents
Culture Documents
BI Lab Manual
BI Lab Manual
Dr. Y. PATIL
EDUCATIONAL FEDERATION
VARALE
Lab Practice-VI
Final Year Computer Engineering
Semester-VIII
PART-II 410253 : Elective VI :Business Intelligence
Subject Code: 410253(C)
Prepared By:
Mr. Sagar A. Dhanake
(Assistant Professor, Computer Engineering)
Assignment No. 1
Aim : Import the legacy data from different sources such as (Excel , Sql Server, Oracle etc.) and
load in the target system.(You can download sample database such as Adventure works,
Northwind, foodmart etc.)
Objective : To introduce the concepts and components of Business Intelligence (BI)
Prerequisite :
1. Basics of dataset extensions.
2. Concept of data import.
Theory :
Legacy Data:
Legacy data, according to Business Dictionary, is "information maintained in an old or out- of-date
format or computer system that is consequently challenging to access or handle."
Sources of Legacy Data :
Where does legacy data come from? Virtually everywhere. Figure 1 indicates that there are many
sources from which you may obtain legacy data. This includes existing databases, often relational,
although non-RDBs such as hierarchical, network, object, XML, object/relational databases, and
NoSQL databases. Files, such as XML documents or "flat files"• such as configuration files and
comma-delimited text files, are also common sources of legacy data. Software, including legacy
applications that have been wrapped (perhaps via CORBA) and legacy services such as web services
or CICS transactions, can also provide access to existing information. The point to be made is that
there is often far more to gaining access to legacy data than simply writing an SQL query against an
existing relational database.
1
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 2: Click on Get data following list will be displayed → select Excel.
2
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 3: Select required file and click on Open, Navigator screen appears.
3
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
4
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Click on ok.
Note: If you just want to see preview you can just click on table name without clicking on
checkbox Click on edit to view table.
5
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Conclusion: In this way we import the Legacy datasets using the Power BI Tool.
6
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Assignment No. 2
Aim : Perform the Extraction Transformation and Loading (ETL) process to construct the
database in the Sql server.
Theory:
Extraction Transformation and Loading (ETL): -
ETL, which stands for extract, transform and load, is a data integration process that combines data
from multiple data sources into a single, consistent data store that is loaded into a data warehouse or
other target system.
As the databases grew in popularity in the 1970s, ETL was introduced as a process for integrating
and loading data for computation and analysis, eventually becoming the primary method to process
data for data warehousing projects.
ETL provides the foundation for data analytics and machine learning work streams. Through a series
of business rules, ETL cleanses and organizes data in a way which addresses specific business
intelligence needs, like monthly reporting, but it can also tackle more advanced analytics, which can
improve back-end processes or end user experiences. ETL is often used by an organization to:
• Extract data from legacy systems
• Cleanse the data to improve data quality and establish consistency
• Load data into a target database
Extract
During data extraction, raw data is copied or exported from source locations to a staging area. Data
management teams can extract data from a variety of data sources, which can be structured or
unstructured. Those sources include but are not limited to:
1
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
ETL solutions improve quality by performing data cleansing prior to loading the data to a different
repository. A time-consuming batch operation, ETL is recommended more often for creating smaller
target data repositories that require less frequent updating, while other data integration methods—
including ELT (extract, load, transform), change data capture (CDC), and data virtualization—are
used to integrate increasingly larger volumes of data that changes or real-time data streams.
What Is ETL Process?
To put it simply, the data ETL process including extracting and compiling rawdata,
transforming it to make it intelligible, and loading it into a target system, such as a databaseor data
warehouse, for easy access and analysis. ETL short for Extract, Transform, Load, is an important
component in the data ecosystem of any modern businesses. The ETL process is what helps break
down data silos and makes data easier to access for decision-makers.
Since data coming from multiple sources has a different schema, every dataset must be transformed
differently before utilizing BI and analytics. For instance, if you are compiling data from source
systems like SQL Server and Google Analytics, these two sources will need to be treated individually
throughout the ETL process. The importance of this process has increased since big data analysis has
become a necessary part of every organization.
ETL tools :-
In the past, organizations wrote their own ETL code. There are now many open source and
commercial ETL tools and cloud services to choose from. Typical capabilities of these products
include the following:
• Comprehensive automation and ease of use: Leading ETL tools automate the entire dataflow,
from data sources to the target data warehouse. Many tools recommend rules for extracting,
transforming and loading the data.
• A visual, drag-and-drop interface: This functionality can be used for specifying rules and data
flows.
• Support for complex data management: This includes assistance with complex calculations,
data integrations, and string manipulations
• Security and compliance: The best ETL tools encrypt data both in motion and at rest and are
certified compliant with industry or government regulations, like HIPAA and GDPR.
2
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
3
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 4: Click on Add → Select path of backup files → Select Both file and OK → OK → OK.OK
4
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
5
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Select File → New → Project → Business Intelligence → Integration Services Project and Give
appropriate Project Name → OK.
Step 6: Right Click on Connection Managers in solution explorer and click on New Connection
Manager.
Add SSIS Connection Manager window appears → Select OLEDB Connection Manager and
Click on Add.
6
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 8: Select Server Name(As per your machine) and database name and click on Test
connection → OK → OK → Connection is added to connection manager.
7
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 9: Drag and Drop Data Flow Task in Control Flow Tab.
Step 10: Drag OLE DB Source from Other Sources and drop into Data Flow tab.
8
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 11: Double click on OLE DB Source → OLE DB Source Editor appears → click on New to
add Connection Manager.
Step 12: Drag OLE DB Destination in Data Flow Tab and Connect both.
9
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 13: Double Click on OLE DB Destination → Click on New to run the query to get
[OLE DB Destination] in name of the table or the view → OK .
In Mappings → OK.
10
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Conclusion: -
In this way we import the Extraction Transformation and Loading (ETL) process to construct the
database in the Sql server.
11
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Assignment No. 3
Aim : Create the cube with suitable dimension and fact tables based on ROLAP, MOLAP and HOLAP
model.
Objective : To introduce the concepts and components of OLAP, MOLAP and HOLAP model.
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Software Requirements : SQL Server 2012 Full version
Theory:
What is OLAP?
OLAP was introduced into the business intelligence (BI) space over 20 years ago, in a time where
computer hardware and software technology weren’t nearly as powerful as they are today. OLAP
introduced a (typically analysts) to easily perform multidimensional analysis of large volumes of business
data.
Aggregating, grouping, and joining data are the most difficult types of queries for a relational database
to process. The magic behind OLAP derives from its ability to pre-calculate and pre- aggregate data.
Otherwise, end users would be spending most of their time waiting for query results to be returned by the
database. However, it is also what causes OLAP-based solutions to be extremely rigid and IT-intensive.
OLAP stands for Relational Online Analytical Processing. ROLAP stores data in columns and rows (also
known as relational tables) and retrieves the information on demand through user submitted queries. A
ROLAP database can be accessed through complex SQL queries to calculate information. ROLAP can
handle large data volumes, but the larger the data, the slower the processing times.
Because queries are made on-demand, ROLAP does not require the storage and pre-computation of
information. However, the disadvantage of ROLAP implementations are the potential performance
constraints and scalability limitations that result from large and inefficient join operations between large
tables. Examples of popular ROLAP products include Metacube by Stanford Technology Group, Red
Brick Warehouse by Red Brick Systems, and AXSYS Suite by Information Advantage.
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
What is MOLAP?
MOLAP stands for Multidimensional Online Analytical Processing. MOLAP uses a multidimensional
cube that accesses stored data through various combinations. Data is pre- computed, pre-summarized,
and stored (a difference from ROLAP, where queries are served on- demand).
A multicube approach has proved successful in MOLAP products. In this approach, a series of dense,
small, precalculated cubes make up a hypercube. Tools that incorporate MOLAP include Oracle Essbase,
IBM Cognos, and Apache Kylin. Its simple interface makes MOLAP easy to use, even for inexperienced
users. Its speedy data retrieval makes it the best for “slicing and dicing” operations. One major
disadvantage of MOLAP is that it is less scalable than ROLAP, as it can handle a limited amount of data.
What is HOLAP?
HOLAP stands for Hybrid Online Analytical Processing. As the name suggests, the HOLAP storage
mode connects attributes of both MOLAP and ROLAP. Since HOLAP involves storing part of your data
in a ROLAP store and another part in a MOLAP store, developers get the benefits of both.
With this use of the two OLAPs, the data is stored in both multidimensional databases and relational
databases. The decision to access one of the databases depends on which is most appropriate for the
requested processing application or type. This setup allows much more flexibility for handling data. For
theoretical processing, the data is stored in a multidimensional database. For heavy processing, the data
is stored in a relational database.
Microsoft Analysis Services and SAP AG BI Accelerator are products that run off HOLAP.
Step 3: Click on execute or press F5 by selecting query one by one → After completing execution
save and close SQL Server Management studio & Reopen to see Sales_DW in Databases Tab.
3
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 4: Go to Sql Server Data Tools → Right click and run as administrator.
Step 5: Right click on Data Sources in solution explorer → New Data Source
4
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 7: Select Server Name(As per your machine) and database name and click on Test
connection → OK → OK → Connection is added to connection manager.
5
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 8: Click Next on Data Source Wizard → Select Inherit → Next → Finish → Sales_DW.ds
gets created under Data Sources in Solution Explorer
In Solution explorer right click on Data Source View → Select New Data Source View → Next
→ Next
Select FactProductSales(dbo) from Available objects and put in Includes Objects by clicking
on >
6
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 11: Sales DW.dsv appears in Data Source Views in Solution Explorer.
7
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 12: Right click on Cubes → New Cube → Next → Select Use existing tables in Select
Creation Method → Next
Step 13: In Select Measure Group Tables → Select FactProductSales → Click Next
8
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
9
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 18: Drag and Drop Product Name from Table in Data Source View and Add in Attribute
Pane at left side
Double click On Dim Date dimension → Drag and Drop Fields from Table shown in Data Source
View to Attributes → Drag and Drop attributes from leftmost pane of attributes to middle pane of
Hierarchy.
Drag fields in sequence from Attributes to Hierarchy window (Year, Quarter Name, Month Name,
Week of the Month, Full Date UK).
10
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Right click on Project name → Properties → Do following changes and click on Apply & ok
11
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 22: To process cube right click on Sales_DW.cube → Process → Click Run
Conclusion: In this way we import of the OLAP, MOLAP and HOLAP model.
12
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Assignment No. 4
Aim : Import the data warehouse data in Microsoft Excel and create the Pivot table and Pivot
Chart.
Objective : To introduce the concepts and components of data warehouse data in Microsoft Excel
and createthe Pivot table and Pivot Char.
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Software Requirements : SQL Server 2012 Full version, Excel
Go to Data Tab → Get Data → Legacy Wizard → From Data Connection Wizard.
1
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 4: In Select Database and Table → Select Sales_DW → Check all Dimensions and import
relationship between selected tables → Next.
2
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 5: In Save data Connection file browse path and click on finish
Step 7: In Fields put SalesDateKey in Filters, FullDateUK in axis and sum of ProductActualCost
in values.
3
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Step 9: Choose From External Data Source → Choose Connection to select existing connection
with Sales_DW and Click on Open → OK.
4
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Conclusion: -In this way we import of data warehouse data in Microsoft Excel and create the
Pivot table and Pivot Chart.
5
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Assignment No. 5
Aim : Perform the data classification using classification algorithm. Or Perform the data
clustering using clustering algorithm.
Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import.
Software Requirements : R Studio
Theory : Time series is a series of data points in which each data point is associated with a
timestamp. A simple example is the price of a stock in the stock market at different points of time
on a given day. Another example is the amount of rainfall in a region at different months of the
year. R language uses many functions to create, manipulate and plot the time series data. The data
for the time series is stored in an R object called time-series object. It is also a R data object like
a vector or data frame.
Syntax : The basic syntax for ts() function in time series analysis is –
• Data is a vector or matrix containing the values used in the time series.
• Start specifies the start time for the first observation in time series.
• End specifies the end time for the last observation in time series.
Consider the annual rainfall details at a place starting from January 2012. We create an R time
series object for a period of 12 months and plot it.
1
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Code to run in R :
print(rainfall.timeseries)
png(file = "rainfall.png")
plot(rainfall.timeseries)
dev.off()
plot(rainfall.timeseries)
Output :
When we execute the above code, it produces the following result and chart –
Jan Feb Mar Apr May Jun Jul Aug Sep 2012 799.0 1174.8 865.1 1334.6 635.4 918.5 685.5
998.6 784.2 Oct Nov Dec 2012 985.0 882.8 1071.0
2
Dr. D. Y. Patil Educational Federation’s
Dr. D. Y. PATIL COLLEGE OF ENGINEERING & INNOVATION
Department of Computer Engineering
A.Y. 2022-23
Output :
Conclusion: - In this way we Perform the data classification using classification algorithm.