Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

INDUSTRYINTERNSHIP

SUMMARY REPORT

Cognizant BISQUD - AIA

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

Submitted by
MOHAK RASTOGI (17SCSE101366)

SCHOOL OF COMPUTING SCIENCE AND ENGINEERING


GREATER NOIDA, UTTAR PRADESH
Winter 2020 – 2021
Offer Letter

1
CERTIFICATE
I hereby certify that the work which is being presented in the Internship project report entitled
“Cognizant BISQUAD-AIA Project Report” in partial fulfillment for the requirements for the
award of the degree of Bachelor of Technology in the School of Computing Science and
Engineering of Galgotias University, Greater Noida, is an authentic record of my own work
carried out in the industry.
To the best of my knowledge, the matter embodied in the project report has not been submitted
to any other University/Institute for the award of any Degree.

Mohak Rastogi (17SCSE10136)


This is to certify that the above statement made by the candidate is correct and true to the best of
my knowledge.

Signature of Internship Coordinator Signature of Dean (SCSE)

Dr.N.Partheeban Dr. MUNISH SABHARWAL


Professor & IIIC Professor & Dean
School of Computing Science & School of Computing Science & Engineering
Engineering .Galgotias University, Greater Galgotias University, Greater Noida.
Noida

2
TABLE OF CONTENTS

S.No Particulars Page No

1 Abstract 4

2 Introduction 6

3 Objective Of The Project 7

4 Target Specification 10

5 Functional Partitioning Of Project 11

6 Methodology 12

7 Tools Required 28

8 Result Analysis 29

9 Conclusion 30

10 Technical References 31

3
ABSTRACT

BI informs business priorities, goals and directions by tracking and publishing predefined key
performance indicators in the form of dashboard drill-up/drill-down cubes and reports for every
aspect of business operations.

As part of the BI process, organizations collect data from internal IT systems and external
sources, prepare it for analysis, run queries against the data and create data visualizations, BI
dashboards and reports to make the analytics results available to business users for operational
decision-making and strategic planning.
The ultimate goal of BI initiatives is to drive better business decisions that enable organizations
to increase revenue, improve operational efficiency and gain competitive advantages over
business rivals. To achieve that goal, BI incorporates a combination of analytics, data
management and reporting tools, plus various methodologies for managing and analyzing data.

4
List of Figures

Figure No. Name Page No.

Fig. 1 Database Components 13

Fig. 2 Data Warehouse Architecture 16

Fig. 3 Informatica Architecture 17

Fig. 4 Characteristics of Big Data 22

Fig. 5 Software Testing 26

5
Introduction
BI (Business intelligence) encompasses technologies and analytical processes that examine data
and present actionable information based on, for example, reports, predictive analytics, data and
text mining, and business performance, and help business leaders make better- informed
decisions. Enterprises use business intelligence make a wide variety of strategic and operational
business decisions.

A brief introduction of the organization-


Cognizant is an American multinational technology company that provides business consulting,
information technology and outsourcing services. It is headquartered in Teaneck, New Jersey,
United States. Cognizant is part of the NASDAQ-100 and trades under CTSH. It was founded as
an in-house technology unit of Dun & Bradstreet in 1994,and started serving external clients in
1996.
Cognizant had a period of fast growth during the 2000s and became a Fortune 500 company in
2011; as of 2020 it's at position 194.
Product and services - Cognizant provides information technology, information security,
consulting, ITO and BPO services. These include business & technology consulting, systems
integration, application development & maintenance, IT infrastructure services, Artificial
Intelligence, Digital Engineering, analytics, business intelligence, data warehousing, customer
relationship management, supply chain management, engineering & manufacturing solutions,
enterprise resource planning, research and development outsourcing, and testing solutions.
Cognizant has three areas which makes up their business — Digital Business, Digital Operations,
and Digital Systems & Technology.

6
Objective of the work
 should be able to Describe what is Database
 should be able to Implement Structured Query Language (SQL)
 should be able to Implement queries using DDL, DML, DCL
 should be able to Implement queries applying operators, Function, & Clauses concepts
 should be able to Implement queries using SQL joins, Sub queries, clauses
 should be able to Define the Operational System and the Data Warehouse
 should be able to Describe the Data Warehouse and Data Mart.
 should be able to Describe the Operational Data Store
 should be able to Describe the Enterprise Data Warehouse (EDW)
 should be able to Describe the Extract, Transform and Load Process
 should be able to Explain the Extract Transform and Load Process to load Operational
Data Store
 should be able to Explain the Extract Transform and Load Process to Load Data
Warehouse.
 should be able to Explain the Extract Transform and Load Process to Load Data Mart.
 should be able to Explain the Advanced Extract, Transform and Load Practices
 should be able to Describe the Operating System.
 should be able to Explain the File System
 should be able to Demonstrate the Editors
 should be able to Describe the Architecture of the Informatica PowerCenter and Uses.
 should be able to List the Informatica PowerCenter Components and the Objects.
 should be able to Describe the Core Administrative Tasks and Configure the Informatica
Administration Tool.
 should be able to Explain the creation and the configuration of the Repository and the
Integration services.
 should be able to Describe the Client Tool - Repository Manager
 should be able to Demonstrate the creation of the Folders and the access management.
 should be able to Describe the Client Tool – Designer

7
 should be able to List the Informatica Designer Tools
 should be able to Demonstrate the Creation of the Source definition, Target definition and
the mapping.
 should be able to Describe the Client Tool - Workflow Manager
 should be able to List the Workflow Manager Tools
 should be able to Demonstrate the Creation of the Session, Workflow and other Tasks.
 should be able to Demonstrate the Scheduling in Informtica PowerCenter
 should be able to Describe the Client Tool - Workflow Monitor
 should be able to Demonstrate the monitoring of the Workflows and the tasks.
 should be able to Demonstrate the Deployment of the Informatica Objects.
 should be able to Explain the Informatica PowerCenter Performance bottlenecks
 should be able to Demonstrate the Best Practices of the Informatica PowerCenter objects
usage.
 should be able to Explain the Usage of the Power Exchange Connectors for the Cloud
and CDC.
 should be able to Apply Python script for various cloud specific development.
 should be able to Explain about various Python libraries.
 should be able to Describe the basic concepts of BigData, Hadoop, HDFS, MapReduce,
Sqoop, Pig, Hive, Hbase
 should be able to Perform all related operations on the Hbase tables
 should be able to Implement the Datatypes, closures, traits, exceptional handlings,
Collections, generics and various functions
 should be able to Perform operations on RDD’s using transformations and actions
 should be able to Implement the transformation/action based on the needs for different
problem statements
 should be able to Articulate when and where the appropriate memory levels are used.
 should be able to Execute the programs, debug them, assigning different parameters like
memory allocation and executors while executing the spark program on different
execution modes.

8
 should be able to Work on different streaming data using spark applications and execute
the streaming programs by reading data from different sources like Sockets, Kafka,
Flume, File etc.
 should be able to Perform different operations on DStreams and persist results into
HBase
 should be able to Create and perform operations on the Data frames using SparkSQL –
Load data from different sources like JDBC, csv, JSON, XML and plain text.
 should Identify different ETL patterns in spark like Lambda architecture etc..,
 should be able to Define test strategy , test plan and its importance
 should be able to Design test scenarios(both positive and negative) from requirements
gathered.
 should be able to Design test cases from test scenarios
 should be able to Define a defect.
 should be able to Explain the defect lifecycle workflow.

9
Target Specifications

Business intelligence is rather an umbrella term that covers the processes and methods of
collecting, storing, and analyzing data from business operations or activities to optimize
performance. All of these things come together to create a comprehensive view of a business to
help people make better, actionable decisions.
Over the past few years, business intelligence has evolved to include more processes and
activities to help improve performance. These processes include:
 Data mining: Using databases, statistics and machine learning to uncover trends in large
datasets.
 Reporting: Sharing data analysis to stakeholders so they can draw conclusions and make
decisions.
 Performance metrics and benchmarking: Comparing current performance data to
historical data to track performance against goals, typically using customized dashboards.
 Descriptive analytics: Using preliminary data analysis to find out what happened.
 Querying: Asking the data specific questions, BI pulling the answers from the datasets.
 Statistical analysis: Taking the results from descriptive analytics and further exploring
the data using statistics such as how this trend happened and why.
 Data visualization: Turning data analysis into visual representations such as charts,
graphs, and histograms to more easily consume data.
 Visual analysis: Exploring data through visual storytelling to communicate insights on
the fly and stay in the flow of analysis.
 Data preparation: Compiling multiple data sources, identifying the dimensions and
measurements, preparing it for data analysis.

10
Functional partitioning of project

1. Database Design
2. Data Warehouse Basics
3. ETL Concepts
4. Data Warehouse Testing
5. Informatica Power Center
6. Python
7. Big Data and Hadoop
8. Big Data - Hbase
9. Scala
10. Spark
11. Testing

11
Methodology

What is Data?

Data can be facts related to any object in consideration. For example, your name, age, height,
weight, etc. are some data related to you. A picture, image, file, pdf, etc. can also be considered
data.

What is Database?

A database is a systematic collection of data. They support electronic storage and manipulation
of data. Databases make data management easy.

Let us discuss a database example: An online telephone directory uses a database to store data of
people, phone numbers, and other contact details. Your electricity service provider uses a
database to manage billing, client-related issues, handle fault data, etc.

Types of Databases

Here are some popular types of databases:

Distributed databases:
A distributed database is a type of database that has contributions from the common database and
information captured by local computers. In this type of database system, the data is not in one
place and is distributed at various organizations.

Relational databases:
This type of database defines database relationships in the form of tables. It is also called
Relational DBMS, which is the most popular DBMS type in the market. Database example of the
RDBMS system include MySQL, Oracle, and Microsoft SQL Server database.

12
Object-oriented databases:
This type of computers database supports the storage of all data types. The data is stored in the
form of objects. The objects to be held in the database have attributes and methods that define
what to do with the data. PostgreSQL is an example of an object-oriented relational DBMS.

Centralized database:
It is a centralized location, and users from different backgrounds can access this data. This type
of computers databases store application procedures that help users access the data even from a
remote location.

Open-source databases:
This kind of database stored information related to operations. It is mainly used in the field of
marketing, employee relations, customer service, of databases.

Cloud databases:
A cloud database is a database which is optimized or built for such a virtualized environment.
There are so many advantages of a cloud database, some of which can pay for storage capacity
and bandwidth. It also offers scalability on-demand, along with high availability.

Data warehouses:
Data Warehouse is to facilitate a single version of truth for a company for decision making and
forecasting. A Data warehouse is an information system that contains historical and commutative
data from single or multiple sources. Data Warehouse concept simplifies the reporting and
analysis process of the organization.

Database Components

13
What is a Database Management System (DBMS)?

Database Management System (DBMS) is a collection of programs that enable its users to access
databases, manipulate data, report, and represent data. It also helps to control access to the
database. Database Management Systems are not a new concept and, as such, had been first
implemented in the 1960s.

Charles Bachman's Integrated Data Store (IDS) is said to be the first DBMS in history. With time
database, technologies evolved a lot, while usage and expected functionalities of databases
increased immensely.

MySQL Database

MySQL Database is popular in relational database management systems can be used from small
business applications to big business applications.

Some of the key features of MySQL are:

 Open-source – MySQL is an open-source license. So we get it free nothing to pay to use


it.
 Implemented language – MySQL Written in C, C++.
 Powerful – MySql handles a large subset of the data with the functionality of the most
powerful database packages. So it makes MySQL is a very powerful program.
 SQL data language – MySQL uses a standard database language that is SQL data
language, which is commonly used in most of the database. So it compatible with other
databases also.
 Operating systems – On many operating systems the MySQL works with many languages
like C, C++, PHP, PERL, JAVA, and so on.
 Large data sets – With large data sets MySQL works well and even very fast.
 Web development – MySQL can also be used in web applications as it is working with
PHP and most web development languages.

14
 Supports large databases – MySQL works with large databases. The default file size limit
for a table is 4GB, which can be increased depending on the operating system, up to 50
million rows or more in a table.
 Multi-layered design – MySQL is a multi-layered server design with independent
modules. As it is fully multithreaded by using kernel threads, it uses multiple CPUs if
they are available.
 Client/server environment – MySQL Server works in embedded or client/server systems.

MySQL Commands

1. SELECT — extracts and fetches data from a database.


2. UPDATE — updates data and allows us to edit rows in a table.
3. DELETE — deletes data and remove rows from a table.
4. INSERT INTO — inserts new data and add a new row to a table.
5. CREATE DATABASE — creates a new database.
6. ALTER DATABASE — modifies a database and changes the characteristics of a
database.
7. CREATE TABLE — creates a new table in the database.
8. ALTER TABLE — modifies: add, delete, or modify columns in an existing table.
9. DROP TABLE — deletes or drop an existing table in a database.
10. CREATE INDEX — creates an index on existing tables to retrieve the rows quickly.
11. DROP INDEX — deletes an index in a table.
Data Warehouse Fundamentals

The Data Warehouse is a collection of data in support of management decision processes, which
is:
● Subject oriented
● Integrated
● Time variant
● Non-volatile
A Data Warehouse is a relational database that is designed for query and analysis.

15
It usually contains historical data derived from transaction data and other sources.

Need for Data Warehousing?


Operational data helps the organization to meet the operational and tactical requirements for
data.
The Data Warehouse data helps the organization to meet strategic requirements for information.
The strategic data helps the business with the following needs:
• Understand Business Issues
• Analyze Trends and Relationships
• Analyze Problems
• Discover Business Opportunities
• Plan for the Future
Data Warehouse Architecture

Data Warehouse: Application Areas

● Risk management
● Financial analysis

16
● Marketing programs
● Profit trends
● Procurement analysis
● Inventory analysis
● Statistical analysis
● Claims analysis
● Manufacturing optimization
● Customer relationship management

What is Informatica?
Informatica is introduced as a software development company in the market. It provides a
complete data integration solution and data management system. It launched multiple products
that mainly focused on data integration.

Informatica is also introduced as a data integration tool. This tool is based on the ETL
architecture. It provides data integration software and services for different industries,
businesses, government organizations, as well as telecommunication, health care, insurance, and
financial services.It has a unique property to connect, process, and fetch the data from a different
type of mixed sources.

Informatica Architecture

17
Informatica architecture is service-oriented architecture (SOA). A service-oriented architecture is
defined as a group of services that communicate with each other. It means a simple data transfer
during this communication, or it can be two or more services that coordinate the same activity.

Repository Service: It is responsible for maintaining Informatica metadata and provides access to
the same to other services.
Integration Service: This service helps in the movement of data from sources to the targets.
Reporting Service: This service generates the reports.
Nodes: This is a computing platform to execute the above services.
Informatica Designer: It creates the mappings between source and target.
Workflow Manager: It is used to create workflows or other tasks and their execution.
Workflow Monitor: It is used to monitor the execution of workflows.
Repository Manager: It is used to manage the objects in the repository.

Informatica PowerCenter

Informatica PowerCenter is an ETL tool that is used to enterprise extract, transform, and load the
data from the sources. We can build enterprise data warehouses with the help of the Informatica
PowerCenter. The Informatica PowerCenter produces the Informatica Crop.

The Informatica PowerCenter is extracting data from its source, transforming this data according
to requirements, and loading this data into a target data warehouse.

The main components of Informatica PowerCenter are its client tools, server, repository, and
repository server. Both the PowerCenter server and repository server make up the ETL layer,
which is used to complete the ETL processing.

The Informatica PowerCenter is having the following services, such as:

18
● B2B exchange.
● Data governance.
● Data migration.
● Data warehousing.
● Data synchronization and replication.
● Integration Competency Centers (ICC).
● Master Data Management (MDM).
● Service-oriented architectures (SOA) and many more.

Informatica Transformations

Informatica Transformations are repository objects which can create, read, modifies, or passes
data to the defined target structures such as tables, files, or any other targets.

In Informatica, the purpose of transformation is to modify the source data according to the
requirement of the target system. It also ensures the quality of the data being loaded into the
target.

A Transformation is used to represent a set of rules, which define the data flow and how the data
is loaded into the targets.

Classification of Transformation

Transformation is classified into two categories-the first one based on connectivity and second
based on the change in several rows. First, we will look at the transformation based on
connectivity.
1. Here are two types of transformation based on connectivity, such as:
● Connected Transformations
● Unconnected Transformations
In Informatica, one transformation is connected to other transformations during mappings are
called connected transformations.

19
Those transformations who are not link to any other transformations are called unconnected
transformations.

2. Here are two types of transformations based on the change in several rows, such as:
● Active Transformations
● Passive Transformations
Active Transformations are those who modify the data rows, and the number of input rows
passed to them. For example, if a transformation receives 10 numbers of rows as input, and it
returns 15 numbers of rows as an output, then it is an active transformation. In the active
transformation, the data is modified in the row.
Passive Transformations do not change the number of input rows. In passive transformations,
the number of input and output rows remains the same, and data is modified at row level only.

List of Transformations in Informatica:

● Source Qualifier Transformation


● Aggregator Transformation
● Router Transformation
● Joiner transformation
● Rank Transformation
● Sequence Generator Transformation
● Transaction Control Transformation
● Lookup and Re-usable transformation
● Normalizer Transformation
● Performance Tuning for Transformation
● External Transformation
● Expression Transformation

Mapping in Informatica

20
Mapping is a collection of source and target objects which is tied up together through a set of
transformations. These transformations are formed with a set of rules that define how the data is
loaded into the targets and flow of the data.
Mapping in Informatica includes the following set of objects, such as:

● Source definition: The source definition defines the structure and characteristics of the
source, such as basic data types, type of the data source, and more.
● Transformation: It defines how the source data is changed, and various functions can be
applied during this process.
● Target Definition: The target definition defines where the data will be loaded finally.
● Links: Link is used to connecting the source definition with target tables and different
transformations. And it shows the flow of data between the source and target.
What is Big Data?
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is
a data with so large size and complexity that none of traditional data management tools can store
it or process it efficiently. Big data is also a data but with huge size.
Types Of Big Data
1. Structured- Any data that can be stored, accessed and processed in the form of fixed
format is termed as a 'structured' data.
2. Unstructured- Any data with unknown form or the structure is classified as unstructured
data. In addition to the size being huge, un-structured data poses multiple challenges in terms of
its processing for deriving value out of it.
3. Semi-structured-Semi- form structured data can contain both the forms of data. We can
see semi-structured data as a structured in but it is actually not defined with e.g. a table definition
in relational DBMS.

Characteristics Of Big Data

21
 Volume- The name Big Data itself is related to a size which is enormous. Size of data
plays a very crucial role in determining value out of data. Hence, 'Volume' is one
characteristic which needs to be considered while dealing with Big Data.
 Variety-Variety refers to heterogeneous sources and the nature of data, both structured
and unstructured. During earlier days, spreadsheets and databases were the only sources
of data considered by most of the applications.
 Velocity-The term 'velocity' refers to the speed of generation of data. How fast the data
is generated and processed to meet the demands, determines real potential in the data.
 Variability-This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.

WHAT IS HADOOP?
Hadoop is a high-performance distributed data storage and processing system. Its two major
subsystems are :
● HDFS for storage
● Mapreduce for parallel data processing.
It can store any kind of data from any source, inexpensively and at very large scale, and it can do
very sophisticated analysis of that data easily and quickly.
Hadoop automatically detects and recovers from hardware, software and system failures.
Hadoop provides scalable, reliable and fault tolerant services for data storage and analysis at
very low cost.

22
WHAT IS HADOOP USED FOR?
 Searching/ Text mining
 Log processing
 Recommendation systems
 Business Intelligence/Data Warehousing
 Video and Image analysis
 Archiving
 Graph creation and analysis
 Pattern recognition
 Risk assessment
 Sentiment Analysis

WHAT IS HBASE?
HBase is Column‐Oriented, Multi‐Dimensional, High Availability , High Performance,Non-
relational, Distributed Database.It runs on top of HDFS. It is well suited for sparse data sets,
which are common in many big data use cases.
An HBase system comprises a set of tables. Each table contains rows and columns, much like a
traditional database.
Apache HBase scales linearly to handle huge data sets with billions of rows and millions of
columns, and it easily combines data sources that use a wide variety of different structures and
schemas. It provides a fault-tolerant way of storing large quantities of sparse data.
HBase is not a direct replacement for a classic SQL database.

WHAT IS HIVE?
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis.Apache Hive supports analysis of large datasets stored in
Hadoop's HDFS and compatible file systems .All the data types in Hive are classified into four
types, given as follows:
 Column Type
 Literals

23
 Null Values
 Complex Types
Hive provides indexing to provide acceleration. Hive has built-in user defined functions (UDFs)
to manipulate dates, strings, and other data-mining tools. Hive supports extending the UDF set to
handle use-cases not supported by built-in functions.Hive supports SQL-like queries (HiveQL).

PIG-
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for
expressing data analysis programs, coupled with infrastructure for evaluating these programs.Pig
is made up of two pieces:
● The language used to express data flows, called Pig Latin.
● The execution environment to run Pig Latin programs
Pig Latin is a data flow language, Allows users to describe how data from one or more inputs
should be read, processed, and then stored to one or more outputs in parallel.

SQOOP-
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop
and structured datastores such as relational databases.
Sqoop can be connected to databases like Oracle, MySQL, and Teradata etc. It uses JDBC to
connect to them, so JDBC driver for each of databases is required. Sqoop is a tool designed to
transfer data between Hadoop and relational databases.Sqoop uses MapReduce to import and
export the data, which provides parallel operation as well as fault tolerance.
Sqoop can also import the result set of an arbitrary SQL query. Instead of using the --table,
--columns and --where arguments, you can specify a SQL statement with the --query argument.

What is Scala?
Scala is a modern multi-paradigm programming language designed to express common
programming patterns in a concise, elegant, and type-safe way.Scala, short for Scalable
Language
● Scala smoothly integrates the features of object-oriented and functional languages.

24
● Scala Programming = Object Oriented Programming + Functional Programming.From
the functional programming perspective- each function in Scala is a value and from the object
oriented aspect - each value in Scala is an object.
● Scala programming language can be found in use at some of the best tech companies like
LinkedIn, Twitter, and FourSquare.

What Is Spark?
Apache Spark is a lightning-fast cluster computing technology, designed for fast
computation.Spark is not a modified version of Hadoop
It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for
more types of computations, which includes interactive queries and stream processing.The main
feature of Spark is its in-memory cluster computing that increases the processing speed of an
application.
Covers wide range of workloads – Streaming – No need of other separate tools.

Why Spark?
The reason is that Hadoop framework is based on a simple programming model
(MapReduce)Here, the main concern is to maintain speed in processing large datasets in terms of
waiting time between queries and waiting time to run the program.
● Speed − Spark helps to run an application in Hadoop cluster, up to 100 times faster in
memory, and 10 times faster when running on disk. This is possible by reducing number of
read/write operations to disk. It stores the intermediate processing data in memory.
● Supports multiple languages − Spark provides built-in APIs in Java, Scala, or Python.
Therefore, you can write applications in different languages.
● Advanced Analytics − It supports SQL queries, Streaming data, Machine learning (ML),
and Graph algorithms.

SOFTWARE TESTING-
Software Testing is a method to check whether the actual software product matches expected
requirements and to ensure that software product is Defect free. It involves execution of
software/system components using manual or automated tools to evaluate one or more properties

25
of interest. The purpose of software testing is to identify errors, gaps or missing requirements in
contrast to actual requirements.

DIFFERENT TYPES OF SOFTWARE TESTING-


● Manual Testing: Manual testing includes testing a software manually, i.e., without using
any automated tool or any script. In this type, the tester takes over the role of an end-user and
tests the software to identify any unexpected behavior or bug. There are different stages for
manual testing such as unit testing, integration testing, system testing, and user acceptance
testing.
● Automation Testing: Automation testing, which is also known as Test Automation, is
when the tester writes scripts and uses another software to test the product. This process involves
automation of a manual process.

DIFFERENT LEVELS OF SOFTWARE TESTING-


● Unit Testing: A level of the software testing process where individual units/components
of a software/system are tested. The purpose is to validate that each unit of the software performs
as designed.
● Integration Testing: A level of the software testing process where individual units are
combined and tested as a group. The purpose of this level of testing is to expose faults in the
interaction between integrated units.

26
● System Testing: A level of the software testing process where a complete, integrated
system/software is tested. The purpose of this test is to evaluate the system’s compliance with the
specified requirements.
● Acceptance Testing: A level of the software testing process where a system is tested for
acceptability. The purpose of this test is to evaluate the system’s compliance with the business
requirements and assess whether it is acceptable for delivery.

27
Tools required

 MYSQL : MySQL Workbench is a unified visual tool for database architects,


developers, and DBAs. MySQL Workbench provides data modeling, SQL development,
and comprehensive administration tools for server configuration, user administration,
backup, and much more. MySQL Workbench is available on Windows, Linux and Mac
OS X.
 Informatica PowerCenter : Informatica PowerCenter is an enterprise extract, transform,
and load (ETL) tool used in building enterprise data warehouses.With its high availability
as well as being fully scalable and high-performing, PowerCenter foundation for all
major data integration projects and initiatives throughout the enterprise.
 Hadoop : Hadoop is an open-source software framework for storing data and running
applications on clusters of commodity hardware. It provides massive storage for any kind
of data, enormous processing power and the ability to handle virtually limitless
concurrent tasks or jobs.

28
Results Analysis

A successful BI program produces a variety of business benefits in an organization. For


example, BI enables C-suite executives and department managers to monitor business
performance on an ongoing basis so they can act quickly when issues or opportunities arise.
Analyzing customer data helps make marketing, sales and customer service efforts more
effective. Supply chain, manufacturing and distribution bottlenecks can be detected before they
cause financial harm. HR managers are better able to monitor employee productivity, labor costs
and other workforce data.

Overall, the key benefits that businesses can get from BI applications include the ability to:
 speed up and improve decision-making;
 optimize internal business processes;
 increase operational efficiency and productivity;
 spot business problems that need to be addressed;
 identify emerging business and market trends;
 develop stronger business strategies;
 drive higher sales and new revenues; and
 gain a competitive edge over rival companies.

BI initiatives also provide narrower business benefits -- among them, making it easier for project
managers to track the status of business projects and for organizations to gather competitive
intelligence on their rivals. In addition, BI, data management and IT teams themselves benefit
from business intelligence, using it to analyze various aspects of technology and analytics
operations.

29
Conclusions

BI platforms are increasingly being used as front-end interfaces for big data systems that contain
a combination of structured, unstructured and semi-structured data. Modern BI software typically
offers flexible connectivity options, enabling it to connect to a range of
data sources. This, along with the relatively simple user interface (UI) in most BI tools, makes it
a good fit for big data architectures.
Users of BI tools can access Hadoop and Spark systems, NoSQL databases and other big data
platforms, in addition to conventional data warehouses, and get a unified view of the diverse data
stored in them. That enables a broad number of potential users to get involved in analyzing sets
of big data, instead of highly skilled data scientists being the only ones with visibility into the
data.

30
Technical References

1. Udemy course of ETL, MySql


2. https://www.tutorialgateway.org/export-data-from-sql-server-to-flat-file-in-informatica/
3. "Cognizant founder Mahadeva to retire". Economic Times. 22 December 2003.
4. Apache Hadoop - https://hadoop.apache.org/
5. Apache Hive -https://hive.apache.org/
6. Scala - https://docs.scala-lang.org/
7. Software Testing - https://www.guru99.com/software-testing-introduction-
importance.html

31

You might also like