19bce7342 Ap2022236001121 RV4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

A project report on

DATA VISUALIZATION & DATA ENGINEERING

Submitted in partial fulfillment for the award of the degree of

Bachelors of Technology (B. Tech)


Computer Science and Engineering

by

NIMMAGADDA SAI VIJAY (19BCE7342)

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING (SCOPE)

May, 2023
DECLARATION

I here by declare that the thesis entitled “DATA VISUALIZATION &


DATA ENGINEERING” submitted by me, for the award of the degree of
Bachelors of Technology in VIT is a record of bonafide work carried out by me
under the supervision of Nagendra Shivakumar Mysore & Natasha Mohite.

I further declare that the work reported in this thesis has not been submitted
and will not be submitted, either in part or in full, for the award of any other degree
or diploma in this institute or any other institute or university.

Place: Amaravati NIMMAGADDA SAI VIJAY

Date: 26-05-2023 Signature of the Candidate


INTERNSHIP COMPLETION CERTIFICATE
ABSTRACT

The aim of this project is to create dashboards and reports to get some important insights for a client who
can take decisions on the basis of these dashboards. Another aim is to create Robotic Process Automation
which means creating automation for specific workflow and Data Engineering which consists of creating
data pipelines, ETL (Extract, Transform, Load) processes, and cloud computing. With the help of the power
app engine and power apps, I tried to implement a leave tracker where a person can apply for leave, and all
the processes were completely automated.

i
ACKNOWLEDGEMENT

It is my pleasure to express with deep sense of gratitude to Nagendra Shivakumar


Mysore (Sr. Manager Training and Learning head) & Natasha Mohite (Manager)
for making me familiar with the intricacies of Data Analytics. The Way They supported
me and showed me the way through the training period. Sincere thanks to all my
colleagues at “Genpact” for their support and assistance throughout the project. It is
definitely a wonderful experience to have such a good surrounding of people who have
helped me on every step where I might have fallen.

I would like to express my gratitude to G. Viswanathan, Sankar Vishwanathan,


S. V. Kota Reddy, and S. V. Sudha, SCOPE, for providing with an environment to
work in and for his inspiration during the tenure of the course.

In jubilant mood I express ingeniously my whole-hearted thanks to Saroj Kumar


Panigrahy(Associate Professor), all teaching staff and members working as limbs
of our university for their not-self-centered enthusiasm coupled with timely
encouragements showered on me with zeal, which prompted the acquirement of the
requisite knowledge to finalize my course study successfully. I would like to thank
my parents for their support.

It is indeed a pleasure to thank my friends who persuaded and encouraged me to take


up and complete this task. At last but not least, I express my gratitude and
appreciation to all those who have helped me directly or indirectly toward the
successful completion of this project.

Place: Amaravati NIMMAGADDA SAI VIJAY

Date: 26-05-2023 Name of the student

ii
CONTENTS

CONTENTS PAGE NO

ABSTRACT………………………………………………………………...i
ACKNOWLEDGMENT…………………………………………………..ii
LIST OF FIGURES………………………………………………………..v

CHAPTER 1: INTRODUCTION………………………………………...1-3
1.1 INTRODUCTION OF THE PROJECT………………………………...1
1.2 OBJECTIVES…………………………………………………………..1
1.3 OVERVIEW OF THE ORGANIZATION……………………………..1

CHAPTER 2: MICROSOFT POWER BI……………………………….4-16


2.1 MICROSOFT POWER BI……………………………………………...4
2.2 POWER BI COMPONENTS…………………………………………...5
2.3 WHAT IS POWER BI DESKTOP……………………………………...5
2.4 POWER BI SERVICES…………………………………………………6
2.5 CONNECT TO DATA………………………………………………….7
2.6 TRANSFORM AND CLEAN DATA, CREATE A MODEL………….7
2.7 M- CODE……………………………………………………………….8
2.8 MEASURES IN POWER BI…………………………………………...9
2.9 FILTER IN POWER BI………………………………………………...10
2.10 TASK GIVEN BY GENPACT………………………………………..11
2.11 PUBLISH TO POWER BI..………….…………. ………….………...15

CHAPTER 3: TABLEAU………………………………………………...17-32
3.1 TABLEAU………….………….………….………….………….……..17
3.2 DEVELOPER TOOLS………….………….………….………….……17
3.3 WHAT IS DATA………….………….………….………….…………19
3.4 WHAT IS DATA VISUALIZATION………….………….…………..20
3.5 PARAMETERS………….………….………….………….…………..21
3.6 FORECASTING………….………….………….………….………….23
3.7 JOINS………….………….………….………….………….………….24
3.8 ADVANCED CHARTS IN TABLEAU………….………….………...24
iii
3.9 LOD………….………….………….………….………….……………28
3.10 GENPACT’S GIVEN TASK………….………….………….……….28

CHAPTER 4: POWER PLATFORM…………………………………...33-40


4.1 MICROSOFT POWER PLATFORM………….………….…………...33
4.2 CONNECTING TO DYNAMICS 365 AND MICROSOFT 365……...34
4.3 DATAVERSE………….………….………….………….………….…34
4.4 POWER APP………….………….………….………….………….…..35
4.5 GENPACT’S GIVEN TASK TO A LEAVE TRACKER………….….35
4.6 ER DIAGRAM………….………….………….………….……………38
4.7 FLOW-CHART………….………….………….………….…………...38
4.8 POWER AUTOMATE………….………….………….………….……39

CHAPTER 5: ALTERYX………………………………………………...41-49
5.1 ALTERYX………….………….………….………….………….……..41
5.2 PRODUCTS OF ALTERYX………….………….………….…………41
5.3 WHAT IS ALTERYX DESIGNER………….………….………….…..41
5.4 ALTERYX SERVER………….………….………….………….……...42
5.5 DESIGNER TOOLS LIST………….………….………….……………42
5.6 TASK GIVEN BY GENPACT………….………….………….……….43

CHAPTER 6: CONCLUSION……………………………………………50

CHAPTER 7: REFERENCES……………………………………………51

iv
LIST OF FIGURES

2.1.1: POWER BI WELCOME SCREEN…………………………………………………………4


2.3.1: POWER BI DESKTOP……………………………………………………………………..6
2.3.2: REPORT, DATA, MODEL PANE…..……………………………………………………..6
2.5.1: CONNECT TO DIFFERENT SOURCES OF DATA ……………………...……………...7
2.5.2: CONNECT TO SQL SERVER DATABASE……………..………………………………7
2.6.1: POWER QUERY EDITOR…………………………….………………………………….8
2.6.2 POWER QUERY STEPS.……………………………..…………………………………...8
2.7.1: M CODE…………………………………………………..……………………………….9
2.8.1: MEASURES USING DAX LANGUAGE………………………………………………...10
2.9.1: FILTERS………………………………………………………………………..………….11
2.9.2: ADVANCED FILTER IN POWER BI……………………………………………………11
2.10.1: INTERNATIONAL MATCHES……………………………………….………………...12
2.10.2: QATAR WORLD-CUP INFO……………..………………………………………..……12
2.10.3: ARGENTINA'S PATH TO VICTORY…………………………………………………..13
2.10.4 FIFA 2022 PLAYER PERFORMANCE………………………………………………….13
2.10.5: FOOTBALL LEGENDS…………………………………………………...……………..14
2.10.6: HOST COUNTRY, TEAMS PERFORMANCE…………………………….…………...14
2.10.7: FIFA GOALS AND CLUBS……………………………………………….…………….14
2.10.8: WORLD CUP STATS…………………………………………………….……………...15
2.11.1: PUBLISH SECTION IN POWER BI………………………………….…………………15
2.11.2: PUBLISH DESTINATION………………………………………..……………………..15
2.11.3: PUBLISH STATUS…………………………………………….………………………...16
2.11.4 POWER BI SERVICE………………………………………….………………………....16
3.1.1 TABLEAU PRODUCT SUITE……………………………...……………………………..17
3.3.1: STRUCTURED DATA………………………………….………………………………...19
3.4.1: DATA VISUALIZATION TOOLS………………………………………………….……20
3.4.2: DATA VISUALIZATION AS BAR CHART…………………………………………….20
3.5.1: TOP N PARAMETER…………………………………………………………………….21
3.5.2: MAP CHART……………………………………………………………………………...21
3.5.3: DATE FIELD PARAMETER……………………………………………………………..22
3.5.4: DYNAMIC MEASURE……………………………………………………….………..23
3.5.5: SETS…………………………………………………………………………………….23
3.6.1: FORECASTING IN TABLEAU……………………………………….……………….24
3.7.1: JOINS IN TABLEAU……………………………………………...……………………24
3.8.1: DONUT CHART………………………………………………………………………..25
3.8.2: WATERFALL CHART…………………………………..……………………………..25
3.8.3: BUMP CHART………………………………………….………………………………26
3.8.4: GAUGE CHART……………………………………...………………………………...26
3.8.5: GAUGE CHART FORMULA………………………………………………………….27
v
3.8.6: BOX PLOT……………………………………………………………..……………….27
3.9.1: LOD IN TABLEAU……………………………………………………………………..28
3.10.1: SCENARIO 1…………………………………………………….…………………….29
3.10.2: SCENARIO 2………………………………………………….……………………….29
3.10.3: SCENARIO 3……………………………………………….………………………….30
3.10.4: SCENARIO 4…………………………………………….…………………………….30
3.10.5 SCENARIO 5…………………………………………….……………………………..31
3.10.6: SCENARIO 6………………………………………….……………………………….31
3.10.7: SCENARIO 7……………………………………….………………………………….32
4.2.1: POWER PLATFORM…………………………………………………………………...34
4.3.1: DATAVERSE……………………………………………………………………………35
4.5.1: HOMESCREEN…………………………..……………………………………………...36
4.5.2: HOMESCREEN’S VIEW REQUEST………………………………..………………….36
4.5.3: REQUEST SCREEN……………………………………………...……………………...37
4.5.4: FORMSCREEN…………………………………………………..………………………37
4.6.1: ENTITY RELATIONSHIP DIAGRAM OF LEAVE TRACKER………………………38
4.7.1: FLOW CHART OF LEAVE TRACKER………………………………………………..38
4.8.1: POWER AUTOMATE FLOW…………………………………………………………...39
4.8.2: COMPLETE AUTOMATION FOR EMAIL……………………………………………40
5.3.1: ALTERYX DESIGNER…………………………………………………………………42
5.6.1: INPUT DATA, BROWSE, FILTER, OUTPUT DATA IN CONTAINER 1……..…….44
5.6.2: RESULTS FOR FIGURE 5.5.1………………………………………………………….44
5.6.3: OUTPUT FOR FILTER TOOL………………………………………………………….45
5.6.4: OUTPUT FOR BROWSE TOOL AND OUTPUT DATA TOOL……………………45
5.6.5: JOINS IN ALTERYX…………………………………………………………..……..45
5.6.6: SUMMARIZE TOOL AND ITS OUTPUT…………………………………………...46
5.6.7: DATETIME TOOL AND ITS OUTPUT……………………………………………...46
5.6.8: SORT TOOL ……………………………………………………………………..……46
5.6.9: SELECT TOOL……………………………………………………………….………..47
5.6.10: RANDOM % SAMPLE TOOL………………………………………..……………...47
5.6.11: FORMULA TOOL……………………………………………………………………48
5.6.12: DATA CLEANSING TOOL………………………………………………………….48
5.6.13: TRANSPOSE TOOL………………………………………………………………….49
5.6.14: MACHINE LEARNING CLASSIFICATION TOOL………………………………..49

vi
Chapter 1

Introduction

1.1 INTRODUCTION:
The aim of this project is to create dashboards and reports to get some important insights for a client
who can take decisions on the basis of these dashboards. Another aim is to create Robotic Process
Automation which means creating automation for specific workflow and Data Engineering which
consists of creating data pipelines, ETL (Extract, Transform, Load) processes, and cloud computing.
1.2 OBJECTIVE:
The objective of this project is to make dashboards and reports, make data pipelines, perform ETL
process, optimization of workflow using robotic process automation (i.e generating automatic email
for specific work), make a customized App for customer’s needs, and cloud computing using AWS
and Azure.
1.3 OVERVIEW OF THE ORGANIZATION:

Genpact (NYSE: G) is a global professional services firm delivering the outcomes that transform
our clients' businesses and shape their futures. We're guided by our real-world experience
redesigning and running thousands of processes for hundreds of global companies. Our clients –
including many in the Global Fortune 500 – partner with us for our unique ability to combine deep
industry and functional expertise, leading talent, and proven methodologies to drive collaborative
innovation that turns insights into action and delivers outcomes at scale. We create lasting
competitive advantages for our clients and their customers, running digitally enabled operations and
applying our Data-Tech-AI services to design, build, and transform their businesses. And we do it
all with purpose. From New York to New Delhi and more than 30 countries in between, our
115,000+ team is passionate in its relentless pursuit of a world that works better for people. Get to
know us at Genpact.com and on LinkedIn, Twitter, YouTube, and Facebook.

Genpact began in 1997 as a business unit within General Electric. Then, in January 2005, we became
an independent company, bringing our process expertise and unique DNA in Lean management to
more companies. We became a publicly traded company in 2007. Since December 31, 2005, we
have expanded from 19,000+ employees and annual revenues of $491.90 million to 115,000+
employees and annual revenues of $4.37 billion as of December 31, 2022.
Genpact's partner ecosystem builds on our industry expertise, our deep industry knowledge, and our
partners' technology solutions. Together, we inspire our clients to innovate, transform operations,

1
accelerate ROI, and drive top-line growth
CGRLH Vertical: Genpact Ltd. engages in business process management, outsourcing, shared
services, and information outsourcing. The company operates through the following segments:
Banking, Capital Markets and Insurance (BCMI), Consumer Goods, Retail, Life Sciences, and
Healthcare (CGRLH), and High Tech, Manufacturing, and Services (HMS). The BCMI segment
provides application processing, collections, and customer services, equipment and auto loan
servicing, mortgage origination, and servicing, risk management and compliance services, reporting
and monitoring services, wealth management operations support, end-to-end information
technology services, application development and maintenance, managed services, financial crimes
support, and consulting. The CGRLH segment offers supply chain management, pricing and trade
promotion management, order management, digital commerce, customer experience, and risk
management. The HMS segment involves industry-specific solutions for the Industrial Internet of
Things (IIoT), user experience, order and supply chain management, data engineering, digital
content management, and risk management.
A Few Clients of Genpact:
AppZen:
With AppZen's AI technology and Genpact's compliance-as-a-service solution, clients identify fraud
and maintain compliance.
Blue Prism:
Blue Prism and Genpact develop intelligent automation and RPA solutions for global clients.
Deloitte:
A strategic alliance between two best-in-class service providers delivering end-to-end business
transformation solutions.
E2open:
E2open and Genpact partner to optimize transportation, logistics, global trade management, and
control tower operations.

2
3
Chapter 2

Microsoft Power BI

2.1 Microsoft Power BI:


It is a business intelligence (BI) platform that provides non-technical business users with tools for
aggregating, analyzing, visualizing, and sharing data. Power BI's user interface is fairly intuitive
for users familiar with Excel, and its deep integration with other Microsoft products makes it a
versatile self-service tool that requires little upfront training.
With Power BI Desktop, you can build advanced queries, models, and reports that visualize data.
You can also build data models, create reports, and share your work by publishing to the Power BI
service. Power BI Desktop is a free download.

To get Power BI Desktop, you can use one of two approaches.


• Install as an app from the Microsoft Store.
• Download directly as an executable and install it on your computer.
When you launch Power BI Desktop, a welcome screen appears-

Figure 2.1.1: Power BI Welcome Screen


The following list provides the minimum requirements to run Power BI Desktop:
• Windows 8.1 or Windows Server 2012 R2 or later.
• .NET 4.6.2 or later.
• Microsoft Edge browser (Internet Explorer is no longer supported)
• Memory (RAM): At least 2 GB is available, 4 GB or more is recommended.
• Display: At least 1440x900 or 1600x900 (16:9) is required. Lower resolutions such as
1024x768 or 1280x800 aren't supported because some controls (such as closing the startup
screens) display beyond those resolutions.

4
• Windows display settings: If you set your display to change the size of text, apps, and other
items to more than 100%, you won't see some dialogs that you must interact with to
continue using Power BI Desktop. If you encounter this issue, check your display settings
in Windows by going to Settings > System > Display, and using the slider to return display
settings to 100%.
• CPU: 1 gigahertz (GHz) 64-bit (x64) processor or better recommended.
• WebView2: If WebView2 wasn't automatically installed with Power BI Desktop or if it
was uninstalled.
2.2 Power BI components:
Microsoft Power BI works by connecting data sources and providing a dashboard of BI to the users.
It can connect with just an Excel spreadsheet or bring together cloud-based and on-premises data
warehouses. Data pulled from cloud-based sources, such as Salesforce CRM, is automatically
refreshed.

With applications such as an Excel workbook or Power BI Desktop file connected to online or on-
premises data sources, Power BI users must manually refresh or set up a refresh schedule to ensure
the data in Power BI reports and dashboards use the most current data available.

Power BI consists of a collection of apps and can be used either on a desktop, as a SaaS product,
or on a mobile device. Power BI Desktop is the on-premises version, Power BI Service is the cloud-
based offering and mobile Power BI runs on mobile devices.

The different components of Power BI are meant to let users create and share business insights in
a way that fits with their role.

Included within Power BI are several components that help users create and share data reports.
Those are the following:
• Power Query: a data mashup and transformation tool
• Power Pivot: a memory tabular data modeling tool
• Power View: a data visualization tool
• Power Map: a 3D geospatial data visualization tool
• Power Q&A: a natural language question and answering engine.
2.3 What is Power BI Desktop?
Power BI Desktop is a free application you install on the local computer that lets you connect to,
transform, and visualize your data. With Power BI Desktop, you can connect to multiple different
sources of data, and combine them (often called modeling) into a data model. This data model lets
you build visuals, and collections of visuals you can share as reports, with other people inside your
organization. Most users who work on business intelligence projects use Power BI Desktop to
create reports and then use the Power BI service to share their reports with others.

5
Figure 2.3.1: Power BI Desktop
The most common uses for Power BI Desktop are as follows:
• Connect to data.
• Transform and clean data to create a data model.
• Create visuals, such as charts or graphs that provide visual representations of the data.
• Create reports that are collections of visuals on one or more report pages.
• Share reports with others by using the Power BI service.
• There are three views available in Power BI Desktop, which you select on the left side of
the canvas. The views, shown in the order they appear, are as follows:
• Report: You create reports and visuals, where most of your creation time is spent.
• Data: You see the tables, measures, and other data used in the data model associated with
your report, and transform the data for best use in the report's model.
• Model: You see and manage the relationships among tables in your data model.

Figure 2.3.2: Report, Data, Model Pane


2.4 Power BI Services:
Power BI is a collection of software services, apps, and connectors that work together to help you
create, share, and consume business insights in the way that serves you and your business most
effectively. The Microsoft Power BI service (https://app.powerbi.com), sometimes referred to as
Power BI online, is the software as a service (SaaS) part of Power BI. In the Power BI service,
dashboards help you keep a finger on the pulse of your business. Dashboards display tiles, which

6
you can select to open reports for exploring further. Dashboards and reports connect to datasets
that bring all of the relevant data together in one place.
2.5 Connect to data:
To get started with Power BI Desktop, the first step is to connect to data. There are many different
data sources you can connect to from Power BI Desktop.
To connect to data:
From the Home ribbon, select Get Data > More.
1. The Get Data window appears, showing the many categories to which Power BI Desktop
can connect.

Figure 2.5.1: Connect to different sources of data


2. When you select a data type, you're prompted for information, such as the URL and
credentials, necessary for Power BI Desktop to connect to the data source on your behalf.

Figure 2.5.2: Connect to SQL Server Database


2.6 Transform and clean data, create a model:
In Power BI Desktop, you can clean and transform data using the built-in Power Query Editor.
With Power Query Editor, you make changes to your data, such as changing a data type, removing
columns, or combining data from multiple sources. It's like sculpting: you start with a large block
of clay (or data), then shave off pieces or add others as needed, until the shape of the data is how
you want it.

7
To start Power Query Editor:

• On the Home ribbon, in the Queries section, select Transform data.

Figure 2.6.1: Power Query Editor


Each step you take in transforming data (such as renaming a table, transforming a data type, or
deleting a column) is recorded by Power Query Editor. Every time this query connects to the data
source, those steps are carried out so that the data is always shaped the way you specify.

The following image shows the Power Query Editor window for a query that was shaped and turned
into a model.

Figure 2.6.2 Power Query Steps


2.7 M- Code:
The M stands for data Mash-up, as power query is all about connecting to various different data
sources and “Mashing” them up.

M code is the language behind the scenes of power query. When you create a data transformation
in the power query editor UI, Excel is writing the corresponding M code for the query.

M is a functional language which means it is primarily written with functions that are called to

8
evaluate and return results. M code comes with a very large library of predefined functions
available and you can also create your own.
2.7.1 Where Can You Write Power Query M Code?
If you want to start writing or editing M code, you’re going to need to know where you can do this.
There are two places where it’s possible, in the formula bar or in the advanced editor.

Figure 2.7.1: M Code

2.8 Measures in Power BI:


In Power BI Desktop, measures are created and displayed in Report View, Data View, or Model
View. Measures you create yourself appear in the Fields list with a calculator icon. You can name
measures whatever you want, and add them to a new or existing visualization just like any other
field.
Measures and Quick Measures are very powerful tools in Power BI. We use DAX formulas to
create a measure field that you can then use in the report or Data view. Examples of measures are
sums, averages, minimum or maximum values, and counts. For more advanced calculations, you
can create yourself using DAX. The calculated results of measures are always changing in response

9
to your interactions.
1. Data Analysis Expressions (DAX):
It is a programming language that is used throughout Microsoft Power BI for creating
calculated columns, measures, and custom tables. It is a collection of functions,
operators, and constants that can be used in a formula, or expression, to calculate and
return one or more values. You can use DAX to solve a number of calculations and data
analysis problems, which can help you create new information from data that is already
in your model.
Example DAX Code:
Projected Sales = SUM('Sales'[Last Years Sales])*1.06

Figure 2.8.1: Measures using DAX Language

2.9 Filter in Power BI:


Filter in Power BI sorts data and information based on some selected criteria. That is, you can select
particular fields or values within fields and view only the information related to them. For instance,
you have a dataset related to the sales of a store. Now, using the filters you can filter out
unnecessary information.

10
Figure 2.9.1: Filters

Figure 2.9.2: Advanced Filter In Power BI

2.10 The task given by Genpact to Create a Power BI DashBoard for FIFA World
Cup:
We had been given FIFA World Cup data in form of Excel with a data dictionary, a data dictionary
that tells about the dataset.
Steps used to make a dashboard:
Step 1: Load the data in power bi using connectors i.e Excel
Step 2: Transform the data By removing the Null values and cleaning them if requires some
additional columns then add them to the table then apply and load back into power bi.
Step 3: Make the dashboard as per the requirements of clients.
The total number of pages that were in the dashboard was eight given as:
Page 1: International matches

11
Figure 2.10.1: International Matches
Page 2: Qatar World Cup 2022 Team Info

Figure 2.10.2: Qatar World-Cup Info


Page 3: Argentina’s Path to Victory

12
Figure 2.10.3: Argentina's Path to Victory
Page 4: FIFA 2022 Player Performance by Field Position

Figure 2.10.4 FIFA 2022 Player Performance


Page 5: Classification of Football Legends

13
Figure 2.10.5: Football Legends
Page 6: Host Country, Teams Performance, FIFA Ranking, and Groups

Figure 2.10.6: Host Country, Teams Performance


Page 7: FIFA Goals By Legends And Club

Figure 2.10.7: FIFA Goals And Clubs

14
Page 8: World Cup Goals by year and Total Goals by the tournament for international matches

Figure 2.10.8: World Cup Stats

2.11 Publish to Power BI:


When you publish a Power BI Desktop file to the Power BI service, you publish the data in the
model to your Power BI workspace. The same is true for any reports you created in Report View.
You’ll see a new dataset with the same name and any reports in your Workspace navigator.

Publishing from Power BI Desktop has the same effect as using Get Data in Power BI to connect
to and upload a Power BI Desktop file.
To publish a Power BI Desktop dataset and reports
1. In Power BI Desktop, choose File > Publish > Publish to Power BI or select Publish on the
Home ribbon.

Figure 2.11.1: Publish Section in Power BI


2. Sign in to Power BI if you aren't already signed in.
3. Select the destination. You can search your list of available workspaces to find the
workspace into which you want to publish. The search box lets you filter your workspaces.
Select the workspace, and then click the Select button to publish.

Figure 2.11.2: Publish Destination

15
4. When publishing is complete, you receive a link to your report. Select the link to open the
report on your Power BI site.

Figure 2.11.3: Publish Status


Power BI services look like this:

Figure 2.11.4 Power BI Service

16
Chapter 3

Tableau

3.1 Tableau:
It is a visual analytics platform transforming how we use data to solve problems—empowering
people and organizations to make the most of their data.
What is Tableau?
Tableau is a powerful and fastest-growing data visualization tool in the Business Intelligence
Industry. It helps in simplifying raw data in a very easily understandable format. Tableau helps
create data that can be understood by professionals at any level in an organization. It also allows
non-technical users to create customized dashboards. Data analysis is very fast with the Tableau
tool and the visualizations created are in dashboards and worksheets.
The best features of Tableau software are:
• Data Blending
• Real-time analysis
• Collaboration of data
The great thing about Tableau software is that it doesn’t require any technical or any kind of
programming skills to operate. The tool has garnered interest among people from all sectors such
as business, researchers, different industries, etc.
Tableau Product Suite:
The Tableau Product Suite consists of :
• Tableau Desktop
• Tableau Public
• Tableau Online
• Tableau Server
• Tableau Reader

Figure 3.1.1 Tableau Product Suite


For a clear understanding, data analytics in the Tableau tool can be classified into two sections-
3.2 Developer Tools:
The Tableau tools that are used for development such as the creation of dashboards, charts, report

17
generation, and visualization fall into this category. The Tableau products, under this category, are
the Tableau Desktop and the Tableau Public.
Sharing Tools: As the name suggests, the purpose of these Tableau products is to share the
visualizations, reports, and dashboards that were created using the developer tools. Products that
fall into this category are Tableau Online, Server, and Reader.
Tableau Desktop:
Tableau Desktop has a rich feature set and allows you to code and customize reports. Right from
creating the charts, and reports, to blending them all together to form a dashboard, all the necessary
work is created in Tableau Desktop. For live data analysis, Tableau Desktop provides connectivity
to the Data Warehouse, as well as other various types of files. The workbooks and the dashboards
created here can be either shared locally or publicly. Based on the connectivity to the data sources
and publishing option, Tableau Desktop is classified into:
Tableau Desktop Personal: The development features are similar to Tableau Desktop. The
personal version keeps the workbook private, and access is limited. The workbooks cannot be
published online. Therefore, it should be distributed either Offline or in Tableau Public.
Tableau Desktop Professional: It is pretty much similar to Tableau Desktop. The difference is
that the work created in the Tableau Desktop can be published online or in Tableau Server. Also,
in the Professional version, there is full access to all sorts of the datatype. It is best suitable for
those who wish to publish their work in Tableau Server.
Tableau Public:
It is a Tableau version specially build for cost-effective users. By the word “Public,” it means that
the workbooks created cannot be saved locally; in turn, they should be saved to Tableau’s public
cloud which can be viewed and accessed by anyone.
There is no privacy to the files saved to the cloud since anyone can download and access the same.
This version is the best for individuals who want to learn Tableau and for the ones who want to
share their data with the general public.
Tableau Server:
The software is specifically used to share the workbooks and visualizations that are created in the
Tableau Desktop application across the organization. To share dashboards in the Tableau Server,
you must first publish your work in the Tableau Desktop. Once the work has been uploaded to the
server, it will be accessible only to licensed users. However, It’s not necessary that licensed users
need to have the Tableau Server installed on their machines. They just require the log in credentials
with which they can check reports via a web browser. The security is high in Tableau servers, and
it is much suited for quick and effective sharing of data in an organization. The admin of the
organization will always have full control over the server. The hardware and the software are
maintained by the organization.
Tableau Online:
As the name suggests, it is an online sharing tool of Tableau. Its functionalities are similar to
Tableau Server, but the data is stored on servers hosted in the cloud which are maintained by the
Tableau group. There is no storage limit on the data that can be published in Tableau Online.
Tableau Online creates a direct link to over 40 data sources that are hosted in the cloud such as

18
MySQL, Hive, Amazon Aurora, Spark SQL, and many more. To publish, both Tableau Online and
Server require the workbooks created by Tableau Desktop. Data that is streamed from web
applications say Google Analytics, Salesforce.com are also supported by Tableau Server and
Tableau Online.
Tableau Reader:
Tableau Reader is a free tool that allows you to view the workbooks and visualizations created
using Tableau Desktop or Tableau Public. The data can be filtered but editing and modifications
are restricted. The security level is zero in Tableau Reader as anyone who gets the workbook can
view it using Tableau Reader.
3.3 What is data:
Data refer to distinct pieces of information, usually formatted and stored in a way that is concordant
with a specific purpose. Data can exist in various forms- as numbers or text records on paper, as
bits or bytes stored in electronic memory, or as facts living in a person’s mind.
Structured Data-
Structured data is data whose elements are addressable for effective analysis. It has been organized
into a formatted repository that is typically a database. It concerns all data which can be stored in
databases SQL in a table with rows and columns.

Figure 3.3.1: Structured Data


Semi-Structured data-
Semi-Structured data is information that doesn’t reside in a relational database but has some
organizational properties that make it easier to analyze. With some processes, you can store them
in a relational database.
Example: XML Data
{
“EMPLOYEE”: {
‘SALES”: {
“648229”: {
“NAME”: “ ADARSH TIWARI”
“DOB”: “01-01-1999”
},

19
“648666” :{
“NAME”: “DAVID”
“DOB”: “01-10-1990”
“MISC”: “ON A LEAVE”
}
}
}
}
Unstructured data-
Unstructured data is data that is not organized in a predefined manner or doesn’t have a predefined
data model, thus it is not a good fit for a mainstream relational database.
Example: Word, PDF, Text, Media logs.
3.4 What is Data Visualization:
Data visualization is the graphical representation of information and data. By using visual elements
like charts, graphs, and maps, data visualization tools provide a way to see and understand trends,
outliers, and patterns in data which results in good decision-making.

Figure 3.4.1: Data Visualization Tools

Figure 3.4.2: Data Visualization as Bar Chart

20
3.5 Parameters:
Parameters in Tableau enable users to add some advanced calculations and calculated fields.
Parameter provides adding a non-existing variable to the entire work and simplifies the needs and
requirements to analyze and visualize the data.
The parameters in Tableau are the workbook variables like a number, data, or calculated field that
allows users to replace a constant value in a calculation, filter, or reference line.
The task is given by Genpact:
• Top N Parameters in Tableau
• Date Field Parameters in Tableau
• Dynamic Measures
• Dynamic Dimensions
• Filter
• Sets
• Reference Line
• User input
• Global Filter
• What-if analysis - increase in sales
• Bar/Line chart

Figure 3.5.1: Top N Parameter

21
Figure 3.5.2: Map Chart

Figure 3.5.3: Date Field Parameter

22
Figure 3.5.4: Dynamic Measure

Figure 3.5.5: Sets


3.6 Forecasting:
Forecasting in Tableau uses a technique known as exponential smoothing. Forecast algorithms try
to find a regular pattern in measures that can be continued into the future.

23
Figure 3.6.1: Forecasting in Tableau
3.7 Joins:
It's primarily used when you have to merge data sets from the same source.
Type of joins are:
• Inner Join
• Left Join
• Right Join
• Full outer Join

Figure 3.7.1: Joins In Tableau


3.8 Advanced Charts in Tableau:
The task given by Genpact to implement these charts:

24
Donut Chart:

Figure 3.8.1: Donut Chart


Waterfall Chart:

Figure 3.8.2: Waterfall Chart


Bump Chart:

25
Figure 3.8.3: Bump Chart
Gauge Chart: To Implement a Gauge chart we have to make a few more things.

Figure 3.8.4: Gauge Chart

26
Figure 3.8.5: Gauge Chart Formula
Box Plot:

27
Figure 3.8.6: Box Plot
3.9 LOD:
Level of Detail expressions (also known as LOD expressions) allow you to compute values at the
data source level and the visualization level. However, LOD expressions give you even more
control over the level of granularity you want to compute. They can be performed at a more granular
level (INCLUDE), a less granular level (EXCLUDE), or an entirely independent level (FIXED).

Figure 3.9.1: LOD in Tableau


3.10 The task is given by Genpact to implement these questions and publish them
on Tableau Public:

28
Figure 3.10.1: Scenario 1

Figure 3.10.2: Scenario 2

29
Figure 3.10.3: Scenario 3

Figure 3.10.4: Scenario 4

30
Figure 3.10.5 Scenario 5

Figure 3.10.6: Scenario 6

31
Figure 3.10.7: Scenario 7

32
Chapter 4

Power Platform

4.1 Microsoft Power Platform:


It is a powerful set of applications that allow you to automate processes, build solutions, analyze
data, and create virtual agents. This blog and video will introduce the various components of the
Power Platform, discuss how you can harness the power of data, and discover the impact these
solutions have when integrated with other Microsoft solutions.
4.1.1 Power Apps:
Power Apps is the idea of building applications that do not take months and a lot of
development work to do. Power Apps gives users the ability to build no-code applications as
well as with very low code. There is very minimalistic code involved to build a very
sophisticated intelligent type of application that includes the logic of your business. Those
applications, as well as the other Power Platform solutions, can be used on different devices
whether it’s your Android phone or iOS operating systems, your laptop, tablet, desktop, on
the web, or online anywhere.
Not only can you build model-driven or canvas apps which are across multiple sources, but
you can also have highly customized tasks as well that gives you an immersive experience
from starting your data and your data model, and then going all the way to your business
processes and consuming that across multiple devices.

Security is a big component within Power Apps and the entire Power Platform. In Power
Apps, there is high enterprise security, management, and control that you can manage
through your Azure Active Directory to enable policies that have multifactor authentication.
You can have full audit logs and use analytics that is there, or your data loss prevention
policies that you can put in place, essentially to manage your data all through the admin
center, providing you that full experience to centrally manage your apps across your
organization, as well as what has been deployed outside the organization.

The one key aspect that I want to emphasize for all the Power Platform applications is the
idea that you can connect just about any data, as well as integrate that data across your
existing systems so that you can extend your solution. You can utilize the data that is within
Dynamics 365 and inherently connect that data to an app that you’ve built and utilize the
information that you get out of there.
4.1.2 Power Automate:
Power Automate, formerly known as Microsoft Flow, enables process automation to get rid
of rudimentary manual tasks and eliminate the manual errors that could arise. Power
Automate is a powerful workflow automation tool that allows you to connect different
systems together and take that data and translate it. There is one source of truth and you can
work throughout different Microsoft systems. Power Automate allows you to automate and

33
build business processes across your apps and the services that you have already deployed.
These can vary from simple automation to very advanced scenarios like creating branches or
having different trigger responses and trigger actions.

One example could be using workflows for approval processes or something as simple as
getting notifications about different platforms where you work. Power Automate can connect
to those different data sets.

The aspect of security, connecting that data, using these applications as a way of making data
work for you and making it meaningful for your organization, and essentially having a
stronger system are some of the attributes of Power Automate. Power Automate has a range
of functionality. There are some that are natively integrated within the Microsoft Cloud
applications and others you can build in, for example, through Microsoft Dataverse (formerly
known as the Common Data Service), or you can build custom workflows or custom
applications. Strong data connectivity, and built-in platforms that are seamlessly integrated
together to give you a fuller, more intelligent, automated experience is what Power Automate
can do.
4.1.3 Power Virtual Agents:
Power Virtual Agents are intelligent virtual bots that can communicate and do a lot of the
work that you might need to do manually or hire someone else to do, by using a robot online.
4.2 Connecting to Dynamics 365 and Microsoft 365:
How do we connect the Power Platform with our existing systems? If you’re utilizing Power BI,
Power Apps, or Power Automate, you can use them as standalone applications, but there is a much
more powerful experience once you integrate and connect them into a common or a more unified
ecosystem.

Figure 4.2.1: Power Platform


4.3 Dataverse:
Dataverse lets you securely store and manage data that are used by business applications. Data
within Dataverse is stored within a set of tables. A table is a set of rows (formerly referred to as
records) and columns (formerly referred to as fields/attributes). Each column in the table is
designed to store a certain type of data, for example, name, age, salary, and so on. Dataverse
includes a base set of standard tables that cover typical scenarios, but you can also create custom

34
tables specific to your organization and populate them with data by using Power Query. App
makers can then use Power Apps to build rich applications that use this data.

Figure 4.3.1: Dataverse


4.4 Power App:
We have three types of apps in Power App.
• Canvas app
• Model-driven app
• Portal-Power Pages Websites
4.4.1 Canvas App:
A Canvas app is a ‘traditional’ app - the sort of mobile or tablet app that everyone is familiar
with. But where traditional apps require knowledge of a programming language like C# or
JavaScript to write, canvas apps are built by dragging and dropping pre-built controls onto a
blank canvas, and Excel-style expressions are written to specify logic and handle navigation.
4.4.2 Model-Driven App:
With Model-driven apps, you can implement business process flows to ensure consistency
and provide structure to processes by defining stages and actions for users to progress
through.
4.4.3 Power Pages:
Microsoft Power Pages is a secure, enterprise-grade, low-code software as a service (SaaS)
platform for creating, hosting, and administering modern external-facing business websites.
Whether you're a low-code maker or a professional developer, Power Pages enable you to
rapidly design, configure, and publish websites that seamlessly work across web browsers
and devices.
4.5 Task Given by Genpact to make a leave tracker for an employee:
HomeScreen: At the home screen, on the top, we can see user details along with user name and
photo using the power app query language, we have two buttons given, one is for view requests(top
right) and another is for Submitting a new leave request. The data is being managed by Microsoft
Azure Cloud using Sharepoint for business. When we click on View Request, the control goes to
the Details panel.

35
Figure 4.5.1: HomeScreen

Figure 4.5.2: HomeScreen’s View Request

36
Request Screen: In this Screen, we can see the Employee Details i.e. Email, Name, Date in
between they are taking leave, and reason for leave. When we click on any particular row of details
the control goes to the detailed form that a particular employee has submitted for leave.

Figure 4.5.3: Request Screen


Form Screen: In this Screen, a particular user can request for the leave it will store in SharePoint’s
cloud data. We have performed certain coding for this section.

Figure 4.5.4: FormScreen

37
4.6 ER Diagram:

Figure 4.6.1: Entity Relationship Diagram of Leave Tracker


4.7 Flow-Chart:

38
Figure 4.7.1: Flow chart of Leave Tracker
4.8 Power Automate:
Microsoft Power Automate is a very simple drag-and-drop workflow-based automation software
created by Microsoft to automate manual and repetitive tasks. The main aim of creating Microsoft
Power Automate (earlier known as Microsoft Flow) was to allow coders and non-coders to
automate repetitive tasks following a sequential rule-based flow.

The task is given by Genpact to automate an email for the leave tracker:
For this, we need to start with power automate in the Microsoft Power platform, go to create pane on
the left side, and select the Automated Cloud Flow to trigger an event.

Figure 4.8.1: Power Automate Flow


Now we have to give Flow Name and choose to flow’s triggers, in this, i will go with when an item
is created or modified then go to create, give the site address and list name and create the
automation as per your need. Now an auto-generated email will be received.

39
Figure 4.8.2: Complete Automation for Email

40
Chapter 5

Alteryx

5.1 Alteryx:
It is an ETL(extract, transform, load) tool used in data engineering. The Alteryx Analytics
Automation Platform delivers end-to-end automation of analytics, machine learning, and data
science processes that accelerate digital transformation. Alteryx powers analytics for all by
providing the leading Analytics Automation Platform. Alteryx delivers easy end-to-end automation
of data engineering, analytics, reporting, machine learning, and data science processes. It enables
enterprises everywhere to democratize data analytics across their organizations for various use
cases.

5.2 Products of Alteryx:


• Alteryx Connect- changes how organizations discover, govern, and collaborate across data
and analytic assets.
• Alteryx Designer-The leading solution for data prep, blending, and analytics, with drag-and-
drop capabilities that speed up every step of the analytic process.
• Alteryx Promote-Promote gives data scientists the tools they need to develop, deploy, and
manage their models quickly and reliably, without any need for custom deployment code.
• Alteryx Server-Server provides a scalable server-based analytics solution that lets you create,
publish, and share analytic applications, schedule and automate workflow jobs, create,
manage, and share data connections, and control data access.
• Analytics Hub-With Alteryx Analytics Hub, every team can share analytic assets and
automate processes, reports, and insights in a central, secure, governed analytics environment.
• Alteryx Intelligence Suite- Alteryx Intelligence Suite offers machine learning, text mining,
and computer vision capabilities on top of the great features that are already available with
your Designer license.

5.3 What is Alteryx Designer and what can it do?


Alteryx Designer is a self-service data analytics software suite, which can perform ETL (Extract,
Transform, Load) operations, and some other things like manipulating your data, creating those
calculations that you might typically do in Excel, structuring your reports, or even using predictive
tools to forecast. It has an intuitive drag-and-drop interface that makes it intuitive for an ordinary
business user because you do not need to be a master coder or from an IT background. It allows
you to bring in multiple data sources from different locations, such as files, databases, or APIs.
You can bring in structured or unstructured data and prepare, blend, and analyze it. You can then
push these outputs into reports like files or charts, back to databases or APIs, or share the process
you built through Alteryx Server.

41
Figure 5.3.1: Alteryx Designer
5.4 Alteryx server:
The use of Alteryx Server in conjunction with Designer allows you to schedule your workflows on
a quarterly, monthly, daily, or basis that suits your needs best. The use of a Server makes it easy to
organize your workflows, share and collaborate.
5.5 Designer Tools List:
View a list of all tools in Alteryx Designer. Tools are grouped according to their tool categories. A
few important are given below:
5.5.1 In/Out:
● Auto Insights Uploader
● Browse
● Date Time Now
● Directory
● Input Data
● Map Input
● Output Data
● Text Input
5.5.2 Preparation Tool:
● Auto Field
● Create Samples
● Data Cleansing
● Filter
● Formula
● Generate Rows
● Imputation
● Multi-Field Binning

42
● Multi-Field Formula
● Multi-Row Formula
● Oversample Field
● Random % Sample
● Record ID
● Sample
● Select
● Select Records
● Sort
● Tile
● Unique
5.5.3 Join Tool:
● Append Fields
● Find Replace
● Fuzzy Match
● Join
● Join Multiple
● Make Group
● Union
5.5.4 Parse Tool:
● DateTime
● RegEx
● Text To Columns
● XML Parse
5.5.5 Transform Tool:
● Arrange
● Count Records
● Cross Tab
● Running Total
● Summarize
● Transpose
● Weighted Average
5.6 Task Given by Genpact to perform the operations:

43
Figure 5.6.1: Input Data, Browse, Filter, Output Data in Container 1

Figure 5.6.2: Results for Figure 5.6.1

44
Figure 5.6.3: Output For Filter Tool

Figure 5.6.4: Output For Browse Tool and Output Data Tool

Figure 5.6.5: Joins in Alteryx

45
Figure 5.6.6: Summarize Tool and Its Output

Figure 5.6.7: DateTime Tool and Its Output

46
Figure 5.6.8: Sort Tool

Figure 5.6.9: Select Tool

47
Figure 5.6.10: Random % Sample Tool

Figure 5.6.11: Formula Tool

48
Figure 5.6.12: Data Cleansing Tool

Figure 5.6.13: Transpose Tool

Figure 5.6.14: Machine Learning Classification Tool

49
Chapter 6

Conclusion & Future Work

This project has allowed me to learn about a lot of new technology that I was not aware of. I got to
know about data visualization tools, cloud computing basics, data engineering tools like Alteryx which
is used for ETL processes, Automation tools like power automation, how to create automation for a
specific workflow, and how to create customized power apps for customers. I came to know about how
tough the process is of developing a company project and how it is different from a college project to a
real-world project.
I found the internship experience to be positive, and I am positive that i would be able to use the
skills I learned in my career to develop dashboards using BI tools to give insights to business problems.

50
Chapter 7

REFERENCES

1. https://learn.microsoft.com/en-us/power-bi/
2. https://help.tableau.com/current/pro/desktop/en-us/default.htm
3. https://community.tableau.com/s/question/0D54T00000C5zUlSAJ/tableau-desktop-
documentation
4. https://powerplatform.microsoft.com/en-us/
5. https://community.alteryx.com/?category.id=external
6. https://community.alteryx.com/t5/Alteryx-Academy/ct-p/alteryx-
academy?_ga=2.73279379.1808098640.1681840906-1926133801.1681840906

51

You might also like