Professional Documents
Culture Documents
19bce7342 Ap2022236001121 RV4
19bce7342 Ap2022236001121 RV4
19bce7342 Ap2022236001121 RV4
by
May, 2023
DECLARATION
I further declare that the work reported in this thesis has not been submitted
and will not be submitted, either in part or in full, for the award of any other degree
or diploma in this institute or any other institute or university.
The aim of this project is to create dashboards and reports to get some important insights for a client who
can take decisions on the basis of these dashboards. Another aim is to create Robotic Process Automation
which means creating automation for specific workflow and Data Engineering which consists of creating
data pipelines, ETL (Extract, Transform, Load) processes, and cloud computing. With the help of the power
app engine and power apps, I tried to implement a leave tracker where a person can apply for leave, and all
the processes were completely automated.
i
ACKNOWLEDGEMENT
ii
CONTENTS
CONTENTS PAGE NO
ABSTRACT………………………………………………………………...i
ACKNOWLEDGMENT…………………………………………………..ii
LIST OF FIGURES………………………………………………………..v
CHAPTER 1: INTRODUCTION………………………………………...1-3
1.1 INTRODUCTION OF THE PROJECT………………………………...1
1.2 OBJECTIVES…………………………………………………………..1
1.3 OVERVIEW OF THE ORGANIZATION……………………………..1
CHAPTER 3: TABLEAU………………………………………………...17-32
3.1 TABLEAU………….………….………….………….………….……..17
3.2 DEVELOPER TOOLS………….………….………….………….……17
3.3 WHAT IS DATA………….………….………….………….…………19
3.4 WHAT IS DATA VISUALIZATION………….………….…………..20
3.5 PARAMETERS………….………….………….………….…………..21
3.6 FORECASTING………….………….………….………….………….23
3.7 JOINS………….………….………….………….………….………….24
3.8 ADVANCED CHARTS IN TABLEAU………….………….………...24
iii
3.9 LOD………….………….………….………….………….……………28
3.10 GENPACT’S GIVEN TASK………….………….………….……….28
CHAPTER 5: ALTERYX………………………………………………...41-49
5.1 ALTERYX………….………….………….………….………….……..41
5.2 PRODUCTS OF ALTERYX………….………….………….…………41
5.3 WHAT IS ALTERYX DESIGNER………….………….………….…..41
5.4 ALTERYX SERVER………….………….………….………….……...42
5.5 DESIGNER TOOLS LIST………….………….………….……………42
5.6 TASK GIVEN BY GENPACT………….………….………….……….43
CHAPTER 6: CONCLUSION……………………………………………50
CHAPTER 7: REFERENCES……………………………………………51
iv
LIST OF FIGURES
vi
Chapter 1
Introduction
1.1 INTRODUCTION:
The aim of this project is to create dashboards and reports to get some important insights for a client
who can take decisions on the basis of these dashboards. Another aim is to create Robotic Process
Automation which means creating automation for specific workflow and Data Engineering which
consists of creating data pipelines, ETL (Extract, Transform, Load) processes, and cloud computing.
1.2 OBJECTIVE:
The objective of this project is to make dashboards and reports, make data pipelines, perform ETL
process, optimization of workflow using robotic process automation (i.e generating automatic email
for specific work), make a customized App for customer’s needs, and cloud computing using AWS
and Azure.
1.3 OVERVIEW OF THE ORGANIZATION:
Genpact (NYSE: G) is a global professional services firm delivering the outcomes that transform
our clients' businesses and shape their futures. We're guided by our real-world experience
redesigning and running thousands of processes for hundreds of global companies. Our clients –
including many in the Global Fortune 500 – partner with us for our unique ability to combine deep
industry and functional expertise, leading talent, and proven methodologies to drive collaborative
innovation that turns insights into action and delivers outcomes at scale. We create lasting
competitive advantages for our clients and their customers, running digitally enabled operations and
applying our Data-Tech-AI services to design, build, and transform their businesses. And we do it
all with purpose. From New York to New Delhi and more than 30 countries in between, our
115,000+ team is passionate in its relentless pursuit of a world that works better for people. Get to
know us at Genpact.com and on LinkedIn, Twitter, YouTube, and Facebook.
Genpact began in 1997 as a business unit within General Electric. Then, in January 2005, we became
an independent company, bringing our process expertise and unique DNA in Lean management to
more companies. We became a publicly traded company in 2007. Since December 31, 2005, we
have expanded from 19,000+ employees and annual revenues of $491.90 million to 115,000+
employees and annual revenues of $4.37 billion as of December 31, 2022.
Genpact's partner ecosystem builds on our industry expertise, our deep industry knowledge, and our
partners' technology solutions. Together, we inspire our clients to innovate, transform operations,
1
accelerate ROI, and drive top-line growth
CGRLH Vertical: Genpact Ltd. engages in business process management, outsourcing, shared
services, and information outsourcing. The company operates through the following segments:
Banking, Capital Markets and Insurance (BCMI), Consumer Goods, Retail, Life Sciences, and
Healthcare (CGRLH), and High Tech, Manufacturing, and Services (HMS). The BCMI segment
provides application processing, collections, and customer services, equipment and auto loan
servicing, mortgage origination, and servicing, risk management and compliance services, reporting
and monitoring services, wealth management operations support, end-to-end information
technology services, application development and maintenance, managed services, financial crimes
support, and consulting. The CGRLH segment offers supply chain management, pricing and trade
promotion management, order management, digital commerce, customer experience, and risk
management. The HMS segment involves industry-specific solutions for the Industrial Internet of
Things (IIoT), user experience, order and supply chain management, data engineering, digital
content management, and risk management.
A Few Clients of Genpact:
AppZen:
With AppZen's AI technology and Genpact's compliance-as-a-service solution, clients identify fraud
and maintain compliance.
Blue Prism:
Blue Prism and Genpact develop intelligent automation and RPA solutions for global clients.
Deloitte:
A strategic alliance between two best-in-class service providers delivering end-to-end business
transformation solutions.
E2open:
E2open and Genpact partner to optimize transportation, logistics, global trade management, and
control tower operations.
2
3
Chapter 2
Microsoft Power BI
4
• Windows display settings: If you set your display to change the size of text, apps, and other
items to more than 100%, you won't see some dialogs that you must interact with to
continue using Power BI Desktop. If you encounter this issue, check your display settings
in Windows by going to Settings > System > Display, and using the slider to return display
settings to 100%.
• CPU: 1 gigahertz (GHz) 64-bit (x64) processor or better recommended.
• WebView2: If WebView2 wasn't automatically installed with Power BI Desktop or if it
was uninstalled.
2.2 Power BI components:
Microsoft Power BI works by connecting data sources and providing a dashboard of BI to the users.
It can connect with just an Excel spreadsheet or bring together cloud-based and on-premises data
warehouses. Data pulled from cloud-based sources, such as Salesforce CRM, is automatically
refreshed.
With applications such as an Excel workbook or Power BI Desktop file connected to online or on-
premises data sources, Power BI users must manually refresh or set up a refresh schedule to ensure
the data in Power BI reports and dashboards use the most current data available.
Power BI consists of a collection of apps and can be used either on a desktop, as a SaaS product,
or on a mobile device. Power BI Desktop is the on-premises version, Power BI Service is the cloud-
based offering and mobile Power BI runs on mobile devices.
The different components of Power BI are meant to let users create and share business insights in
a way that fits with their role.
Included within Power BI are several components that help users create and share data reports.
Those are the following:
• Power Query: a data mashup and transformation tool
• Power Pivot: a memory tabular data modeling tool
• Power View: a data visualization tool
• Power Map: a 3D geospatial data visualization tool
• Power Q&A: a natural language question and answering engine.
2.3 What is Power BI Desktop?
Power BI Desktop is a free application you install on the local computer that lets you connect to,
transform, and visualize your data. With Power BI Desktop, you can connect to multiple different
sources of data, and combine them (often called modeling) into a data model. This data model lets
you build visuals, and collections of visuals you can share as reports, with other people inside your
organization. Most users who work on business intelligence projects use Power BI Desktop to
create reports and then use the Power BI service to share their reports with others.
5
Figure 2.3.1: Power BI Desktop
The most common uses for Power BI Desktop are as follows:
• Connect to data.
• Transform and clean data to create a data model.
• Create visuals, such as charts or graphs that provide visual representations of the data.
• Create reports that are collections of visuals on one or more report pages.
• Share reports with others by using the Power BI service.
• There are three views available in Power BI Desktop, which you select on the left side of
the canvas. The views, shown in the order they appear, are as follows:
• Report: You create reports and visuals, where most of your creation time is spent.
• Data: You see the tables, measures, and other data used in the data model associated with
your report, and transform the data for best use in the report's model.
• Model: You see and manage the relationships among tables in your data model.
6
you can select to open reports for exploring further. Dashboards and reports connect to datasets
that bring all of the relevant data together in one place.
2.5 Connect to data:
To get started with Power BI Desktop, the first step is to connect to data. There are many different
data sources you can connect to from Power BI Desktop.
To connect to data:
From the Home ribbon, select Get Data > More.
1. The Get Data window appears, showing the many categories to which Power BI Desktop
can connect.
7
To start Power Query Editor:
The following image shows the Power Query Editor window for a query that was shaped and turned
into a model.
M code is the language behind the scenes of power query. When you create a data transformation
in the power query editor UI, Excel is writing the corresponding M code for the query.
M is a functional language which means it is primarily written with functions that are called to
8
evaluate and return results. M code comes with a very large library of predefined functions
available and you can also create your own.
2.7.1 Where Can You Write Power Query M Code?
If you want to start writing or editing M code, you’re going to need to know where you can do this.
There are two places where it’s possible, in the formula bar or in the advanced editor.
9
to your interactions.
1. Data Analysis Expressions (DAX):
It is a programming language that is used throughout Microsoft Power BI for creating
calculated columns, measures, and custom tables. It is a collection of functions,
operators, and constants that can be used in a formula, or expression, to calculate and
return one or more values. You can use DAX to solve a number of calculations and data
analysis problems, which can help you create new information from data that is already
in your model.
Example DAX Code:
Projected Sales = SUM('Sales'[Last Years Sales])*1.06
10
Figure 2.9.1: Filters
2.10 The task given by Genpact to Create a Power BI DashBoard for FIFA World
Cup:
We had been given FIFA World Cup data in form of Excel with a data dictionary, a data dictionary
that tells about the dataset.
Steps used to make a dashboard:
Step 1: Load the data in power bi using connectors i.e Excel
Step 2: Transform the data By removing the Null values and cleaning them if requires some
additional columns then add them to the table then apply and load back into power bi.
Step 3: Make the dashboard as per the requirements of clients.
The total number of pages that were in the dashboard was eight given as:
Page 1: International matches
11
Figure 2.10.1: International Matches
Page 2: Qatar World Cup 2022 Team Info
12
Figure 2.10.3: Argentina's Path to Victory
Page 4: FIFA 2022 Player Performance by Field Position
13
Figure 2.10.5: Football Legends
Page 6: Host Country, Teams Performance, FIFA Ranking, and Groups
14
Page 8: World Cup Goals by year and Total Goals by the tournament for international matches
Publishing from Power BI Desktop has the same effect as using Get Data in Power BI to connect
to and upload a Power BI Desktop file.
To publish a Power BI Desktop dataset and reports
1. In Power BI Desktop, choose File > Publish > Publish to Power BI or select Publish on the
Home ribbon.
15
4. When publishing is complete, you receive a link to your report. Select the link to open the
report on your Power BI site.
16
Chapter 3
Tableau
3.1 Tableau:
It is a visual analytics platform transforming how we use data to solve problems—empowering
people and organizations to make the most of their data.
What is Tableau?
Tableau is a powerful and fastest-growing data visualization tool in the Business Intelligence
Industry. It helps in simplifying raw data in a very easily understandable format. Tableau helps
create data that can be understood by professionals at any level in an organization. It also allows
non-technical users to create customized dashboards. Data analysis is very fast with the Tableau
tool and the visualizations created are in dashboards and worksheets.
The best features of Tableau software are:
• Data Blending
• Real-time analysis
• Collaboration of data
The great thing about Tableau software is that it doesn’t require any technical or any kind of
programming skills to operate. The tool has garnered interest among people from all sectors such
as business, researchers, different industries, etc.
Tableau Product Suite:
The Tableau Product Suite consists of :
• Tableau Desktop
• Tableau Public
• Tableau Online
• Tableau Server
• Tableau Reader
17
generation, and visualization fall into this category. The Tableau products, under this category, are
the Tableau Desktop and the Tableau Public.
Sharing Tools: As the name suggests, the purpose of these Tableau products is to share the
visualizations, reports, and dashboards that were created using the developer tools. Products that
fall into this category are Tableau Online, Server, and Reader.
Tableau Desktop:
Tableau Desktop has a rich feature set and allows you to code and customize reports. Right from
creating the charts, and reports, to blending them all together to form a dashboard, all the necessary
work is created in Tableau Desktop. For live data analysis, Tableau Desktop provides connectivity
to the Data Warehouse, as well as other various types of files. The workbooks and the dashboards
created here can be either shared locally or publicly. Based on the connectivity to the data sources
and publishing option, Tableau Desktop is classified into:
Tableau Desktop Personal: The development features are similar to Tableau Desktop. The
personal version keeps the workbook private, and access is limited. The workbooks cannot be
published online. Therefore, it should be distributed either Offline or in Tableau Public.
Tableau Desktop Professional: It is pretty much similar to Tableau Desktop. The difference is
that the work created in the Tableau Desktop can be published online or in Tableau Server. Also,
in the Professional version, there is full access to all sorts of the datatype. It is best suitable for
those who wish to publish their work in Tableau Server.
Tableau Public:
It is a Tableau version specially build for cost-effective users. By the word “Public,” it means that
the workbooks created cannot be saved locally; in turn, they should be saved to Tableau’s public
cloud which can be viewed and accessed by anyone.
There is no privacy to the files saved to the cloud since anyone can download and access the same.
This version is the best for individuals who want to learn Tableau and for the ones who want to
share their data with the general public.
Tableau Server:
The software is specifically used to share the workbooks and visualizations that are created in the
Tableau Desktop application across the organization. To share dashboards in the Tableau Server,
you must first publish your work in the Tableau Desktop. Once the work has been uploaded to the
server, it will be accessible only to licensed users. However, It’s not necessary that licensed users
need to have the Tableau Server installed on their machines. They just require the log in credentials
with which they can check reports via a web browser. The security is high in Tableau servers, and
it is much suited for quick and effective sharing of data in an organization. The admin of the
organization will always have full control over the server. The hardware and the software are
maintained by the organization.
Tableau Online:
As the name suggests, it is an online sharing tool of Tableau. Its functionalities are similar to
Tableau Server, but the data is stored on servers hosted in the cloud which are maintained by the
Tableau group. There is no storage limit on the data that can be published in Tableau Online.
Tableau Online creates a direct link to over 40 data sources that are hosted in the cloud such as
18
MySQL, Hive, Amazon Aurora, Spark SQL, and many more. To publish, both Tableau Online and
Server require the workbooks created by Tableau Desktop. Data that is streamed from web
applications say Google Analytics, Salesforce.com are also supported by Tableau Server and
Tableau Online.
Tableau Reader:
Tableau Reader is a free tool that allows you to view the workbooks and visualizations created
using Tableau Desktop or Tableau Public. The data can be filtered but editing and modifications
are restricted. The security level is zero in Tableau Reader as anyone who gets the workbook can
view it using Tableau Reader.
3.3 What is data:
Data refer to distinct pieces of information, usually formatted and stored in a way that is concordant
with a specific purpose. Data can exist in various forms- as numbers or text records on paper, as
bits or bytes stored in electronic memory, or as facts living in a person’s mind.
Structured Data-
Structured data is data whose elements are addressable for effective analysis. It has been organized
into a formatted repository that is typically a database. It concerns all data which can be stored in
databases SQL in a table with rows and columns.
19
“648666” :{
“NAME”: “DAVID”
“DOB”: “01-10-1990”
“MISC”: “ON A LEAVE”
}
}
}
}
Unstructured data-
Unstructured data is data that is not organized in a predefined manner or doesn’t have a predefined
data model, thus it is not a good fit for a mainstream relational database.
Example: Word, PDF, Text, Media logs.
3.4 What is Data Visualization:
Data visualization is the graphical representation of information and data. By using visual elements
like charts, graphs, and maps, data visualization tools provide a way to see and understand trends,
outliers, and patterns in data which results in good decision-making.
20
3.5 Parameters:
Parameters in Tableau enable users to add some advanced calculations and calculated fields.
Parameter provides adding a non-existing variable to the entire work and simplifies the needs and
requirements to analyze and visualize the data.
The parameters in Tableau are the workbook variables like a number, data, or calculated field that
allows users to replace a constant value in a calculation, filter, or reference line.
The task is given by Genpact:
• Top N Parameters in Tableau
• Date Field Parameters in Tableau
• Dynamic Measures
• Dynamic Dimensions
• Filter
• Sets
• Reference Line
• User input
• Global Filter
• What-if analysis - increase in sales
• Bar/Line chart
21
Figure 3.5.2: Map Chart
22
Figure 3.5.4: Dynamic Measure
23
Figure 3.6.1: Forecasting in Tableau
3.7 Joins:
It's primarily used when you have to merge data sets from the same source.
Type of joins are:
• Inner Join
• Left Join
• Right Join
• Full outer Join
24
Donut Chart:
25
Figure 3.8.3: Bump Chart
Gauge Chart: To Implement a Gauge chart we have to make a few more things.
26
Figure 3.8.5: Gauge Chart Formula
Box Plot:
27
Figure 3.8.6: Box Plot
3.9 LOD:
Level of Detail expressions (also known as LOD expressions) allow you to compute values at the
data source level and the visualization level. However, LOD expressions give you even more
control over the level of granularity you want to compute. They can be performed at a more granular
level (INCLUDE), a less granular level (EXCLUDE), or an entirely independent level (FIXED).
28
Figure 3.10.1: Scenario 1
29
Figure 3.10.3: Scenario 3
30
Figure 3.10.5 Scenario 5
31
Figure 3.10.7: Scenario 7
32
Chapter 4
Power Platform
Security is a big component within Power Apps and the entire Power Platform. In Power
Apps, there is high enterprise security, management, and control that you can manage
through your Azure Active Directory to enable policies that have multifactor authentication.
You can have full audit logs and use analytics that is there, or your data loss prevention
policies that you can put in place, essentially to manage your data all through the admin
center, providing you that full experience to centrally manage your apps across your
organization, as well as what has been deployed outside the organization.
The one key aspect that I want to emphasize for all the Power Platform applications is the
idea that you can connect just about any data, as well as integrate that data across your
existing systems so that you can extend your solution. You can utilize the data that is within
Dynamics 365 and inherently connect that data to an app that you’ve built and utilize the
information that you get out of there.
4.1.2 Power Automate:
Power Automate, formerly known as Microsoft Flow, enables process automation to get rid
of rudimentary manual tasks and eliminate the manual errors that could arise. Power
Automate is a powerful workflow automation tool that allows you to connect different
systems together and take that data and translate it. There is one source of truth and you can
work throughout different Microsoft systems. Power Automate allows you to automate and
33
build business processes across your apps and the services that you have already deployed.
These can vary from simple automation to very advanced scenarios like creating branches or
having different trigger responses and trigger actions.
One example could be using workflows for approval processes or something as simple as
getting notifications about different platforms where you work. Power Automate can connect
to those different data sets.
The aspect of security, connecting that data, using these applications as a way of making data
work for you and making it meaningful for your organization, and essentially having a
stronger system are some of the attributes of Power Automate. Power Automate has a range
of functionality. There are some that are natively integrated within the Microsoft Cloud
applications and others you can build in, for example, through Microsoft Dataverse (formerly
known as the Common Data Service), or you can build custom workflows or custom
applications. Strong data connectivity, and built-in platforms that are seamlessly integrated
together to give you a fuller, more intelligent, automated experience is what Power Automate
can do.
4.1.3 Power Virtual Agents:
Power Virtual Agents are intelligent virtual bots that can communicate and do a lot of the
work that you might need to do manually or hire someone else to do, by using a robot online.
4.2 Connecting to Dynamics 365 and Microsoft 365:
How do we connect the Power Platform with our existing systems? If you’re utilizing Power BI,
Power Apps, or Power Automate, you can use them as standalone applications, but there is a much
more powerful experience once you integrate and connect them into a common or a more unified
ecosystem.
34
tables specific to your organization and populate them with data by using Power Query. App
makers can then use Power Apps to build rich applications that use this data.
35
Figure 4.5.1: HomeScreen
36
Request Screen: In this Screen, we can see the Employee Details i.e. Email, Name, Date in
between they are taking leave, and reason for leave. When we click on any particular row of details
the control goes to the detailed form that a particular employee has submitted for leave.
37
4.6 ER Diagram:
38
Figure 4.7.1: Flow chart of Leave Tracker
4.8 Power Automate:
Microsoft Power Automate is a very simple drag-and-drop workflow-based automation software
created by Microsoft to automate manual and repetitive tasks. The main aim of creating Microsoft
Power Automate (earlier known as Microsoft Flow) was to allow coders and non-coders to
automate repetitive tasks following a sequential rule-based flow.
The task is given by Genpact to automate an email for the leave tracker:
For this, we need to start with power automate in the Microsoft Power platform, go to create pane on
the left side, and select the Automated Cloud Flow to trigger an event.
39
Figure 4.8.2: Complete Automation for Email
40
Chapter 5
Alteryx
5.1 Alteryx:
It is an ETL(extract, transform, load) tool used in data engineering. The Alteryx Analytics
Automation Platform delivers end-to-end automation of analytics, machine learning, and data
science processes that accelerate digital transformation. Alteryx powers analytics for all by
providing the leading Analytics Automation Platform. Alteryx delivers easy end-to-end automation
of data engineering, analytics, reporting, machine learning, and data science processes. It enables
enterprises everywhere to democratize data analytics across their organizations for various use
cases.
41
Figure 5.3.1: Alteryx Designer
5.4 Alteryx server:
The use of Alteryx Server in conjunction with Designer allows you to schedule your workflows on
a quarterly, monthly, daily, or basis that suits your needs best. The use of a Server makes it easy to
organize your workflows, share and collaborate.
5.5 Designer Tools List:
View a list of all tools in Alteryx Designer. Tools are grouped according to their tool categories. A
few important are given below:
5.5.1 In/Out:
● Auto Insights Uploader
● Browse
● Date Time Now
● Directory
● Input Data
● Map Input
● Output Data
● Text Input
5.5.2 Preparation Tool:
● Auto Field
● Create Samples
● Data Cleansing
● Filter
● Formula
● Generate Rows
● Imputation
● Multi-Field Binning
42
● Multi-Field Formula
● Multi-Row Formula
● Oversample Field
● Random % Sample
● Record ID
● Sample
● Select
● Select Records
● Sort
● Tile
● Unique
5.5.3 Join Tool:
● Append Fields
● Find Replace
● Fuzzy Match
● Join
● Join Multiple
● Make Group
● Union
5.5.4 Parse Tool:
● DateTime
● RegEx
● Text To Columns
● XML Parse
5.5.5 Transform Tool:
● Arrange
● Count Records
● Cross Tab
● Running Total
● Summarize
● Transpose
● Weighted Average
5.6 Task Given by Genpact to perform the operations:
43
Figure 5.6.1: Input Data, Browse, Filter, Output Data in Container 1
44
Figure 5.6.3: Output For Filter Tool
Figure 5.6.4: Output For Browse Tool and Output Data Tool
45
Figure 5.6.6: Summarize Tool and Its Output
46
Figure 5.6.8: Sort Tool
47
Figure 5.6.10: Random % Sample Tool
48
Figure 5.6.12: Data Cleansing Tool
49
Chapter 6
This project has allowed me to learn about a lot of new technology that I was not aware of. I got to
know about data visualization tools, cloud computing basics, data engineering tools like Alteryx which
is used for ETL processes, Automation tools like power automation, how to create automation for a
specific workflow, and how to create customized power apps for customers. I came to know about how
tough the process is of developing a company project and how it is different from a college project to a
real-world project.
I found the internship experience to be positive, and I am positive that i would be able to use the
skills I learned in my career to develop dashboards using BI tools to give insights to business problems.
50
Chapter 7
REFERENCES
1. https://learn.microsoft.com/en-us/power-bi/
2. https://help.tableau.com/current/pro/desktop/en-us/default.htm
3. https://community.tableau.com/s/question/0D54T00000C5zUlSAJ/tableau-desktop-
documentation
4. https://powerplatform.microsoft.com/en-us/
5. https://community.alteryx.com/?category.id=external
6. https://community.alteryx.com/t5/Alteryx-Academy/ct-p/alteryx-
academy?_ga=2.73279379.1808098640.1681840906-1926133801.1681840906
51