Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

A Comprehensive Guide on Advanced Microsoft Excel for Data

Analysis
BE G I NNE R E XC E L

This article was published as a part of the Data Science Blogathon

Overview

Data analysis provides insights from raw data which is used to support decision-making. Microsoft Excel is
a simple, powerful, and one of the top tools for data analysis. This comprehensive guide will introduce you
to the concepts of data analysis and present practical examples using Microsoft Excel.

Introduction

Data analysis has emerged as an important field because it provides the ability to analyze data which helps
people in making better decisions. Data analysis is the process of collecting, modeling, and analyzing, and
exploring data to find a pattern in it.

Table of Contents

1. What is Data analysis?


2. Data analysis with Microsoft Excel
3. Essential keyboard Shortcuts for Excel Data Analysis
4. Download data set
5. Sort, Filter, Conditional, Formatting, and Charts
. Create a Pivot table for a data set
7. Tables, What-If Analysis and Quick Analysis
. Analysis ToolPak
9. Other tools for data analysis
10. Conclusion
11. End Notes

What is Data analysis?

Data analysis is the process of Data Requirement Gathering (What type of data do you want to use and how
do you plan to analyze it), Data Collection (Collect the data from available sources), Data Cleaning (The
data we collected are not always in ideal form so we have to remove white spaces, duplicate records, null
values, and basic errors. it is mandatory to clean data before uses for analysis), Data Analysis (In this step
we uses data analysis software’s and other tools to interpret the data and get insights on data), Data
Interpretation (We need to interpret them and give the best findings from that) and data visualization
(Display your data using graph and charts so that people can read and understand it easily).

There are two main methods of Data Analysis: qualitative analysis and quantitative analysis. There are
many other techniques available like Statistical analysis, Text analysis, Diagnostic analysis, Predictive
analysis, Prescriptive Analysis, etc.

Data analysis with Microsoft Excel

Microsoft Excel provides powerful features and ways to interpret and analyze data. The data can be
collected from various sources. Then that data can be converted and formatted in several ways according
to user need. It can be analyzed with the relevant Excel commands, statistical functions, and tools – Sort,
Filter, Ranges, Tables, Conditional Formatting, Date functions, Time functions, Subtotals, Quick Analysis,
What-if Analysis, Solvers, Pivot Tables, Charts, and Analysis ToolPak, etc.

Essential keyboard Shortcuts for Excel Data Analysis

1. Alt + A + T or Alt + D+ F +F : To add or remove Filters

2. Ctrl + Pg Up/Pg Dn : Shift between sheets

3. Alt + A + S + S : To sort data

4. Ctrl + Spacebar : Select entire column

5. Shift + Spacebar : Select entire row

6. Ctrl + Shift + Down/Right : Select till the last non-empty cell in column/row

7. Ctrl + 0 : To hide columns

8. Ctrl + Shift + ) or Alt + O + C + U : To Unhide columns

9. Ctrl + 9 : To hide rows

10. Ctrl + Shift + ( : To Unhide rows

11. Alt + W + F + R : To freeze top row

Download data set

Download Supermarket sales data set from Kaggle using the below link:

https://www.kaggle.com/aungpyaeap/supermarket-sales

The growth of supermarkets is increasing and market competition is also high. The given dataset shows
the historical sales of a Supermarket Company which has recorded data for 3 months in 3 different
branches of the Supermarket. We are going to analyze this dataset with Microsoft Excel. Open
supermarket_sales – Sheet1 file in MS Excel. We can also check the Attribute information of this sheet on
the Kaggle site.

This data set is already clean and validated so we don’t have to perform any data cleaning and validation
steps on it. Before analyzing always check that only valid values are in cells, there are no missing or null
values and data is in the proper format.
Sorting in Excel:

We can sort our Excel data based on one or multiple columns. We can sort in ascending or descending
order or sort data in a custom order. We can also sort data by color. We can use Alt + A + S + S shortcut to
directly sort data or select the sort icon from the Data menu as shown below.

In the below example we sort data according to the Date column and the order is ascending. We can also
sort data according to different columns and different orders. For example, if we want to check data rating
wise we can sort data according to the rating column. From this, we can find out the highest and lowest-
rated products and branches ratings.
Filtering in Excel:

We can filter our data and can only display records that we need for analysis. We can use Alt + A + T
shortcut to directly Filter data or select the Filter icon from the Data menu as shown below. We can also
apply a number filter to display numbers in a certain range or greater than, less than, or equal to the
selected limit We can also apply a text filter to display records that begin with a specific character, end
with a specific character, contain or do not contain a specific character, etc. To apply a number filter or text
filter Click the arrow next to the heading row. For example, if we want to display only branch A’s data, then
select only branch A from the filter box.

Conditional Formatting:

Conditional formatting in Excel allows you to identify duplicates in a dataset, highlight cells with a value
greater/less than a number, highlight top/bottom 10 (or 10%), highlight errors/blanks, create Heat Maps,
highlight every other row/column to increase the readability of the data, search and highlight data using
Conditional Formatting. For example, if we want to highlight only members in customer type, first select
the customer type column then Go to Home –> Conditional Formatting –> Highlighting Cell Rules –> Text
that Contains. A dialogue box would open, in the dialogue box enter the value and select the color for
highlighting cells.
Charts in Excel:

A simple Excel chart can be worth a thousand words of description. Concepts that are difficult to
understand with sheets full of numbers can be easily displayed with charts or plots. To create a chart,
Select data first then on the Insert tab in the Charts group click the chart symbol you want. There are
different types of charts like Column Charts, Line Chart, Pie Chart, Bar Chart, Area Chart, Scatter Plot, Data
Series, Axes, Chart Sheet, Trendline, Error Bars, Sparklines, Combination Chart, Gauge Chart, Thermometer
Chart, Gantt Chart, Pareto Chart available in Excel.
Create a Pivot table for a data set:

The Pivot Table feature is a powerful tool that allows us to summarize huge amounts of data. You can use
a PivotTable to analyze numerical data in detail (summarize data as averages, sums, or counts), and answer
unanticipated questions about your data.it is mostly used to understand and find patterns in the data set.

Here, we’ll use it to find out the Gross income of each Branch for General items like Electronic accessories,
Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, sports, and travel. First,
select table from supermarket_sales – Sheet1, so they are highlighted. Then from the Insert menu select
PivotTable. In the dialogue box that appears, make sure the Table/Range radio button is selected and the
accompanying text box reads ‘supermarket_sales – Sheet1′!$A$1:$Q$1001’.At the bottom of the dialogue
box, make sure the New Worksheet radio button is highlighted, then click OK.

 
Now add Branch in Columns, product line in Rows gross income in values that shows the sum of gross
income and generate table shown below from which we can know the total gross income of different
branch for different products.
We can also add a graph for this table. Just select the type of graph from the Insert menu. We select the
bar chart. From this chart, we can easily find out the highest and lowest gross income for different
branches.

From this pivot table and bar chart, we can gain general insights like food and beverages have high gross
income than other categories. The amount of gross income is very similar in all Branches with a slight
increase in Branch C.
Just like this pivot table, we can create different types of pivot tables based on the information we need
from the data. In the below example select the columns as shown in the below figure. After creating the
pivot table select a line graph. From this Pivot table and graph, we can say that Women generate higher
sales when compared to men.

Tables in Excel:

Excel tables allow you to analyze your data easily and quickly. To create a table from your data click any
single cell in your data and then from the Insert tab in the Tables group, click Table. It automatically
selects your data. Select My table has headers and click on OK.

It directly formatted the table for you and allows you to use many powerful features like sorting, filtering,
calculating the total row at the end of the table by clicking on the checkbox Total Row in the Design tab.
Using this not only gives you the total but you can also calculate Average, Count, Max, Min, Sum, Std
deviation, etc.
What-If Analysis in Excel:

What-If Analysis in Excel allows you to change the different values in cells to see how those changes will
affect the outcome of formulas. You can use its Goal Seek feature when you are not sure what input value
the formula requires to get that result. To use this from the Data tab, in the Forecast group click on What-If
Analysis.

Quick Analysis in Excel:

With the Quick Analysis tool in Excel, you can quickly and easily analyze your data. To use this First select
the data and then Right Click and select Quick Analysis from the menu which opens. You can directly
perform Formatting like adding Data Bars, Color Scales, Icon Sets, highlight cells, etc. we can directly add
pivot tables, insert sparklines, Charts, and also perform calculations like Average, Count, Max, Min, Sum,
etc.
Excel Analysis ToolPak:

The Analysis ToolPak is an add-in that is included with Excel that provides data analysis tools. To add the
Analysis ToolPak add-in follow the below steps:

1. Open File tab in Excel

2. Click on Options

3. In Excel, Options dialog box click on Add-ins

4. Select Analysis ToolPak from Add-ins and click on the Go

5. Select the checkbox for the Analysis ToolPak and click OK.

To access Analysis ToolPak Click on the Data tab and then from the Analysis group click On Data Analysis.

The Analysis ToolPak adds the following functionality to Excel:

· Anova: Single Factor

· Anova: Two-Factor with Replication

· Anova: Two-Factor Without Replication

· Correlation

· Covariance

· Descriptive Statistics analysis


· Exponential Smoothing analysis

· F-Test Two-Sample for Variances

· Fourier Analysis

· Histogram

· Moving Average analysis

· Random Number Generation analysis

· Rank and Percentile analysis

· Regression analysis

· Sampling

· t-Test: Paired Two Sample For Means

· t-Test: Two-Sample Assuming Equal Variances

· t-Test: Two-Sample Assuming Unequal Variances

· z-Test: Two-Sample for Means

The below figure shows the dialog box for Data Analysis.

Correlation:

If we want to find out if there is any correlation between columns Gross Income and Rating. First, select
the correlation option from Data Analysis. Then in the correlation dialog box select both Gross Income and
Rating columns in the Input Range. Then Click on OK. The correlation coefficient has a value between -1
and +1. From this, we can know how strongly two variables are related. Value 1 indicates a perfect positive
correlation. From the output shown below, we can say that there is no correlation between Rating and
gross income.
Descriptive Statistics:

If we want to calculate Descriptive Statistics for any numeric column then first select the Descriptive
Statistics option from Data Analysis. Then in the Descriptive Statistics dialog box select columns (for
example cogs, gross margin percentage, gross income, Rating) in Input Range. Then Click on OK.

Descriptive Statistics directly calculates Mean, Standard Error, Median, Mode, Standard Deviation, Sample
Variance, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, and Count for selected columns. Output
for this example is shown below.

Histogram:

If you want to know the distribution of customer ratings, you can use a histogram for that. First, select the
histogram option from Data Analysis. Then in the histogram dialog box select the rating column in the
Input range and for the Bin range create a range according to your need. We created Bin range from 1 to 10
and select it in Bin range. Click on Chart output and then Ok. It creates a histogram display below. In this
histogram add Trend Line. To add a trend line go to Design –> Add Chart Element –> Trendline. From this,
we can say that customer rating has a normal distribution.

Other tools for data analysis:


There are many data analysis tools available nowadays, but which one is best to analyze our data depends
on what type and how much data do we need to analyze, budget, and time available for analysis.

Here is a list of some of the best data analytics tools available Other than Excel.

1. Python:

Python is an open-source, high-level programming language. For data analysis, it provides a large set of
libraries like Pandas, Numpy, SciPy, and scikit-learn, etc. It also provides many data visualization packages
like Matplotlib, Plotly, and Seaborn, etc. it is the most preferred language for machine learning and data
science.

2. Tableau:

Tableau is a powerful analytics and data visualization platform that allows users to create reports,
interactive dashboards that represent data in the most user-friendly way. It can run on the cloud or on-
premises.

3. Microsoft Power BI:

Microsoft Power BI is a top business intelligence platform it makes it easy for businesses to spot trends,
patterns, and insights across large sets of data. It supports many types of data sources. It also provides
reports, visualizations, and dashboards for data.

4. R:

R is an open-source programming language. It is best at exploratory data analysis, predictive analyses, and
statistical analysis. Such as distributions, clustering classification, and regression analysis. It is used for
building your own data analysis tools.

5. Qlik:

Qlik offers in-memory data processing for fast results and data visualization. It can also run on the cloud or
on-premises. It offers helpful tools support for data exploration and discovery by technical and non-
technical users alike.

6. KNIME:

KNIME stands for the Konstanz Information Miner. It is a free, open-source platform to perform data
integration, processing, visualization, and reporting. It builds advanced machine learning and data mining
solutions with minimal or no programming requirements.

Conclusion:

Here are some general insights about the data that we find from our data analysis.

The number of gross income is very similar in all Branches with a slight superiority in Branch C. Women
use to generate sales with a higher gross income than men. Customer rating has a normal distribution. The
Credit Card is the payment method less used. There is no correlation between Rating and gross income.

Always perform data cleaning and data validation before Data analysis if data are not in proper format
otherwise it leads your data analysis in the wrong direction.

End Notes:

Excel is one of the best and powerful tools for data analysis. If you are new to the data analysis field then it
is one of the best tools to start learning data analysis. In this article, we discussed one of the most basic
and most important techniques for data analysis in Excel. If you have any doubts or feedback, feel free to
share them in the comments section below.

The media shown in this ar ticle is not owned by Analytics Vidhya and are used at the Author’s discretion.
Article Url - https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-advanced-
microsoft-excel-for-data-analysis/

Hinal Patel

You might also like