Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

Experiment 1 – Basics of Excel

The following is the format you have to maintain in all the experiments

1) Aim: Write the aim of the experiment. For example, the aim of this experiment is
toe understand the basic functions of excel
2) Techniques used: Function such as Sum, filter
3) Procedure followed: Write the steps followed in each of the exercises in this
experiment
4) Results: Paste the results one below the other under each exercise
5) Post it in MOODLE

Microsoft Excel is considered the industry standard piece of software in data analysis. There are
two ways of calculation: using 1) Formulas and 2) Functions

Formulas are expressions that operate on values in range of cells or a cell. For example, we have
values in column A as follows:

1) We wish to find the sum of the range of the values from A1 to A10. Then the formula is

=A1+A2+A3 …… +A10

Functions in excel are predefined formulas. These functions eliminate laborious tasks involving
manual entry of formulas while giving them commonly used jargons. For example, to find the
sum of the weight of the students given from values A1 through A10, the function
=SUM(A1:A10). The <SUM> function sums up all the values from A1 to A10.

In order to execute this function, choose a cell and insert the function =sum(B2:B11)

Qn1: Find the sum of the weight of the students given in cells B2 through B11. Use the file
named: Basics of excel circulated.

Step1: Enable the function using “=”

Step 2: Enter the function SUM

Step 3: Enter the range of values to be added in a parenthesis. B2:B11

Step 4: Press OK
2) Similarly, you may add values through rows. Functions =SUM(B13:K13) B13 through
K13 will add values lying in row 13 and columns from B to K.

Qn2: Find the sum of the weight of the students given in cells B13 through K13. Use the file
named: Basics of excel circulated.

Step1: Enable the function using “=”

Step 2: Enter the function SUM

Step 3: Enter the range of values to be added in a parenthesis. B13:K13

Step 4: Press OK

3) Adding more than one row or column


You may be constantly in need of adding several rows and columns. Rather than
inserting them one-by-one, you may do so by highlighting the same number of pre-
existing rows or columns that you want to add. Then right-click and select “insert”.

Qn3: Add 4 rows and 5 columns in the excel you have created in questions 1 and 2. Use the file
named: Basics of excel circulated.
4) Using filters

In large data, you may not require to look or analyse or examine every single row values
at the same time. In other words, you may be interested in looking at only selected
values based on a criteria. In this case, filters allow you to pare down your data and
allow you to see only those data that is of interest to you. Filters can be added to each
column in each data.

Filters are added by clicking the Data tab and selecting “Filter”. A down-arrow pops in
each column. Clicking the arrow next to the column headers allow you to organise the
data in ascending or descending order as well as which specific rows you want to show.
Or you want to organise the data based on a specific criteria.

Qn4. Please organise the data in ascending order using ‘name’. Use the file named: BBA RIT
Name list 2020-21.

Organise it using “male” as the filter and then “female” as the filter

5) Removing duplicate data points or sets


You assume that the data or data sets have duplicate content. For example, you might be
interested in eliminating data pertaining to all “males”. Then go to the Data tab and
select “remove duplicates” given under tools. A pop-up will appear to confirm which
data you wish to work with. Select “remove duplicates” and you are good to go.
Qn.5. Remove duplicate values in variable ‘Gender’. Use the file named: BBA RIT Name list
2020-21.

6) Transpose row and columns

If you wish to transpose the data presented into rows and columns (i.e., rows -> columns
and columns -> rows), doing it manually would consume lot to copy and paste each
individual header. Excel allows us to do this in a simple way using the transpose option.

The following are the steps:

i) Highlight the column you wish to transpose into rows.


ii) Right-click it and then select “copy”.
iii) Select the cells on the spreadsheet where you want the first row or column to
begin.
iv) Right click on the cell, and then select “Paste Special”.
v) A module will appear at the bottom, with an option to transpose
vi) Check that box and select OK.
vii) Transpose is done

Screenshot 1
Screenshot 2
Qn. 6. Transpose Programm, Register No. and name into rows. Use the file named: BBA RIT
Name list 2020-21.

7) Split up text information between columns

You may be interested in splitting up data in one cell into two different cells. Or you
want to pull out someone’s company name through their email address. Or separate
someone’s full name into a first and last name for your email marketing

Step 1: Select the column that you want to split up


Step 2: Go to Data tab and select Text to columns. A module will appear with additional
information
Step 3: Select either “Delimited” or “Fixed width”
Step 4: Delimited means you intend to break up the column based on characters such as
commas, spaces or tabs. “Fixed width” means you wish to select the exact location on all
the column that you wish the split to occur
Step 5: Choose “Delimited”. Choose ‘tab’, or ‘semi-colon’, ‘comma’ or ‘space’ or even
‘@’ found in an email address.
Step 6: Choose space. Press “Next”. Choose Finish
Qn. 7: Split the names into two columns based on anyone of the delimiter such as commas,
spaces, tabs or @. Use the file named: BBA RIT Name list 2020-21.

8) Using formulas

In order to activate a formula, please use ‘=’ sign in the beginning.

We can add, subtract, multiply and divide using the ‘=’ function. Parenthesis is used to
ensure certain calculations are done first.

If we wish to add the values in columns B2 through B11 and C2 through C11
= (B2:B11+C2:C11)

Qn.8: i) Add values in Columns B and C and store in Column G


ii) Subtract values in Columns B and C and store in Column H
iii) Multiple values in columns C and D and store in Column I
iv) Divide values in columns E and F and store in column J

Use the file named: Basics of excel circulated.

9) Using average to find the average of values in columns

Qn. 9: Find the average of the values in columns B, C, D, E, F and G using the excel
function “average”. Use the file named: Basics of excel circulated

10) Conditional formatting

Conditional formatting allows you to change the colour of cell values based on
information within the cell. For example, you may be interested in flagging students who
are above 70 kgs in each section.

Step1: Highlight the group of cells you want to use conditional formatting
Step2: Choose ‘conditional formatting’ from the Home menu
Step3: Select your logic from the dropdown menu. You may also create your own rule if
the logic is different from the default set of items in the menu
Step4: A window will popup prompting you to provide more information on the ’rule’
Step5: Select OK
Qn10: Select the students who are above 70 kgs in each section. Use the file named:
Basics of excel circulated

11) Logical functions

If we wish to brand those who are more than 75 kgs as obese, then use the logical
function, IF THEN

The following is the syntax

IF(logical_test, value_if_true, value of false)

Step 1: Enter the logical statement


Step 2: Enter the statement that is to be true in double quotes
Step3: Enter the statement that is to be false in double quotes
Step 4; Close the bracket

Example, IF(B2=>75, “Obese”, “Not Obese”)

Qn.11: Declare all the students as “obese” if they weigh more than 75kgs and others as
“not obese”. Use the file named: Basics of excel circulated

12) Counting the cell values based on a criteria

Step1: Enter the function COUNTIF


Step2: Enter the range of the values
Step3: Enter the criteria you wish to apply

Qn12: Count the number of obese and non obese students in all the sections. Use the file
named: Basics of excel circulated

13) Maximum and minimum values in a data set

Use the functions MAX and MIN to list the maximum and minimum values in the
dataset. This is used only with numerical data

=MAX(B2: F11). Use the file named: Basics of excel circulated


Experiment 2 – Descriptive Statistics

The following is the format you have to maintain in all the experiments

1) Aim: Write the aim of the experiment. For example, the aim of this experiment is to
understand how descriptive statistics is run using Excel
2) Techniques used: Function such as Frequency distribution, measures of central
tendency and measures of dispersion
3) Procedure followed: Write the steps followed in each of the exercises in this
experiment
4) Results: Paste the results one below the other under each exercise.
5) Post it in MOODLE the excel file with the steps involved in doing each technique
and the output.

1) Construction of frequency distribution tables

As you learnt in the theory, frequency distribution tables gives you a snapshot of how the
data is spread out in the distribution. It is a summary table that shows the frequency of
each value or the number of times each value is repeated in a distribution or data.

The aim is to get the frequency of values in a data set. There are different ways of
constructing the frequency distribution table using excel. Let us construct using the
COUNTIFS function. This function is executed as follows in order to get the frequency
table

Step1: Use the data in the second sheet of the excel file named, “Basics of excel
circulated
Step2: Set the class intervals as seen from columns H5 to H11
Step2: Insert =COUNTIFS($A$5:$A$14,">=45",$A$5:$A$14,"<50"). Define the range
of the values to be counted. For example, against 45 to 50, enter in double quotes >=45
as the lower range and <50 as the upper range. Similarly, you may enter the specified
ranges in each class intervals
Step 3: Highlight the frequency distribution table > right click > Choose format > Click
border > Choose the desired border

(Note: The dollar sign is inserted so that the cell value does not change even if the drag
through the rows down. You may be aware the ‘row’ position changes if we drag down
the rows).
2) Descriptive Statistics

Step 1: Open the data labeled Fund Returns in the third worksheet named, ‘Experiemnt 2
– Fund returns’
Step2: From the menu choose Data > Data Analysis > Descriptive Statistics > Ok
(If you don’t see Data Analysis under Data, you must Add-in the analysis Toolpak
option)
Step3: In the Descriptive Statistics dialog box, click on the box next to Input Range, then select
the data. If you included the fund names when you highlighted the data, make sure you click on
the option next to Labels in First Row. Click the box in front of Summary Statistics. Then click
OK
Step4: The output table is difficult to read. Highlight the data and choose
Home>Format>Column>Autofit selection. Note that Excel provides numerous descriptive
statistics. I have put the measures of CT in boldface.

Also do measures of dispersion

Experiment 3 – Charts and graphs

Excel offers a basket of charts and graphs to visually present the data. While the number
of charges and graphs include a huge library, it is important to select a chart type that best
fits with your objective.
Line chart
To create line chart, follow the steps on fund returns data

Step1: Select the range B3:C13 that contains data on returns on metals and income
Step2: on the insert tab, in the charts group, click the Line symbol
Step3: Click line with markers. Use this chart when we intend to show trends over time
such years, months and days or categories.
Step4: A new menu design will appear after the line chart is created.

Click the ‘edit’ key

Step5: In the design menu, choose the item select data


Step6: In the select data source window, click edit key as shown in the screenshot above.
Step7: You will get a screen as shown in the screenshot below
Step8: Enter the input range from A4:A13.
Step9: Click Ok.
Step10: You will get the line graphs with the X-axis defined with the year
Step 11: From the items in the navigation bar as shown in the screen shot below, you can
choose the type of line chart you want

Step12. Modify the titles of the axis by double clicking inside the boxes
Step13: You may also try changing the chart layouts

Bar chart

Using the steps you followed for line charts, please construct Bar chart.

Area chart

Using the steps you followed for line charts, please construct Area chart.

Pie chart
Using the steps you followed for line charts, please construct Pie chart.
Experiment 4

Hypothesis testing

One sample test

Aim: To test for significant difference between a sample mean and an assumed and hypothesized mean
Techniques used: One sample t test. (Please write why you use one sample t test)
Write the null (H0) and Alternate Hypothesis (H1)

Tool used:Excel
Steps involved:

It was reported that Indians on an average spent 5.33 hours of screen time before the pandemic
related lockdown. However, lockdown had mandated Work-from-home and Study-from-home
thus increasing smartphone screen time. In order to test this, a survey was conducted among 30
households and the data are furnished from cells A2 to A31. Using excel, find if there is any
signficanct difference between the sample mean and hypthesised mean

Step 1: Open the data file named: “Smartphone one sample.xls”

Step 2: Select Formulas > Insert Function > Z.Test. This command returns the p-value associated with a
right-tailed test
Step 3: Supply the following three arguments in the dialog box:

a. Array is the data set. Select the data from A2:A31


b. X is the hypothesized mean (5.33) under the null hypothesis
c. Sigma is the value of the population / sample standard deviation. If the population
standard deviation is known, you may enter. Else, excel will use the standard deviation
of the sample.

Step 4: Enter the values in the order in items a,b,c


e) You will get the z value. Interpret the results the way I have taught in the theory class.
The value you estimate may be ‘z’, but decide which statistic ‘z’ or ‘t’ should be used in this
case.

f) Also conclude if the smartphone screen time during the pandemic lockdown period has
increased compared to the pre-lockdown period
Experiment 5

Hypothesis testing 2

Two independent sample

Aim: To test for significant difference between two independent sample means.

Technique used: Two independent sample test. Please write why you use independent sample
‘z’ or ‘t’ test

Write the null (H0) and Alternate Hypothesis (H1)

Tool used: Excel

Problem statement: A survey was conducted among 14 males and 16 females on the number of
hours of smartphone scree time spent. The data on males is given through A2 and A15 and
females in B2:B16. Test if the screen time spent is the same or different between males and
females using p-value approach.

Steps involved

1) Open the “Smartphone independent sample.xls” data


2) Choose data > Data analysis > t-test: Two sample assuming unequal variances > Ok.
3) If the population variances are known, we use the option z-test: Two sample for means.
4) If the population variances are unknown, but assumed equal, we can use the option t-test:
Two sample assuming equal variances
5) In the t-test, two sample assuming equal variances dialog box, choose Variable 1 Range
and select the data for Screen time males.
6) Then, choose Variable 2 Range and select the Screen time female data.
7) Enter a Hypothesised Mean difference of 0 since the null hypothesis captures no or zero
difference between the two sample data (d0 = 0), check the Labels box if you include
Screen time males and Screen time females as headings, and enter an ∞ value of 0.05
since the test is conducted at the 5% significance level.
8) Choose an output range and click OK
9) You may get a table as given below
10) Determine if the null hypothesis is accepted. Conclude, if the male and female screen
time is the same or not?

(In this case, you have been asked to test for differences using the p-value approach.
Please examine the value given as P(T<=t) one tail. If this values is greater than the
significance value (in our problem it is 0.5), then accept the null hypothesis. If it is lower
than the 0.05, the reject the null hypothesis.)
Experiment 6

Hypothesis testing 3

Paired sample t test

Aim: To test for significant difference between two dependent sample means.

Technique used: Two dependent sample test. Please write why you use dependent sample ‘z’ or
‘t’ test

Write the null (H0) and Alternate Hypothesis (H1)

Tool used: Excel

Problem statement: The nutritionist wants to use the data from the 40 Starbucks cardholder in
order to determinne if the posting of caloric information has reduced the intake of average food
calories. Please test using p-value approach at 5% significance level.

Steps involved

Step1:

Paired sample test

1) Open the Food Calories.xls data


2) Choose Data > Data Analysis > t-test: Paired two sample for means > Ok
3) In t-test: Paired two sample for means dialog box, choose Variable 1 Range and Select
food caloric intake before the ordinance.
4) Choose Variable 2 Range and select food caloric intake after the ordinance.
5) Enter a Hypothesised Mean Difference of 0 since d0 = 0. We choose the value to be ‘0’
because we hypothesise that the difference between the intake of average food calories
before and after posting the caloric information is zero.
6) Check the Labels box if include Before and After as headings and enter and α value of
0.05 since the test is conducted at the 5% significance level.
7) Choose an output range and click OK.
8) Determine if the null hypothesis is accepted. Conclude, if the intake before and after
posting the caloric information is the same or not?

(In this case, you have been asked to test for differences using the p-value approach.
Please examine the value given as P(T<=t) one tail. If this values is greater than the
significance value (in our problem it is 0.5), then accept the null hypothesis. If it is lower
than the 0.05, the reject the null hypothesis.)
Experiment 7

Hypothesis testing 4

One Way Analysis of Variance (ANOVA)

Aim: To test for significant difference between more than independent sample means.

Technique used: One way Analysis of Variance. Please write the assumptions behind the use of
One Way ANOVA

Write the null (H0) and Alternate Hypothesis (H1)

Tool used: Excel

Problem statement: An economic association wanted to determine if the salary drawn by those
who graduated in Economics, Medicine and History are the same. The data were collected from
9 economics graduates, 7 medicine graduates and 9 history graduates. Please test using p-value
approach at 5% significance level.

Steps involved

Step1:

1) Open the Salary data


2) Choose Data > Data Analysis > ANOVA: Single Factor > Ok
3) In the dialog box, Choose the three columns that contain the salary data
4) Choose Labels in First Row. This will the first row data as labels of each data column.
5) Since we are expected to conduct the test at 0.05 level of significance, retain the default
alpha value as 0.05. If your dialogue box shows a different value, change it to 0.05.
6) Choose an output range and click OK.
7) You will get the output in the same worksheet. If you wish to get the output in a different
worksheet, you choose the second option.

(In this case, you have been asked to test for differences using the p-value approach.
Excel produces the ‘p’ value as one of the outputs. You may notice that you get two
tables: 1) Summary which contains the descriptive statistics and 2) ANOVA. The p-
value is shown under ANOVA table. If the ‘p’ value is less than the test significance
level, in our case is 0.05, then null hypothesis is rejected. If this values is greater than the
significance value (in our problem it is 0.05), then accept the null hypothesis.
8) Write the conclusion if the salary difference across the three streams or not
Experiment 8

Correlation

Aim: To test for relationship between two variables measured using numeric data

Technique used: Karl Pearson Correlation. (Please write the assumptions behind the use of
Karl Pearson Correlation)

Tool used: Excel

Problem statement: An economics association wanted to determine if there is a relation between


unemployment rate and debt. The data were collected from different states in the US.

Steps involved

Step1:

1) Open the Debt_Payments data


2) Choose Data > Data Analysis > Correlation
3) In the dialog box, Choose the two columns that contain the necessary data
4) Choose Labels in First Row
5) Choose output Range. This suggests the range of cells where you want to get the output.
If you wish to get the output in a separate worksheet, then choose ‘New Worksheet’
under ‘Output Options’.
6) Interpret the results using correlation coefficient
Experiment 9

Regression

Aim: To test for independent-dependent relationship between two variables measured using
numeric data

Technique used: Simple Regression. (Please write the assumptions behind the use of Simple
Regression)

Tool used: Excel

Problem statement: An economics association wanted to determine if ‘Debt in various states is


determined by ‘Income’. The data were collected from different states in the US.

Steps involved

Step1:

1) Open the data labeled Debt_Payments


2) Choose Data > Data Analysis > Regression
3) In the regression dialog box, click on the box next to Input Y Range, then select the debt
data, including its heading. For Input X Range, select the income data, including its
heading. Check Labels, since we are using the Debt and Income as headings
4) If you wish to get the output in the same worksheet then enter the range of cells where
you wish to get the output as shows in the screenshot below
5) Click OK
6) Determine r²
7) Write the estimate regression equation (Y= a + bx) using the values shown under
‘Coefficients’ in the third table in the output. Constant ‘a’ is the intercept value under
coefficients and constant ‘b’ is the value produced against ‘Inc”
Experiment 10
Chisquare

Aim: To test if two attributes are associated with each other.

Technique used: Chisquare. (Please write the assumptions behind the use of Simple
Regression)

Write the null and alternate hypothesis

Tool used: Excel

Problem statement: A company has 10,000 pieces of furniture. About one tenth of them were
distributed over four halls. Find out if the distribution of the furniture across the halls are same
or different.

Steps involved

Step1:

1) Open the data labeled Furniture_hall. This file contains the observed values
2) Determine the expected values using the formula (Column total * Row Total)/Total
sample size for each observed value
3) Use the function “=chisquare.test” to determine the chi-square value. Enter the “actual
range’ and the ‘expected range’
4) Click ok

You might also like