Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Research Methodology Lab

(Using MS Excel and R)

PRACTICAL FILE
Submitted for partial fulfillment for the award of the Degree of

BACHELOR OF COMMERCE

(B.COM (H)2018 - 2021)

Under the supervision of


CA. (Dr.) Ruchi Kansil

Submitted by
NAME : SHIVAM GUPTA

ENROLLMENT NO. : 11717788818

SCHOOL OF BUSINESS STUDIES

VIVEKANANDA INSTITUTE OF PROFESSIONAL STUDIES


(Affiliated to Guru Gobind Singh Indraprastha University)

1
TABLE OF CONTENTS
S. No. Topic Page No.
Functions in Excel
1. Count Function
a. COUNT 6
b. COUNTA 7
c. COUNTBLANK 8
d. COUNTIF 9
e. COUNTIFS 10
2. Sum Function 11
a. SUM 11
b. SUMIF 12
3. Average Function 13
a. AVERAGE 13
b. AVERAGE IF 14
4. Concatenate Function
a. CONCATENATE (with space) 15
b. CONCATENATE (without space) 16
5. “&” Function 17
6. Max Function
a. MAX 18
b. MAXA 19
7. Min Function
a. MIN 20
b. MINA 21
8. Question : 22-24
- SUM
- SUMIF
- SUMIFS
- AVERAGEIF
- AVERAGEIFS
9. VLOOKUP Function 25
10. VLOOKUP Approximate Value 26

2
11. Pivot Table
- Row Labels with Count of sales and Sum of 27
sales
- Row Labels, Column Labels with Sum of
sales
- Report Filter (Company- North)
- Report Filter (Date- 1/1/2010)
12. Summarizing and Visualizing Data 28
- Frequency Distribution
- Relative Frequency Distribution
- Percent Frequency Distribution
- Graphs
13. Histogram
a. Histogram using Graph tab 32
b. Histogram – Chart output 33
c. Histogram – Pareto (sorted diagram) 34
d. Histogram – Cumulative percentage 35
14. Descriptive Statistics 36
a. Descriptive Statistics for various scales 36
15. Correlation 38
16. Hypothesis Testing 39
a. One sample t test using dummy (one-tailed) 39
b. One sample t test using dummy (two-tailed) 41
c. One sample t test using test average (one- tailed) 42
d. One sample t test using test average (two- tailed) 45
e. t test using function (all combinations) 46
f. Two sample - Independent sample t test 47
g. Two sample - Paired sample t test 49
h. One sample z test 50
i. Two sample z test 50
j. F test 54
k. ANOVA – Single Factor 56
l. ANOVA – Two Factor without replication 58
m. ANOVA – Two Factor with replication 62
n. Chi Square Test 65
Introduction to R

3
1. Four Panes in R 68
2. Import of Data Sheet in Excel 71
3. Descriptive Statistics 73
4. Correlation 74
5. Hypothesis Testing 77-86
a. One sample t test 77
b. Two sample - Independent Sample t test 78
c. Two sample - Paired Sample t test 80
d. One way ANOVA 82
e. F test 84
f. Chi Square Test 86

4
RESEARCH METHODOLOGY

Meaning of research:
Research in simple terms refers to search for knowledge. It is a scientific and systematic search
for information on a particular topic or issue. It is also known as the art of scientific
investigation. Several social scientists have defined research in different ways. In the
Encyclopedia of Social Sciences, D. Slesinger and M. Stephension (1930) defined research as
“the manipulation of things, concepts or symbols for the purpose of generalizing to extend,
correct or verify knowledge, whether that knowledge aids in the construction of theory or in the
practice of an art”. According to Redman and Mory (1923), research is a “systematized effort to
gain new knowledge”. It is an academic activity and therefore the term should be used in a
technical sense. According to Clifford Woody (kothari, 1988), research comprises “defining and
redefining problems, formulating hypotheses or suggested solutions; collecting, organizing 4 and
evaluating data; making deductions and reaching conclusions; and finally, carefully testing the
conclusions to determine whether they fit the formulated hypotheses”. Thus, research is an
original addition to the available knowledge, which contributes to its further advancement. It is
an attempt to pursue truth through the methods of study, observation, comparison and
experiment. In sum, research is the search for knowledge, using objective and systematic
methods to find solution to a problem.

Objectives of Research:
The objective of research is to find answers to the questions by applying scientific procedures. In
other words, the main aim of research is to find out the truth which is hidden and has not yet
been discovered. Although every research study has its own specific objectives, the research
objectives may be broadly grouped as follows: 1. To gain familiarity with new insights into a
phenomenon (i.e., formulative research studies); 2. To accurately portray the characteristics of a
particular individual, group, or a situation (i.e., descriptive research studies); 3. To analyse the
frequency with which something occurs (i.e., diagnostic research studies); and 4. To examine the
hypothesis of a causal relationship between two variables (i.e., hypothesis-testing research
studies).

5
1. COUNT FUNCTIONS

COUNT:
(A) MEANING: The COUNT function counts the number of cells that contain numbers, and
counts numbers within the list of arguments. Use the COUNT function to get the number of
entries in a number field that is in a range or array of numbers.

(B) SYNTAX: =COUNT (value1, value2,…)

EXAMPLE:

6
COUNTA:
(A) MEANING: The COUNTA function counts cells containing any type of information,
including error values and empty text (""). ... If you do not need to count logical values, text, or
error values (in other words, if you want to count only cells that contain numbers), use the
COUNT function.

(B) SYNTAX: =COUNTA(value1, value2,…)

(C) EXAMPLE:

7
COUNT BLANK:
(A) MEANING: The Microsoft Excel COUNTBLANK function counts the number of empty
cells in a range. ... It can be used as a worksheet function (WS) in Excel. As a worksheet
function, the COUNTBLANK function can be entered as part of a formula in a cell of a
worksheet.

(B) SYNTAX: =COUNTBLANK(range)

(C) EXAMPLE

8
COUNTIF:
(A) MEANING: The Microsoft Excel COUNTIF function counts the number of cells in a range,
that meets a given criteria. ... It can be used as a worksheet function (WS) in Excel. As a
worksheet function, the COUNTIF function can be entered as part of a formula in a cell of a
worksheet.

(B) SYNTAX: =COUNTIF(range, criteria)

(C) EXAMPLE:

9
COUNT-IFS FUNCTION
(A) MEANING: The Excel COUNTIFS function returns the count of cells that meet one or more
criteria. COUNTIFS can be used with criteria based on dates, numbers, text, and other
conditions. COUNTIFS supports logical operators (>,<,<>,=).

(B) SYNTAX: =COUNTIFS(range1, criteria1, range2, criteria2…)

(C) EXAMPLE:

10
2. SUM FUNCTION

SUM:
(A) MEANING: The Microsoft Excel SUM function adds all numbers in a range of cells and
returns the result. The SUM function is a built-in function in Excel that is categorized as a
Math/Trig Function. ... As a worksheet function, the SUM function can be entered as part of a
formula in a cell of a worksheet.

(B) SYNTAX: =SUM(number1, number2,…)


(C) EXAMPLE:

11
SUM IF:
(A) MEANING: The SUMIF function is a worksheet function that adds all numbers in a range of
cells based on one criteria (for example, is equal to 2000). ... It can be used as a worksheet
function (WS) in Excel. As a worksheet function, the SUMIF function can be entered as part of a
formula in a cell of a worksheet.

(B) SYNTAX: =SUMIF(range, critera, [sum_range])

(C) EXAMPLE:

12
3. AVERAGE FUNCTION

AVERAGE:
(A)MEANING: The Microsoft Excel AVERAGE function returns the average (arithmetic mean)
of the numbers provided. The AVERAGE function is a built-in function in Excel that is
categorized as a Statistical Function. It can be used as a worksheet function (WS) in Excel.

(B) SYNTAX: =AVERAGE(number1, number2,…)

(C) EXAMPLE:

13
AVERAGE IF:
(A)MEANING: The Microsoft Excel AVERAGEIF function returns the average (arithmetic
mean) of all numbers in a range of cells, based on a given criteria. The AVERAGEIF function is
a built-in function in Excel that is categorized as a Statistical Function. It can be used as a
worksheet function (WS) in Excel.

(B) SYNTAX: =AVERAGEIF(range, critera, [sum_range])

(C) EXAMPLE:

14
4. CONCATENATE FUNCTIONS

CONCATENATE (WITH SPACE)


(A) MEANING: The concatenate function is one of Excel's text functions. It is used to join two
or more words or text strings together. For example, sometimes data distributed over multiple
columns in an excel spreadsheet is more efficient to use when combined into one column.

(B) SYNTAX: =CONCATENATE(text1, “ “, text2)

(C) EXAMPLE:

15
CONCATENATE(WITHOUT SPACE)
A) MEANING: The concatenate function is one of Excel's text functions. It is used to join two or
more words or text strings together. For example, sometimes data distributed over multiple
columns in an excel spreadsheet is more efficient to use when combined into one column.

(B) SYNTAX=CONCATENATE(text1,[text 2]..)

(C) EXAMPLE:

16
5. ”&” FUNCTION

A) Meaning: AND function returns TRUE if all conditions are TRUE. It


returns FALSE if any of the conditions are FALSE.
The AND function is a built-in function in Excel that is categorized as a
Logical Function.
B) Syntax=AND (logical1, [logical2], ...)
C) Example

17
6. MAX FUNCTION

MAX
(A) MEANING: The Microsoft Excel MAX function returns the largest value from the numbers
provided. ... It can be used as a worksheet function (WS) in Excel. As a worksheet function, the
MAX function can be entered as part of a formula in a cell of a worksheet.

(B) SYNTAX: =MAX(number1, number2,…)

(C) EXAMPLE:

18
MAXA

A) Meaning: MAXA function returns the largest numeric value in a range of values. The MAXA
function ignores empty cells, but evaluates the logical values TRUE and FALSE as 1 and 0,
respectively.

B) Syntax=MAXA (value1, [value2], ...)

C) Example;

19
7. MIN FUNCTION

MIN
(A) MEANING: The Microsoft Excel MIN function returns the smallest value from the numbers
provided. The MIN function is a built-in function in Excel that is categorized as a Statistical
Function. It can be used as a worksheet function (WS) in Excel.

(B) SYNTAX: =MIN(number1, number2,…)

(C) EXAMPLE:

20
MINA

: A) Meaning: MINA function returns the smallest numeric value in a range of values. The
MINA function ignores empty cells, but evaluates the logical values TRUE and FALSE as 1 and
0, respectively.

: B) Syntax: =MINA (value1, [value2], ...)

C) Example

21
8. QUESTIONS

Given the data of marks scored by some students of Bcom 2nd year for various subjects
out of 100. You are required to calculate the following functions.
Enroll No Name 101 103 105 107 109
101 Dileep 42 29 76 3 13
102 Raju 66 34 48 75 20
103 Krishna 24 87 27 91 12
104 Akshay 76 65 64 61 59
105 Sanjeev 54 21 22 68 21

Syntax: Sum (cell address: cell address)


Syntax: SUMIF (range, criteria)
Syntax: SUMIFS (sum_range, range1, criteria1, [range2], [criteria2] ...
Syntax =AVERAGEIF (range, criteria, [average_range])
Syntax =AVERAGEIFS (avg_rng, range1, criteria1, [range2], [criteria2] ...

A) Sum

B) SUMIF

22
C) SUMIFS

D) Average IF

23
E) Average IFS

24
9. V-LOOKUP FUNCTION

(A) MEANING: When the VLOOKUP function is called, Excel searches for a lookup value in
the leftmost column of a section of your spreadsheet called the table array. The function returns
another value in the same row, defined by the column index number.

(B) SYNTAX: =VLOOKUP(lookup_value, table_array, col_index_num, false)

(C) EXAMPLE:

25
10.V-LOOKUP (APPROXIMATE VALUE)

26
11. PIVOT TABLE
Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract the
significance from a large, detailed data set.

To insert a pivot table, execute the following steps.

1. Click any single cell inside the data set.


2. On the Insert tab, in the Tables group, click PivotTable
3. Click OK.
Below you can find Pivot Table.

27
Summarizing and Visualizing Data

Frequency
The Microsoft Excel FREQUENCY function returns how often values occur within a set of data. It
returns a vertical array of numbers. The FREQUENCY function is a built-in function in Excel that is
categorized as a Statistical Function. It can be used as a worksheet function (WS) in Excel.

28
Relative Frequency
Relative Frequency is the percentage a specific frequency is of the total frequencies.

29
Percentage Frequency
A percentage frequency distribution is a display of data that specifies the percentage of observations
that exist for each data point or grouping of data points. It is a particularly useful method of expressing
the relative frequency of survey responses and other data

30
Bar Graph

31
12.HISTOGRAM

HISTOGRAM – using graph tab


A histogram is a specific use of a column chart where each column represents the frequency of
elements in a certain range. In other words, a histogram graphically displays the number of
elements within the consecutive non-overlapping intervals, or bins.

32
Histogram- Chart Output

Step

33
Histogram- Pareto (sorted diagram )

Step
Select the range A1:D22

Insert tab > chart group > histogram symbol > pareto

Note: A Pareto chart combines a column chart and a line graph.

34
Histogram- Cumulative Percentage

35
13.Descriptive Statistics

Descriptive Statistics for various scales


Descriptive statistics are one of the fundamental “must know” with any set of data. It gives you a
general idea of trends in your data including:

 The mean, mode, median and range.


 Variance and standard deviation.
 Skewness
 Count, maximum and minimum.

The statistics is obtained from the following values of X and Y

TREATMENT OUTCOME X Y
1 1 10.2 9.9
1 1 9.7
2 1 10.4 10.2
1 2 9.8 9.7
2 1 10.3 10.1
1 2 9.6 9.4
2 1 10.6 10.3
1 2 9.9 9.5
2 2 10.1 10
1 2 10.2

36
37
14.Correlation
We usually use correlation coefficient (a value between -1 and 1) to display how strongly two
variables are related to each other. In Excel, we also can use the CORREL function to find the
correlation coefficient between two variables.

38
15. HYPOTHESIS TESTING

One Sample t Test using dummy (one-tailed)

Problem Statement:

To determine whether the population mean age is greater than 40 at α = 0.05.

Hypothesis:
H0: µ<=40

H1: µ>40

Age Dummy
18 0
24 0
56
78
67
24
65
89
76
23
45
65
78
55
32
33
44

Steps:
 Go to Data – Data Analysis – T-Test sample assuming equal variance

39
40
Decision Rule:
1 if t Stat value > t Critical value, Reject Null hypothesis

2. If p value is less than alpha (α) [0.05], Reject Null hypothesis

Here,

 t Stat value=0.683,which is less than t critical value= 1.739 therefore null hypothesis is
accepted.
 Here p value is 0.25 which is greater than 0.05, so null hypothesis is accepted.

Inference:
The Null Hypothesis is accepted. Therefore, population mean age is greater than 40.

41
One Sample t Test using test average (one-tailed)

Problem Statement:
Is there sufficient evidence to suggest that the mean to exertion is greater after choco milk than
carbo replacement drink? Use a significant level.

Hypothesis Statement:
H0:µ1-µ2≤0

H1: µ1-µ2>0

Chocolate Carbo
Cyclist Milk Replacement
1 50.46 42.9
2 47.08 50.1
3 57.51 41.67
4 46.6 32.69
5 29.1 46.33
6 57.5 31.63
7 23.87 20.61
8 28.65 14.99
9 35.37 20.11

STEPS:

1. Go to Data- Data Analysis- t-Test paired two sample for means

42
43
Decision Rule:

T statement is greater than one-tail, reject H0


p>0.05, reject H0

INFERENCE:
Therefore we can suggest that MEAN time for exhaustion
of chocolate milk is greater than carbohydrate replacement
drink.

44
One Sample t Test using test average (two-tailed)

Problem Statement:

The above are the weights of 8 persons to test the effectiveness of the diet. The weights are
before and after the consumption of the diet, you are required to determine whether the diet was
effective or not.

Hypothesis:

H0: µ LOSS =0 (The average weight loss was 0)


H1: µ loss≠0 (the average was different than 0)

BEFORE AFTER
162 168
170 136
184 147
164 159
172 143
176 161
159 143

170 145

STEPS:

Go to Data- Data Analysis- t Test Paired two samples for Means (Two- tail)

45
46
Decision Rule:

A. If T stat is greater than t critical, reject null hypothesis.


B. If p value is less than α = 0.05, reject null hypothesis.

INFERENCE

As T stat value (3.70) is greater than t critical (1.89), therefore null hypothesis is rejected.

P value (0.07) is greater than α=0.05, therefore reject null hypothesis.

There is enough evidence that the diet was effective.

47
T-TEST(PAIRED TWO SAMPLES)

48
49
Z Test

Research problem:
The net annual returns [returns on investment after deducting all relevant fee] in percentage are
given

Can investors do better by buying mutual funds directly from banks or other financial institution
than by purchasing mutual funds through brokers? Can we conclude at 5% significance level that
directly purchased mutual funds outer perform mutual funds bought through brokers.

Direct Broker
9.33 3.24
6.94 -6.76
16.17 12.8
16.97 11.1
5.94 2.73
12.61 -0.13
3.33 18.22
16.13 -0.8
11.2 -5.75

50
1.14 2.59
4.68 3.71
3.09 13.15
7.26 11.05
2.05 -3.12
13.07 8.94
0.59 2.74
13.57 4.07
0.35 5.6
2.69 -0.85
18.45 -0.28
4.23 16.4
10.28 6.39
7.1 -1.9
-3.09 9.49
5.6 6.7
5.27 0.19
8.09 12.39
15.05 6.54
13.21 10.92
1.72 -2.15
14.69 4.36
-2.97 -11.07
10.37 9.24
-0.63 -2.67
-0.15 8.97
0.27 1.87
4.59 -1.53
6.38 5.23
-0.24 6.87
10.32 -1.69
10.29 9.43
4.39 8.31
-2.06 -3.99
7.66 -4.44
10.83 8.63
14.48 7.06
4.8 1.57
13.12 -8.44
-6.54 -5.72

51
-1.06 6.95

HYPOTHESIS
LET DIRECT INVESTMENT BE µd AND BROKER INVESTMENT BE µb

THEREFORE,µd-µb>0:H1

µd-µb<=0:H0

IN Z TEST WE NEED TO FIND VARIANCE OF THE TWO VARIABLES.

IN CASE OF DIRECT- 37.48818

IN CASE OF BROKER-43.33928

STEP 1:

AND SELECT THE DATA FOR INPUT IN Z TEST.

52
OUTPUT
z-Test: Two Sample for Means

Variable 1 Variable
2
Mean 6.6312 3.7232
Known Variance 37.48818 43.33928
Observations 50 50
Hypothesized Mean Difference 0
Z 2.287177862
P(Z<=z) one-tail 0.011092722
z Critical one-tail 1.644853627
P(Z<=z) two-tail 0.022185444
z Critical two-tail 1.959963985

DECISION RULE

53
If z > z critical, reject null.

If p value < alpha, reject null.

HERE, Z IS GREATER THAN Z CRITICAL (2.28>1.64), REJECT NULL HYPOTHESIS.

ALSO P VALUE IS LESS THAN ALPHA, REJECT NULL HYPOTHESIS.

INFERENCE
Therefore we can say that mutual funds purchase directly out performs mutual funds bought
from broker.

F test

PROBLEM:
To test the null hypothesis that the variances of two populations are equal.

HYPOTHESIS STATEMENT:
H0: σ12 = σ22
H1: σ12 ≠ σ22

STEPS:
1. On the Data tab, in the Analysis group, click Data Analysis.

2. Select F-Test Two-Sample for Variances and click OK.

3.Click in the Variable 1 Range box and select the range A2:A7.

4. Click in the Variable 2 Range box and select the range B2:B6.

54
5. Click in the Output Range box and select cell E1.

6. Click “OK”.

# Be sure that the variance of Variable 1 is higher than the variance of Variable 2. This is the
case, 160 > 21.7. If not, swap your data. As a result, Excel calculates the correct F value, which
is the ratio of Variance 1 to Variance 2 (F = 160 / 21.7 = 7.373).

CONCLUSION:

If F > F Critical one-tail, we reject the null hypothesis. This is the case, 7.373 > 6.256. Therefore,
we reject the null hypothesis. The variances of the two populations are unequal.

55
ANOVA- Single Factor

PROBLEM:

. A single factor or one-way ANOVA is used to test the null hypothesis that the means of several
populations are all equal.

Below you can find the salaries of people who have a degree in economics, medicine or history.

HYPOTHESIS:
H0: μ1 = μ2 = μ3
H1: at least one of the means is different.

STEPS:

1. On the Data tab, in the Analysis group, click Data Analysis.


2. Select Anova: Single Factor and click OK.

3. Click in the Input Range box and select the range A2:C10.
4. Click in the Output Range box and select cell E1.
5. Click “OK”.

56
CONCLUSION:

If F > F crit, we reject the null hypothesis. This is the case, 15.196 > 3.443. Therefore, we reject the null
hypothesis. The means of the three populations are not all equal. At least one of the means is different.
However, the ANOVA does not tell you where the difference lies.

57
ANOVA- Two Factor without replication

Problem Statement

To test whether or not marks of students differ with respect to student and subject
both.

HYPOTHESIS:

H0

Row wise: There is no significant difference in marks of students.

Column-wise: There is no significant difference in marks for three subjects i.e.


Economics, Science and History.

H1

Row wise: There is significant difference in marks of students.

Column-wise: There is significant difference in marks for three subjects i.e.


Economics, Science and History.

Students Economics Science History


A 42 69 35
B 53 54 40
C 49 58 53
D 53 64 42
E 43 64 50

58
STEPS:

1. Go to Data- Data Analysis- ANOVA without replication.

59
DECISION RULE:

1. If F stat is greater than F critical, Reject Null Hypothesis.


2. If p value is less than alpha (5%), reject null hypothesis.

Row –wise

Here, F stat is 0.30 and F critical is 3.83, so Null hypothesis is accepted.

Here, p value is 0.86 which is more than 5%, so null hypothesis is accepted.

Column-wise:

Here, F stat is 8.59 and F critical is 4.45, so Null hypothesis is rejected.

Here, p value is 0.10 which is less than (5%), so Null hypothesis is rejected.

60
INFERENCE

There is enough evidence that marks of students do not differ significantly.

COLUMN-WISE

There is enough evidence that marks of three subjects i.e. Economics, Science and
History do differ significantly.

61
TWO –FACTOR ANOVA WITH REPLICATION

1. TO ANALYZE THERE IS A SIGNIFICANT DIFFERNCE BETWEEN SCHOOL A


AND B.
2. TO ANALYZE THERE IS A SIGNIFICANT DIFFERNCE BETWEEN MARKS OF
ECO, SCIENCE AND HISTORY.
3. TO ANALYZE THATE THERE IS SIGNIFICANT DIFFERENCE BETWEEN
SCHOOL A AND B SUBJECT WISE.

HYPOTHESIS
1. H0:- THERE IS NO SIGINIFICANT BETWEEN SCHOOL A AND B.
2. H0:- THERE IS NO SIGINIFICANT BETWEEN ANY SUBJECTS
3. H0- THERE IS NO SIGINIFICANT BETWEEN SCHOOL A AND B SUBJECT WISE.

STEP-1

GO TO DATA ANALYSIS.

62
STEP 2:

OUTPUT
Anova: Two-Factor With Replication

SUMMARY Economics Science History Total


School A
Count 5 5 5 15
Sum 240 309 220 769
Average 48 61.8 44 51.26667
Variance 28 34.2 54.5 95.6381

School B
Count 5 5 5 15
Sum 195 111 173 479
Average 39 22.2 34.6 31.93333
Variance 494 924.2 420.3 579.4952

Total

63
Count 10 10 10
Sum 435 420 393
Average 43.5 42 39.3
Variance 254.5 861.5556 235.5667

ANOVA
Source of SS df MS F P-value F crit
Variation
Sample 2803.333 1 2803.333 8.6027 0.007272 4.259677
Columns 90.6 2 45.3 0.139014 0.870912 3.402826
Interaction 1540.467 2 770.2333 2.363646 0.115611 3.402826
Within 7820.8 24 325.8667

Total 12255.2 29

DECISION RULE
 IF F STAT IS GREATER THAN F CRITICAL, REJECT NULL HYPOTHESIS.
 IF P VALUE IS LESS THAN ALPHA (5%), REJECT NULL HYPOTHESIS.
Here, in case 1we will reject null as f stat is greater than f critical and also p value is less than
alpha.

In case 2 we will accept null as f stat is less than f critical and also p value is greater than alpha.

In case 3 we will accept null as f stat is less than f critical and also p value is greater than alpha.

INFERENCE
1. There is no significant difference between school A and school B.

2. There is significant difference in marks of Eco, science and history.

3. There is significant difference between school A and B subject wise.

64
CHI Square Test

Discrete Series
A Co. is concerned about increase in violent alterations between its employees. The no. of violent
incidents recorded by management during six months randomly selected months.

Month Observed
Jan 55
Feb 65
Mar 68
Apr 72
May 78
Jun 82

To determine whether or not crime rate in the company is associated with months

Null Hypothesis: Crime rate is not associated with months


Alternate Hypothesis: Crime rate is associated with months

Degree of freedom= (r-1)(c-1)


Degree of freedom= 5
Table Value=11.07

65
Rule- If chi value is greater than tab value reject null

CONTINUOUS SERIES

Determine whether brand preference is independent of age group

Age/Brand Brand 1 Brand 2 Brand 3


15-25 65 76 72
26-35 60 40 64
36-45 45 52 50
46-55 55 65 60

Null: There is no association between brand preference and age group


Alternate: There is association between brand preference and age group

Degree of freedom= (r-1)(c-1)


(4-1)(3-1)
Degree of freedom= 6
Tab Value=12.5016

Rule- If chi value is greater than tab value reject null

66
INTRODUCTION TO R

67
R-STUDIO
RStudio is a free and open-sourceintegrated development environment (IDE) for R, a
programming language for statistical computing and graphics. RStudio was founded by JJ
Allaire, creator of the programming language ColdFusion. Hadley Wickham is the Chief
Scientist at RStudio.

RStudio is available in two editions: RStudio Desktop, where the program is run locally as a
regular desktop application; and RStudio Server, which allows accessing RStudio using a web
browser while it is running on a remote Linux server. Prepackaged distributions of RStudio
Desktop are available for Windows, macros, and Linux.

RStudio is available in open source and commercial editions and runs on the desktop (Windows,
macros, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro
(Debian,Ubuntu, Red Hat Linux, CentOS, openSUSE and SLES). RStudio is partly written in the
C++ programming language and uses the Qt framework for its graphical user interface. The
bigger percentage of the code is written in Java. JavaScript is also amongst the languages used.

 FOUR PANES IN R:
The R Studio interface consists of four main panes, or windows:

1. TOP LEFT:
Text editor or script window. This is where you can save and edit collections of commands.

68
2. TOP RIGHT:
Environment and history window. The environment window contains objects (data, values,
functions) R has currently stored in its memory. The history window shows all commands that
were executed in the Console.

3. BOTTOM LEFT:
Console or command window. Here you can type any valid R command after the prompt
followed by Enter and R will execute that command.

69
4. BOTTOM RIGHT:
Files, plots, packages, help, and viewer pane. Here you can open files, view plots, install and
load packages, read man pages, and view markdown and other documents in the viewer tab.

70
IMPORT OF DATA SHEET IN EXCEL:

Importing data into R is a necessary step that, at times, can become time intensive. To ease this
task, RStudio includes new features to import data from: csv, xls, xlsx, sav, dta, por, sas and stata
files.

Step 1:Go to Files > Import Dataset > From Excel.

Step 2: Go to Browse; select the file to be imported.

71
Step 3: Select the sheet to be imported from Default.

Step 4: Click OK after selecting the sheet to be imported.

72
DESCRIPTIVE STATISTICS :
 R provides a wide range of functions for obtaining summary statistics. One method of
obtaining descriptive statistics is to use the sapply( ) function with a specified summary
statistic.
 # get means for variables in data frame mydata

# excluding missing values

sapply(mydata, mean, na.rm=TRUE)

 Possible functions used in sapply include mean, sd, var, min, max, median, range, and
quantile.

 There are also numerous R functions designed to provide a range of descriptive statistics at
once. For example

 # mean,median,25th and 75th quartiles,min,max

summary(mydata)

# Tukey min,lower-hinge, median,upper-hinge,max

fivenum(x)

 Using the Hmisc package


 library(Hmisc)

describe(mydata)

# n, nmiss, unique, mean, 5,10,25,50,75,90,95th percentiles

# 5 lowest and 5 highest scores

 Using the pastecs package.

73
CORRELATION:

View(RM_29)
>cor.test(RM_29$`Group 1`,RM_29$Group2)

Pearson's product-moment correlation

data: RM_29$`Group 1` and RM_29$Group2


t = 0.42408, df = 4, p-value = 0.6933
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.7264192 0.8721680
sample estimates:
cor
0.2074277

>Group 1=c(42,53,49,53,43,44,45,52,54)

>Group2=c(69, 54,58,64,64,55,56,0,0)

>Group3=c(35, 40, 53,42,50,39,55,39,40)

>

>Combinedgroup=data.frame (cbind (Group1, Group2, Group3))

74
>summary(combined group)

Group1 Group2 Group3


Min. : 42.00 Min. : 0.00 Min. :35.00
1st Qu.:44.00 1st Qu.:54.00 1st Qu.:39.00
Median :49.00 Median :56.00 Median :40.00
Mean :48.33 Mean :46.67 Mean :43.67
3rd Qu.:53.00 3rd Qu.:64.00 3rd Qu.:50.00
Max. :54.00 Max. :69.00 Max. :55.00

Hypothesis Testing

One sample T-test

Data entry:

>a<-c(5,6,24,16,17,10,23,11,17,3,21,18,18,12,12,17,10,3,7,13,23,9,22,8)

Syntax:

>t.test(a,mu=20)

75
t = -4.8471, df = 23, p-value = 6.817e-05
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
10.78539 16.29794
sample estimates:
mean of x
13.54167

76
HYPOTHESIS TESTING

One sample t test

>a<-c(5,6,24,16,17,10,23,11,17,3,21,18,18,12,12,17,10,3,7,13,23,9,22,8)
>t.test(a,mu=20)

One Sample t-test

data: a
t = -4.8471, df = 23, p-value = 6.817e-05
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
10.78539 16.29794
sample estimates:
mean of x
13.54167

77
TWO SAMPLE INDEPENDENT SAMPLE T TEST

Problem statement:
To analyse that the time spent by full time students in studying statistics is different as time
spent by part time students

Using R command

>library(readxl)
>Part_time_raw<- read_excel("C:/Users/sudhir/Downloads/Part time raw.xlsx")
>View(Part_time_raw)
>t.test(Part_time_raw$`FullTime`,Part_time_raw$`Part Time`)

Result:
Welch Two Sample t-test

data: Part_time_raw$`Full Time` and Part_time_raw$`Part Time`


t = 0.33137, df = 31.772, p-value = 0.7425
alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:


-1.169199 1.623366
sample estimates:
mean of x mean of y
3.583333 3.356250

78
INFERENCE:
The time spent by full time students in studying statistics is different as time spent by part time
students.

79
TWO SAMPLE- PAIRED SAMPLE T TEST

Research Problem:
Is there sufficient evidence to suggest that the mean time to exhaustion is greater than after
having carbohydrate replacement drink?

>library(readxl)
>Carbohydrate_data<- read_excel("C:/Users/sudhir/Downloads/Carbohydrate data.xlsx")
>View(Carbohydrate_data)
>t.test(Carbohydrate_data$`ChocolateMilk`,Carbohydrate_data$`Carbohydrate Replacement
Drink`)

Result:
Welch Two Sample t-test

data: Carbohydrate_data$`Chocolate Milk` and Carbohydrate_data$`Carbohydrate Replacement


Drink`
t = 1.3949, df = 15.999, p-value = 0.1821

80
Alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.356605 21.118827
sample estimates:
mean of x mean of y
41.82889 33.44778

INFERENCE:

The mean time to exhaustion is less after having carbohydrate replacement drink.

81
ONE WAY ANOVA:
>stack(combined group)

values ind
1 42 Group1
2 53 Group1
3 49 Group1
4 53 Group1
5 43 Group1
6 44 Group1
7 45 Group1
8 52 Group1
9 54 Group1
10 69 Group2
11 54 Group2
12 58 Group2
13 64 Group2
14 64 Group2
15 55 Group2
16 56 Group2
17 0 Group2
18 0 Group2
19 35 Group3
20 40 Group3
21 53 Group3
22 42 Group3

82
23 50 Group3
24 39 Group3
25 55 Group3
26 39 Group3
27 40 Group3

>stackedgroup=stack(combinedgroup)

>anovaresults=avo (values~ind,data=stackedgroup)

> summary (anovaresults)

Df Sum Sq Mean Sq F value Pr(>F)


ind 2 101 50.33 0.189 0.829
Residuals 24 6386 266.08

83
F test

data: RM_29$`Group 1` and RM_29$Group2


F = 0.70823, num df = 5, denom df = 5, p-value = 0.7142
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.09910322 5.06127790
sample estimates:
ratio of variances
0.7082294

var.test(RM_29$`Group 1`,RM_29$Group2,alternative = "greater")

84
CHI TEST

Library (readxl)
> RM_1_apr <- read_excel ("RM 1_apr.xlsx")
> View (RM_1_apr)
Table (RM_1_apr)

Observed

Month 55 65 68 72 78 82
Apr 0 0 0 1 0 0
Feb 0 1 0 0 0 0
Jan 1 0 0 0 0
June 0 0 0 0 1
Mar 0 0 1 0 0 0
May 0 0 0 0 1 0

85
>chisq.test(table(RM_1_apr$Month,RM_1_apr$Observed))

Pearson's Chi-squared test


data: table(RM_1_apr$Month, RM_1_apr$Observed)
X-squared = 30, df = 25, p-value = 0.2243

var.test(RM_29$`Group 1`,RM_29$Group2,alternative = "two.sided")

86

You might also like