Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

2022 ECM1001

LA TROBE UNIVERSITY
ECONOMIC AND FINANCE DATA ANALYSIS

Semester 1, 2022

ECM1001 Individual Assignment


Due Date: 11.59pm 15th May

Student ID: 21160943


Student Name: Nguyen Huong Tra
Workshop Time/Day: 5/14/2022
Subject Tutor Name: Anh Do Van/Van Hung Pham

Instructions

1. Fill in your name and student ID above on the cover page.


2. Record your subject tutor’s name and workshop details under your name and ID.
3. This is an individual assignment.
4. This individual assignment carries 30 marks and will represent 30% of your final grade.
5. Please use your unique random sample of 500 observations and answer Questions 1-7 in
Part A to Part C.

Page 1 of 12
Analysing US Stocks Data from NYSE, AMEX, and Nasdaq

Part A: Sampling
Q1. What sampling plan did you employ to select your sample of 500 US stocks? [1 mark]

The sampling plan that was utilized was a simple random sampling

Q2. Use your own words and describe a possible way to obtain a stratified random sample from
the available 3922 US stocks (i.e., from all stocks in the “US Stock Data for Individual Assignment”
Excel file). [1 mark]

tratified Sampling is another


common sampling method,
which separates the entire
population
into sub groups or strata based on
similarities eg. gender or
religion. A random sample is
then
taken from each strata. This
method is preferred when the
population is notably
heterogeneous or
Page 2 of 12
when higher precision is needed.
Cluster Sampling is a sampling
method, which takes samples
from naturally formed sub groups
or
clusters rather than the whole
population. This differs from
stratified sampling as the cluster
as a
whole is considered the sample
whereas in stratified sampling
only certain elements of the
strata is
considered the sample. The
clusters are already formed
before the sampling begins rather
than
Page 3 of 12
being separated manually by the
sampler, eg. schools. This
method is used, as it is more
feasible
and cost effective.
tratified Sampling is another
common sampling method,
which separates the entire
population
into sub groups or strata based on
similarities eg. gender or
religion. A random sample is
then
taken from each strata. This
method is preferred when the
population is notably
heterogeneous or
Page 4 of 12
when higher precision is needed.
Cluster Sampling is a sampling
method, which takes samples
from naturally formed sub groups
or
clusters rather than the whole
population. This differs from
stratified sampling as the cluster
as a
whole is considered the sample
whereas in stratified sampling
only certain elements of the
strata is
considered the sample. The
clusters are already formed
before the sampling begins rather
than
Page 5 of 12
being separated manually by the
sampler, eg. schools. This
method is used, as it is more
feasible
and cost effective.
Stratified Sampling is a common sampling method to separate the whole population into sub group
based on similarities (eg. Exchange name or Stock price level). A random sample will be taken from
each sub groups. This method is preferred when the data is notably heterogenous or need higher
precision.

Part B: Descriptive Statistics

Q3. Analysis on stock exchange proportion:

a. The “Exchange Name” column in your data indicates the stock trading venue for each stock.
In the following table, report proportion of stocks based on stock exchange names in your sample.
[1.5 mark]

Exchange Name Sample Proportion (in percentage form)


New York Stock Exchange 30.4%
American Stock Exchange 3.2%
Nasdaq Stock Exchange 66.4%

b. Use an appropriate chart (properly labelled) to display information in Q3-a above, and
provide brief interpretation on the chart. [2 mark]

Page 6 of 12
Proportion of US Stock Data
70.00%

60.00%
Sample Proportion

50.00%

40.00%

30.00%

20.00%

10.00%

0.00% Exchange name


New York Stock American Stock Nasdaq Stock
Exchange Exchange Exchange

c. Estimate the 95% confidence interval for the proportion of stocks traded on the New York
Stock Exchange in your sample of 500 stocks. Briefly interpret your answer. [1.5 marks]

α
N= 500 if N>30 -> 1-α = 0,95 -> α = 0.05 -> =0,025 -> Z α =1,96
2 2

^p ± z 0.025
√ ^p (1− ^p )
n
=[ 0.264,0 .344 ]

At the 95% confidence level, that the proportion of stocks traded on the New York Stock Exchange
in your sample of 500 stocks is estimated to lie somewhere between 26,4% and 34.4%

Q4. The “Exchange Name” column in your data indicates the stock trading venue for each stock;
the “Stock Price Level” column in your data indicates whether the end-of-month stock price is
“High” or “Low”. Investigate the relationship between stock trading venue and stock price level
based on the “Exchange Name” and “Stock Price Level” data:

a. Complete the following cross classification table of “Stock Price Level” by “Exchange Name”.
[3 marks]
Exchange Name

New York Stock American Stock Nasdaq Stock


Total
Exchange Exchange Exchange

High 70 1 68 139
Stock Price
Low 82 15 264 361
Level
Total 152 16 332 500

Page 7 of 12
b. Derive a relative frequency cross classification table based on row totals (i.e., “Stock Price
Level”) using the table in Q4-a above. [3 marks]

Exchange Name

New York Stock American Stock Nasdaq Stock


Total
Exchange Exchange Exchange

High 0.50 0.01 0.49 1.00


Stock Price
Low 0.23 0.04 0.73 1.00
Level
Total 0.30 0.03 0.67 1.00

c. Use an appropriate graph to display the information contained in the cross-classification


table of row relative frequencies above. Decide whether there is a relationship between stock
trading venue and stock price level. [2 mark]

Relationship between stock trading venue and stock price level


80.00%

70.00%

60.00%

50.00%

40.00%

30.00%

20.00%

10.00%

0.00%
high low

New York Stock Exchange American Stock Exchange Nasdaq Stock Exchange

Q5. Investigating the relationship between stock return and trading volume:

a. In the following table, report average stock return and trading volume for each stock
exchange in your sample. [3 marks]

New York Stock American Stock Nasdaq Stock


Exchange Exchange Exchange
Average Stock Return 0.0135 0.0596 0.0292
Average Trading Volume (in Units of 60354.31
349792.08 317827.96
100)

Page 8 of 12
b. Estimate the 95% confidence interval of the average return for the New York Stock
Exchange, and interpret your answer. [2 marks]

α
N= 500 if N>30 -> 1-α = 0,95 -> α = 0.05 -> =0,025 -> Z α =1,96
2 2

n = 500
^x =0.0135
s=0.107
s
Formula: ^x ± t ∝ , n−1
2 √n
 The 95% confidence interval estimate of the average return
0.107
= 0.107 ± 1.965 x = [0.004, 0.02]
√ 500
The 95% confidence interval estimate is that the average return for the New York Stock Exchange is
between 0.004 and 0.02

c. Calculate the covariance and correlation coefficient between stock return and trading
volume for stocks traded on the New York Stock Exchange, and interpret your answer. Why is
correlation useful in determining the strength of the link between two variables (i.e., stock return
and trading volume in this case)? [3 marks]

Standard deviation
Covariance
Correlation coefficient

Covariance

n
S xy =∑ ¿ ¿ ¿19353.98
i=1

Covariance = 19353.98 -> the association between stock return and trading volumn for stocks trade on the
New York Stock Exchange is positive

Correlation Coefficient


n

∑ ( y i− y ) 0.27039312
i=1
S xy = =¿
n−1

Page 9 of 12
Correlation Coefficient = 0.27039312 -> the association between stock return and trading volumn for stocks
trade on the New York Stock Exchange is positive

Q6. Frequency, relative frequency, and cumulative relative frequency

a. Construct a frequency and cumulative relative frequency distribution using the “End-of-
Month Stock Price” data of stocks traded on the New York Stock Exchange (i.e., complete the table
below). [3 marks]

Relative Cumulative
Frequency Frequency Relative Frequency
$0 < Price ≤ $10 23 0.151315789 23
$10 < Price ≤ $20 18 0.118421053 41
$20 < Price ≤ $30 13 0.085526316 54
$30 < Price ≤ $40 14 0.092105263 68
$40 < Price ≤ $50 14 0.092105263 82
$50 < Price ≤ $60 10 0.065789474 92
$60 < Price ≤ $70 9 0.059210526 101
$70 < Price ≤ $80 4 0.026315789 105
$80 < Price ≤ $90 5 0.032894737 110
$90 < Price ≤ $100 4 0.026315789 114
$100 < Price 38 0.25 152

b. Construct a relative frequency histogram and an ogive for the “End-of-Month Stock Price”
data of stocks traded on the New York Stock Exchange, using the frequency distribution table in Q6-
a above. [2 mark]

Page 10 of 12
Ogive
40
35
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11

Part C: Summary and Discussion

Q7. Based on your answers above in this individual assignment, briefly discuss how to
determine the relationship between two categorical variables (e.g., see Q4), and how to determine
the relationship between two numerical variables (e.g., see Q5).
Note: Use your own words, and talk about the methods in general. No additional calculation is
required. [2 marks]

  Nominal variables are those that are measured at the nominal level and have no inherent rank.  To
determine between nominal variables, a fallback/bivariate table is very commonly used.  They
examine relationships in the data that may not be obvious and also group variables to understand
correlations between different variables while showing how they vary from one group of variables
to another.

  A variable is a variable or continuous variable that can take any value in a finite or infinite interval. 
The best way to determine the relationship between two variables is a scatter plot.  Scatter plot
shows how one variable is influenced by another.  The relationship between two variables is called
their correlation.  Scatter charts often include large amounts of data.  If the line goes from a high
value on the y-axis to a high value on the x-axis, the variables are negatively correlated and vice
versa.
Nominal variables are variables that are measured at the nominal level, and have no inherent
ranking. To determine between nominal variables a contingency/ bivariate table is a very common
to use. They examine relationships within data that may not be readily apparent, and also groups
variables to understand the correlation between different variables while showing how they
change
from one variable grouping to another.
Numerical variables are a numerical or continuous variable that may take on any value within a
finite or infinite interval. The best way to determine the relationship between two numerical
variables is by a scatter plot. Scatter plots show how much one variable is affected by another. The
relationship between two variables is called their correlation. Scatter plots usually consist of a large
body of data. If the line goes from a high-value on the y-axis down to a high-value on the x-axis, the
Page 11 of 12
variables have a negative correlation and vice versa.

Page 12 of 12

You might also like