Professional Documents
Culture Documents
Getting Started With R: Sebastiano Manzan
Getting Started With R: Sebastiano Manzan
Sebastiano Manzan
1 / 71
Why R?
I If you can have an excellent free product why should you pay for an
excellent expensive product (e.g., Matlab, SAS)?
2 / 71
Outline of the course
3 / 71
Lets get started with R
4 / 71
Lets get started with Rstudio
Figure 2: Rstudio
5 / 71
How R works
6 / 71
Loading data in R
7 / 71
Base function read.csv()
I You can load a file from Rstudio via Tools -> Import Dataset and then
you are given the option From Text File or From Web URL
I Otherwise, you can type a few lines of code (table from Wikipedia):
I The commands head( ,n) and tail( ,n) show the first and last n
observations
8 / 71
Data types
I The str() command can be used to evaluate the object structure and the
data types:
str(splist)
I Each variable in the data frame splist has a type that can be:
I numeric: (or double) is used for decimal values
I integer: for integer values
I character: for strings of characters
I Date: for dates
I factor: represents a type of variable (either numeric, integer, or
character) that categorizes the values in a small (relative to the sample
size) set of categories (or levels)
9 / 71
I The read.csv() function has the annoying feature that any string is
interpreted as a factor
I This can be switched off by adding the argument stringsAsFactors = FALSE
I The ticker symbol, security name, and address are all correctly interpreted
as chr
I The date.first.added is also imported as a string, but we would like to
define it of type Date
10 / 71
I The code below is used to define the column/variable Date.first.added as
a date with command as.Date()
11 / 71
read_csv() from readr package
I In addition to the base read.csv() function, there are other packages that
provide functions to read data
I There are two problems with the read.csv() function:
I type guessing (in particular dates)
I reading speed
I The function read_csv() from package readr tries to solve both problems
(will talk about speed later):
library(readr)
splist <- read_csv("List_SP500.csv")
str(splist, max.level=1)
I Notice:
I class tbl (tibble) and tbl_df is a type of data frame specific to this
package
I Date defined as a Date which saves us a line of code
12 / 71
I The file GSPC.csv represents daily data for the S&P 500 Index from January
1985 downloaded from Yahoo Finance
I Below is a comparison of read.csv() and read_csv()
13 / 71
Saving data files
I This can be done with the base function write.csv() (see help(write.csv)
for the arguments)
I The index object is saved to a file called myfile.csv in the working directory
14 / 71
Plotting the data . . .
15 / 71
I The code below produces a time series plot of the S&P 500 Index:
I The column index$Date is defined of class Date and used as the x-axis
I The column index$GSPC.Adjusted is used as the y-axis
I The left plot uses the default settings, the plot on the right has been
customized
# LEFT PLOT
plot(index$Date, index$GSPC.Adjusted)
# RIGHT PLOT
plot(index$Date, index$GSPC.Adjusted, type="l", xlab="", ylab="S&P 500 Index", xaxt="n", yaxt="n")
ticks <- seq(index$Date[1], index$Date[nrow(index)], by="year")
axis(1, at=ticks, labels=ticks, cex.axis=0.9, col="orange", col.axis="blue")
axis(2, at=seq(0, 2000, 500), labels=seq(0,2000,500), col.ticks=3,cex.axis=0.75,col.axis="purple")
axis(4, at=seq(0, 2000, 500), labels=seq(0,2000,500), col.ticks=3, cex.axis=0.75,col.axis="purple")
2500
index$GSPC.Adjusted
2000
2000
S&P 500 Index
1500
1500
1500
1000
1000
500
500
500
1985 1990 1995 2000 2005 2010 2015 1985−01−02 1995−01−02 2005−01−02 2015−01−02
index$Date
16 / 71
Time series objects
I A variable that is observed over time is called a time series (e.g., stock
prices, real GDP, inflation)
I There are several packages that provide infrastructures to define an object
as a time series object
I I will mostly use the xts package (that is part of the quantmod package for
quantitative finance; other packages are ts and zoo)
I To define an object as a time series we use the command xts() that takes
two arguments:
I a data frame
I a vector of dates (of class Date)
library(xts)
index.xts <- xts(subset(index, select=-Date), order.by=index$Date)
17 / 71
I The xts package provides functions to extract information from a time
series object:
[1] "1985-01-02"
[1] "2017-09-01"
Daily periodicity from 1985-01-02 to 2017-09-01
I There are also functions to aggregate the observations from high frequency
(e.g., daily) to lower frequency (e.g., weekly/monthly/quarterly)
I By default the sub-sampling is performed by taking the first observation of
the interval (e.g., monday of each week, 1st of the month)
18 / 71
I Functions apply.weekly() and apply.monthly() are used when the goal is
to apply a function to each week/month in the sample
I In the examples below I apply these functions to subsample the first() and
last() day of the week (notice that the first() is equivalent to the
to.weekly() function) and to calculate the mean() of the week
19 / 71
Plotting time series data
Ad(GSPC)
2500
2000
1500
1000
500
20 / 71
Subsetting
I The xts package provides its own syntax to subset the time series object
I Below are some examples:
1600
1000 1400
1500
1200
600
1400
800
200
Jan 03 Apr 02 Jul 02 Oct 01 Dec 31 Jan 03 Jan 02 Jan 02 Dec 31 Jan 02 Jan 02 Jan 03 Jan 03 Jan 03
2007 2007 2007 2007 2007 2007 2008 2009 2009 1985 1990 1995 2000 2005
2500
1550
2000
1500
1450
1000
1350
500
Jan 03 Jan 02 Jan 03 Jan 02 Jan 02 Jan 03 Mar 21 Jun 01 Aug 01 Oct 01 Dec 03 Feb 12 Jan 03 Jan 02 Jan 08 Jan 08 Jan 07 Jan 07
2007 2009 2011 2013 2015 2017 2007 2007 2007 2007 2007 2008 1985 1992 1998 2004 2010 2016
21 / 71
getSymbols() from the quantmod package
I There are several packages that provide functions to download economic and
financial data by only specifying the ticker and time period (and frequency
in some functions)
I I will discuss only the getSymbols() function from package quantmod()
which will be used in this class
I Features of getSymbols():
I Sources: Yahoo Finance, Google Finance, OANDA (fx rates), FRED
(argument src)
I Download multiple tickers in one call
I Select the time period (with arguments from and to)
I By default the output is a xts object for each ticker specified
I Yahoo Finance: downloads open, high, low, close, volume, and
adjusted close at the daily frequency
I You can convert to weekly or monthly use to.weekly()/to.monthly()
or the apply.weekly()/apply.monthly()
22 / 71
getSymbols() with one ticker
I By default, the function creates a xts object with the name of the ticker
(except the ˆ part if you are downloading an index)
I When downloading only one ticker, setting auto.assign=FALSE allows you to
assign the output to an object that you name (in the example below data)
library(quantmod)
getSymbols("^GSPC", src = "yahoo", from = "1990-01-01")
[1] "GSPC"
tail(GSPC, 2)
23 / 71
Multiple tickers
library(quantmod)
getSymbols(c("^GSPC","^DJI"), src="yahoo", from="1990-01-01")
periodicity(GSPC)
periodicity(DJI)
24 / 71
I getSymbols() allows you also to specify an environment (argument env)
I Think of the environment as a folder in the R global environment where the
objects will be stored
I Steps:
I create a new environment with new.env() command (called myenv
below)
I call the getSymbols() function and set the env= argument to the new
environment you created
I the ls() command below lists the objects in the new environment
myenv
[1] "MMM" "ABT" "ABBV" "ACN" "ATVI" "AYI" "ADBE" "AMD" "AAP" "AES"
ls(myenv)
[1] "AAP" "ABBV" "ABT" "ACN" "ADBE" "AES" "AMD" "ATVI" "AYI" "MMM"
25 / 71
quantmod functionalities
I The object created contains all the information, but for our analysis we
might need only some of the columns
I The package provides functions to extract the open price Op(), the closing
price Cl(), the highest intra-day price Hi(), the lowest Lo(), the volume
Vo(), and the adjusted closing price Ad()
I OpCl() calculates the open-to-close daily return, ClCl() for the close-to-close
return, and LoHi() for the low-to-high difference (also called the intra-day
range)
GSPC.Adjusted DJI.Adjusted
1990-01-02 359.69 2810.1
1990-01-03 358.76 2809.7
1990-01-04 355.67 2796.1
26 / 71
Oanda
I Daily exchange rates for a wide range of currency pairs
I Limit of 2000 days for request
I command oanda.currencies gives you the symbols for 191 currencies
par(mfrow=c(1,2))
plot(USDEUR)
plot(USDJPY)
USDEUR USDJPY
114
0.85
112
0.83
110
108
0.81
Jul 30 Sep 04 Oct 09 Nov 13 Dec 18 Jan 22 Jul 30 Sep 04 Oct 09 Nov 13 Dec 18 Jan 22
2017 2017 2017 2017 2017 2018 2017 2017 2017 2017 2017 2018
27 / 71
FRED
library(quantmod)
macrodata <- getSymbols(c(’UNRATE’,’CPIAUCSL’,’GDPC1’), src="FRED")
macrodata <- merge(UNRATE, CPIAUCSL, GDPC1)
par(mfrow=c(1,3))
plot(UNRATE); plot(CPIAUCSL); plot(GDPC1)
15000
200
8
150
10000
6
100
5000
4
50
Jan Jan Jan Jan Jan Dec Jan Jan Jan Jan Jan Dec Jan Jan Jan Jan Jan Jul
1948 1960 1975 1990 2005 2017 1947 1960 1975 1990 2005 2017 1947 1960 1975 1990 2005 2017
28 / 71
I Exchange rates are also available in FRED (no restriction on the time
period)
DEXUSEU DEXJPUS
1.6
350
300
1.4
250
1.2
200
150
1.0
100
0.8
Jan 04 Jan 01 Jan 01 Jan 03 Jan 01 Jan 04 Jan 01 Jan 01 Jan 03 Jan 01
1999 2003 2007 2011 2015 1971 1980 1990 2000 2010
29 / 71
Quandl
library(Quandl)
macrodata <- Quandl(c("FRED/UNRATE", ’FRED/CPIAUCSL’, "FRED/GDPC1"),
start_date="1950-01-02", type="xts")
head(macrodata)
30 / 71
I Quandl has many more datasets, e.g. commodity spot and futures prices
2500
120
1500
2.5
100
2000
80
1500
1000
2.0
60
1000
1.5
500
40
500
20
1.0
Jan Feb Feb Feb Feb May 01 Nov 01 May 01 Oct 31 Apr 21 Jan 02 Jan 02 Jan 03 Dec 31 Jan 04 Jan 04 Jan 03
1982 1990 1998 2006 2014 2007 2010 2014 2017 1982 1992 2002 2012 1974 1988 2000 2012
31 / 71
Reading large files
I The physical limit to the file size that can imported is determined by the
RAM of your machine (2, 4, 6GB)
I Reading large files can be time consuming when using the base functions
32 / 71
I The benchmark for the comparison is obtained from the Center for Research
in Security Prices (CRSP) at the University of Chicago. The variables in the
dataset are:
I PERMNO: identification number for each company
I date: date in format 2015/12/31
I EXCHCD: exchange code
I TICKER: company ticker
I COMNAM: company name
I CUSIP: another identification number for the security
I DLRET: delisting return
I PRC: price
I RET: return
I SHROUT: share oustanding
I ALTPRC: alternative price
I The observations are all companies listed in the NYSE, NASDAQ, and
AMEX from January 1985 until December 2016 at the monthly frequency
for a total of 3,627,236 observations and 16 variable. The size of the file is
328Mb
33 / 71
read_csv() from readr package
I The command Sys.time() reads the current time that is assigned to
start.time
I The time to perform the operation is calculated as the difference between
the Sys.time() and the start.time
library(readr)
start.time <- Sys.time()
crsp <- read_csv("crsp_eco4051_jan2017.csv")
end_csv <- Sys.time() - start.time
I fread is 12 times faster than read.csv() and 1.8 times faster than
read_csv()
35 / 71
Create returns
36 / 71
I If we want to transform all the columns of a data frame or xts object we can
simply do the operation on the object rather than the variable
# "^GSPC" = S&P 500 Index, "^N225" = Nikkei 225, "^STOXX50E" = EURO STOXX 50
data <- getSymbols(c("^GSPC", "^N225", "^STOXX50E"), from="2000-01-01")
price <- merge(Ad(GSPC), Ad(N225), Ad(STOXX50E))
ret <- 100 * diff(log(price))
tail(ret, 5)
37 / 71
Elegant graphics: ggplot2 package
I The base plotting functions are easy to use and convenient for quick plotting
I However, they lack elegance and it is difficult to produce high-quality
graphics
I The package ggplot2 offers an alternative set of function to make graphs
38 / 71
I ggplot2 does not recognize the time series properties of xts objects and we
have to specify the x axis
I The time(GSPC) command is used to extract the date associated with each
observation
2000
GSPC$Close
1000
39 / 71
I ggplot2 interacts best with data frames
I We can extract a data frame from the xts object by:
I creating a new variable that represents the Date
I use coredata() to extract the data from the xts object
2000
Close
1000
40 / 71
I We can produce the same graph using the ggplot2 grammar as in the
example below
I Notice that we can assign a ggplot to an object (called myplot) that can be
used later and altered by only changing specific aspects
2000
Close
1000
41 / 71
I The plots can be customized with themes, line colors and types, label names
etc
I The par(mfrow=c(2,2)) does not work with ggplot and we should use the
grid.arrange() function from package gridExtra
plot1 <-
ggplot(GSPC.df, aes(Date, Adjusted)) + geom_line(color="darkgreen")
plot2 <-
plot1 + theme_bw()
plot3 <-
plot2 + theme_classic() + labs(x="", y="Index", title="S&P 500")
plot4 <-
plot3 + geom_line(color="darkorange") + geom_smooth(method="lm") +
theme_dark() + labs(subtitle="Period: 1985/2016", caption="Source: Yahoo")
library(gridExtra)
grid.arrange(plot1, plot2, plot3, plot4, ncol=2)
2000 2000
Adjusted
Adjusted
1000 1000
2000 2000
Index
Index
1000
1000
0
1990 2000 2010
1990 2000 2010
Source: Yahoo
42 / 71
I A scatter plot of two variables can be easily produced in ggplot2
10 10
0 0
SP500
SPret
−10 −10
−20 −20
−20 −10 0 10 20 −20 −10 0 10 20
NIKret NIKKEI
43 / 71
I The strength of ggplot2 is to make easier to produce sophisticated graphics
I For example: if we want to have the dots in the scatter plot depend on a
variable (e.g., Year) this can be done easily by adding the argument
color=Year in the aesthetics
1990 2005
10 10 1991 2006
1992 2007
1993 2008
0 0 1995 2010
2010 1996 2011
1997 2012
2000
1998 2013
−10 −10 1999 2014
1990
2000 2015
2001 2016
2002 2017
−20 −20 2003 2018
−20 −10 0 10 20 −20 −10 0 10 20
2004
44 / 71
Boxplot
10
0
SPret
−10
−20
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
factor(Year)
45 / 71
Summary statistics
I The function summary() provides a few summary statistics of the
distribution of a data object
I Of course, the variable needs to be of type numerical
summary(GSPC$ret.simple)
Index ret.simple
Min. :1990-01-02 Min. :-9.0350
1st Qu.:1996-12-26 1st Qu.:-0.4399
Median :2004-01-07 Median : 0.0530
Mean :2004-01-07 Mean : 0.0353
3rd Qu.:2011-01-13 3rd Qu.: 0.5541
Max. :2018-01-24 Max. :11.5800
NA's :1
summary(GSPC$ret.log)
Index ret.log
Min. :1990-01-02 Min. :-9.4695
1st Qu.:1996-12-26 1st Qu.:-0.4409
Median :2004-01-07 Median : 0.0530
Mean :2004-01-07 Mean : 0.0292
3rd Qu.:2011-01-13 3rd Qu.: 0.5525
Max. :2018-01-24 Max. :10.9572
NA's :1
46 / 71
I Package fBasics provides the basicStats() function with a more
comprehensive set of descritive statistics compared to summary()
fBasics::basicStats(GSPC$ret.log)
ret.log
nobs 7072.000000
NAs 1.000000
Minimum -9.469512
Maximum 10.957197
1. Quartile -0.440919
3. Quartile 0.552538
Mean 0.029210
Median 0.052970
Sum 206.545022
SE Mean 0.013172
LCL Mean 0.003390
UCL Mean 0.055031
Variance 1.226761
Stdev 1.107592
Skewness -0.252791
Kurtosis 9.004323
47 / 71
Covariance and correlation between two or more assets
I When we have several variables or assets the first question that arises in the
analysis is whether they co-move
I Dependence is measured using the covariance and the correlation
SPret NIKret
SPret 16.960 13.446
NIKret 13.446 38.731
cor(Ret, use=’complete.obs’)
SPret NIKret
SPret 1.00000 0.52463
NIKret 0.52463 1.00000
48 / 71
Plotting the data distribution
I A very useful tool to explore the distribution of the data is the histogram
that represents an estimator of the underlying (population) of the data. It is
useful to assess (visually) the characteristics of the data, such as normality,
fat tails, asymmetry
1500
1500
1000
count
Frequency
1000
500
500
0
0
−10 −5 0 5 10
−10 −5 0 5 10 ret.log
49 / 71
I We can overlap a non-parametric estimate to the histogram which represents
a smooth line that goes through the histogram bars
# base function
hist(GSPC$ret.log, breaks=50, main="", xlab="Return", ylab="",prob=TRUE)
lines(density(GSPC$ret.log,na.rm=TRUE),col=2,lwd=2)
box()
# ggplot function
ggplot(GSPC, aes(ret.log)) +
geom_histogram(aes(y = ..density..), bins=50, color="black", fill="white") +
geom_density(color="red", size=1.2) +
theme_bw()
0.6
0.5
0.4
0.4
0.3
density
0.2
0.2
0.1
0.0
0.0
−10 −5 0 5 10
−10 −5 0 5 10
Return ret.log
50 / 71
I Or compare the histogram to a distribution (e.g., normal)
# base function
hist(GSPC$ret.log, breaks=50, main="", xlab="Return", ylab="",prob=TRUE)
curve(dnorm(x, mean(GSPC$ret.log, na.rm=T), sd(GSPC$ret.log, na.rm=T)),
from=-10, to=10, add=TRUE, col="red",lwd=2)
box()
# ggplot function
ggplot(GSPC, aes(ret.log)) +
geom_histogram(aes(y = ..density..), bins=50, color="black", fill="white") +
stat_function(fun = dnorm, colour = "red",
args = list(mean(GSPC$ret.log, na.rm=T), sd(GSPC$ret.log, na.rm=T)), size=1.2) +
theme_bw()
0.6
0.5
0.4
0.4
0.3
density
0.2
0.2
0.1
0.0
0.0
−10 −5 0 5 10
−10 −5 0 5 10
Return ret.log
51 / 71
Dates and times in R
[1] "2011-07-17"
[1] "2011-07-17"
[1] "2011-07-17"
[1] "2011-07-17"
[1] "2011-07-17"
52 / 71
I One operation we might want to do with dates is to calculate the difference
between two dates
I This can be done by subtracting two dates or using the difftime() function
that allows also to specify the unit of time
53 / 71
Time
I In addition to the date, we might need to specify the time of the day
I This is useful when dealing with intra-day data, such as the FX data that
we discussed earlier and shown below
54 / 71
I To work with time we can use two functions:
I strptime()
I as.POSIXlt()
I Both require to specify the format of the date and time part
I The format of the time is: %H hour (out of 24), %M minute, %S seconds, and
%OS fractional seconds
55 / 71
lubridate package
library(lubridate)
ymd("20110717")
ymd("2011/07/17")
ymd_hm("20110717 01:00")
ydm_hms("20111707 00:00:00.041")
[1] "2011-07-17"
[1] "2011-07-17"
[1] "2011-07-17 01:00:00 UTC"
[1] "2011-07-17 00:00:00 UTC"
56 / 71
I The package provides functions that makes it easy to extract the year,
month, day, day of the week/month/year, minute, second etc
[1] 2011
[1] 7
[1] 17
[1] Sunday
Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < Friday < Saturday
[1] 0
[1] 0.041
57 / 71
dplyr package
58 / 71
mutate and select
library(dplyr)
library(lubridate)
Date Open High Low Close Volume Adjusted range ret.c2c year month wday
8334 2018-01-23 2835.1 2842.2 2830.6 2839.1 3519650000 2839.1 0.41073 0.217200 2018 1 Tuesday
8335 2018-01-24 2845.4 2853.0 2824.8 2837.5 4014070000 2837.5 0.99195 -0.056013 2018 1 Wednesday
59 / 71
filter()
I the filter() function is used to select specific rows of the data frame
Date Open High Low Close Volume Adjusted range ret.c2c year month wday
1 1985-01-08 164.24 164.59 163.91 163.99 92110000 163.99 0.41400 -0.152332 1985 1 Tuesday
2 1985-01-15 170.51 171.82 170.40 170.81 155300000 170.81 0.82988 0.175790 1985 1 Tuesday
3 1985-01-22 175.23 176.63 175.14 175.48 174800000 175.48 0.84715 0.142568 1985 1 Tuesday
4 1985-01-29 177.40 179.19 176.58 179.18 115700000 179.18 1.46727 0.998381 1985 1 Tuesday
5 1985-02-05 180.35 181.53 180.07 180.61 143900000 180.61 0.80753 0.144058 1985 2 Tuesday
6 1985-02-12 180.51 180.75 179.45 180.56 111100000 180.56 0.72182 0.027697 1985 2 Tuesday
7 1985-02-19 181.60 181.61 180.95 181.33 90400000 181.33 0.36408 -0.148791 1985 2 Tuesday
60 / 71
group_by()/summarize()
# A tibble: 5 x 4
wday AV.RET MIN.RET MAX.RET
<ord> <dbl> <dbl> <dbl>
1 Monday 0.010565 -22.8997 10.9572
2 Tuesday 0.066759 -5.9108 10.2457
3 Wednesday 0.054635 -9.4695 8.7089
4 Thursday 0.016346 -7.9224 6.6923
5 Friday 0.019671 -7.0082 6.1328
61 / 71
I The grouping can also be done on two variables, for example month and year
# A tibble: 397 x 5
# Groups: month [?]
month year AV.RET MIN.RET MAX.RET
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1985 0.393875 -0.54228 2.2566
2 1 1986 0.010744 -2.76472 1.4843
3 1 1987 0.589429 -1.40073 2.3024
4 1 1988 0.198181 -7.00824 3.5231
5 1 1989 0.327143 -0.87157 1.4854
6 1 1990 -0.324090 -2.61989 1.8710
7 1 1991 0.184905 -1.74726 3.6642
8 1 1992 -0.091477 -1.11960 1.4615
9 1 1993 0.035106 -0.87605 0.8903
10 1 1994 0.152304 -0.58097 1.1363
# ... with 387 more rows
62 / 71
Does volatility of the S&P 500 vary over time?
63 / 71
Creating functions in R
2. The task is very complex and you prefer to break it down in smaller
tasks that make the code easier to read, interpret, and test.
3. Once you write a function, you can use it again in future analysis
64 / 71
I A function is a set of operations applied to some data
return(output)
}
65 / 71
A function to calculate the sample average
mean(GSPC$ret.log, na.rm=T)
[1] 0.02921
66 / 71
I Below is the code that defines a new function called mymean()
mean(GSPC$ret.log, na.rm=T)
[1] 0.02921
mymean(GSPC$ret.log)
[1] 0.02921
67 / 71
Loops in R
I Loops are a useful tool when you want to perform the same set of operations
on several time series or datasets
I A common loop is the for loop which has the following syntax:
for (i in 1:N)
{
# write your commands here
}
for (i in 1:3)
{
print(i)
}
[1] 1
[1] 2
[1] 3
68 / 71
mysum() function using a for loop
for (i in 1:N)
{
sumY = sumY + as.numeric(Y[i]) # current sum is equal to previous sum
} # plus the i-th value of Y
return(sumY) # as.numeric(): makes sure to transform
} # from other classes to a number
mysum(GSPC$ret.log)
[1] 206.55
sum(GSPC$ret.log, na.rm=T)
[1] 206.55
69 / 71
A simulation exercise
I Simulations are used to evaluate some quantities (e.g., the price of an option
or an estimator) based on large number of samples generated from a certain
distribution
I The recipe works as follows:
1. generate random values from a model
2. calculate the quantity of interest
3. repeat 1 and 2 many times
70 / 71
S = 5000 # set the number of simulations
N = 1000 # set the length of the sample
mu = 0 # population mean
sigma = 2 # population standard deviation
4
density
71 / 71