Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

BE368: Finance Research Techniques Using

Matlab
Lecture 3: Importing and Analysing Financial
Data

Mark Hallam

University of Essex

27 Oct 2022

Mark Hallam University of Essex


BE368: Lecture 3
Outline for Today’s Lecture

Importing and Analysing Financial Data


I Importing data: preparing your source data, importing data
into Matlab
I Table objects: creating and indexing table objects
I Timetable objects: date numbers and strings, creating and
indexing timetables,
I Basic numerical analysis of data: calculating summary
statistics, moments, min/max, autocorrelation functions, etc
I Graphical analysis of data: standard plots, labelling figures
and axes, plotting multiple series

Mark Hallam University of Essex


BE368: Lecture 3
Importing Data
I In this module you will usually be provided with the data you
need for any questions in Matlab format, but it is important
to know how you would import it yourself
I Matlab can read and import data from many common file
formats (Excel, CSV, etc.)
I Excel files usually import easily, but it’s best to make sure you
don’t have any complex formatting, formulas, etc.
I Just prepare a simple Excel file containing only your data
values, variable names and dates
I Note: in your Excel file the data need to be formatted in
column/vertical form with each column containing the data
for one variable

Mark Hallam University of Essex


BE368: Lecture 3
Importing Data
I To start importing data, click on the ‘Import Data’ button
found in the ‘Home’ tab of Matlab’s menu bar:

I Note: small details of the interface may differ a little between


Windows and Mac computers, or different versions of Matlab
Mark Hallam University of Essex
BE368: Lecture 3
I After clicking on the ‘Import Data’ button and selecting the
file you want to open, Matlab opens the data import window:

Mark Hallam University of Essex


BE368: Lecture 3
Importing Data
I Here you can choose which rows/columns of the data you
want to import into Matlab by resizing the blue shaded box in
the spreadsheet area like you would do to select data in Excel
I If needed you can change the row where variables names are

I Under ‘Output Type’ you can choose what type of object you
want Matlab to save the data into - the default choice is
usually a table object, which we will discuss shortly
I Alternatively, you can import the values for all data series into
a single matrix variable (select ‘Numeric Matrix’) or each
variable into a separate vector variable (‘Column vectors’)
I There are also ‘String Array’ and ‘Cell Array’ options, which
are useful for saving variable names or other text data
Mark Hallam University of Essex
BE368: Lecture 3
Importing Data
I There are also options for choosing what Matlab should do
about any missing or unreadable values present in the
imported data (default is to replace them with NaNs)

I Once you have finished changing the import options, click on


the ‘Import Selection’ button and your variables should have
appeared in your workspace
I The variable names assigned by Matlab will be based on the
filename of the imported file and/or the variable names
contained in that file, depending on the object type chosen
I You can also import into more than one different object type
(e.g. table and numeric matrix versions of the same data) by
using the import window multiple times

Mark Hallam University of Essex


BE368: Lecture 3
Importing Data
I If you have variable names in one row that you want to import
as a separate object, then you can select them, choose ‘String
Array’ or ‘Cell Array’ and then click ‘Import Selection’
I This will create a string array or cell array containing those
variable names, that you can then use for creating other
objects, labelling graphs, etc.
I However, if you import the complete dataset into a table, you
can always extract the variable names from there later
I Newer versions of Matlab normally correctly recognise dates in
Excel files and convert them appropriately to Matlab format
dates (discussed later), but this does depend on your dates in
Excel being formatted in a standard way

Mark Hallam University of Essex


BE368: Lecture 3
Table Objects

I For some time the default import object type in Matlab has
been a table object, which stores data for multiple
series/variables in a single object
I Data for each variable is stored column-wise and each
column/variable has a name stored within the table/timetable
object that can be used for indexing
I They can optionally also contain extra information, such as
descriptions of the series and row names, but we will not use
those options here
I Each column can contain different data types, e.g. numerical
values in the first two columns and text values in the third
column

Mark Hallam University of Essex


BE368: Lecture 3
Table Objects
I Opening a table object in the Matlab Variables window will
show the series with variable names at the top:

Mark Hallam University of Essex


BE368: Lecture 3
Structure Objects
I Before we talk about indexing table objects, we first introduce
structure objects, since these are indexed in a similar way
I Like cell arrays, structures can also be used to contain
objects/information of different types within a single object
e.g. string arrays together with numerical values
I The main difference is that each ‘field’ within a structure
object has both a value and a name - opening a structure
object will show you all of the field names and their values
I This makes them useful for tasks like saving all the options,
specifications, parameter values, etc for a model you
estimated into a single object
I Indeed, Matlab uses structures (or objects that behave exactly
like structures) for exactly this purpose in many of the finance
and econometrics functions (see e.g. fitlm in lecture 4)
Mark Hallam University of Essex
BE368: Lecture 3
Structure Objects

I An empty structure can be created using the struct function


e.g. struct1 = struct()
I Fields and their values are added using the period or full stop
e.g. struct1.P = 3 sets the value stored in the field named
‘P’ to 3, or struct1.estimationmethod = ’MLE’ sets the
value in field ‘estimationmethod’ to ‘MLE’
I If these fields already exist in the structure, the value stored
there will be replaced, or if not the fields will be created
I Indexing is also done using the period/full stop and the field
name, e.g. struct1.P to index the value for field ‘P’ or
strcut.P = 5 to overwrite the field value

Mark Hallam University of Essex


BE368: Lecture 3
Indexing Table Objects
I There are several ways to index tables, depending on which
part of the table contents you need and what format you want
the output in (vector/matrix or table)
I If you want to directly index the data contained in the table it
is generally simplest to use ‘dot’ notation that is similar to
structure objects
I To index all the data for a single variable in the table, we
simply use [table name].[variable name]
I Using the previous table as an example, table1.SP500 will
give all the data for the variable named ‘SP500’ as an output
in the form of a a numeric (vector) object
I If we want to index only specific rows, we indicate the rows in
parentheses at the end of the indexing command e.g.
table1.SP500(1:100) for the first 100 observations
Mark Hallam University of Essex
BE368: Lecture 3
Indexing Table Objects
I The ‘dot’ indexing for tables also has some additional options
I To index data for all variables in the table, we can use [table
name].Variables e.g. table1.Variables will give all the
data for all variables as a numeric matrix (again, specific rows
can be indexed using parentheses)
I Note: This is only possible if all table columns contain
numerical data - if for example one column contains datetime
values, Matlab will give an error because all columns in a
numerical matrix must be the same data type

I We can also index the other properties of the table using


[table name].Properties.[Property Name] - the most
useful example is variable names, which are indexed as
[table name].Properties.VariableNames
Mark Hallam University of Essex
BE368: Lecture 3
Indexing Table Objects
I Tables can also be indexed directly by rows and columns in
parentheses without the dot, but this gives output as a table
object instead of a numerical vector/matrix variable as before
I For example, we can use table1(:,’SP500’) and
table1(1:100,’SP500’) extract the same contents of the
table as the earlier examples, but as table objects
I We can also use column/variable numbers instead of names -
SP500 is variable 7, so table1(:,7) is the same as
table1(:,’SP500’)
I You can also specify multiple variables/columns using the
same indexing notation we saw last week for numeric matrix
objects e.g. table1(:,2:5) for variables 2 to 5

Mark Hallam University of Essex


BE368: Lecture 3
Converting Table Objects
I Complete tables, or parts of tables you have indexed using
parentheses, can be converted to standard numerical
matrices/arrays using the ‘table2array’ function
I This is useful because most Matlab functions will not directly
accept a table object as an input, so you need to first convert
the values to a standard array
I Examples would be table2array(table1) for the whole of
‘table1’, or table2array(table1(100:end, 5:7)) for rows
100 onwards for variables 5 to 7
I Note: table2array works only for purely numerical variables
in a table, not non-numerical variables like strings

I We can also use the ‘table2timetable’ function to convert


table objects to timetable objects, which we will discuss now
Mark Hallam University of Essex
BE368: Lecture 3
Timetable Objects
I Timetable objects behave very similarly to table objects, but
are designed specifically for storing time series data
I The main difference is that they must contain date (and
optionally time of day) information for each row/observation
I This date information is very useful when organising or
working with datasets, because it allows us to index
rows/observations by their dates instead of numbers
I For example, with simple commands we can index the value(s)
for 31 August 2020, or for all of September 1990, or from 1
January 2015 to 31 December 2016
I Timetables also give us other useful features for organising
and editing data not possible with other objects

Mark Hallam University of Essex


BE368: Lecture 3
Date and Time Information in Matlab

I First we need to discuss the formats that Matlab can use to


store date information - in recent versions of Matlab the
simplest option is to use datetime arrays
I Datetime arrays can either be created automatically when
importing appropriately formatted date values from another
file (e.g. Excel), or manually using the ‘datetime’ function
I To save time we will only discuss the first of these methods,
but see the Matlab help page for the ‘datetime’ function for
lots of information on creating them manually

Mark Hallam University of Essex


BE368: Lecture 3
Datetime Arrays
I When importing data containing dates formatted in a
standard way, Matlab will usually automatically detect them
as dates and set their type to ‘Datetime’ instead of the
standard ‘Number’ format in the Import box:

Mark Hallam University of Essex


BE368: Lecture 3
Datetime Arrays
I As we saw before, importing the data into Matlab as a table
object gives datetime values in the first column and numerical
values in the remaining columns:

I Dates are displayed as text (i.e. strings) in the form


DD-MMM-YYYY, often referred to as ‘date string’ format

Mark Hallam University of Essex


BE368: Lecture 3
Creating Timetable Objects from Tables

I Note that this is still currently a table object, even though it


contains dates formatted as datetime values and data
I We can directly convert a table to a timetable using the
‘table2timetable’ function e.g. table2timetable(T)
converts table ‘T’ to a timetable using the first datetime
variable in ‘T’ as the dates and the other variables as the data

I To save time we focus only on the ‘table2timetable’ method


above, but timetables can also be created manually using the
‘timetable’ function by providing all the individual components
separately (datetimes, variable values, variable names)

Mark Hallam University of Essex


BE368: Lecture 3
Indexing Timetable Objects
I Everything we discussed previously about indexing tables also
works for timetables (‘dot’ indexing, indexing with
parentheses, etc), but we now have additional indexing
methods for selecting observations by date

Indexing a single specific date:


I To index the observation(s) for a specific date we can directly
substitute a single row index number in the previous table
indexing with the date string for the date we want
I For example, using parentheses indexing to index all
observations on 1 October 2020 in the timetable ‘tt1’ we use
tt1(’01-Oct-2020’,:), or using dot indexing for only the
variable named ‘SP500’ tt1.SP500(’01-Oct-2020’)
Mark Hallam University of Essex
BE368: Lecture 3
Indexing multiple dates:
I To index the observations for more than one date we can give
a cell array containing multiple date strings as row indexes
I For all observations on 1 October 2020 and 8 October 2020
then we can use (note the braces to create the cell array):
tt1({’01-Oct-2020’,’08-Oct-2020’},:)
I A more general way to do this is to index the timetable using
a vector of datetime values
I We first create a datetime vector containing the dates we
want using the ‘datetime’ function: dt1 =
datetime({’01-Oct-2020’, ’08-Oct-2020’})
I We then use the datetime vector ‘dt1’ we created to index the
rows of the timetable: tt1(dt1,:)

Mark Hallam University of Essex


BE368: Lecture 3
Indexing a range of dates:
I Finally, to index all the observations in a specific range of
dates we can again use datetime values - for example, suppose
we want to index all observations in January 2020
I We create datetime values for the start and end of the period
we want: dt start = datetime(’01-Jan-2020’) and
dt end = datetime(’31-Jan-2020’)
I We then use ‘dt start’ and ‘dt end’ to index the range of rows:
tt1(dt start:dt end,:) using the usual ‘colon’ notation

Mark Hallam University of Essex


BE368: Lecture 3
Converting Timetables to Numerical Matrix Variables
I The main advantage of timetable objects is that they make it
much simpler to organise and index a dataset than standard
numerical matrix objects due to the variable names and dates
I There are also many useful timetable-specific functions to
perform tasks like changing sampling frequency (e.g. convert
from daily to monthly), fill or remove missing values, etc.
I However, many Matlab functions will not directly accept a
timetable object as an input and require a numerical array
I Therefore, a good general strategy is to work with timetable
objects initially when importing and organising your data
I Then we convert the timetable (or part of it) into a numerical
matrix before performing more complex calculations - the
‘table2array’ function also works for timetables
Mark Hallam University of Essex
BE368: Lecture 3
Basic Data Analysis

I Matlab has functions for all standard descriptive statistics and


common exploratory data analysis tools
I For example, the functions ‘mean’, ‘median’, ‘mode’, ‘std’ (for
standard deviation), ‘var’ (for variance), ‘skewness’, ‘kurtosis’,
‘max’, ‘min’ and ‘range’ all do what you would expect when
provided with a numerical matrix as an input
I Note that if the input is a matrix, by default Matlab will
compute the specified statistic for each column of the matrix
I So if ‘X’ is a (T × K ) matrix of data with T observations for
K series then mean(X) will give a (1 × K ) vector of sample
mean values for the K variables in ‘X’

Mark Hallam University of Essex


BE368: Lecture 3
Basic Data Analysis
I If for some reason you wish to calculate the mean, variance,
etc. along each row instead, then you can do this too
I Either the second or third input for the function (see the
relevant function help page) will be ‘dim’ - setting this to 2
will compute the values along each row e.g. mean(X,2)
I Alternatively, you could transpose the input matrix e.g.
mean(X’) if you prefer
I To measure the strength of linear dependence between each
column of the matrix we can compute the sample covariance
or correlation matrix using the ‘cov’ and ‘corr’ functions
I The ‘autocorr’ function will calculate the sample
autocorrelation coefficients of a single series in a vector

Mark Hallam University of Essex


BE368: Lecture 3
Basic Graphical Analysis
I Matlab’s tools for creating graphs and other figures are very
powerful and flexible, but slightly more difficult to learn than
the tools in some other programs

I To create a new empty figure object we use the command


figure - this will bring the new empty figure to the top of
the Matlab windows
I To actually plot one or more series in this empty figure as line
plots we use the plot command
I If ‘X’ is a matrix of data, then plot(X) will produce a line
plot, with each column of ‘X’ as a separate line
I Other plot types have their own commands e.g. ‘scatter’ for a
scatter plot, ‘histogram’ for a histogram, etc. (search ‘2-D
and 3-D Plots’ in Matlab help for a full list)
Mark Hallam University of Essex
BE368: Lecture 3
Plotting Matrix Variables
I The majority of these plotting commands will only produce
the contents of the plot area itself i.e. the line plots, scatter
plot, etc.
I We need to manually add axis labels, titles, legends, etc. - we
will look at some examples of this in the lab class and the
later lectures
I More generally, Matlab provides a huge number of options for
adjusting the appearance of figures e.g. line and marker
colours and styles, font types and sizes, etc.
I Some of these can be difficult to use initially, but give
complete control over the appearance of figures that is not
possible with other software

Mark Hallam University of Essex


BE368: Lecture 3

You might also like