Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

VECTORS

rand/randn/randi
The uniformly distributed random integers generated by randi lie
between one and the specified maximum value by default.
You can specify the interval over which the random integers are
distributed using
x = randi([minVal maxVal],m,n)
This code will generate random integers between minVal and maxVal.
It also works in this way: x = randi(maxvalue,m,n)

The uniformly distributed random numbers generated by rand lie


between zero and one by default.
The rand function doesn't allow you to specify the interval over
which the numbers are distributed.
If you want uniformly distributed numbers between values a and b,
you can compute the following:
x = a + (b-a)*rand(m,n)

Similarly, normally distributed random numbers generated by the


randn function will have a mean of zero and a standard deviation of
one by default.
If you want normally distributed numbers with mean M and standard
deviation sd, you can compute the following:
x = M + sd*randn(m,n)
reshape
You can use the reshape function to reshape a m-by-n matrix intro a
p-by-q matrix (data is extracted column-wise)
B = reshape (A, p, q)
For convenience you can leave one of the dimensions blank using []
so that dimension is automatically calculated:
B = reshape (A,[],q)
You can reshape any matrix using the column operator
b = A(:) reshapes all data in A in a one-column vector
To extract the data row-wise rather than column-wise use he
transpose operator ‘ :
reshape(A',1,[])
This code extracts all values in A row-wise in a one row vector.
ELEMENT EXTRACTION FROM A MATRIX
To extract elements from a matrix:
Msmall = M(2:3,[2 4])

It extracts the elements in row 2 and 3 and column 2 and 4.


If I want to extract the corner elements:
Mcorners = M([1 end],[1 end])

MATRIX MULTIPLICATION
A.*B gives the same result of A*B if the number of columns of A is
the same number of rows in B.
We can use .* (or ./) when we want to operate between an element in
A and the correspondent in B (element-wise multiplication). A and B
must have compatible sizes(Two inputs have compatible sizes if, for
every dimension, the dimension sizes of the inputs are either the
same or one of them is 1). MATLAB implicitly expands arrays with
compatible sizes to be the same size during the execution of the
element-wise operation or function .
isnan/max/min/mean/nnz/sort/diff/cumsum
You can identify and count missing values in a vector. To identify
missing values in a vector, use the isnan function.
isnan(v)
The output is a logical vector where missing values are marked with
a 1 for true.
You can count the number of missing values with the nnz (number non-
zero) function.
nnz(vLogical)

You can ignore missing values in the mean calculation by using the
"omitnan" flag.
mean(v,"omitnan")

You can use the max function to calculate the largest element of a
vector and the index where it occurs by requesting two outputs.
[m,idx] = max(v);

You can sort data from largest to smallest using "descend" option
for the sort function.
[xSorted,idx] = sort(v,"descend")
The sort function orders arrays from smallest to largest by default.
You can achieve the same result by passing "ascend" as the second
input to sort.
If there are missing values:
You can specify whether you want the missing values at the top or
bottom with the MissingPlacement option.

[desc,idx] = sort(usageDiff,"descend",...
"MissingPlacement","last")

The min and max functions also calculate the minimum and maximum of
the columns of a matrix. They omit missing values by default, so
you only need to provide the matrix.

Many statistical functions accept an optional dimensional argument


that specifies whether the operation should be applied to the
columns independently (the default) or to the rows.
To get the mean of each row, specify 2 as the second input.
mean(M,2)
mean(M,2,"omitnan")

Some statistical functions are comparison functions, like min and


max. You can specify the dimension as the third input. Skip the
second input with square brackets, [].
min(A,[],dim)

You can also get the mean, minimum, and maximum, of all of the
elements in array by specifying the dimension as "all".
minUse = min(usage,[],'all')
maxUse = max(usage,[],'all')
avgUse = mean(usage,'all','omitnan')

To calculate the difference between elements of a vector, use the


diff function.
diff(v)
The diff function also works on matrices column-wise, meaning it
calculates the difference vector of each column. The output is a
matrix with one fewer row.

For the max and min functions, you can request a second output that
gives you the index for the maximum of each column

The cumsum function calculates the sum of an element with all the
previous elements of a vector. If the input is a matrix, the sums
are calculated by column, and the output is a matrix of the same
size.
cumsum(M,”omitnan”)
If you want to operate by row:
d=diff(M,[],2)
bar(sum(d,’omitnan’))

PLOT
stairs/stem/area/scatter/bar/plot/surf/mesh/contour/histogram
Stairs plot:
stairs(xdata,ydata)

Stem plot:
stem(xdata,ydata)
Area plot:
area(xdata,ydata)

Scatter plot: scatter(Data1,Data2)


To change the marker size pass the third element:
scatter(Data1,Data2, 30) where we set a marker size of 30
The color can be added as 4th element:
scatter(Data1,Data2, 30, ‘r’)

The scatter documentation provides other optional inputs to


customize the plot. For example, this command creates the scatter
plot with filled markers.
scatter(Australia,Canada,30,Year,"filled")
colorbar
Colorbar adds a collorbar explaining how color correspond to xdata.
(Year is the range color data, so it has to be a numeric array)
You can use the text function to add text to a plot:
text(x,y,txt)
where x and y are numeric arrays and txt is string array (the same
length as x and y). As with all visualization functions, you can set
visual properties such as color with extra input pairs
text(x(k),y(k),txt,"Color",’c’)

Bars in a bar graph often represent quantities of something best


described as text.
You can use a string array as input to the xticklabels function to
modify the x-axis tick labels.
xticklabels(["Label 1" "Label 2" ...]).
You can set the x-axis tick values with the xticks function.
xticks([1 12])
xticklabels(["First Month" "Last Month"])

You can change the angle of the text using the xtickangle function.
xtickangle(15)

ylabel(txt)
If we want to create a multiline annotation the string vector has be
a column vector
e.g. ylabel(["Area";"\pi r^2"])

histogram(data)
The histogram function can bin on integer values. Use the BinMethod
property name and specify the value as "integers".
histogram(data,"BinMethod","integers")
cassic
plot:
To specify marker size, face color:
plot(x,y,’o’, MarkerSize=10, MarkerFaceColor=’g’)
Other properties: Linewidth, MarkerEdgeColor
To plot a line having a specific color set the property “Color” to a
RGB vector: e.g. plot(x,y,Color=[0.5 0.6 0])
If we use strings properties:
plot(x,y,'o-', "Color",[.6 .2 .8], "MarkerEdgeColor", 'c')

To get the current axis limits:


xyLims = axis
If you want to set exactly the range of the data as limits:
axis tight
Other axis options: axis equal, axis square, axis normal

Surf(x,y,Z)
To specify x and y coordinates, you can pass them in as vectors.
The number of elements of x must match the number of columns of z
The number of elements of y must match the number of rows of z

LOGICAL COUNTING
any/nnz/all
any(logical vector)
You can use any to find if any of the elements in a logical array
are true

You can use the function all to find if all the elements in a
logical vector are true. It operates by column if it is a matrix
all(logical vector)
To operate row-wise: all(v, 2)

You can use the function nnz to count the Number of Non-Zero values
in an array, or the number of true values in an array.
nnz(v)
It always gives a scalar number, regardless it is a vector or matrix

LOGICAL INDEXING
find
You can use the function find to find the indices of the true values
in a logical vector.
v = [false true true false];
idx = find(v)
idx =
[2 3]

You can obtain the locations of just the first or last n true values
in a logical vector by providing optional extra arguments to find.
idx = find(v,n,dirn)
where dirn is either "first" or "last".

TABLES OF DATA
readtable/summary/table/array2table
Use the readtable function to import the data from a file into a
table in MATLAB.
data = readtable("dataFile.txt")
To ensure text is imported as a string, set the TextType property to
string.
data = readtable("dataFile.txt","TextType","string")
You can use the summary function to display a summary of each
variable.
summary(data)

You can organize your workspace variables into a table with the
table function. The following code creates a table, data with
variables a, b, and c.
data = table(a,b,c)
data = table(a,b,c, ‘VariableNames’, {‘Var1’,’Var2’, ‘Var3’})
It can also works in this way:
data = table(a,b,VariableNames=[“Var1”, “Var2”]) variable names must
be string if included in [], otherwise:
data 0 table(a,b,VariableNames={‘Var1’, ‘Var2’})
IMPORTANT: the table function works with VariableNames parameter
specified as character vector (i.e ‘VariableNames’), and the
variable names coming after must be included in a cell array of
character vectors((i.e.{‘var1’, ‘var2’})

You can use the array2table function to convert a matrix to a table.


The following code creates a table named data from a matrix, A.
data = array2table(A)
To create custom variable names in the table, follow the variable
input with the property VariableNames and a string array of text.
The following code creates a table named data with custom variable
names, X and Y..
data = array2table(A,"VariableNames",["X" "Y"])
IMPORTANT: array2table accept the variable names when passed as a
cell array of character vectors (i.e.{‘var1’, ‘var2’}) or as a
vector of strings. The VariableNames property can be written either
in single or double quotation marks, there is no difference.
To not ever be wrong with both array2table and table use this
syntax:
array2table (B,'VariableNames',{'Var1' 'Var2'})

table(a,b,c, 'VariableNames',{'Var1' 'Var2' ‘Var3’})

You can use dot notation to extract data for use in calculations or
plotting.
You can also use dot notation to create new variables in a table.
tw = EPL.HW + EPL.AW

SORTING TABLE
sortrows
You can sort a table on a specific variable using the sortrows
function.
tSort = sortrows(tableName,"SortingVariable")

By default, the rows are sorted on that variable from smallest to


largest.
You can use the "descend" option to sort in descending order.
tSort = sortrows(tableName,"SortingVariable","descend")
To sort by a second variables, supply them in order to the sortrows
function as a string array.
tSort = sortrows(tableName,["var1" "var2"],"descend")
EXTRACTING PORTIONS of A TABLE
removevars/movevars
You can create a subset of the original table using regular array
indexing with parentheses.
You can extract rows and columns in whatever order you like. For
instance, this will have the first three rows of M but the second
and third rows will be swapped
M([1 3 2],:)
When indexing into a table, it's often easier to remember a variable
name as opposed to figuring out the specific column number.
So, as an alternative to numeric indexing, you can index using the
variable name in double quotes.
hmWins = EPL(:,"HomeWins");
You can select multiple variables by name using a string vector of
variable names as input.
wins = EPL(:,["HomeWins" "AwayWins"]);
Remember that you can remove elements from a numeric array by
assigning them to the empty array.
x = [4 7 1 8 2 9];
x(2:3) = []
x =
4 8 2 9

You can remove variables from a table in the same way.


T(:,["VarName1" "VarName2"]) = []
Alternatively, you can use the table-specific function removevars to
remove variables from a table.
T = removevars(T,["VarName1" "VarName2"])
With numeric arrays, you can use indexing to reorder elements.
x = [6 -7 9 8];
x = x([1 3 2 4])
x =
6 9 -7 8
You can reorder variables in a table in the same way. For example,
to swap the first and second variables of T, you can use the command
T = T(:,[2 1 3:end])

You can use the movevars function to move a variable (or variables)
to the position "Before" or "After" another variable.
T = movevars(T,vars2move,"After",posVar)
You can reference vars2move and posVar using numeric or named
indexing.
Try using movevars to move the team names to the end of the table.
EPL = movevars(EPL,"Team","After","AwayLosses")

Use the writetable function to create a file from a table.


writetable(tableName,"fileName")

You can specify a delimiter, such as tab, when creating the file.
writetable(data,"fileName.txt","Delimiter","\t")

IMPORTANT: To access a portion of table you can use tablename(), but


in this case remember the output will be a table. If you want to get
a matrix you have to use the {} notation which means tablename{}.
Accessing with dot notation(.) gives a numeric data as well (like
{})

COMBINING TABLE DATA


You can horizontally concatenate two tables using square brackets,
[], if they have the same number of observations but do not share
any variable names.
newTable = [table1 table2]

You can use the join function to merge two tables using any
variables with the same names as key variables.
newTable = join(table1,table2)
The join function matches up rows using variables with the same name
as key variables. All the variables from the first entry are
retained, and then it adds corresponding information from the non-
key variables of the second entry.
The order matters!

Other join functions: innerjoin | outerjoin

You can access metadata from a table using the Properties property.
tableName.Properties

You can also access specific table properties using dot notation.
tableName.Properties.propName

Whenever a column heading is not a valid MATLAB variable name, it is


replaced by a valid name and the original column header is saved in
the VariableDescriptions property of the table. VariableDescriptions
contains a cell array of text – one cell for each column header.

CELL ARRAYS
Index into a cell array using parentheses, (), to return a cell
array subset.
The following code will return the first row of regions.
countries = regions(1,:)
countries =
1×2 cell array
{'South America'} {'Europe'}

Use curly braces {} to extract the contents of a cell.


The following syntax, using parentheses, returns a 1-by-1 cell
array.
regions(1,1)
ans =
cell
{'South America'}

In contrast, curly braces will return a 1-by-13 character array.


regions{1,1}
ans =
'South America'

To change e.g. a variable name of a table:


tablename.Properties.VariableNames{index} = ‘character array’
It has to be a character array and not a string in double quotes

DATES AND TIMES


datetime/days/years/hours/minutes
To represent a specific point in a timeline use datetime:
datetime(year,month,day)
If you want to specify the time, use the optional fourth through
sixth input arguments in the datetime function. They represent the
hour, minute, and second in that order.
t = datetime(2014,5,25,8,11,0)
t =
25-May-2014 08:11:00
Note that there are either three or six input values for datetime.

To create vectors and matrices of type datetime you can specify


array inputs to the datetime function. Notice that providing a row
vector as input produces a row vector, and a column vector as input
produces a column vector.
d = datetime((1929:5:1939)',10,29)
d =
29-Oct-1929
29-Oct-1934
29-Oct-1939

Create a 12-by-1 vector d4 of type datetime containing the first day


of each month in the year 1990.
d4 = datetime(1990,1:12,1)'

Create a 1-by-4 row vector named d3 of type datetime representing


the dates January 1st-4th, 1975.
d3 = datetime(1975,1,1:4)

To represent an interval
duration
If you subtract a datetime value from another one you’ll get a
duration time in hours.
To transform it in days:
days(durationvalue)
If the input of days is a duration it outputs a scalar. Otherwise,
if the input is a scalar it converts in a duaration variable
You can create a duration time by using the functions days, hours,
minutes, etc.
There are functions years and days, but what happens if you try
months?
months(seasonEnd - seasonStart)
Error using duration/months
Durations cannot represent calendar months.

Why does it produce an error? What would months mean as a duration?


Is it 30 days? 31 days? What about 28 or 29 days?
Duration values are constant, they do not change. However, it can be
useful to measure elapsed time in terms of something that does
change, like months. In this context, you could use a calendar
duration.
Calendarduration
You can create a calendar duration from two datetime values using
the between function.
t = datetime(2016,6,23)
b = datetime(2014,5,25)
between(b,t)
ans =
2y 29d

Just like durations, you can create a calendar duration from a


numeric input. For example, you can create a value of one month
using the calmonths function.
Create a calendar duration named om containing one calendar month.
calmonths(1)

You can add or subtract a calendarDuration variable from a datetime


variable

CATEGORICALS ARRAYS
categorical/categories/mergecats/summary
You can convert a string array to a categorical using the
categorical function.
catArray = categorical(stringArray);

You can see the categories represented in a categorical array using


the categories function on the categorical array.
cats = categories(categoricalarray)

You can use == for comparison the same way you would with string
arrays (or other logical function)

You can view the category names with the number of elements in each
category using the summary function.
summary(players.Team)

The mergecats function allows you to merge multiple categories into


one, and even give it a new name.
x = categorical([ 2 1 2 3 ])
x =
2 1 2 3
y = mergecats(x,["1" "3"],"C")
y =
2 C 2 C

By default, the possible values and the names of the categories will
be determined automatically from the data. You can specify these as
additional inputs to categorical, where the second input indicates
the unique category values in the original array, and the third
input indicates the names that correspond to these categories.
v = [ 10 5 0 0 ];
levels = ["beg" "mid" "last"];
categorical(v,[0 5 10],levels)
ans =
last mid beg beg

If your categories have an inherent ordering – for example, “low”,


“medium”, and “high” – you can specify this with an optional
property "Ordinal":

v = [ 10 5 0 0 ];
levels = ["beg" "mid" "last"];
c = categorical(v, [0 5 10],levels,"Ordinal",true)
c > "mid"
ans =
1 0 0 0

PREPROCESSING DATA
Data normalization: normalize
One of the most common ways to normalize data is to shift it so that
it's mean is centered on zero (i.e. the data has zero mean) and
scale the data so that it's standard deviation is one. This is
called the z-score of the data.

To normalize data using z-scores, you can use the normalize


function.

xNorm = normalize(x)

It can be also written:


a = x-mean(x,'omitnan')
s = std(x,'omitnan')
xnorm = a/s
If you don't want to change the scale of the data, you can just
center the data on zero by including the argument "center" set to
"mean".
xCenter = normalize(X,"center","mean")

You can normalize data by the first value in each column by


including the "scale" argument and setting it to "first".

xScale = normalize(X,"scale","first")
or
xScale = normalize(X, "scale",X(1,:))

You can also stretch or compress data so that it's maximum and
minimum values are within a specified range [a,b].
xRange = normalize(X,"range",[a b])

WORKING WITH MISSING DATA


Doing comparison with NaN doesn’t work.
X = [1 2 NaN 0 NaN]
X == NaN  ans = [0 0 0 0 0]
Instead you can use:
isnan to test whether the elements of a set of numeric data are NaN
or not. This outputs a logical array.
isnan(X)  ans = [0 0 1 0 1]
The isnan function is used to identify missing values in numeric
data types, where missing values are denoted as NaN values. The
ismissing function is more general and identifies missing values in
other data types as well. The results is again a logical array.
ismissing(x)
By default, ismissing finds only NaN values in a numeric array. You
can provide an optional list of values to be considered “missing”:
ismissing(data,[val1 val2 val3 ...])
rmmissing removes rows with missing data by default, but you can
also remove any columns with missing data.
rmmissing(x)
rmmissing(x,2)  removes columns having missing data

INTERPOLATION

To interpolate missing data (NaN) use fillmissing(x,'method')


method can be linear, spline, nearest, pchip
When fillmissing is applied to a matrix, it acts on the columns
independently.
By default, the fillmissing function assumes the observations are
equally spaced when performing the interpolation.
You can specify the spacing of the observations by providing a
vector that represents the sampling locations:
yinterp = fillmissing(y,"method","SamplePoints",x)

To replace a number (e.g. 0) to NaN:


standardizeMissing(x,0)
To replace NaN then you can use fillmissing

SMOOTHING DATA
You can smooth the data using the function as follows
smoothedData = smoothdata(y,"movmean",3,"SamplePoints",x);
When data is evenly sampled there’s no need to add the sample points
(e.g. 3).
The function works column-wise
You can specify the number of points backwards and forward of the
window
y = smoothdata(x,method,[kb kf])
e.g. Create a vector named lead4med that contains the leading 4-
point median of y (i.e., the median of the current point and the
three next points).
lead4med = smoothdata(y,"movmedian", [0 3])
Smoothdata is a function to filter data (lowpass, gaussian, etc.)

Plotting with Different Scales

The command yyaxis left creates new axes with independent vertical
axis scales. The axis on the left is currently active.
yyaxis left

Plots are created in the active axes.


plot(t,y1)

The command yyaxis right changes the currently active axis to the
axis on the right. Plots are now created in this axis which uses a
different scale to the axis on the left.
yyaxis right
plot(t,y2)

Issuing the command yyaxis left a second time does not modify the
axis on the left but makes it active again, allowing you to make
modifications to the axes without replotting.
yyaxis left
ylabel("y_1")
ylim([0 20])

Similarly, yyaxis right makes the axis on the right active again.
yyaxis right
ylabel("y_2")
ylim([0 600])
xlabel("x")
LINEAR CORRELATION
To see if there is a linear correlation betwenn two or more
variables use corrcoef
corrcoef(v1,v2)
The function returns in this case a 2x2 matrix with values in the
range [-1 1] where 1 means perfect correlation(diagonal), -1
anticorrelation, 0 no correlation
Scatterplot can help visualize the correlation between the
variables. If we want to see more than 2 variables we can create a
scattter for each pair with the plotmatrix function
plotmatrix(matrix)
The plots in the diagonal are the histogram of the individual
columns.
You can use corrcoef on a matrix by passing it as input. The result
will be a matrix with the same size.
Corrcoef([a b c])

By default, corrcoef returns a NaN if one of the data points is


missing. Specifying the "Rows" option to be "complete" ensures only
sets of data without missing values are included in the calculation.
corrcoef(A,B,"Rows","complete")
You can require that the rows be pairwise complete to include more
data. Try changing the "Rows" option to "pairwise".
corrcoef(allSectors,"Rows","pairwise")

The polyfit function finds the coefficients of the best fit n-th
degree polynomial of yData in terms of xData.
cf = polyfit(xData,yData,n)

The polyval function evaluates a polynomial (given by coefficients


cf) at the points xEval.
yEval = polyval(cf,xEval)

You can avoid the numerical precision limitations by centering and


scaling the x data when using polyfit and polyval. To do this, ask
for a third output from polyfit:
[c,~,sc] = polyfit(x,y,deg)

yFit = polyval(c,xFit,[],sc)
where sc is the vector of the scaled coefficients

PROGRAMMING CONSTRUCTS
warning(text) to display a warning
disp(text) to display info

When it is important to make sure the user saw your message, you can
create dialog boxes that display messages, errors, or warnings. The
corresponding functions are msgbox, errordlg, and warndlg. Each of
them accepts two inputs: the message and a title for the dialog box.

msgbox("Analysis complete","Complete!")
errordlg("Data not found","Error!")
warndlg("Missing values.","Warning!")
IF
if condition
code
else
code
end

or
if condition
code
elseif condition1
code
elseif condition2
code
else
code
end

Switch-case
switch x
case 1
case 2
end

FOR
idx=1
for idx=1:100
code
end
WHILE
while condition is true
code
end

FUNCTIONS
You can define local functions at the end of a script. Your script
can call any local function defined within it. However, as the name
implies, these functions are local to that script – they are not
accessible from outside the script where they are defined. You
cannot call a local function from the Command Window or another
script or function.
A function can also contain its own local functions, if desired. As
always, these local functions are accessible only within the file.
That is, they can be called only by the primary function. The
primary function is always the first function in the file and its
name matches the file name.
The syntax is always:
function [outputs] = functname(input)
code
end

Variable priority
First Matlab checks the base workspace, then a local function, a
file in the current folder, then a file in other local drive in your
pc.Matlab has a search path so to know where to find the function.
To add folders to the search path:
On the Home tab, in the Environment section, click Set Path.
Add a single folder or a set of folders using the buttons
highlighted below.You can also add the path in the workspace.
which variable -all
to know what the variable is in Matlab

You might also like