IE 332 - Homework #2: Read Carefully. Important!

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

IE 332 - Homework #2

Due: Sept 28th, 11:59pm EST

Read Carefully. Important!


As outlined in the course syllabus this homework contributes to your project grade. The maximum attainable
mark on this homework is 150. As was also outlined in the syllabus, there is a zero tolerance policy for any
form of academic misconduct. Each group submits one assignment.

By electronically submitting this assignment all members of the group acknowledge these statements and accept
any repercussions if in any violation of ANY Purdue Academic Misconduct policies. You must upload your
homework on time for it to be graded. No late assignments will be accepted. Only the last uploaded version
of your assignment will be graded.

NOTE: You should aim to submit no later than 30 minutes before the deadline, as there could be last minute
network traffic that would cause your assignment to be late, resulting in a grade of zero.

You must use the provided LATEX template on Brightspace to create the pdf your assignment. No exceptions, at
penalty of assignment grade of zero.

Page i of i
IE 332 Homework #2 Due: Sept 30 2022

1. For this question you will optimize the risk of a portfolio selection problem given an expected return and other
constraints using simulated annealing. First a number of functions will be created.
Si1
Markowitz Mean-Variance Model. The return of investment on stock i is usually defined as ri = Si0
−1
where Si1
and Si0
are stock prices at the beginning and the end of the investment duration being considered.
For a portfolio consisting of n stocks, the total portfolio return is defined as
n
X
rp = ri wi (1)
i=1

where wi is the proportion of the budget allocated to stock i, and is also called the portfolio weight on stock i.
The Markowitz Mean-Variance Model treats the return of every stock in a portfolio as random variables (with
expected return and variance) and aims at finding optimal portfolio weights wi∗ on every stock to achieve a pre-
determined portfolio return level µp with minimal portfolio return variance. The following is the mathematical
definition of a Markowitz Mean-Variance Model:

• Let w = [w1 , · · · , wn ]T where wi is the weight of stock i in the portfolio.


• Let r = [r1 , · · · , rn ]T where each ri is a random variable representing the return on stock i.
• Let m = [µ1 , · · · , µn ]T where µi is the expectation of ri .
• Let Σ be the covariance matrix of r.
• The portfolio return expectation is mT w, and the portfolio return variance is wT Σw.

Then a portfolio investment on stocks i = 1, · · · , n is optimized through the following Markowitz Mean-
Variance Model (Equation 5 means that there is no short-selling, which is not necessary all the time):

1 T
Minimize w Σw (2)
2
Subject to mT w ≥ µp Return Constraint (3)
T
e w=1 Budget Constraint (4)
wi ≥ 0 ∀i Floor Constraint (5)

(a) (8 points) The following line of code using the getSymbols function from quantmod package (that you
will need to install) in R could be utilized to retrieve stock price with a specific symbol within a specific
date range. The returned xts object will have colnames as illustrated in Table 1.
priceInfo = getSymbols(Symbol,from=sdate,to=edate,auto.assign=F,periodicity=‘weekly’)

Symbol.Open Symbol.High Symbol.Low Symbol.Close Symbol.Volume Symbol.Adjusted

Table 1: The returned xts object from the getSymbols function.

Create a portfolio of stocks with the following 20 ticker symbols: AVY, WHR, MTD, ECL, BXP, HSY,
DUK, CTVA, ZTS, DAL, MLM, ALGN, NEM, PARA, CEG, EQR, COST, AIZ, SEE,PKG. Then,
calculate the weekly returns from 2022-01-01 to 2022-09-05 based on the “Symbol.Adjusted” column
from table 1, and then estimate the expected return vector m (name it rd wmean) and covariance matrix
Σ (name it rdCov), both of which are based on these weekly returns. Aside from defining variables for
start and end dates, and a vector of ticker symbols, your solution should take no more than 8 lines of R
code. HINTS:
• Check for missing values when calculating the expectation vector and the covariance matrix.
• Read this posting, for a hint on how to store the price and return info of multiple stocks: https://
stackoverflow.com/questions/28069655/storing-the-xts-object-returned-by-getsymbols/
28534246#28534246
• The Ad(x) function extracts the “Symbol.Adjusted” column data.
• The ROC function calculates the return from price data automatically.
• Use lapply and sapply! do.call may also be a potentially useful function if merging data frames.
(b) (4 points) Calculate the weekly mean return of SP500 Index (with Symbol ˆGSPC). You will use it as
the pre-determined benchmark return level µp (Equation 3) (name it sp500 wmean). No more than 3 lines
of R code. HINT: use the getSymbols and ROC functions.
(c) (0 points) One can create a function called calculate objVal onlyVar to calculate the objective value as
indicated in Equation 2 (be careful to think about how many parameters this function should have!)
using no more than 2 lines of R code (not including the function definition). HINT: matrix multiplication
not for loops! (nothing to submit for this question, it is helpful for answering questions below!)
(d) (0 points) Likewise, it is possible to create a function called create initial meetfloor that generates a guess
to solve the problem by randomly choosing wi ≥ 0 ∀i = 1, . . . , n, for n as the number of stocks one can
choose from. No more than 2 lines of R code (not including the function definition) would be needed to
do so. (nothing to submit for this question, it is helpful for answering questions below!)
(e) (0 points) It is also straightforward to write a function called neighbor rnorm that creates a “neighbor”
to an existing candidate solution x (that is, a new solution based on an existing one by making small
tweaks to it). This would be done by adding a small random value to each element of x using the rnorm
function. The parameters to the function should be x and the standard deviation one wants to allow,
with a default of 0.01. No more than 2 line of R code would be necessary. (nothing to submit for this
question, it is helpful for answering questions below!)
(f) (0 points) Using the previous functions you could then apply the SA algorithm using the code below,
appropriately replacing the code after the identified MISSING lines. BUT, you would find that it can
result in infeasible solutions by violating the floor constraint (at least), which is obviously not good!
(nothing to submit for this question, it is helpful for answering questions below!)
# Simulated Annealing.
# - Returns two vectors:
# (1) the optimal solution
# (2) the trajectory of objective value

mySAfun_simple <- function(nVar,temperature=3000, maxit=1000, cooling=0.95) {


# nVar: number of stocks
# temperature: initial temperature
# maxit: maximum number of iterations to execute for
# cooling: rate of cooling

# MISSING LINE 1: generate an initial solution


c_sol <-
# MISSING LINE 2: evaluate initial solution
c_obj <-
best <- c_sol # track the best solution found so far
best_obj <- c_obj # track the best objective found so far

IE 332 Homework #2 Page 2 of 6


# to keep best objective values through the algorithm
obj_vals <- best_obj
cnt <- 0

# run Simulated Annealing


for(i in 1:maxit) {
# MISSING LINE 3: # generate a neighbor solution
neigh <-
# MISSING LINE 4: # calculate the objective value of the new solution
neigh_obj <-
if ( neigh_obj <= best_obj ) {
# MISSING LINE 5-8: keep neigh if it is the new global best solution
c_sol <-
c_obj <-
best <-
best_obj <-
} else if ( runif(1) <= exp(-(neigh_obj-c_obj)/temperature) ) {
# MISSING LINE 9-10: otherwise, probabilistically accept
c_sol <-
c_obj <-
cnt <- cnt +1
}
temperature <- temperature * cooling # update cooling
obj_vals <- c(obj_vals, best_obj) # maintain list of best found so far
}

return(list(best=best, values=obj_vals))
}

(g) (8 points) In order to enforce that solutions will not violate the budget constraint rewrite functions cre-
ate initial meetfloor and neighbor rnorm into functions create initial meet2 and neighbor rnorm meetbudget.
This will require no more than 3 lines, and 2 lines, of R code, respectively.
(h) (8 points) It may still be possible that a constraint is violated. To try and avoid this, one strategy is to
add a value to the objective value that makes it worse if a constraint is violated, we call this penalizing
a candidate solution. Rewrite the objective function calculate objVal onlyVar to calculate objVal penalty
to penalize for solutions that violates the floor and return constraints. No more than 7 lines of R code
(aside from the function definition, which should include the penalty values as parameters with defaults
as stated below). Use the following penalty values:
p_ret = max(diag(rdCov))/1e-4 #penalty for the return constraint
p_floor = 1e3 #penalty for the floor constraint
and
function definition:
# Calculate the objVal of a candidate solution which does not allow infeasibility.
calculate_objVal_penalty <- function(x, covMat=rdCov, retVec=rd_wmean, bRet=sp500_wmean,

IE 332 Homework #2 Page 3 of 6


p_ret=max(diag(rdCov))/1e-4, p_floor=1e3) {
# x: a candidate solution
# covMat: Covariance matrix
# retVec: expected return vector
# bRet: benchmark expected return

YOUR CODE HERE!


}

(i) (10 points) Using the mySAfun simple function provided in part (f), fill in the MISSING lines of code
using functions create initial meet2, neighbor rnorm meetbudget and calculate objVal penalty, and imple-
ment it in R.
(j) (12 points) Run your SA algorithm 30 times and create a single plot that shows the trajectory of each
trial. The trajectory of the best result should be in red, other should all be gray. Be sure to properly
label the plot, and have a legend! In the title of the plot include the median, minimum and maximum
final objective value of the 30 runs. See the below example plot, noting that yours will appear different
due to the stochastic nature of simulated annealing.

IE 332 Homework #2 Page 4 of 6


2. In your quest to find good algorithms to perform stock selection, you also consider a greedy framework. In
this problem two greedy algorithms will be created, each with different sets of possible choices.
(a) (10 points) Create pseudocode for parts b and c, below. Choose one of these and write a loop invariant
that proves your greedy algorithm will perform correctly (not that the result will be optimal).
(b) (10 points) The first approach will consider the stock selection problem as a variant of knapsack (as
discussed in lecture) where B > 0 is an integer-valued budget, and S is a set of stocks (use those from
Question 1). For each s ∈ S you can purchase more than one stock, and so you will determine exactly
how many copies of s to select to accomplish two tasks (1) maximize the value of the stocks selected while
using up as much of your budget as possible, and (2) accomplish this with the fewest stocks possible (i.e.
not the fewest tickers, but the fewest shares purchased). The stock prices are assumed for a current day.
Test your greedy algorithm on the two cases below; assume prices are integers. Does either one lend itself
toward the optimality of your greedy solution? Explain.
Share Values 1: [1,5,10,25,100], Share Values 2: [1, 3, 6, 12, 21, 24, 30, 60, 240]
(c) (10 points) Using the same algorithm in (b), but this time with a different objective function. Specifically,
select stocks s ∈ S in order to maximize the value to risk ratio (defined as the current value divided by the
estimated probability the stock will decrease in price over the course of the next week), while preserving
the constraints outlined in part (b). The stock prices are assumed for a current day.
Does this new version of the problem, given the set of the following parameters, provide an optimal
outcome? Explain.
Share Values: [1,5,10,25,100], Risk Values: [1,2.5,5,12.5,50]
(d) (10 points) Implement your algorithm from (b) or (c) in R, and test it using the data from Question 1.
Use the date of Sept 1 (+your team number), 2020; So team 1 will use data from Sept 2, 2020, and so
on. If using algorithm (c) you can simplify the probability a stock will decrease by using pnorm(current
price, mean, std). Generate a random budget between $30,000 and $40,000 using the runif function,
and be sure to round stock prices to the nearest integer. Report the budget, final solution (amount of
each stock chosen) and the cost to purchase all of them, as well as the remaining budget. Also include
the value of the stock, 7 days after purchase, and compute the total gain or loss (plus the remainder of
the budget not used to purchase stocks).
(e) (10 points) Analyze your algorithm in (c) using Big-O notation to determine the worst-case runtime AND
space complexities.

IE 332 Homework #2 Page 5 of 6


3. Using the data from Question 1, and letting the market closing prices be c(t) for a given company’s stock
over a certain number of days t (for the project you would extend this to weeks). You would like to look
back in time to determine when would have been the best pair of dates to have bought and sold a stock in
order to maximize the profit in the investment. For example, for t = 4 and if the stock prices c(1) = 6,
c(2) = 2, c(3) = 7, and c(4) = 3, the pair of days in which one should have bought and then sold the stock
to get maximum profit would have been (buy = 2, sell = 3), since in that case the profit would have been the
highest: c(3)-c(2)=5. Assume you cannot buy and sell on the same day.

(a) (9 points) A brute force algorithm to find the pair of dates among every feasible (buy,sell) date combi-
nation for which the profit c(s) − c(b) is the highest. Determine the worst-case running time complexity
of the brute-force algorithm implied by the pseudocode given in italics. HINT: first translate the italics
into a more traditional pseudocode, then analyze it.
(b) (8 points) Provide a loop invariant for your brute-force algorithm to show it yields an optimal result.
(c) (10 points) Provide pseudocode for a divide-and-conquer algorithm that, given a list of arbitrary c(t)
closing prices for a particular stock over a period of t > 2 days will return the optimal pair of dates on
which to buy and sell the stock.
(d) (8 points) Determine the worst-case runtime complexity of the divide-and-conquer algorithm.
(e) (15 points) Implement your divide-and-conquer and brute-force algorithms in R, for a stock of your
choosing from the 20 ticker symbols given. Run each algorithm considering t=10,20,50,100,365 days.
Record the time each takes (you can add an if test to stop computation of the brute-force-algorithm if it
takes longer than 5 minutes). Use one of these approaches to record the “wall clock” (actual) runtime:
https://www.r-bloggers.com/2017/05/5-ways-to-measure-running-time-of-r-code/.
Plot the results comparing the two approaches using t as the x-axis and the runtime as the y-axis. Ensure
proper labels, legend, and coloring of the plot.

IE 332 Homework #2 Page 6 of 6

You might also like