Professional Documents
Culture Documents
Unit 4 DA
Unit 4 DA
Financial Analytics
Financial analytics is the creation of ad hoc analysis to answer specific
business questions and forecast possible future financial scenarios. The
goal of financial analytics is to shape business strategy through reliable,
factual insight rather than intuition.
• By offering detailed views of companies’ financial data, financial
analytics provides the tools for firms to gain deep knowledge of key
trends and take action to improve their performance.
The importance of financial analytics
• Financial analytics can help companies determine the risks they face,
how to enhance and extend the business processes that make them
run more effectively, and whether organizations’ investments are
focused on the right areas.
Top 3 predictive analytics models in finance
• In the finance context, these are the three most widely used predictive
models:
• Classification model:
• The classification model is among the most straightforward predictive
analytics models that produce a binary output. In the banking
context, classification models are often used to guide decisions based
on a broad assessment of the subject. For example, it can predict
whether the shares of a certain company will go up or down.
Outliers model:
• The outliers model is used to detect significant deviations in a dataset,
making it one of the most widely-used models for fraud detection. For
example, if a customer’s credit card is used to buy an overly expensive
watch in a city that he doesn’t live in, the outliers model will flag this
transaction as potentially fraudulent on the grounds of this being an
unusual behavior.
Time series model:
• The time series model tracks a certain variable throughout a specific
time period to predict how that variable will be affected at another
specific time frame. For example, in finance, the time series model is
often used to predict how a given financial asset (like a security’s price
or inflation ratio) will change over time.
Customer Profitability Analytics
• Customer Profitability Analysis (in short CPA) is a management accounting and a
credit underwriting method, allowing businesses and lenders to determine the
profitability of each customer or segments of customers, by attributing profits
and costs to each customer separately. CPA can be applied at the individual
customer level (more time consuming, but providing a better understanding of
business situation) or at the level of customer aggregates / groups (e.g.
Grouped by number of transactions, revenues, average transaction size, time
since starting business with the customer, distribution channels, etc.)
• CPA is a “retrospective” method, which means it analyses past events of
different customers, in order to calculate customer profitability for each
customer. Equally, research suggests that that credit score does not necessarily
impact the lenders’ profitability
Organizational Profitability Analytics
• PACE’s( Profitability Analytics Center of Excellence) Analytics (PA)
Framework is a process designed to produce high-quality internal
decision-support information that supports decision making throughout
the organization – from the C-suite to the shop floor and direct customer
contact points. It is built on a holistic view of revenue management,
managerial costing, and investment management within an organization.
• The PA Framework goes well beyond traditional financial accounting,
reporting, and analysis. It incorporates modern revenue management
techniques, modern managerial costing focused on internal decision
support, and new views of investments as both tangible and intangible.
Profitability Analytics Framework
• The Profitability Analytics Framework is composed of three primary
elements:
• 1. Strategy Formulation – where an organization establishes its plan
for identifying and addressing its market(s) and for mobilizing its
resources to meet the demands created by that plan.
2. Strategy Validation – where causal models are developed that
directly enable the evaluation of strategy. These models employ the
principle of causality to quantify, in operational and monetary terms,
the revenue and cost impacts of an organization’s strategy, and then
track the execution and performance of that strategy
3. Strategy Execution – involves decision making that employs the
outputs of the causal models to provide an organization’s decision
makers with the accurate and relevant information they need to make
economically sound decisions as they execute and adapt tactics to meet
strategic goals.
Cash flow analytics
• How do you calculate whether a property will be successful or not?
One of the most import metrics real estate investors use to decide on
whether an investment is sound is cash flow.
• What is cash flow?
• Cash flow is a description of the way money flows through a rental
property, similar to the way water flows over a waterfall. — Roofstock
• Put simply cash flow is monthly income — monthly expenses. The
remainder amount is the number you profit on a monthly basis.
Why do we want POSITIVE cash flow?
The period of the SMA can be any natural number really (N > 0), but the most
useful SMA values fall between 20–200 depending on the timeframe that is being
used. A lower SMA value will respond faster to changes in the underlying price.
It is important to note that because our number of daily closing prices is finite
there will be instances where the SMA cannot be calculated (i.e.; there are not
enough future underlying prices to calculate the appropriate SMA). We will have
to take this into account when writing the code.
Residual:
In order to turn the SMA values calculated into a data that is
interpretable, we will be calculating the difference between the
underlying and the SMA value at each point where both are defined.
• Then we can sum the differences and divide the sum by the number
of values we found. This will give us a raw value that could potentially
be compared to those of other similar companies.
The Code
In the Highest of levels, we need to:
Decide who we will be analyzing and when we want to analyze them.
Establish a data source to retrieve adjusted closing prices for each day
within the chosen period.
Plot the Underlying vs. The SMA for visualization.
• Calculate Average of Residuals as explained previously.
In python
• Let’s begin by importing the necessary packages:
• datetime will allow our program to read our specified time period.
• Pandas_datareader.data as web helps us extract data from the Web.
• All three variations of Matplotlib will allow us to actually graph and
visualize the data that we are interpreting.
• Functools will let us reduce our array of residual differences to one
total Sum.
Import datetime
import pandas_datareader.data as web
import matplotlib.pyplot as plt
from matplotlib import style
import functools
• import matplotlib as mpl
Next, we can initialize and set the time periods we want to analyze. I
chose a two year period for a more simple data visualization:
# Initialize and set the time periods that we want to analyze
start = datetime.datetime(2017, 1, 1)
• end = datetime.datetime(2019, 11, 2)
Next, we should pick the companies that we want to compare. I chose
Apple, Microsoft, and Netflix. These companies have relatively similar
underlying prices at the publication date of this article:
# make a list of companies that should be tested
# Apple: AAPL, Microsoft: MSFT, Netflix: NFLX
• stock_tickers = [‘AAPL’, ‘MSFT’, ‘NFLX’]
• Now we will enter our main loop that plots the stock charts along with the Simple Moving Averages. I
chose a 20 day period for my SMA soely for data visualization:
• Iterate over each company in the list and produce:
• # 1. Graph of Moving average vs stock price
• # 2. Average of the residual for each day moving average can be calculated
• for i in range(len(stock_tickers)):
• # Read our data from Yahoo Finance
• df = web.DataReader(stock_tickers[i], ‘yahoo’, start, end)
• df.tail()
• # Create a list of the closing prices for each day
• close_price = df[‘Adj Close’]
• # Calculate the SMA for each ticker with a 20 day period
• moving_avg = close_price.rolling(window=20).mean()
Now we must Plot the values that we have calculated (This is still contained within the original ‘for’
loop). I used plt.show() to physically show my graphs. If you are using a Collab Notebook like
Jupyter, you can use %matplotlib inline to display graphs.
# Adjusting the size of matplotlib
mpl.rc(‘figure’, figsize=(8, 7))
# Adjusting the style of matplotlib
style.use(‘ggplot’)
# Plot the Moving Average vs. Closing Price
close_price.plot(label=stock_tickers[i])
moving_avg.plot(label=‘moving_avg’)
plt.legend()
• plt.show()
• From Matplotlib we receive these graphs. The blue line represents the
Simple Moving Average with a 20 day period:
• Now to make use of this SMA data, we will calculate the average of
the residuals (This is still contained within the original for loop). It is
important to note that because the number of days we can collect
closing prices for is finite and bounded, some values will result in
‘NaN’ which means Not a Number. These values are eliminated from
the parsed list through a mini function called lambda. This is
comparable to the Lambda Function notation in Java or Arrow
Function in JavaScript.
Finally, Sum the the Residuals of Moving Average ~ Closing Price
differences = []
# Iterate over each list and calculate differences
for j in range(len(close_price)):
x = abs(moving_avg[j] – close_price[j])
differences.append(x)
# Eliminate the values that are of type ‘nan’
• cleanedList = [x for x in differences if str(x) != ‘nan’]
Use the Reduce method and a short form function ‘lambda’ combine all
elements of the list
combined_value = functools.reduce(lambda a, b: a + b, cleanedList)
# Calculate the average of the residuals
average = combined_value / len(cleanedList)
• print(“The residual average of “ + stock_tickers[i] + “ is: “ +
str(average))
The final output yields:
The residual average of AAPL is: 5.875529758261265
The residual average of MSFT is: 2.0096710682601384
• The residual average of NFLX is: 12.446982918608967
Financial Modeling in python
• Financial Modeling in Python refers to the method used to build a
financial model using a high-level python programming language with
a rich collection of built-in data types.
• This language can be used for modification and analysis of excel
spreadsheets and automation of certain tasks that exhibit repetition.
• Since financial models use spreadsheets extensively, Python has
become one of the most popular programming languages in finance.
PPF Package for Python
PPF Package for Python
• The PPF package or library refers to the Python package that comprises a family of
sub-packages. In other words, it is a mixture of various supporting extension
modules that facilitate the implementation of Python programming. Please find
below the summary of the various PPF sub-packages:
• Com: It is used for trade, market, and pricing functionality.
• Core: It is used in the representation of types and functions of financial quantities.
• Date_time: It is used in the manipulation and calculation of date and time.
• Market: It represents the types and functions of standard curves and surfaces in
financial programming (e.g., volatility surfaces, discount factor curves, etc.).
• Math: It is used for general mathematical algorithms.
Model: It is used for coding various numerical pricing models.
Pricer: It is for types and functions used for valuing financial structures
Text: It is used for the test suite.
• Utility: It is used for tasks that are general in nature (e.g., algorithms
for searching and sorting).
Mathematical Tools for Python
• Some of the major mathematical tools available in Python are as
follows:
• N(.): It is a function in the ppf.math.special functions module that helps
approximate the standard normal cumulative distribution function, which is
used in the Black–Scholes option pricing model.
• Interpolation: It is the process that is used to estimate the values of a
function y(x) for arguments between several known data points (x0, y0), (x1,
y1) . . . , (xn, yn). The ppf. Utility. The bound module is used in its
implementation. Some of the variants of interpolation are:
• Linear interpolation
• Loglinear interpolation
• Linear on zero interpolation
• Cubic spline interpolation
Root Finding: It is used to find the root with or without derivative
information using the ppf. Math. Root finding module. Some of the
variants of root finding are:
Bisection method
Newton-Raphson method
• Linear Algebra: The linear algebra functions are mostly covered in the
NumPy package. It is implemented using the ppf. Math. Linear-algebra
module. Some of the variants of linear algebra are:
Matrix Multiplication
Matrix Inversion
Matrix Pseudo-Inverse
Solving Linear Systems
• Solving Tridiagonal Systems
• Generalized Linear Least Squares: It fits a set of data points to a linear
combination of some basic functions. The algorithms for this function
are implemented using the ppf: math and generalized least squares
module.
Quadratic and Cubic Roots: These functions are used to find the real
roots of a quadratic or cubic equation. The ppf.math.quadratic roots
module is used to find the real roots of a quadratic equation, while the
ppf. Math. The cubic roots module is used for the cubic roots algorithm.
Integration: This tool is used to calculate the expected value of a function
with random variables. It is primarily used in the calculation of financial
payoffs. Some of the variants of integration are:
Piecewise Constant Polynomial Fitting
Piecewise Polynomial Integration
• Semi-analytic Conditional Expectations
Portofolio Analysis using Python
Installing the required libraries
Open the terminal and activate the conda environment to install the
following packages.
Pip install matplotlib
pip install seaborn
• pip install nsepy
Importing the libraries
• import numpy as np
• import pandas as pd
• import matplotlib.pyplot as plt
• import seaborn as sb
• from datetime import date
• from nsepy import get_history as gh
• plt.style.use('fivethirtyeight') #setting matplotlib style
Defining Parameters
• stocksymbols = ['TATAMOTORS','DABUR',
'ICICIBANK','WIPRO','BPCL','IRCTC','INFY','RELIANCE’]
• srartdate = date(2019,10,14)
• end_date = date.today()
• print(end_date)
• print(f"You have {len(stocksymbols)} assets in your porfolio" )
• Here, we’ve created a list of stocks for which we want to fetch data and
analyze that data. We’ve defined the starting date, i.e., the date from
which we want to fetch the data, and the end date as well, i.e., today.
Fetching Data
• Now, we’ll be iterating over the list of stocks to fetch data one by one for every single stock and
combine it towards the end to have it in one data frame.
• df = pd.DataFrame()
• for i in range(len(stocksymbols)):
• data = gh(symbol=stocksymbols[i],start=startdate, end=(end_date))[[‘Symbol’,’Close’]]
• data.rename(columns={‘Close’:data[‘Symbol’][0]},inplace=True)
• data.drop([‘Symbol’], axis=1,inplace=True)
• if i == 0:
• df = data
• if i != 0:
• df = df.join(data)
• df
• We’ve fetched the data for two columns only, the Symbol and Close
Price. While fetching the data, we renamed the Close Price Column
with the Symbol/Ticker and then dropped the Symbol Column.
• #Output
Now, with this dataset,
we'll do a great deal of
Portfolio Analysis.
Analysis
• Plotting the Close Price history.
• Fig, ax = plt.subplots(figsize=(15,8))
• for i in df.columns.values :
• ax.plot(df[i], label = i)
• ax.set_title(“Portfolio Close Price History”)
• ax.set_xlabel(‘Date’, fontsize=18)
• ax.set_ylabel(‘Close Price INR (₨)’ , fontsize=18)
• ax.legend(df.columns.values , loc = ‘upper left’)
• plt.show(fig)
• Output
Correlation Matrix
• A Coefficient of correlation is a statistical measure of the relationship
between two variables. It varies from -1 to 1, with 1 or -1 indicating
perfect correlation. A correlation value close to 0 indicates no
association between the variables. A correlation matrix is a table
showing correlation coefficients between variables. Each cell in the
table shows the correlation between two variables.
• The correlation matrix will tell us the strength of the relationship
between the stocks in our portfolio, which essentially can be used for
effective diversification.
• Code to determine correlation matrix:
• Correlation_matrix = df.corr(method=‘pearson’)
• correlation_matrix
• Output:
• Plotting the Correlation Matrix:
• fig1 = plt.figure()
• sb.heatmap(correlation_matrix,xticklabels=correlation_matrix.column
s, yticklabels=correlation_matrix.columns, cmap=‘YlGnBu’,
annot=True, linewidth=0.5)
• print(‘Correlation between Stocks in your portfolio’)
• plt.show(fig1)
With this matrix, we can see
that Wipro and Infosys are heavily
correlated, which is very logical as both
companies belong to the same industry. It
can also be seen that BPCL and IRCTC are
negatively correlated. Hence, it is wise to
have them in our portfolio for efficient
diversification, which ensures that if, for
some reason, the BPCL goes in one
particular direction, let's say down. There's
less chance of IRCTC also moving in the
same direction
Risk & Return
Daily Simple Returns:
• To ascertain daily simple return, we’ll write this code:
• daily_simple_return = df.pct_change(1)
• daily_simple_return.dropna(inplace=True)
• daily_simple_return
Output