Professional Documents
Culture Documents
Big Data Architectures For Investment Banking - Session I
Big Data Architectures For Investment Banking - Session I
Ignacio Sales
15.07.2017
AGENDA
Senior Software Architect with over 20 years of experience designing solutions on the Java
Enterprise and .NET platforms. For the last five years, specializing in Big Data solutions for the
Financial Services industry.
Current:
Big Data Senior Architect at GFT IT Consulting (Valencia)
Course Objectives
GFT Group is a business change and technology consultancy trusted by the world’s leading
financial services institutions to solve their most critical challenges.
Specifically defining answers to the current constant of regulatory change – whilst innovating to
meet the demands of the digital revolution.
GFT Group brings together advisory, creative and technology capabilities with innovation culture
and specialist knowledge of the finance sector, to transform the client’s businesses.
Utilising the CODE_n innovation platform, GFT is able to provide international startups, technology
pioneers and established companies access to a global network, which enables them to tap into the
disruptive trends in financial services markets and harness them for their out of the box thinking.
GFT At a Glance
STAFF
GLOBAL DELIVERY MODEL
3,248
Approximately
4,000 1,096
1,300 1,337 1,386
2,111
employees
STRONG INTERNATIONAL
INNOVATION ECOSYSTEM
PRESENCE
12
countries
1,000
digtial pioneers
Germany, Brazil, Canada, Costa
Rica, Italy, Mexico, Poland, Spain, with startups, established
Switzerland, UK, USA corporations, media and politicians
Capital
Markets
Retail
Banking
Insurance
Private
Wealth
Focused in both Functional and Technical aspects and covering end to end data lifecycle: data
sourcing, transformation, quality assurance, analytics, persistence, visualisation, BI reporting and
governance.
Main efforts are being dedicated to research & development of the key technologies in the following
areas: Big Data, Data Analytics, Social Mining , Stream Processing and Block Chain.
We then focus on providing the following services both internally and externally: asset creation,
training, project and pre-sales support and communication.
The practice is complemented by two Knowledge Communities: Data Management KC and Businness
Intelligence KC.
Data Practice
CERTIFICATIONS POCS
25
certifications 10
PoCs
On Hadoop Development &
TRAINING
Administration, Elasticsearch, SAP- On Anti-Money Laundering/KYC,
HANA, MongoDB, Spark and HBase Hadoop based ETL, HPC, Block
Over Chain, Real-time Anomaly Detection
PROJECTS
300
people trained
Supported over
Investment Banking at
10,000 feet
2.- INVESTMENT BANKING AT 10,000 FEET
Banks at a Glance
Numerous types of banks exist, each of which fulfils a specific set of services to its
customers:
Retail banks
• deal directly with individuals and small to medium size enterprises offering
payment services, savings products, mortgages, credit cards, and insurances
Savings banks
• offer savings products to a wide range of the public. Saving banks are now
included within the retail bank sector
Banks at a Glance
Private banking
• advise and manage the assets for high net worth individuals
Commercial banks
• deal mostly with deposits, loans, and financing for large corporations. Often
commercial banks also sell more complex banking products to its clients (via
the investment bank)
Investment banks
• act as an intermediary between an issuer of securities and the investing public,
and trade financial instruments, make markets (bring buyer to seller),
underwrite stock and bond issues, foreign exchange and advise corporations
on capital markets activities (see functions of investment bank in next slides)
Banking business: raising capital and executing M&A transactions for corporate
clients; raising capital for government clients
Arranges financing for corporations and governments
Debt (bonds)
Equity (stock)
Convertibles (Convertibles are securities, usually bonds or preferred shares, that can be converted
into common stock)
Advises on mergers and acquisitions (M&A) transactions
A classic distinction for the organisation within an Investment Bank for trade related
activities is by whether or not the areas interact with clients or perform supporting tasks in
the back ground:
Front Office: Any business unit which directly interacts with the customers and
counterparties such as
Relationship Management
Sales
Traders
Middle Office: These units perform supporting and controlling activities for the Front
Office and may occasionally interact with the counterparties or other involved parties
(Stock Exchanges, Brokers, Custodians, etc.) during the trade processing such as
Trade Processing, Reconciliation and Clearing
Profit & Loss (P&L) calculation and verification
Market and Credit Risk calculation and monitoring
Limit and Position Controlling
Financial Control
Collateral Management
Back Office: the back office units are usually responsible for performing all Trade
Settlement, Accounting and Reporting activities:
Trade Settlement (payments and deliveries, incl. physical delivery of e.g. gold bullions)
Record Maintenance (account booking and adjustments in General Ledger [GL])
P&L booking
Taxation
Regulatory and MI reporting
Asset Classes
Financial Instruments can be classified in different ways. The criteria mostly used is Asset
Class.
Asset Class: Financial instruments or securities that can be grouped by the same characteristics
and behaviour in market places and which are usually regulated in the same way:
Equities: stocks
Fixed Income: bonds
Commodities: precious metals, crude oil
Foreign Exchange (FX)
Cash Equivalents: Money Market loans and deposits
Real Estate
Derivatives: Futures, Options, etc
Asset Classes
Other criteria describe how or where these instruments are traded or certain
characteristics of the instruments.
Market Place describes whether the instruments are traded in a regulated market environment e.g.
Stock or Commodities Exchange or Over-the-Counter (OTC)
Term / Maturity is especially used for fixed income products to describe the length of the loan or
deposit:
Short Term – Up to one year
Medium Term – between one year and five years
Long Term – five years or longer
Client Trade Capture Market Data Valuation Tax Processing Financial Financial
Trade Lifecycle
On-Boarding Management Control Reporting Management &
Event
Management Analysis
Transactional Funding & FX
Reference Data
Data Matching Valuation & Business Control Risk Reporting
Management
Settlements Analytics Control Risk
Management &
New Product Transactional Regulatory
Analysis
Approval Regulatory Trading Risk & Product Control Capital
Asset Services
Reporting P&L
Treasury
Accounting
Accounting, Collateral Management
Control
Regulatory &
Reporting
Policy Client
Valuations
Counterparty Organisation
All financial transactions have two participating parties. Hierarchies exist which enable the mapping of financial data to
the organisation, e.g. to a cost centre, to a particular office
location, etc. Organisations may have multiple organisational
The counterparty is the second party which participates in a hierarchies by which they want to manage information.
financial transaction.
Business hierarchy (Book, Desk, Business Unit,Region..)
Every buyer of an asset must be paired up with a seller that is
willing to sell and vice versa.
Financial hierarchy(GL Account, Cost Centre…)
Counterparty reference data will include:
• counterparty name
• counterparty type (inter / intra company)
Trade Data
Product Instrument
A product is a high-level grouping of financial An instrument is a unique security which can, for
Instruments / securities: FX Spot, Equity Derivatives, example, be traded on an exchange. AAPL is the
Corporate Bonds. unique ticker for Apple Computer stock equity that
can be traded on the NASDAQ exchange.
Product reference data will include:
• product codes Instrument reference data will include:
• product name, e.g. Bond 3 Year • instrument number
• product category, .e.g. Bonds • instrument type
Regulatory Environment
Regulatory The Markets in Financial Instruments Directive (MIFID) is a European Union law that provides
Policies harmonised regulation for investment services across the 30 member states of the European
Economic Area.
Authorisation, regulation and passporting: once a financial institution is accepted in one EU state it can operate
in others
MIFID Client categorisation: firms must categorise clients as "eligible counterparties", professional clients or retail
clients
The Sarbanes Oxley Act 2002 (SOX) was introduced under US Federal Law to prevent
reoccurrences of major accounting scandals such as that caused by Enron.
Includes rules around public company accounting, auditor independence, corporate responsibility and analysts’
SOX conflicts of interests.
Companies must report annually on the operational effectiveness of the internal controls relating to financial
reporting. The company’s auditors must also attest to and report on the board’s assertions.
Dodd-Frank was introduced in 2010 and brought on financial regulation in the United States in
response to the financial crisis of 2007-2010.
Consolidation of regulatory agencies and evaluation of systematic risk
Dodd-Frank Increased transparency of derivatives trading
Volcker Rule – prohibits proprietary trading by depository banks
Structural Measures
MiFID II uncertainty Jan 14 – Jan 19
May 14 -Jan 18
• The Volcker Rule, a specification of the U.S. Dodd-Frank • Able to calculate several metrics like Inventory Aging,
Wall Street Reform Act CTFR and ITR of positions in order to determine extent of
• A large UK-based investment bank needed to produce proprietary trading
regulatory reports on its proprietary trading • Management of huge volumes of data – fine granularity
• Technological platform which allows both high volume and trade data for key businesses
rapid processing • Rapid calculation of key figures for Volcker reporting
• Scalable to meet future needs of the bank
• GFT implemented a new system using the big data platform • Receives data from over 50 trade and position sources
Hadoop
• MapReduce was used for data importing, transformation, • Processes 750 million trades events/day
and calculation • Total 174 TB of historic data
• Sqoop was used to implement the interface to relational
databases • Cluster: 22 nodes (4x4 cores/each + 98GB RAM)
• QlikView was implemented as the main reporting tool
Exporter
CFTR Metric
Metric Calculation Data
Source
CSV Source
Source System
Systems CSV System Norm
Normalizers ITR Metric
(x50) CSV Normalizers
Normalizers alized Metric Calculation
CSV Data
Data
Server
Qlickview
Dashboard
Cov. Funds Metric
Metric Calculation Data
RENT-D Metric
Metric Calculation Data
Orchestration
Scheduling
Financial Accounting
Tax Processing
• The bank must calculate and pay appropriate taxes and comply with the local tax laws.
Funding & FX Control
• The bank needs to maintain a healthy balance of assets and liabilities at all times, and be aware of areas of
weakness.
Regulatory Capital
• The bank must ensure that they maintain the minimum levels of capital required by the regulatory bodies.
Accounting Control
• Ensuring that accounts are prepared according to correct accounting practices and the numbers produced
can be supported.
GFT Group 03.09.2015 34
3.- THE TRADE LIFECYCLE
Balance Maintenance
• Accounting SubLedger is a critical finance system which • The new architecture puts in place a totally scalable
produces credit / debit postings for the accounting platform which will allow continued growth of data volumes
process of a large investment bank • Improved efficiency by migrating existing logic from PL/SQL
• To future-proof the system, the data loading, conversion, to Hadoop
aggregation and balance generation needed to be • Having managed the original subledger platform over the
updated, reducing processing times and improving last 10 years, GFT was able to understand well how to
efficiency successfully apply the new technologies to the evolving
needs of the bank
• GFT designed and implemented a new architecture using • 6 different business lines
the big data platform Hadoop
• Configurable aggregation logic was implemented in • 700 million balances/day
MapReduce jobs • 65 million postings/day
• Workflow coordination was done with Oozie and Sqoop to
extract data from existing Oracle databases • Cluster: 30 nodes (2x10 cores/each + 128GB RAM)
• Hive used to query intermediate data (for testing purposes)
J
M
S
Calculate Balances
Control Balances
Data Preparation
XML Generation
Import Postings
Summarized
File Delivery
Data Export
SFTP
Posting
Backup
HDFS Unix
HouseKeeping
Archiving &
Purging
Scheduling
Reporting
The reports, both Financial and Risk, are built using inputs from the
earlier stages of the functional model.
Report definitions should be standard and agreed across the
business, locations and legal entities where possible.
The data used in the Risk reports should be consistent and aligned to
the Financial reports, and vice versa. This means that close
communication between the two areas is extremely important, and
only one data source should be used (if possible)
Risk Reporting
Risk exposure is a given within the world of investment banking, in particular market, credit and liquidity risk.
Risk is calculated and reported by portfolio and aggregated to division and bank wide. Risks reported include:
Market Risk Credit Risk Liquidity Risk Operational Risk Legal Risk
The risk of loss of The risk that a The risk that a sale The risk that a loss The risk that a
earnings or capital, counterparty will is unable to be will be incurred due counterparty is not
resulting from a not fulfil its made. to inadequate or legally able to enter
change in the value obligations. failed processes, into a contract or
of financial people, systems. that legislation
instruments. might change
during life of trade
Reports are prepared to both detect trends and areas of particularly high risk. Reports are directed at various levels:
• Traders: market and credit risk of individual trades, and aggregated to trader portfolio; limit reporting
• Desk managers: portfolio risk and desk aggregated risk; limit reporting
• Senior managers: market and credit risk at division level, legal risk, operational risk
• Top managers: overall bank risk (e.g. global VaR), compliance risk reporting, legal risk, and operational risk
• The IFRS9 Impairment project is the bank’s response to a • Meeting regulatory requirements
regulatory requirement for Jan’18. The bank has to provide • An scalable and high-performance solution for the
the authorities with mandatory risk exposure reports to computation of calculations and transformations using
anticipate impairment expected loss impact given default Spark
• Impairment calculation requires huge amounts of input data, • GFT is now rolling this solution out to different divisions of
and billions of daily calculations. A new proposed the bank
architecture based on Big Data technologies must provide
scalability and reliability
• Spark DataFrames were used for the ETL and calculations • Still under development, volume is not significant
and transformations • Smoke test:
• Storage was on HDFS using Avro format and structured
• Input data: 60M default predictions
with Hive tables, allowing for ad-hoc interrogation with Hive
or Impala engines • Processed in about 20min
• Impala was the entry point for QlikView to create the reports • Shared cluster: 15 nodes (48 cores/each and 500GB RAM)
• Programme Management
Core Platform
Volume:
Certainly not at Google or Facebook scale
Quite a few systems on the limits of relational technologies
Oracle Exadata / Teradata, etc
100’s of TB are not uncommon
Variety:
Many types of structured data – rapidly changing
Some use cases do require analysing unstructured data
Trader / Communications surveillance
Velocity:
Algorithmic Trading (Big Data before the name was coined!)
Many processes are batch-driven
Architectures evolving from batch to real / near-real time processing
Holy grail – Straight through processing – Settle at T+0
Globalization of Production chain - Provide the right tools for each function
Development (Western Europe)
Testing (Eastern Europe)
Production Support (Asia)
Do’s
Start “small”
Process and/or store data sets on the multi-Terabyte range
No need for huge Clusters - 10 to 20 nodes is perfectly acceptable
Beware the “Small Print of BigData”: Get help from the experts
Don’ts
Start a project based on purely technical considerations
Business Value must be the main driver
Infrastructure
Architecture
CPU count based licensing models don’t play well with Hadoop
Design
Implementation
Hands On Exercise
ETL In Investment Banking
5 . - H A N D S - O N E XE R C I S E
Exercise Walkthrough
Q &A
Books:
Too Big to Fail - https://www.amazon.com/Too-Big-Fail-Washington-
System/dp/0143118242/ref=pd_sim_14_9?ie=UTF8&dpID=61Sy1mRL4lL&dpSrc=sims&preST=_AC_UL160_SR104
%2C160_&psc=1&refRID=6B96A138QK9KYNGD25GM
The Big Short – Michael Lewis - https://www.amazon.com/Big-Short-Inside-Doomsday-
Machine/dp/0393338827
Liar’s Poker – Michael Lewis - https://www.amazon.com/Liars-Poker-Norton-Paperback-
Michael/dp/039333869X/ref=pd_bxgy_14_img_2?ie=UTF8&psc=1&refRID=FFVAKW546G3WBRPBVDT4
Flash Boys – Michael Lewis - https://www.amazon.com/Flash-Boys-Wall-Street-
Revolt/dp/0393351599/ref=pd_bxgy_14_img_3?ie=UTF8&psc=1&refRID=BFWXWFWAGFQKEKSDNM4Z
Barbarians at the Gate – Bryan Burrough - https://www.amazon.com/Barbarians-Gate-Fall-
RJR-
Nabisco/dp/0061655554/ref=pd_sim_14_8?ie=UTF8&dpID=517uhxQLpdL&dpSrc=sims&preST=_AC_UL160_SR107
%2C160_&psc=1&refRID=D1WFZD628ZNM2YG7EAJJ
Movies:
Inside Job (2010) http://www.imdb.com/title/tt1645089/
The Big Short (2015) http://www.imdb.com/title/tt1596363/
Margin Call (2011) http://www.imdb.com/title/tt1596363/
Too Big to Fail (2011) http://www.imdb.com/title/tt1742683/
Rogue Trader (1999) http://www.imdb.com/title/tt0131566/
Barbarians at the Gate (1993) http://www.imdb.com/title/tt0131566/