Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Federated

Learning for
AI Analytics

CoSP Executive Handbook What does your business need to make the most of 5G?
IoT has changed AI for good
Edge, cloud computing and central data centres are all essential to compete in
today’s high-tech world and the Internet of Things (IoT). Living on the edge or in
the cloud might be precarious at first, but the advantages outweigh the risks by
assisting in everything from traffic enforcement to medical research. None of this
can be efficiently accomplished without robust artificial intelligence (AI) and an
intuitive learning model.

Centralised learning models are risky. Traditional AI approaches present


performance and security issues for many businesses. Data stored on a central
AI/ML helps with
server can lead to user privacy violations and the unauthorised release of data.
smart applications like:
This is a huge liability when corporate governance must comply with regulations
such as Sarbanes-Oxley or HIPAA and GDPR. It can also be catastrophic. If • Face recognition
an adversary compromises an algorithm in a medical or military AI ecosystem,
• Speech recognition
the results could lead to a loss of life. Lost data can also mean lost revenue to
companies. • Handwriting transcription
• Medical diagnosis
The remote, hybridised nature of today’s high-tech world mandates secure, smart,
efficient and accurate predictive analytics through AI. Machine learning (ML),
• Autonomous driving
inherent to AI, is an essential part of the process. ML processes high volumes of • Digital assistants
data with automation and predictive analysis that no human could accomplish. • Banking
It must build vigorous statistical models accurately from massive data sets,
• Stock trades
turning big data into smart data to improve outcomes by identifying problems
and patterns. When done correctly, machine learning allows IoT to work flawlessly • Gaming
both individually and collectively.

AI Analytics IoT has changed AI for good 2


Centralised learning is
no longer sustainable
Centralised learning has been the traditional norm in AI modelling. With
the distributed nature of technology and devices today, it is quickly
losing favour – because it is inefficient and not secure. It cannot be
scaled effectively, nor can it safely process the massive volumes of data
from different sources that federated learning can.

To be trusted and effective, edge, cloud and data centre computing


must have the AI modelling and analytics that guarantee data privacy,
security and accuracy. All algorithms used to process data must have
embedded security protocols to keep data safe and data owners from
being compromised. This is where distributed federated learning excels.

Federated learning was first introduced by the Google1 think tank as the
future for AI analytics. In 2016, Google debuted TensorFlow Federated
(TFF).2 This was a user-friendly implementation of federated learning.
Today’s business is hybrid, with remote and on-premises computing
demanded by users in many business sectors, which means AI must be
collaborative. Federated learning makes edge computing a viable way
for businesses to stay ahead of the curve and the competition. It does it
securely without exposing user data.

A report by Berryville Institute of Machine Learning3 identifies 70


risks associated with machine learning systems. Data sets that train a
machine learning system account for 60 percent of these risks, while
source codes account for 40 percent of those risks. Federated learning
offers access to far more data points while preserving data integrity and
privacy. It brings machine learning models down to the user-data level.

AI Analytics Centralised learning is no longer sustainable 3


The differences
between centralised
and federated learning
Federated learning is a decentralised distributed learning system that
relies on remote execution. Instead of being centralised on a server,
it distributes copies of a machine learning algorithm to various sites or
devices (nodes) where the data is stored. Conversely, the centralised
paradigm delivers machine learning solutions on cloud-based APIs
with software deployed on remote servers of AI providers. Centralised
learning identifies a problem, prepares the data, trains the machine
learning algorithm on a centralised server, and then sends the trained
model to the client system, exposing the API. Federated learning has
all training iterations performed on local devices. It is device centric,
so it does not compromise or expose original data. It returns the
computation or analytics to the central server, which then updates the
main algorithm of the learning model.

Robust encryption inherent in federated learning will be especially


important in the coming years, with more distributed cloud applications.
Gartner researchers report in their Top Strategic Technology Trends
for 20214 that half of large organisations surveyed will implement
privacy-enhancing technologies (PET) for processing data in untrusted
environments by 2025. PET rollouts will be prioritised in areas where
there is data monetisation, fraud analytics and transfers of highly
sensitive data.

AI Analytics The differences between centralised and federated learning 4


Federated learning architecture

Hospital A Hospital B Hospital C


Private and Local Private and Local Private and Local
secure data AI model secure data AI model secure data AI model

2 1
1 1

2
2

KEY: 1 Local model sharing

2 Global model sharing updates


Federated workflow
Instead of data moving to a central place,
machine learning models move to the data for training,
then recombine to create a global model.

AI Analytics The differences between centralised and federated learning 5


Federated learning offers:

Retention of sovereignty Federated learning that is secured


Data remains with the owner without impacting through PET A study conducted
the training of algorithms on the data. PETs such as homomorphic encryption
by NCBI5 found
(HE) help ensure security. This is especially
federated learning
important in the medical and financial sectors
that must comply with regulatory requirements.
used by 10 medical
Data that can be leveraged without HE maintains the integrity of encrypted institutions resulted in
being shared operations like searches and analytics without models achieving
When data privacy must be preserved, the exposing the operation itself or the resulting 99 percent of the
federated model effectively utilises available data. It never reveals personal data to servers. model quality achieved
data anonymously. Model training can be by centralised data.
distributed among data owners and results This was because of
aggregated without compromising privacy.
federated learning
ability to evaluate
Collaboration
generalisability on
Data sets from many sources across a wide
Flexible topology geography can be shared. Complex patterns
data gains from
Model sharing or aggregation among the nodes can be accurately identified across large and institutions outside
can be done later or combined. diverse groups of data sets with better results. the federation.

AI Analytics The differences between centralised and federated learning 6


The next wave of AI:
Performance and
security
Although AI is widely embraced, it is often not widely understood.
“US companies pay
A 2019 McKinsey report, Confronting the Risks of Artificial an average of USD
Intelligence,6 says few business and government leaders have honed 8.64 million per data
their knowledge on the full scope of risks. Many businesses do not fully breach, including
understand how data is fed into AI systems to operate algorithmic the cost of higher
models or how humans interact with machines. That can be costly. customer turnover
A report by Intel on data security7 says US companies pay an average
and lost business
of USD 8.64 million per data breach, so it is essential to maintain due to downtime.”
the integrity of data to stay in compliance with regulatory and legal
standards.

AI Analytics The next wave of AI: Performance and security 7


The case for
federated learning
The distributed, decentralised approach of federated learning can
benefit a variety of different business sectors and applications. Here are
two growth areas where this distributed approach can assist:

Financial services – According to cybersecurityguide.org,8 the global


financial services market totalled USD 22 trillion in 2019. This makes it
a very lucrative area for cybercriminals. Much of this growth was seen
in non-cash payments through internet and mobile devices with users
demanding immediate payment schemas for real-time payments. As
mobile device access has increased, so has the industry’s attack vector
with new vulnerabilities. Securing financial data is mission critical and
non-negotiable. Cyber hackers are very sophisticated in their attacks
so it is imperative the industry can build robust security models from
billions of transaction patterns across multiple institutions worldwide.

Centralised AI analytics platforms are vulnerable to exposing customer


information during data breaches, privacy issues and money laundering.
As more mobile devices are used, more vulnerabilities are exposed.
According to the IBM Security Cost of a Data Breach report,9 the
average cost per breach within financial services was USD 5.86 million
in 2019. From 2009 to 2019, American Express and SunTrust Bank
were breached five times while Capital One and Discover were hit four
times. Those incursions resulted in pressure from both customers and
regulatory agencies to heighten cybersecurity preparedness, protection
of customer data and predictive analytics.

AI Analytics The case for federated learning 8


Healthcare – A Frost & Sullivan report predicts the AI market
in healthcare will increase 40%, to USD 6.6 billion in 2021. AI
AI medical test models
applications in healthcare may save up to USD 150 billion annually
currently used include:
by 2026, with the potential ability to reduce the cost of medical
treatment by 50 percent.10 Lower costs are not the only benefit. • Emotional intelligence
According to a report in the National Institutes of Health,11 machine indicators to detect
learning models used in federated learning can lower mortality subtle cues in a person’s
rates for patients with COVID-19 by avoiding locally aggregating mood and feelings
clinical data across multiple facilities. Federated learning
• Help in tuberculosis
showed promise in COVID-19 electronic health records (EHRs)
detection
in developing robust predictive models without compromising
privacy. • Treatment of PTSD

Research, clinical trials and diagnostics are critical to healthcare. • AI chatbots


Collaboration between research facilities is essential to identify • Virtual assistants for
new treatment modalities. In clinical settings, patient data and
patients and clinicians
archives comprise thousands of records and images. Due to
privacy requirements, data sources are siloed and cannot be • Insurance verification
used without patient permission. This restricts data access,
• Smart robots explaining
limiting medical professionals in diagnosis, treatment and patient
lab reports
outcomes. Digital medical data, which reached 44 zettabytes in
2020, is expected to double every following year. This requires • Clinical documentation
efficient, scalable, secure AI. Federated learning helps circumvent
this problem by keeping patient data confidential and assists in
collaborative learning for medical studies and predictive analysis at
remote locations or facilities.

AI Analytics The case for federated learning 9


Federated learning architecture is more
secure with Intel® SGX
SGX stands for Software Guard Extensions. These are security SGX is not required for federated learning. It
instruction codes that run on Intel® CPUs. Intel® SGX is a hardware- makes it easier and more secure. Intel SGX adds
based trusted execution environments (TEEs) that helps protect another layer of defence by reducing the attack
against code and data snooping and code and data modifications surface.
by malware on the system. It minimises the trusted computing
base to reduce the surface from where an attack can be launched. • This protects code and data from attack
Advantages and results include: by malicious software.

• Protection against software attacks • It protects privileged escalations while


• Prevention of attacks against memory content data is being processed so developers can
create trusted execution environments.
• Option for hardware-based attestation

• The risk of side-channel attacks by


Utilising a federated learning architecture secured by technologies hackers is also reduced. SGX helps isolate
such as Intel SGX preserves owner data and enables training of code and data from outside incursions.
algorithms on data with a flexible topology. Online availability is
continuous since training is done offline with results returned later. • Intel’s founding role in the Confidential
Federated learning applications are becoming the most widely Computing Consortium helps to quickly
used and accepted privacy preservation technique in industry and identify and mitigate areas where attacks
medical AI applications. 12 can occur. Security vulnerabilities are
regularly updated.

AI Analytics The case for federated learning 10


Advantages of Intel® SGX

Protection against software attacks


01
Incorporation of Intel® SGX13 helps protect against
software attacks even if OS/driver/BIOS/VMM/SMM are
compromised. This increases protections for secrets even
when an attacker has full control of the platform.

Helps prevent attacks against memory content


02
Intel SGX helps prevent attacks, including memory bus
snooping, memory tampering and cold boot attacks
against memory contents in RAM. This can reduce the risk
that data in memory is tampered with or stolen.

Option for hardware-based attestation


03 Intel SGX offers an opportunity for hardware-based
attestation capabilities to measure and verify valid code
and data signatures. These mechanisms increase the
confidence level across the participants in the AML/CFT
system about the integrity of the model and data.

Learn more about the Consilient/Intel project.

AI Analytics The case for federated learning 11


Additional AI security tips
To stay compliant and competitive, every organisation should employ these essential data security strategies:

Data encryption ‒ Use algorithms to encode User authentication and authorisation ‒


data in an unreadable format that requires an The most-secure user authentication
authorised key for decryption. Remember includes biometrics, built-in two-factor
that cryptographic processing is vulnerable authentication, and secure enclave
to side-channel attacks. Use the latest technology built into the processor itself.
technologies to speed encryption and boost
security without impacting performance.

Hardware-based security ‒ Protect data at Data backup – Create an exact copy of


every layer of the IT infrastructure, not just the data and store it in a secure location where
software. Intel® hardware-enabled security it can only be accessed by authorised
capabilities include protections built right into administrators. Protect the backup and
the silicon, creating trusted infrastructure maintain a documented backup policy.
helps to secure hardware, firmware, operating
systems, applications, networks and the cloud.

AI Analytics The case for federated learning 12


Intel medical use case
A recent U.S. Food and Drug Administration artificial intelligence
and machine learning discussion paper14 reports AI and machine “This real-world feedback
learning–based technologies “have the potential to transform
and performance
healthcare by deriving new and important insights from the vast
adaptation makes these
amount of data generated during the delivery of healthcare.”
Officials also said AI offers the benefits of earlier disease detection,
technologies uniquely
more-accurate diagnosis, identification of new observations or situated among software
patterns on human physiology, and development of personalised as a medical device
diagnostics and therapeutics. FDA officials said one of the greatest (SaMD). Our vision is
benefits of AI machine learning is its ability to learn from real-world that with appropriately
use and experience and improve its performance: “This real-world tailored regulatory
feedback and performance adaptation makes these technologies
oversight, AI machine
uniquely situated among software as a medical device (SaMD).
learning–based SaMD will
Our vision is that with appropriately tailored regulatory oversight, AI
machine learning‒based SaMD will deliver safe and effective software
deliver safe and effective
functionality that improves the quality of care that patients receive.” software functionality
that improves the quality
Medical devices with embedded AI capabilities are already being of care that patients
certified at the University of California San Francisco’s (UCSF) Center
receive.”
for Digital Health Innovation (CDHI), with the help of Intel.15 UCSF is
using Intel® Software Guard Extensions (Intel® SGX) featured in the
Intel® Xeon® E processor family and Fortanix Confidential Computing
Enclave Manager to streamline the process. Intel SGX helps protect
the privacy of patient data in the BeeKeeperAI project.

AI Analytics Intel medical use case 13


Intel® SGX helps UCSF
Intel® SGX enables the AI platform The platform provides a zero-trust
to create a trusted computing environment designed to protect
environment that offers both the intellectual property of
hardware-based memory encryption an algorithm and the privacy of
to isolate specific application code healthcare data, while CDHI’s
and data in memory. This means the proprietary BeeKeeperAI provides
BeeKeeperAI project can use these the workflows to enable more-efficient
private regions of memory, called data access, transformation and
enclaves (or TEEs), to increase the orchestration.
security of application code and
data (to run signed applications in
enclaves).

Obtaining regulatory approval for clinical AI algorithms requires a varied set of diverse and detailed clinical
data that develops, validates and optimises unbiased algorithm models. These algorithms should be able to
consistently perform across wide-ranging patient populations, socioeconomic groups and geographic locations.
They also need to be equipment agnostic. Because of these complicated parameters and limited data access,
few research groups or healthcare organisations have access to the high-quality data needed to accomplish this.
Federated learning expands their reach.

AI Analytics Intel medical use case 14


Intel use case –
Healthcare medical
imaging
Intel and the University of Pennsylvania (UPenn) are training artificial
intelligence models to facilitate the early detection of brain tumours
while still maintaining privacy.16 The Perelman School of Medicine at Penn
has partnered with Intel Labs to co-develop the technology based on
federated learning. The alliance enables a federation of 29 international
healthcare and research institutions to train AI models to identify
brain tumours that train algorithms across multiple devices without

“AI shows great promise compromising data samples. It is trained on the largest brain tumour data
set to date without sensitive data leaving individual collaborators. The
for the early detection of
project is a three-year, USD 1.2 million grant awarded to the Center for
brain tumours, but it will
Biomedical Image Computing and Analytics (CBICA) at UPenn.
require more data than
any single medical centre Research and healthcare institutions from the US, the UK, Germany, the
holds to reach its full Netherlands, Switzerland and India are participating in the study, which
potential.” uses a distributed learning approach to enable them to collaborate on
deep learning projects without sharing patient data. Penn Medicine and
—Jason Martin, principal engineer at
Intel Labs Intel Labs were the first to publish a paper on federated learning in the
medical imaging domain. They demonstrated that federated learning
could train a model to over 99 percent of the accuracy of a model trained
in the traditional, non-private method. The new work at Penn will leverage
Intel® software and hardware to implement federated learning that
provides additional privacy protection for both the model and the data.

AI Analytics Intel use case – Healthcare medical imaging 15


Intel use case – Financial sector
A twenty-first century solution to combat money laundering

Money laundering is a nagging problem for financial institutions, with over 95 percent of anti‒money
laundering (AML) alerts offering false positives. Illicit actors profit by laundering trillions of dollars annually,
despite massive efforts to track and stop financial crime. The problem is very complicated:

Financial institutions Concerns over data privacy Regulatory pressures Compliance costs to financial
have information-sharing on a global basis have further instituted by the Currency institutions are over a hundred
constraints because they compounded these barriers, and Foreign Transactions times greater than recovered
work within an existing AML/ with no way to facilitate Reporting Act of 1970 (the criminal funds, and banks,
CF financial governance interbank information sharing. Bank Secrecy Act) and taxpayers and depositors are
system that operates as expanded regulations in Title penalised more than criminals
islands. This means they work III of the USA Patriot Act who concoct successful
in isolation to identify and demand financial institutions laundering schemes.
report a suspicious customer understand and manage their
or transaction to a financial crime risk.
intelligence unit. This type of
system does not encourage
information sharing, collective
learning or dynamic feedback
among enterprises.

AI Analytics Intel use case ‒ Financial sector 16


Federated learning fights financial fraud
Consilient’s federated machine learning technology backed by Intel® SGX is fighting financial fraud and
money laundering13 to tackle money laundering by moving beyond traditional rules-based transaction
monitoring to real-time sharing and collective learning. Intel launched the pilot project with Consilient in 2020
to redesign how financial institutions and authorities discover and prevent financial crime by lowering false
positive rates while still protecting sensitive customer information. The new model for the AML/CF system
provides a more effective and efficient means of stopping financial crime. The federated learning approach
has some big advantages:

• It goes beyond traditional rules-based monitoring to one that facilitates information sharing
among various authorities and institutions.

• It enables collective learning on complex threats.

• It distributes and shares risk.

• It simultaneously safeguards customer privacy and data.

AI Analytics Intel use case ‒ Financial sector 17


Federated learning architecture shares insight
on financial crime
The model below uses a federated learning architecture with DOZER technology from Consilient and Intel® SGX technology to
share insights into financial crime risks in a utility-like fashion. At scale, this new model helps to securely and effectively discover
systemically relevant financial crime risk across institutions and borders. It can also reduce the burden of false positives and
dependence on rules-based models and protect privacy and security by moving the analytics and not the data.

BANK 5 BANK 1 BANK 2

Do Dozer/Intel® SGX
ze GX
r/ Int el®S
el®
r/ Int
SG ze
X D o

Alg
o1 Algo 1.1 1 +∆
.4
= Algo
Alg =
.1
o1
.3 + lgo1
∆ A

Dozer/Intel® SGX Algo 1.3 = Algo 1.2 +∆ Dozer/Intel® SGX Algo 1.2 = Algo 1.1 +∆ Dozer/Intel® SGX

Algo factory

BANK 4 BANK 3

AI Analytics Intel use case ‒ Financial sector 18


Federated learning
heralds a new era for
AI analytics
AI analytics has forever changed with federated learning at the helm.
This decentralised distributed system relies on remote execution as it
distributes copies of machine learning algorithms to sites or devices
while preserving data.

• Training iterations are done locally while computation results are


sent to the central server, which updates the main algorithm.

• This maintains data integrity at the source.

Federated learning with Intel® SGX guarantees a level of security and


flexibility in AI analytics that centralised learning will never provide.
Key advantages include retention of sovereignty, flexible topology,
collaboration, data privacy retention and security through homomorphic
encryption. As our world changes and technology advances, federated
learning will be the logical choice.

Many companies and organisations have been hesitant to jump on the


AI bandwagon, waiting for more “capable” technologies. The problem
is when it comes to AI, time is not on your side. Those who delay will fall
behind. It is essential to be an early adopter to remain competitive.

What are you waiting for? Good things come to those who are first in line.
Find more information at intel.com/ai.

AI Analytics Federated learning heralds a new era for AI analytics 19


AI everywhere computing
requires smarter solutions
Visit intel.com/ai to learn why scalable AI applications start
with Intel-optimised AI software.
References
1. https://research.googleblog.com/2017/04/federated-learning-collaborative.html

2. TensorFlow Federated.

3. https://berryvilleiml.com/docs/ara.pdf

4. https://www.gartner.com/en/newsroom/press-releases/2020-10-19-gartner-identifies-the-top-strategic-technology-trends-for-2021

5. https://pubmed.ncbi.nlm.nih.gov/32724046/

6. https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/confronting-the-risks-of-artificial-intelligence

7. https://www.intel.com/content/www/us/en/analytics/data-security.html

8. https://cybersecurityguide.org/industries/financial/

9. https://www.ibm.com/account/reg/us-en/signup?formid=urx-42215

10. https://software.intel.com/content/www/us/en/develop/articles/artificial-intelligence-and-healthcare-data.html?wapkw=AI%20risks

11. Vaid A., Jaladanki S.K., Xu J., Teng S., Kumar A., Lee S., Somani S., Paranjpe I., De Freitas J.K., Wanyan T., Johnson K.W., Bicak M., Klang E., Kwon Y.J., Costa A., Zhao S., Miotto R., Charney A.W., Böttinger E.,
Fayad Z.A., Nadkarni G.N., Wang F., Glicksberg B.S. “Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19.” medRxiv [Preprint]. 2020
Aug 14:2020.08.11.20172809. doi: 10.1101/2020.08.11.20172809. Update in: JMIR Med Inform. 2020 Dec 14; PMID: 32817979; PMCID: PMC7430624.

12. https://arxiv.org/abs/1610.05492

13. https://www.intel.com/content/www/us/en/financial-services-it/federated-learning-solution.html

14. https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf

15. https://www.intel.com/content/www/us/en/newsroom/news/ucsf-propel-medical-device-innovations.html#gs.25v2g8

16. https://newsroom.intel.com/news/intel-works-university-pennsylvania-using-privacy-preserving-ai-identify-brain-tumors/#gs.5rq2yc

© Intel Corporation. Intel, the Intel logo and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
UK/07/2021/PDF/JH/CMD

You might also like