Professional Documents
Culture Documents
1FL 2024
1FL 2024
1FL 2024
Lecture 1
Data Fusion course - Team
• Anna Wilbik
professor in data fusion
and intelligent interaction
• Marcin Pietrasik
• Post-doc in data fusion
• Afsana Khan
PhD student in federated learning
Anna Wilbik
Who are you?
(https://www.flickr.com/photos/saulalbert/37545736336)
Data fusion
• A process dealing with the association, correlation, and combination of data and
information from single and multiple sources to achieve refined position and identity
estimates, and complete and timely assessments of situations and threats as well as
their significance
• Data fusion is a formal framework in which are expressed means and tools for the
alliance of data originating from different sources. It aims at obtaining information of
greater quality; the exact definition of ‘greater quality’ will depend upon the
application
• Data Fusion is the analysis of several data sets such that different data sets can
interact and inform each other
• DF is a framework, fit by an ensemble of tools, for the joint analysis of data from
multiple sources (modalities) that allows achieving information/knowledge not
recoverable by the individual ones.
Taxonomy of data fusion
Federated learning as model fusion
https://towardsdatascience.com/introduction-to-ibm-federated-learning-a-
https://doi.org/10.1016/B978-0-444-63984-4.00001-6 collaborative-approach-to-train-ml-models-on-private-data-2b4221c3839
Schedule (1)
Date Topic
06.02.2024 08:30-10:30 Lecture: Introduction. Federated learning (1)
08.02.2024 08:30-10:30 Lecture: Federated learning (2)
20.02.2024 08:30-10:30 Lab: Federated learning
21.02.2024 11:00-13:00 Lecture: High level fusion
22.02.2024 11:00-13:00 Guest Lecture: Industry perspective
28.02.2024 11:00-13:00 Lab: High level fusion
29.02.2024 11:00-13:00 Lecture: Mid-level fusion (1)
06.03.2024 11:00-13:00 Lecture: Mid-level fusion (2)
07.03.2024 11:00-13:00 Lab: Mid-level fusion
12.03.2024 09:00-10:30* Q&A
Schedule (2)
Date Topic
13.03.2024 11:00-13:00 Lecture: Low level fusion
14.03.2024 11:00-13:00 Lab: Low level fusion
20.03.2024 11:00-13:00 Lecture: Outcome Economy
21.03.2024 11:00-13:00 Lab: Outcome Economy
26.03.2024 09:00-10:30* Exam preparation (Q&A)
27.03.2024 11:00-13:00 Assignment presentations
28.03.2024 13:00 Assignment deadline
Materials
• Handouts from the lectures
• Scientific articles – please check Canvas
Grading
• Assignment (A) (0-10) 30% weight
• Written exam (B) (0-10) 70% weight
• Bonus (C) (0.25)
Final grade = MAX(1,MIN(10,ROUND(0.3*A+0.7*B+C)))
Assignment:
• in groups – use different data fusion techniques on a data set that is provided
• Presentation presence – compulsory (penalty for absent group member)
Resit:
• Written exam - Regular resit in the ongoing academic year
• Assignment – Special repair project for groups with initial score between 4 and 6.
Repair score is 6.
Communication
• Lectures
• Instructions
• Canvas forum
• Please don’t send us emails
about the course content – use Canvas!
(https://universalscribbler.wordpress.com/2012/07/16/wor
k-dealing-with-email-overflow/)
Federated learning
Taxonomy of data fusion
Federated learning as model fusion
https://towardsdatascience.com/introduction-to-ibm-federated-learning-a-
https://doi.org/10.1016/B978-0-444-63984-4.00001-6 collaborative-approach-to-train-ml-models-on-private-data-2b4221c3839
From centralized to decentralized data
≠
Possible options
Li et al., A survey on federated learning systems: vision, hype and reality for data privacy
and protection, 2019.
Key differences with distributed learning
Data distribution
• In distributed learning, data is centrally stored (e.g., in a data
center)
- The main goal is just to train faster
- We control how data is distributed across workers: usually, it is
distributed uniformly at random across workers
• In FL, data is naturally distributed and generated locally
- Data is not independent and identically distributed (non-i.i.d.), and it is
imbalanced
FL – area under development
Web of science
publications
Record Count
3000
2500
2000
1500
1000
500
0
2016 2017 2018 2019 2020 2021 2022 2023 2024
Gboard: next-word prediction
• Federated RNN (compared to prior n-gram model):
• Better next-word prediction accuracy: +24%
• More useful prediction strip: +10% more clicks
https://www.technologyreview.com/2019/12/11/131629/apple-ai- https://blogs.nvidia.com/blog/2020/04/15/federated-learning-mammogram-
personalizes-siri-federated-learning/ assessment/
Taxonomy of Federated Learning
Federated learning systems
Li et al., A survey on federated learning systems: vision, hype and reality for data privacy and protection,
arXiv preprint arXiv:1907.09693, 2019.
Data partitioning
Horizontal FL Vertical FL
Data Data
from A from A
labels Data
labels
from B
Data
from B
Horizontal FL
Vertical FL
Hybrid FL
Communication architecture
Server-orchestrated FL Fully decentralized FL
Scale of federation
Cross-silo federated learning Cross-device federated learning
Training a model on siloed data. Clients are The clients are a very large number of
different organizations (e.g. medical or mobile or IoT devices.
financial) or geo-distributed datacenters.
All clients are almost always available. Only a fraction of clients are available at
any one time, often with diurnal or other
variations.
Typically 2 - 100 clients. Massively parallel, up to 1010 clients.
Relatively few failures. Highly unreliable — 5% or more of the
clients participating in a round of
computation are expected to fail or drop
out
Partition is fixed. Horizontal or vertical Fixed horizontal partition.
Motivation of federation
• Incentive
- Obtaining a better model
- Compensation for sharing data
- …
• Regulation
The Lifecycle of a Model in Federated Learning
1. Problem identification
2. Client instrumentation
3. Simulation prototyping (optional)
4. Federated model training
5. (Federated) model evaluation
6. Deployment
Horizontal federated
learning
Data
from A
labels
Data
from B
How does it work?
1. Client selection
2. Broadcast
3. Client computation
4. Aggregation
5. Model update
How does it work?
How does it work?
………
………
………
………
………
How does it work?
………
………
………
………
………
………
………
………
How does it work?
………
………
………
………
………
………
………
………
How does it work?
……… ………
………
………
……… ………
………
How does it work?
………
………
………
………
………
Federated learning - objective
Goal:
𝐾𝐾 𝑛𝑛𝑘𝑘
𝑖𝑖 (𝑖𝑖)
min � 𝑝𝑝𝑘𝑘 � ℒ 𝑓𝑓 𝑥𝑥𝑘𝑘 , 𝜃𝜃 , 𝑦𝑦𝑘𝑘
𝜃𝜃 𝑘𝑘=1 𝑖𝑖=1
where:
𝑝𝑝𝑘𝑘 - weight of party k
ℒ � - loss function
Gradient descent -
recap
Gradient Descent
Derivative
slope of the tangent line
Partial derivative – multivariate functions
Partial derivative
Partial derivative
Gradient vector
Is the vector that has as coordinates the partial derivatives of the
function
Gradient Descent Algorithm
• Idea
- Start somewhere
- Take steps based on the gradient
vector of the current position till
convergence
• Convergence
- Change between two steps < ε
Stochastic Gradient Descent (SGD)
• At each step of gradient descent, instead of compute for all
training samples, randomly pick a small subset (mini-batch) of
training samples (xk,yk) .
𝑤𝑤t+1 ← 𝑤𝑤t − 𝜂𝜂∇𝑓𝑓(𝑤𝑤t; xk, yk)
[McMahan et al.]
Linear regression
𝒙𝒙 = 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑚𝑚 - data sample
𝑦𝑦 = 𝑓𝑓 𝜽𝜽, 𝒙𝒙 = 𝜃𝜃0 + ∑𝑚𝑚
𝑗𝑗=1 𝜃𝜃𝑗𝑗 𝑥𝑥𝑗𝑗 ,
where 𝜽𝜽 = 𝜃𝜃0 , 𝜃𝜃1 , … , 𝜃𝜃𝑚𝑚 regression coefficients
𝐷𝐷 = 𝒙𝒙(𝑙𝑙) , 𝑦𝑦 (𝑙𝑙) , 𝑙𝑙 = 1 … 𝑛𝑛 training sample (among K clients)
Minimize loss:
1 𝑛𝑛 2
ℒ 𝜽𝜽 = � 𝑓𝑓 𝜽𝜽, 𝒙𝒙(𝑙𝑙) − 𝑦𝑦 (𝑙𝑙)
2𝑛𝑛 𝑙𝑙=1
https://doi.org/10.1016/j.ins.2020.12.007
Linear regression
With gradient descent 𝜽𝜽𝑖𝑖+1 = 𝜽𝜽𝑖𝑖 − 𝛼𝛼𝛻𝛻𝜽𝜽 ℒ(𝜽𝜽), where
𝜽𝜽0 – random initialization,
𝛼𝛼 – learning rate
𝛻𝛻𝜽𝜽 ℒ(𝜽𝜽) – gradient of ℒ(𝜽𝜽) with respect to 𝜽𝜽.
𝛼𝛼 (𝑙𝑙)
𝜃𝜃𝑗𝑗𝑖𝑖+1 = 𝜃𝜃0𝑖𝑖 − ∑𝑛𝑛𝑙𝑙=1
𝑘𝑘
𝑓𝑓 𝜽𝜽, 𝒙𝒙(𝑙𝑙) ) − 𝑦𝑦 (𝑙𝑙) 𝑥𝑥𝑗𝑗
𝑛𝑛𝑘𝑘
Ridge regression
Linear regression with ℓ2 -norm regularization
Loss:
1 𝑛𝑛 2
ℒ 𝜽𝜽 = � 𝑓𝑓 𝜽𝜽, 𝒙𝒙(𝑙𝑙) − 𝑦𝑦 (𝑙𝑙) + 𝜆𝜆 𝜽𝜽 2
2𝑛𝑛 𝑙𝑙=1
Hence for the k-th client:
𝛼𝛼 𝑛𝑛𝑘𝑘
𝜃𝜃0𝑖𝑖+1 = 1 − 2𝜆𝜆𝜆𝜆 𝜃𝜃0𝑖𝑖 − ∑𝑙𝑙=1 𝑓𝑓 𝜽𝜽, 𝒙𝒙(𝑙𝑙) ) − 𝑦𝑦 (𝑙𝑙)
𝑛𝑛𝑘𝑘
𝛼𝛼 (𝑙𝑙)
𝜃𝜃𝑗𝑗𝑖𝑖+1 = 1 − 2𝜆𝜆𝜆𝜆 𝜃𝜃0𝑖𝑖 − ∑𝑛𝑛𝑙𝑙=1
𝑘𝑘
𝑓𝑓 𝜽𝜽, 𝒙𝒙(𝑙𝑙) ) − 𝑦𝑦 (𝑙𝑙) 𝑥𝑥𝑗𝑗
𝑛𝑛𝑘𝑘
https://doi.org/10.1016/j.ins.2020.12.007
Logistic regression
1
𝑦𝑦 =
1 + 𝑒𝑒 −𝒇𝒇(𝜽𝜽,𝒙𝒙)
Loss function can be approximated by second-order Taylor series
and ℓ2 -norm regularization
1 𝑛𝑛 1 𝑙𝑙 1 2
ℒ 𝜽𝜽 = � log 2 − 𝑦𝑦 𝑓𝑓 𝜽𝜽, 𝒙𝒙 𝑙𝑙 + 𝑓𝑓 𝜽𝜽, 𝒙𝒙 𝑙𝑙
𝑛𝑛 𝑙𝑙=1 2 8
+ 𝜆𝜆 𝜽𝜽 2
https://doi.org/10.1016/j.ins.2020.12.007
Logistic regression
Hence for the k-th client:
𝛼𝛼 1 1
𝜃𝜃0𝑖𝑖+1 = 1 − 2𝜆𝜆𝜆𝜆 𝜃𝜃0𝑖𝑖 − ∑𝑛𝑛𝑙𝑙=1
𝑘𝑘
𝑓𝑓 𝜽𝜽, 𝒙𝒙(𝑙𝑙) ) − 𝑦𝑦 (𝑙𝑙)
𝑛𝑛𝑘𝑘 4 2
𝛼𝛼 1 1 (𝑙𝑙)
𝜃𝜃𝑗𝑗𝑖𝑖+1 = 1 − 2𝜆𝜆𝜆𝜆 𝜃𝜃0𝑖𝑖 − ∑𝑛𝑛𝑙𝑙=1
𝑘𝑘
𝑓𝑓 𝜽𝜽, 𝒙𝒙(𝑙𝑙) ) − 𝑦𝑦 (𝑙𝑙) 𝑥𝑥𝑗𝑗
𝑛𝑛𝑘𝑘 4 2
Some challenges
Expensive communication
Can reduce communication in federated optimization by:
• Limiting number of devices involved in communication
• Reducing number of communication rounds
• Reducing size of messages sent over network
FedAvg
At each communication round:
• (i) run SGD locally, then
• (ii) average the model updates
Reduces communication by:
• (i) performing local updating,
• (ii) communicating with a subset of devices
Common approaches:
• Dimensionality reduction (low-rank, sparsity)
- Directly learn model updates that have reduced dimension/size
• Compression
- Take regular (full dimension) updates and then compress them
Heterogeneity
Heterogeneous (i.e., non-identically distributed) data and systems
can bias optimization procedures
Non-IID Data in Federated Learning
Types of non-IID data:
• Feature distribution skew (covariate shift)
• Label distribution skew (prior probability shift)
• Same label, different features (concept shift)
• Same features, different label (concept shift)
• Quantity skew or unbalancedness
Drift in FedAvg
(arXiv:1910.06378)
Possible solutions