Professional Documents
Culture Documents
Guest-Lecture - Transfer Learning
Guest-Lecture - Transfer Learning
Guest-Lecture - Transfer Learning
Guest Lecture
Transfer Learning
Jin Sun
Oct. 2020 1
Today - Transfer Learning
● Overview
● Fine-tuning
● Domain Transfer / Adaptation
● Multi-Task Learning
● Lifelong Learning
2
Overview
Standard Machine Learning Assumption
Train
Data Task
Test
3
Fine-tuning
Train
Data 1 Task 1
Train (Fine-tune)
Data 2 Task 2
Test
4
Domain Adaptation
Data 1 Train
Task
Data 2 Test
5
Multi-task Learning
Task 1
Train / Test
Data
6
Lifelong Learning
Train Train
Data Task 1 Data Task 2
Test Test
7
Why Transfer Learning Is Important
Neural networks are generally hard to train
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings
of the IEEE conference on computer vision and pattern recognition. 2016.
8
Why Transfer Learning Is Important
Not enough data
http://seansoleyman.com/effect-of-dataset-size-on-image-classification-accuracy/
9
Why Transfer Learning Is Important
Multi-tasks serve as implicit regularization:
The other task’s loss function can serve as a regularize term for the current task.
Net
Task 2
10
Can a non-pretrained network be useful?
11
Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Deep image prior." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Today - Transfer Learning
● Overview
● Fine-tuning
● Domain Transfer / Adaptation
● Multi-Task Learning
● Lifelong Learning
12
Fine-tuning
Train
Data 1 Task 1
Pretrained
Train (Fine-tune)
Data 2 Task 2
Test
Final model
13
Fine-tuning Steps
1. Get a good pre-trained recent neural network
Most common is the second to last layer, but you might need something
different!
15
Understand Model Structure in PyTorch
Each model is an instance of a class.
16
Understand Model Structure in PyTorch
It is very flexible, you can ‘borrow’ partially a pre-trained model’s layer and
make a new and more complex one.
class MyNet(nn.Module):
def __init__(self, …):
…
resnet = models.resnet18(pretrained=True)
resnet = models.resnet18(pretrained=True)
model = nn.Sequential(*(list(resnet.children())[:K])) # use K to control how many layers to keep
Then you can use the new model as any other sequential model.
For example the model can now be indexed:
18
Which Layers To Keep
Depends on what information is important for the new task
New categories:
E.g., Beverages
19
Which Layers To Keep
Depends on what information is important for the new task
Different tasks:
E.g., Dense predictions
20
Which Layers To Keep
Depends on what information is important for the new task
??
21
Visualization of VGG Learned Filters
https://blog.keras.io/how-convolutional-neural-networks-see-th 22
e-world.html
Freeze or Not-Freeze
To implement in PyTorch:
for param in model.parameters():
param.requires_grad = False
23
Freeze or Not-Freeze (Softer Version)
Different Learning Rate for Different Layer
To implement in PyTorch:
optim.SGD([
{'params': model.base.parameters()},
{'params': model.task_net.parameters(), 'lr': 1e-4}
], lr=1e-6, momentum=0.9)
24
The Limitation of Fine-tuning
A pre-trained model captures the statistics of the dataset that it was trained
on.
In vision: ImageNet.
He, Kaiming, Ross Girshick, and Piotr Dollár. "Rethinking imagenet pre-training." Proceedings of the IEEE International
26
Conference on Computer Vision. 2019.
Today - Transfer Learning
● Overview
● Fine-tuning
● Domain Transfer / Adaptation
● Multi-Task Learning
● Lifelong Learning
27
Domain Transfer / Adaptation
Data 1 Train
Data 2 Test
28
Domain Shift Problem
Slides from “What you saw is not what you get” Domain adaptation for deep learning, Kate Saenko 29
Domain Transfer / Adaptation
Slides from “What you saw is not what you get” Domain adaptation for deep learning, Kate Saenko 30
Domain Transfer / Adaptation
Torralba, Antonio, and Alexei A. Efros. "Unbiased look at dataset bias." (2011): 1521-1528. 32
Dataset Bias
Torralba, Antonio, and Alexei A. Efros. "Unbiased look at dataset bias." (2011): 1521-1528. 33
Domain Shift Not Just in Computer Vision
Email spam detection
Speech recognition
Recommendation system
34
Domain Alignment
B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, 2012
Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by back propagation. In ICML 2015
35
Adversarial Domain Alignment
Tzeng, Eric, et al. "Adversarial discriminative domain adaptation." Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 2017.
Slides from “What you saw is not what you get” Domain adaptation for deep learning, Kate Saenko 36
Another Perspective of Domain Adaptation
Large data
Fully labeled Data 1 Train
Low cost
Task
Small data
Partial/Not labeled Data 2 Finetune/Test
High cost
37
38
39
UnrealCV
https://unrealcv.org/
40
Today - Transfer Learning
● Overview
● Fine-tuning
● Domain Transfer / Adaptation
● Multi-Task Learning
● Lifelong Learning
41
Multi-Task Learning
Task 1
Train / Test
Data
Many tasks are related to each other. For example: object detection and
segmentation; depth estimation and surface normal estimation.
42
It makes sense to share some network structures so each task doesn’t need to
learn everything from scratch independently.
https://www.researchgate.net/figure/a-Learning-process-of-multi-task-learning-Several-target-tasks-share-part-of-hidden_fig4_327435148
43
Dynamics of Multi-Task Learning
● Features might be easier to learn from Task B than Task A.
● The model is forced not to overfit to any single task.
● Learning a feature space that works for another task will benefit its own
task for unseen data.
● Implicit data augmentation: we get more information from our data.
44
Cross-stitch Networks
Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch Networks for Multi-task Learning. In Proceedings of the 45
IEEE Conference on Computer Vision and Pattern Recognition.
Multi-Task Learning Using Uncertainty
Kendall, A., Gal, Y., & Cipolla, R. (2017). Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics.
46
Taskonomy
http://taskonomy.stanford.edu/
48
Taskonomy - Motivation
Question:
There are many seemly independent tasks in computer vision, how does
one know whether they are indeed related or not?
Goal:
Result:
50
Taskonomy - Task Transfer Modeling
51
Taskonomy - Affinity Matrix
52
Taskonomy Generation
Given the affinity matrix, build a graph where tasks are nodes and transfers
are edges.
Find a graph that has maximum performance with minimum costs (number of
supervision used) -> Boolean Integer Programming
53
Taskonomy - Main Result
Live version
54
Taskonomy - Summary
Many vision tasks are correlated to each other than originally believed.
Number of labeled data samples needed to solve for N tasks can be greatly
reduced if we transfer features from one task to another.
Open Questions:
55
Taskonomy - Website and API
http://taskonomy.stanford.edu/api/
https://github.com/StanfordVL/taskonomy/tree/master/taskbank
https://github.com/StanfordVL/taskonomy/tree/master/data
56
Today - Transfer Learning
● Overview
● Fine-tuning
● Domain Transfer / Adaptation
● Multi-Task Learning
● Lifelong Learning
57
Lifelong Learning
Train Train
Data Task 1 Data Task 2
Test Test
Learn new data on a single model that has limited access to old train data.
Key Ideas:
Rebuffi, Sylvestre-Alvise, et al. "icarl: Incremental classifier and representation learning." Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 2017. 59
Wu, Zuxuan, et al. "Ace: Adapting to changing environments for semantic segmentation." Proceedings
of the IEEE International Conference on Computer Vision. 2019. 60
Summary - Transfer Learning
● Overview
● Fine-tuning
● Domain Transfer / Adaptation
● Multi-Task Learning
● Lifelong Learning
61