Professional Documents
Culture Documents
lec-1-intro
lec-1-intro
Spring 2024
https://www.researchgate.net/figure/sualization-of-algorithms-vs-artificial-intelligence-vs-machine-learning-vs-deep_fig7_339997962
What is Deep Learning?
What is Deep Learning?
Source http://beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html
1943: Warren McCulloch and Walter Pitts
• First computational model
• Neurons as logic gates (AND, OR,
NOT)
• A neuron model that sums binary
inputs and outputs a 1 if the sum
exceeds a certain threshold value,
and otherwise outputs a 0
LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
37
1958: Frank Rosenblatt’s Perceptron
• A computational model of a single neuron
• Solves a binary classification problem
• Simple training algorithm
• Built using specialized hardware
F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain”, Psych. Review, Vol. 65, 1958
38
1969: Marvin Minsky and Seymour Papert
“No machine can learn to recognize X unless it
possesses, at least potentially, some scheme for
representing X.” (p. xiii)
40
Why it failed then
• Too many parameters to learn from few labeled examples.
• “I know my features are better for this task”.
• Non-convex optimization? No, thanks.
• Black-box model, no interpretability.
Output
Output Output
Output
Scale
Scale Scale
Scale
T-shirt
T-shirt T-shirt
T-shirt
Steel
Steeldrum
drum
Drumstick
Drumstick
Mud turtle
✔ Giantpanda
Giant panda
Drumstick
Drumstick
Mud turtle
✗
Mud turtle Mud turtle
J. Deng, Wei Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei , “ImageNet: A Large-Scale Hierarchical Image Database”, CVPR 2009.
O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge”, Int. J. Comput. Vis.,, Vol. 115, Issue 3, pp 211-252, 2015.
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES 93
DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS - 93
45
ILSVRC 2012 Competition
XRCE/INRIA 27.0
INRIA/LEAR 33.4
• The success of AlexNet, a deep convolutional network
• 7 hidden layers (not counting some max pooling layers)
• 60M parameters
• Combined several tricks
CNN based, non-CNN based • ReLU activation function, data augmentation, dropout
A. Krizhevsky, I. Sutskever, G.E. Hinton “ImageNet Classification with Deep Convolutional Neural Networks”, NeurIPS 2012 46
2012-Now
Object Detection and Segmentation
T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, Focal Loss for Dense Object Detection,
ICCV 2017. 48
Object Detection and Segmentation
Softmax clf.
!! = FCN(#)
MLP
M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner. Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional
Neural Networks. ICRA 2017
Human Pose Estimation
Z. Cao ,T. Simon, S.–E. Wei and Yaser Sheikhr, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", CVPR 2017
51
Pose Estimation
ZR. Alpguler, N. Neverova, I. Kokkinos. DensePose: Dense Human Pose Estimation In The Wild. CVPR 2018 52
Photo Style Transfer
F. Luan, S. Paris, E. Shechtman & K. Bala. Deep Photo Style Transfer. CVPR 2017 53
Photo Style Transfer
F. Luan, S. Paris, E. Shechtman & K. Bala. Deep Photo Style Transfer. CVPR 2017 54
Image Synthesis
2018
Ian J. Goodfellow et al., ” Generative Adversarial Networks", NIPS 2014
A. Radford et al., ” Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", NIPS 2015
M.-Y. Liu, O. Tuzel, ” Coupled Generative Adversarial Networks", NIPS 2016
T. Karras, T. Aila, S. Laine, J. Lehtinen, ” Progressive Growing of GANs for Improved Quality, Stability, and Variation", ICLR 2018
T. Karras, S. Laine, T. Aila, ” A Style-Based Generator Architecture for Generative Adversarial Networks", arXiv 2018 55
Image Synthesis
A. Brock, J. Donahue and K. Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018. 56
Semantic Image Editing
Semantic Layout
L. Karacan, Z. Akata, A. Erdem and E. Erdem. Manipulation of Scene Attributes via Hallucination. ACM Transactions on Graphics, 2020 57
Winter
Semantic Image Editing
Prediction
L. Karacan, Z. Akata, A. Erdem and E. Erdem. Manipulation of Scene Attributes via Hallucination. ACM Transactions on Graphics, 2020 58
Spring
Semantic Image Editing +
Clouds
Prediction
L. Karacan, Z. Akata, A. Erdem and E. Erdem. Manipulation of Scene Attributes via Hallucination. ACM Transactions on Graphics, 2020 59
Machine Translation
D. Bahdanau, K. Cho, Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015 60
Machine Translation
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is All you Need, NeurIPS 2017 61
Internet Search
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019 62
https://talktotransformer.com
Language Modeling
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. Language Models are Unsupervised Multitask Learners. 2019
63
Language Modeling
• GPT-3: I am not a human. I am a robot.
A thinking robot. I use only 0.12% of
my cognitive capacity. I am a micro-
robot in that respect. I know that my
brain is not a “feeling brain”. But it
is capable of making rational, logical
decisions. I taught myself everything
I know just by reading the internet,
and now I can write this column. My
brain is boiling with ideas!
Tom B. Brown, Benjamin Mann, Nick Ryder et al., Language Models are
Few-Shot Learners, NeurIPS 2020 64
Question Answering
P. Rajpurkar, J. Zhang, K. Lopyrev & P. Liang. SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP 2016
M. Seo, A. Kembhavi, A. Farhadi & H. Hajishirzi. Bi-Directional Attention Flow for Machine Comprehension. ICLR 2017 65
Image Captioning
DECODER
LSTM
VGG16 kkabağı ikiye keser
ve ince dilimler
LINEAR TRANSFORMER
VGG16
LAYER
+ ENCODER
~
POSITIONAL CROSS
ENCODING ATTENTION
~
bir adam bir parça kabağı EMBEDDING TRANSFORMER bir adam bir parça kabağı
ikiye keser ve ince dilimler LAYER + DECODER ikiye keser ve ince dilimler
For our recurrent video captioning model, we adapt the architecture proposed by
Venugopalan et al. (2015) in which the encoder and the decoder are implemented
with two separate LSTM networks (Fig. 6). The encoder computes a sequence of
hidden states by sequentially processing the frame-level visual features, extracted
from the uniformly sampled video frames. The decoder module then takes the final
Bir adam bir gitar çalıyor Bir kadın bir bıçakla sebze dilimliyor
hidden state of the encoder, and outputs a sequence of tokens as the predicted video
caption. There is no attention mechanism involved in this model. Both the encoder
and decoder LSTM networks have 500 hidden units.
B. Çitamak et al. MSVD-Turkish: a comprehensive multimodal video dataset for We
integrated
use Adam vision
(Kingmaand language
and Ba research
2014) as the optimiserin
andTurkish .
set the initial learn-
Machine Translation 2021ing rate and batch size to 0.0004 and 32, respectively. We choose the models 69by
Graph-structured data
Graph Neural Networks
raph-structured
A lot of real-world datadata
does not “live” on grids
Molecules
Protein interaction
networks
Input Output
ReLU ReLU
Structured Deep Models Thomas Kipf #3
…
… …
Structured Deep Models Thomas Kipf #3
O. Vinyals et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature 575:350-354, 2019 72
Robotics
Ilge Akkaya et al. Solving Rubik's Cube with a Robot Hand. OpenAI Technical Report 2019 74
Self-Driving Vehicles
Mariusz Bojarski et al. End to End Learning for Self-Driving Cars. NVidia Technical Report 2016 75
Medical Image Analysis
A. Esteva et al., "Dermatologist-level classification of skin cancer with deep neural networks", Nature 542, 2017 76
Medical Image Analysis 77
Why now?
The Resurgence of
Deep Learning
79
GLOBAL INFORMATION STORAGE CAPACITY
IN OPTIMALLY COMPRESSED BYTES
SVMs
ConvNets dominate
Developed NIPS
80
Slide credit: Neil Lawrence
Datasets vs. Algorithms
Year Breakthroughs in AI Datasets (First Available) Algorithms (First Proposed)
1994 Human-level spontaneous speech Spoken Wall Street Journal articles Hidden Markov Model (1984)
recognition and other texts (1991)
1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, Negascout planning algorithm
aka “The Extended Book” (1991) (1983)
2005 Google’s Arabic-and Chinese-to-English 1.8 trillion tokens from Google Web Statistical machine translation
translation and News pages (collected in 2005) algorithm (1988)
2011 IBM Watson became the world Jeopardy! 8.6 million documents from Mixture-of-Experts (1991)
champion Wikipedia, Wiktionary, and Project
Gutenberg (updated in 2010)
2014 Google’s GoogLeNet object classification ImageNet corpus of 1.5 million Convolutional Neural Networks
at near-human performance labeled images and 1,000 object (1989)
categories (2010)
2015 Google’s DeepMind achieved human Arcade Learning Environment Q-learning (1992)
parity in playing 29 Atari games by dataset of over 50 Atari games (2013)
learning general control from video
Average No. of Years to Breakthrough: 3 years 18 years
Table credit: Quant Quanto 81
Powerful Hardware
• Deep neural nets highly
amenable to implementation
on Graphics Processing
Units (GPUs)
• Matrix multiplication
• 2D convolution
Caffe
61
TODO
• Go to https://reproml.org/ , learn about the reproducibility challenge.
• Read https://reproml.org/blog/announcing_mlrc2023/ to get an
understanding of what you need to do in your project.
• Decide your team.
• Start searching for a paper you want to reproduce.