8 - PDFsam - EEarly-Exit Large Language Models2402.00518

Uploaded by

鄭惠文

0% found this document useful (0 votes)

1 views1 page

Original Title

8_PDFsam_EEarly-Exit Large Language Models2402.00518

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

1 views1 page

8 - PDFsam - EEarly-Exit Large Language Models2402.00518

Uploaded by

鄭惠文

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 1

Search inside document

5.

0 Embedding
Norm
4.5 MLP
4.0 Layer

Loss
3.5
3.0
2.5
5 10 15 20 25 30 35 40
Layer number
Figure 4: Training losses of all early exits at the end of EE-Tuning for various early-exit architectures.

Embedding Norm MLP Layer

CNN/DailyMail XSUM 0.64 NarrativeQA MMLU
0.26 NarrativeQA
0.24 0.56 0.50
0.24 0.55
ROUGE-L

ROUGE-L

0.22 0.22 0.40

0.48

EM
F1
0.50
0.21 0.30
0.20 0.40
F1

0.18 0.45
0.20 0.20
0.32
1.0 1.5 2.0 0.401.00 1.25 1.50 1.75 2.00 1.0 1.5 2.0 1.0 1.5 2.0 2.5 3.0
Speedup Speedup Speedup Speedup
1.0 1.5 2.0
Speedup
Figure 5: Downstream performance of our 13B models with various early-exit architectures. Points closer to
the top-right corner represent better performance (i.e. higher speedup and scores). Markers on each curve
correspond to discrete values of the confidence threshold that we use for this experiment. Speedup increases
from left to right as the threshold decreases, taking values in {1.0, 0.9, 0.8, 0.6, 0.4, 0.2}.

3.2 Architectures of early-exit layers

Observation: Regarding early-exit architectures,
(1) an early-exit layer with more trainable parameters or located at a deeper position attains a lower
training loss;
(2) satisfactory inference speedup can be achieved together with comparable or even boosted scores, and
MLP strikes the best overall balance between speed and quality.

This experiment compares the early-exit architectures introduced in Section 2.1. For layer normalization
therein, we follow Llama 2 [49] and use RMSNorm [55]. We add 8 early exits to the 40-layer 13B Llama 2-
Chat model, spaced evenly at the 1/8, 2/8, . . . , 8/8 depth of the Transformer backbone and initialized by
the copy method. The last early-exit layer, placed side-by-side to the original final-exit layer, is added only
for experimental purposes rather than practical usage.

Training losses. Figure 4 shows the training loss of each early exit at the end of EE-Tuning. We have
observed that tuning Embedding early-exit layers, which contain no normalization module, is prone to di-
vergence, or high training losses when it does manage to converge, while tuning exits of the Norm ar-
chitecture can effectively avoid the problem. Hence, we no longer consider Embedding in our remaining
experiments. Another observation is that in terms of early-exit training losses at the end of tuning, one has
Norm > MLP > Layer. This confirms that having more trainable parameters in the early-exit layers leads
to lower training losses. Finally, for each architecture, training losses of early exits at deeper layers are
consistently lower than those at earlier layers. This is unsurprising, since hidden states at deeper layers of
a neural network are expected to, on average, contain richer and more usable information about the input,
which facilitates better prediction.

Inference quality and speedup. Figure 5 illustrates the downstream performance of early-exit models
learned by EE-Tuning with the aforementioned configurations. For each model, we activate only three early
exits at the 1/4, 2/4, and 3/4 depth, which can be easily done thanks to the plug-and-play feature of our
implementation. On the tasks that we consider, the MLP model achieves the best overall performance in terms

Cross Box Manual R 10
Document16 pages
Cross Box Manual R 10
Zal
No ratings yet
1lable Solution For Tuning Early-Exit Large Language Models2402.00518
Document1 page
1lable Solution For Tuning Early-Exit Large Language Models2402.00518
鄭惠文
No ratings yet
9 - PDFsam - R Tuning Early-Exit Large Language Models2402.00518
Document1 page
9 - PDFsam - R Tuning Early-Exit Large Language Models2402.00518
鄭惠文
No ratings yet
1ly-Exit Large Language Models2402.00518
Document1 page
1ly-Exit Large Language Models2402.00518
鄭惠文
No ratings yet
Homework 5
Document6 pages
Homework 5
euler96
No ratings yet
A Novel Visual Perceptual Model With An Application To Hi-Fidelity Image Annotation
Document6 pages
A Novel Visual Perceptual Model With An Application To Hi-Fidelity Image Annotation
vijay b
No ratings yet
Document 3 - Stationnarity of DLCAC
Document2 pages
Document 3 - Stationnarity of DLCAC
jovana.mlnv
No ratings yet
B) " Data - Frame".: Outliers and Missing Values (8 Marks) Answer
Document23 pages
B) " Data - Frame".: Outliers and Missing Values (8 Marks) Answer
Raveendra Babu Gaddam
100% (1)
14 - PDFsamEE-Tuning An Economical Yet Ge Language Models2402.00518
Document1 page
14 - PDFsamEE-Tuning An Economical Yet Ge Language Models2402.00518
鄭惠文
No ratings yet
1lable Solution For Tuning Early-Exit Large Language Models2402.00518
Document1 page
1lable Solution For Tuning Early-Exit Large Language Models2402.00518
鄭惠文
No ratings yet
M1 Exercises e 1
Document36 pages
M1 Exercises e 1
aideeb
No ratings yet
59710
Document1 page
59710
nguyenanhquocdhdi
No ratings yet
AHO-OFDM Dimming PDF
Document1 page
AHO-OFDM Dimming PDF
Rashed Islam
No ratings yet
CSL01
Document7 pages
CSL01
Mahnoor Khan
No ratings yet
Validation of Fps Wizard Horizontal Lifeline Calculator
Document1 page
Validation of Fps Wizard Horizontal Lifeline Calculator
perundingtsteohgmailcom
No ratings yet
59719
Document1 page
59719
nguyenanhquocdhdi
No ratings yet
Conference Template A4
Document10 pages
Conference Template A4
Dũng Phan
No ratings yet
Control of Mobile Robots - Week 2
Document10 pages
Control of Mobile Robots - Week 2
Mridul Upadhyay
No ratings yet
Raystar Optronics, Inc.: RX12864A1 Graphic 128x64 Dots
Document1 page
Raystar Optronics, Inc.: RX12864A1 Graphic 128x64 Dots
DVT
No ratings yet
Sim.I.am: A Robot Simulator: Coursera: Control of Mobile Robots
Document14 pages
Sim.I.am: A Robot Simulator: Coursera: Control of Mobile Robots
Mridul Upadhyay
No ratings yet
Sunon
Document4 pages
Sunon
Emerson Müller Juarez Avila
No ratings yet
256Mb E-Die SDRAM Specification: Revision 1.5 May 2004
Document14 pages
256Mb E-Die SDRAM Specification: Revision 1.5 May 2004
Cường Mchw
No ratings yet
Deep Coverage Program - : Band + Sow
Document4 pages
Deep Coverage Program - : Band + Sow
Taufik Andrian
No ratings yet
Kaqy214 Opto
Document12 pages
Kaqy214 Opto
KABOOPATHI
No ratings yet
FC4 Standard Units Pressure Curves
Document12 pages
FC4 Standard Units Pressure Curves
Tiago Karina Elisa
No ratings yet
EEE309 CT3 - January2016
Document2 pages
EEE309 CT3 - January2016
Engr. Shahidul Islam
No ratings yet
HDM24216H-2: Dimensional Drawing
Document2 pages
HDM24216H-2: Dimensional Drawing
Gustavo Niel
No ratings yet
CSL0406WBCW Series: Data Sheet
Document10 pages
CSL0406WBCW Series: Data Sheet
pablo
No ratings yet
A General Approach To Sensitivity Analysis
Document37 pages
A General Approach To Sensitivity Analysis
jimmymorelos
No ratings yet
HDM 64GS24 - 2: Dimensional Drawing
Document1 page
HDM 64GS24 - 2: Dimensional Drawing
Ailton Sorlag
No ratings yet
Supplementary Material
Document1 page
Supplementary Material
Simón Oya Díez
No ratings yet
High Air Flow Series e 2012
Document22 pages
High Air Flow Series e 2012
John Renz Caling Retiro
No ratings yet
Portfolio Management Problem
Document12 pages
Portfolio Management Problem
anisasheikh83
No ratings yet
Catalogue DL55 North America
Document56 pages
Catalogue DL55 North America
Juan Ernesto Perez
No ratings yet
Reflection Factor
Document3 pages
Reflection Factor
MUHAMMAD SADIQ SYAMIR BIN HASNAN Moe
No ratings yet
DBV Dimension
Document4 pages
DBV Dimension
joeywun
No ratings yet
Design of Source Degenerated Cascode Dual Function
Document25 pages
Design of Source Degenerated Cascode Dual Function
maia.df11
No ratings yet
KMOC302X Series: Cosmo
Document10 pages
KMOC302X Series: Cosmo
mejmak
No ratings yet
پارتیکل سایز
Document1 page
پارتیکل سایز
saeid.mehregan.sh
No ratings yet
LMC Lem 2x2
Document2 pages
LMC Lem 2x2
mohi
No ratings yet
Voltage Mode Multiplying DAC Reference Design
Document16 pages
Voltage Mode Multiplying DAC Reference Design
Mouloud Ibelaiden
No ratings yet
ระยะห่างแถว
Document1 page
ระยะห่างแถว
Warajak N.
No ratings yet
32-06 Lead in - Lead Out PDF
Document1 page
32-06 Lead in - Lead Out PDF
David Bumbalough
No ratings yet
The Tolerance Unless Classified 0.3mm: Outline Dimension & Block Diagram
Document1 page
The Tolerance Unless Classified 0.3mm: Outline Dimension & Block Diagram
adil
No ratings yet
Principles of Chemistry A Molecular Approach 2nd Edition Tro Solutions Manual 1
Document36 pages
Principles of Chemistry A Molecular Approach 2nd Edition Tro Solutions Manual 1
tammycontrerasgdaxwypisr
100% (33)
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26
Document14 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26
Indra Kelana Jaya
No ratings yet
ARCH GARCH Models
Document8 pages
ARCH GARCH Models
nicolzee
No ratings yet
Eaton Em2507 Euromag Terminal Blocks Cage Clamp Da 1608398
Document2 pages
Eaton Em2507 Euromag Terminal Blocks Cage Clamp Da 1608398
Dinu Racautanu Miron
No ratings yet
Raystar Optronics, Inc.: RC2002A Character 20x2
Document1 page
Raystar Optronics, Inc.: RC2002A Character 20x2
Zeyad Ayman
No ratings yet
Dalettes Coffrage Adeoti Sa: PK 0+000 - PK 0+030 2
Document2 pages
Dalettes Coffrage Adeoti Sa: PK 0+000 - PK 0+030 2
aziz gbedourorou
No ratings yet
Bio-Rad CDM System Patient Report Medcis Pathlab Variant Turbo V2Turbo - A1C - 2.0
Document1 page
Bio-Rad CDM System Patient Report Medcis Pathlab Variant Turbo V2Turbo - A1C - 2.0
C
No ratings yet
The Tolerance Unless Classified 0.3mm: Outline Dimension & Block Diagram
Document2 pages
The Tolerance Unless Classified 0.3mm: Outline Dimension & Block Diagram
henriquegonfer
No ratings yet
VLT Series 2000: Specification Chart Installation
Document45 pages
VLT Series 2000: Specification Chart Installation
Erwin Susanto
No ratings yet
Adapalene and BPO Gel 0.1% and 2.5% (RLD Vs TEST)
Document9 pages
Adapalene and BPO Gel 0.1% and 2.5% (RLD Vs TEST)
Rahul Mayee
No ratings yet
ANN For Fitting Applications (Data Modeling)
Document47 pages
ANN For Fitting Applications (Data Modeling)
ahmed
No ratings yet
CPT Static Liquefaction Analysis Plots
Document1 page
CPT Static Liquefaction Analysis Plots
myplaxis
No ratings yet
Preliminary Footing Design Template: Jsdesign Engineering
Document2 pages
Preliminary Footing Design Template: Jsdesign Engineering
Johnny
No ratings yet
P1 (4.5) Material Change Sheetpile1
Document1 page
P1 (4.5) Material Change Sheetpile1
Utari Tri Noviati
No ratings yet
LTE FDD MIMO Technology: ZTE University
Document33 pages
LTE FDD MIMO Technology: ZTE University
jedossous
No ratings yet
Engineering and Scientific Computations Using MATLAB
From Everand
Engineering and Scientific Computations Using MATLAB
Sergey E. Lyshevski
No ratings yet
Julia Raymond Lorenz (2000) PDF
Document79 pages
Julia Raymond Lorenz (2000) PDF
Dimas Dcn
No ratings yet
Mastery and Achievement Levels in Empowerment Technologies of Grade 11 Stem Learners Taught Under Blended Learning Modality
Document33 pages
Mastery and Achievement Levels in Empowerment Technologies of Grade 11 Stem Learners Taught Under Blended Learning Modality
Gerome Rogel
No ratings yet
Amazing Grace Diagram Lesson Plan 2 6 17
Document2 pages
Amazing Grace Diagram Lesson Plan 2 6 17
api-310967404
No ratings yet
Andrea Casper: 5 Reasons Your School Needs Flexible Learning Spaces
Document3 pages
Andrea Casper: 5 Reasons Your School Needs Flexible Learning Spaces
Frank Elijorde
No ratings yet
Joseph Strawder Resume - 2019 Updatedraft - As Comments 4-19-2019
Document1 page
Joseph Strawder Resume - 2019 Updatedraft - As Comments 4-19-2019
api-580838834
No ratings yet
Technology Observation Form
Document2 pages
Technology Observation Form
api-190174138
100% (1)
Drs. Martinus Gea, M.Si 2019
Document53 pages
Drs. Martinus Gea, M.Si 2019
Rival Imanuel
No ratings yet
The Challenges of Music Teaching and Learning in Primary Schools in Offinso South Municipality
Document5 pages
The Challenges of Music Teaching and Learning in Primary Schools in Offinso South Municipality
Hannah Elizabeth Garrido
No ratings yet
Jean Piaget
Document2 pages
Jean Piaget
sharmila
No ratings yet
Parameters Strongly Agree Agree Disagree Strongl y Disagree: Teacher-Student Relationship Questionnaire
Document3 pages
Parameters Strongly Agree Agree Disagree Strongl y Disagree: Teacher-Student Relationship Questionnaire
Joyce Voon Ern Cze
100% (1)
Kufa Solomon Final Project
Document62 pages
Kufa Solomon Final Project
Emmanuel Yakubu
No ratings yet
Contextualized Lesson Plan in Biology
Document1 page
Contextualized Lesson Plan in Biology
chorie navarrete
No ratings yet
Pengaruh Gaya Belajar David Kolb (Diverger, Assimilator, Materi Pencemaran Lingkungan
Document8 pages
Pengaruh Gaya Belajar David Kolb (Diverger, Assimilator, Materi Pencemaran Lingkungan
Sandy zaid
No ratings yet
New SAT Reading Practice Tests: New SAT Writing and Language Practice Tests: New SAT Math Practice Tests
Document16 pages
New SAT Reading Practice Tests: New SAT Writing and Language Practice Tests: New SAT Math Practice Tests
Paul
No ratings yet
Accomplishment Report On Reading Program (Phil-Iri) : Rationale
Document3 pages
Accomplishment Report On Reading Program (Phil-Iri) : Rationale
Henio Nanong
No ratings yet
Reflection Paper PDF
Document3 pages
Reflection Paper PDF
ryanadamgray
No ratings yet
Japanese Management Practise
Document17 pages
Japanese Management Practise
Laura
No ratings yet
School Memo No.4s.2024 Inset 2024
Document16 pages
School Memo No.4s.2024 Inset 2024
Christopher Olaya
No ratings yet
A New Leadership Curriculum: The Multiplication of Intelligence
Document4 pages
A New Leadership Curriculum: The Multiplication of Intelligence
Gisele Phalo
No ratings yet
Catch-Up Friday Improving The Reading Proficiency
Document13 pages
Catch-Up Friday Improving The Reading Proficiency
Alyssa Dutollo
No ratings yet
Amber Lees Resume
Document1 page
Amber Lees Resume
api-460543617
No ratings yet
Action Research For Classroom Teachers
Document4 pages
Action Research For Classroom Teachers
Joshua Faraday
No ratings yet
Cambridge Primary Science Book 3
Document13 pages
Cambridge Primary Science Book 3
Alewasi Walaa
No ratings yet
Mb1 Homework Help
Document8 pages
Mb1 Homework Help
zsbkvilpd
100% (1)
Student Mentoring Form
Document3 pages
Student Mentoring Form
bmdbmdbmd
No ratings yet
Resume Sample
Document1 page
Resume Sample
MARICEL M. ROLDAN
No ratings yet
Kinds/classification of Research
Document11 pages
Kinds/classification of Research
erwin recto
No ratings yet
FUTURE SCHOOL: A Blueprint For How The Education System of The Future Should Function and Why?
Document142 pages
FUTURE SCHOOL: A Blueprint For How The Education System of The Future Should Function and Why?
Cassie Janisch
100% (1)
Weekly Home Learning Plan: Contemporary Philippine Arts From The Regions
Document5 pages
Weekly Home Learning Plan: Contemporary Philippine Arts From The Regions
Jealf Zenia Castro
No ratings yet
Track 2 Contract Revised Signed JG Ss
Document4 pages
Track 2 Contract Revised Signed JG Ss
sstone1
No ratings yet