B All Experimental Results: Cobbe Et Al. 2021 1

Uploaded by

Alezz Stone Franco

0% found this document useful (0 votes)

1 views1 page

Original Title

CoT

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

1 views1 page

B All Experimental Results: Cobbe Et Al. 2021 1

Uploaded by

Alezz Stone Franco

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 1

Search inside document

B All Experimental Results

This section contains tables for experimental results for varying models and model sizes, on all
benchmarks, for standard prompting vs. chain-of-thought prompting.
For the arithmetic reasoning benchmarks, some chains of thought (along with the equations produced)
were correct, except the model performed an arithmetic operation incorrectly. A similar observation
was made in Cobbe et al. (2021). Hence, we can further add a Python program as an external
calculator (using the Python eval function) to all the equations in the generated chain of thought.
When there are multiple equations in a chain of thought, we propagate the external calculator results
from one equation to the following equations via string matching. As shown in Table 1, we see that
adding a calculator significantly boosts performance of chain-of-thought prompting on most tasks.

Table 1: Chain of thought prompting outperforms standard prompting for various large language
models on five arithmetic reasoning benchmarks. All metrics are accuracy (%). Ext. calc.: post-hoc
external calculator for arithmetic computations only. Prior best numbers are from the following. a:
Cobbe et al. (2021). b & e: Pi et al. (2022), c: Lan et al. (2021), d: Pi˛ekos et al. (2021).
Prompting GSM8K SVAMP ASDiv AQuA MAWPS
Prior best N/A (finetuning) 55a 57.4b 75.3c 37.9d 88.4e
UL2 20B Standard 4.1 10.1 16.0 20.5 16.6
Chain of thought 4.4 (+0.3) 12.5 (+2.4) 16.9 (+0.9) 23.6 (+3.1) 19.1 (+2.5)
+ ext. calc 6.9 28.3 34.3 23.6 42.7

LaMDA 137B Standard 6.5 29.5 40.1 25.5 43.2

Chain of thought 14.3 (+7.8) 37.5 (+8.0) 46.6 (+6.5) 20.6 (-4.9) 57.9 (+14.7)
+ ext. calc 17.8 42.1 53.4 20.6 69.3

GPT-3 175B Standard 15.6 65.7 70.3 24.8 72.7

(text-davinci-002) Chain of thought 46.9 (+31.3) 68.9 (+3.2) 71.3 (+1.0) 35.8 (+11.0) 87.1 (+14.4)
+ ext. calc 49.6 70.3 71.1 35.8 87.5

Codex Standard 19.7 69.9 74.0 29.5 78.7

(code-davinci-002) Chain of thought 63.1 (+43.4) 76.4 (+6.5) 80.4 (+6.4) 45.3 (+15.8) 92.6 (+13.9)
+ ext. calc 65.4 77.0 80.0 45.3 93.3

PaLM 540B Standard 17.9 69.4 72.1 25.2 79.2

Chain of thought 56.9 (+39.0) 79.0 (+9.6) 73.9 (+1.8) 35.8 (+10.6) 93.3 (+14.2)
+ ext. calc 58.6 79.8 72.6 35.8 93.5

Solana ZK Part1
Document16 pages
Solana ZK Part1
Jackie Singh
No ratings yet
Campbell DiscComp 17e PPT Mod08
Document59 pages
Campbell DiscComp 17e PPT Mod08
aimanfakhrul79
No ratings yet
Solutions Manual For Stochastic Modeling: Analysis and Simulation
Document130 pages
Solutions Manual For Stochastic Modeling: Analysis and Simulation
haidar
No ratings yet
Statistics Tutorial Solution 2020
Document8 pages
Statistics Tutorial Solution 2020
Sospeter Enock
No ratings yet
Capture The Flag Kioptrix Server
Document109 pages
Capture The Flag Kioptrix Server
Sara Moreno Guits
No ratings yet
RSM Case Study
Document12 pages
RSM Case Study
Vbaluyo
No ratings yet
Akash High Scale Benchmarks2
Document11 pages
Akash High Scale Benchmarks2
akashmavle
No ratings yet
ACT 4&5
Document3 pages
ACT 4&5
paduajeanu
No ratings yet
I4-GCI(A0-7-ALL TP_Part33
Document2 pages
I4-GCI(A0-7-ALL TP_Part33
kheangnk
No ratings yet
Statistics Practice Exercises 1
Document2 pages
Statistics Practice Exercises 1
Jackie Enriquez
No ratings yet
N R L M e M Sin55 M Cos55
Document3 pages
N R L M e M Sin55 M Cos55
Lucian Ciprian
No ratings yet
CST 445 M5
Document71 pages
CST 445 M5
shyam krishnan s
No ratings yet
CCRL Complete Engines List
Document13 pages
CCRL Complete Engines List
Bhagirathi Sahu
No ratings yet
Lecture7appendixa 12thsep2009
Document31 pages
Lecture7appendixa 12thsep2009
Richa Shekhar
No ratings yet
VFAST Trans On Edu and Soc Sciences
Document4 pages
VFAST Trans On Edu and Soc Sciences
call
No ratings yet
Adding Decimals Workbook, Level 5
Document102 pages
Adding Decimals Workbook, Level 5
Tinh Le
No ratings yet
Lab Wk1soln PDF
Document14 pages
Lab Wk1soln PDF
Peper12345
No ratings yet
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
Document3 pages
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
Abhishek Dubey
No ratings yet
Reasoning Question Bank
Document114 pages
Reasoning Question Bank
rashmi netam
100% (1)
ROBLES - Module 4 Question
Document3 pages
ROBLES - Module 4 Question
Clint Robles
No ratings yet
Introduction To Machine Learning
Document48 pages
Introduction To Machine Learning
Kylle
No ratings yet
Task 3.6 Calculate The Mean, Standard Deviation and Draw The Boxplots of The Following Frequency Distribution?
Document2 pages
Task 3.6 Calculate The Mean, Standard Deviation and Draw The Boxplots of The Following Frequency Distribution?
Madushan Jayathilake
No ratings yet
Forecast Study Problems
Document5 pages
Forecast Study Problems
emirdurmaz200131
No ratings yet
USL - 21070126112 - Colaboratory
Document3 pages
USL - 21070126112 - Colaboratory
Vihan Chorada
No ratings yet
Greater Than Decimals
Document2 pages
Greater Than Decimals
pauline
No ratings yet
Latihan Andata
Document12 pages
Latihan Andata
PSKM STIKES HI JAMBI
No ratings yet
Adding/Subtracting Decimals (A)
Document30 pages
Adding/Subtracting Decimals (A)
Ron Man Bautista
No ratings yet
Cost & Benefit Analysis: Executive Overview
Document68 pages
Cost & Benefit Analysis: Executive Overview
Vimal Aanjna
No ratings yet
Answer Adding Number Bentuk Lazim
Document12 pages
Answer Adding Number Bentuk Lazim
Siti F Noor Hatta
No ratings yet
Worksheet Decimals
Document2 pages
Worksheet Decimals
Zahra Nur Alifah
No ratings yet
Grade 4 Order Decimals b
Document2 pages
Grade 4 Order Decimals b
Irfan Zafar
No ratings yet
Gv-241 User Manual
Document23 pages
Gv-241 User Manual
Jim C
No ratings yet
Maths Workbook 8 Solutions
Document57 pages
Maths Workbook 8 Solutions
Hassan Alsharqawi
100% (1)
patil_ml
Document9 pages
patil_ml
Sarthak Nande
No ratings yet
Frekuensi Min-Max
Document7 pages
Frekuensi Min-Max
Yudha Rental's
No ratings yet
Tutorial Chapter 3 and 4
Document2 pages
Tutorial Chapter 3 and 4
Nadia
No ratings yet
SCBS Investment Recommendations: Underperform Underperform
Document4 pages
SCBS Investment Recommendations: Underperform Underperform
Happybaby
No ratings yet
Subtracting Decimals Workbook Level 4
Document102 pages
Subtracting Decimals Workbook Level 4
Tinh Le
No ratings yet
Excel Part 6 Tubes SBTG Yofin
Document27 pages
Excel Part 6 Tubes SBTG Yofin
Anonymous V7x3Pe0Yk
No ratings yet
Yi X X Yi X Y: Grafico de Dispercion
Document5 pages
Yi X X Yi X Y: Grafico de Dispercion
jlb1212
No ratings yet
Three Addends (2 Digit) : 1) Solve Each of The Problems
Document30 pages
Three Addends (2 Digit) : 1) Solve Each of The Problems
dee.aira2955
100% (1)
Solutions RegressionTutorial
Document51 pages
Solutions RegressionTutorial
ARBIN RAJ
No ratings yet
How To Download Data Structures and Algorithm Analysis in C 4Th Edition Ebook PDF Ebook PDF Docx Kindle Full Chapter
Document36 pages
How To Download Data Structures and Algorithm Analysis in C 4Th Edition Ebook PDF Ebook PDF Docx Kindle Full Chapter
doug.wiggins940
100% (28)
Octane & RVP Calc Sheet
Document8 pages
Octane & RVP Calc Sheet
Asif
No ratings yet
Statistika 1B
Document4 pages
Statistika 1B
rahardian daniar
No ratings yet
Fmsa315 - s11 - A.ipynb - Fernández - Constanza
Document6 pages
Fmsa315 - s11 - A.ipynb - Fernández - Constanza
ScribdTranslations
No ratings yet
Supporting Information For: Characterization of Crude Oil by Real-Component Surrogates
Document15 pages
Supporting Information For: Characterization of Crude Oil by Real-Component Surrogates
Harman
No ratings yet
Confidential: SESSION 2018/2019
Document26 pages
Confidential: SESSION 2018/2019
MOHAMMED A.M. ABUJARAD 19EE0049
No ratings yet
CIT 2101 - Variation
Document24 pages
CIT 2101 - Variation
Kristian Villabroza
No ratings yet
Results 1
Document4 pages
Results 1
Đặng Thu Trúc
No ratings yet
The Schematics of The Experimental Set Up Is Given Below
Document19 pages
The Schematics of The Experimental Set Up Is Given Below
Sohini Roy
No ratings yet
Team 3 Budget 17 August 2016
Document17 pages
Team 3 Budget 17 August 2016
yiyito04
No ratings yet
Práctica de Maraton - Ipynb - Colaboratory - Alejandro
Document8 pages
Práctica de Maraton - Ipynb - Colaboratory - Alejandro
Alejandro Girón
No ratings yet
Statistical Data Analysis - Ipynb - Colaboratory
Document6 pages
Statistical Data Analysis - Ipynb - Colaboratory
Varad Kulkarni
No ratings yet
Greater Decimals
Document2 pages
Greater Decimals
Calm Seek
No ratings yet
Agricolae
Document89 pages
Agricolae
Iwan
No ratings yet
Function Machines - Filling in Missing Digit
Document3 pages
Function Machines - Filling in Missing Digit
Patrice Blair
No ratings yet
Agricolae Tutorial (Version 1.3-5) : F Elipe de M Endiburu 2021-06-05
Document88 pages
Agricolae Tutorial (Version 1.3-5) : F Elipe de M Endiburu 2021-06-05
JAVIER DE LA HOZ
No ratings yet
Addsubmultdivid 2
Document1 page
Addsubmultdivid 2
Nevgen
No ratings yet
Introsort Screen
Document10 pages
Introsort Screen
vivek upadhyay
0% (1)
Dwnload Full Production and Operations Analysis 6th Edition Nahmias Solutions Manual PDF
Document35 pages
Dwnload Full Production and Operations Analysis 6th Edition Nahmias Solutions Manual PDF
saabatmandearnestus
100% (11)
RAMP Cardiac Controls (C5003) Stability 04feb2020 PDF
Document4 pages
RAMP Cardiac Controls (C5003) Stability 04feb2020 PDF
erick
No ratings yet
Computing for Numerical Methods Using Visual C++
From Everand
Computing for Numerical Methods Using Visual C++
Shaharuddin Salleh
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/12
Document10 pages
Cambridge International AS & A Level: Computer Science 9618/12
C Java
No ratings yet
Semester: 7-CBCS 2017 Date: 16 Jan 2021 Subject: CRYPTOGRAPHY (17EC744) Time: 02:30 PM - 04:30 PM Faculty: DR Jagadeesh Chandra A P Max Marks: 30
Document1 page
Semester: 7-CBCS 2017 Date: 16 Jan 2021 Subject: CRYPTOGRAPHY (17EC744) Time: 02:30 PM - 04:30 PM Faculty: DR Jagadeesh Chandra A P Max Marks: 30
Vari
No ratings yet
RTX 4000 Ada Datasheet Web Nvidia 2788511
Document2 pages
RTX 4000 Ada Datasheet Web Nvidia 2788511
Yassin 1P
No ratings yet
RAM Vs ROM
Document5 pages
RAM Vs ROM
Shubh Satvik
No ratings yet
Secj3323-202120221 Ci
Document7 pages
Secj3323-202120221 Ci
Mariam
No ratings yet
REDCap Performance Suggestions
Document2 pages
REDCap Performance Suggestions
Alyer Espinoza
No ratings yet
2 7Ghz Bidirectional I C Bus Controlled Synthesiser: Advance Information
Document12 pages
2 7Ghz Bidirectional I C Bus Controlled Synthesiser: Advance Information
NuKamila World
No ratings yet
Java Collection Framework
Document21 pages
Java Collection Framework
vinay Murakambattu
No ratings yet
From Mainframes To Microprocessors
Document8 pages
From Mainframes To Microprocessors
Asif Memon
No ratings yet
Resume Gourav Kumar
Document2 pages
Resume Gourav Kumar
Grocery Pvt ltd
No ratings yet
Python Vs Java Comparison Python Java
Document23 pages
Python Vs Java Comparison Python Java
Ramesh k
No ratings yet
Black Spot Identification: Sensitivity: Internal
Document8 pages
Black Spot Identification: Sensitivity: Internal
Shain Salim
No ratings yet
Microconrtoller
Document12 pages
Microconrtoller
Nahom
No ratings yet
ZLAN7104 WiFi User Manual
Document42 pages
ZLAN7104 WiFi User Manual
yury
No ratings yet
Template Emop
Document46 pages
Template Emop
Mauricio Fierro
No ratings yet
@ckiptv 100hits
Document67 pages
@ckiptv 100hits
Abeltito Salga
No ratings yet
Wheel Tracker Large Device
Document3 pages
Wheel Tracker Large Device
Rabnawaz Imam
No ratings yet
Chap 1
Document98 pages
Chap 1
Fahmi Olayah
No ratings yet
PCC-450 Reference Manual Ver1 13rev003
Document24 pages
PCC-450 Reference Manual Ver1 13rev003
omjetty
No ratings yet
It8511te Ite
Document333 pages
It8511te Ite
ZBrasil Informática
No ratings yet
Stop and Wait Arq
Document22 pages
Stop and Wait Arq
Atul Kumar
No ratings yet
Microprocessor and Peripherals Interfacing Notes: Course Code: ECC501 Class: TE-EXTC Mumbai University
Document10 pages
Microprocessor and Peripherals Interfacing Notes: Course Code: ECC501 Class: TE-EXTC Mumbai University
Anikhet Mulky
No ratings yet
Hands-On Lab Analog Sensors, Signal and Data Output: 1 Aims of This Exercise
Document9 pages
Hands-On Lab Analog Sensors, Signal and Data Output: 1 Aims of This Exercise
蕾蕾
No ratings yet
Cisco Note
Document43 pages
Cisco Note
Ethiopian Code
No ratings yet
CPP STL Cheat Sheet
Document19 pages
CPP STL Cheat Sheet
shubham singh
No ratings yet
ACN MICRO PROJECTfinal
Document23 pages
ACN MICRO PROJECTfinal
1213 Vaibhav Kothare
No ratings yet
How To Start Steam in Offline Mode
Document1 page
How To Start Steam in Offline Mode
steam steam
No ratings yet