Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 24

and I think I was fortunate to realize fairly early on as I was looking into

what has been done in AI that learning is something that's both really

important for intelligence and something that we had no idea how to do at all and

so when my family moved to Canada I remember the first thing I did was to go

to the Toronto Public Library and try to find a book on machine

learning that's great how old were you I was 16 16 yeah

and then when I went to the University of Toronto and I sought out machine

learning professors I found Jeff Hinton I discovered neural networks and neural

networks felt like felt like the right thing because it's a very different way

of writing code normally you write code and you can kind of think it through and

understand yep whereas a neural network it's this you write an equation a

complicated equation inside a loop and then you run the loop and I would luck

figuring out what it does precisely and that connects to neural Nets not being

interpretable but it could also argue that the difficulty of understanding

what neural networks do is not a bug but it's

feature like we want to build intelligence intelligence is not simple

to understand we can't explain how we do the cognitive functions that we do how

we see how we hear how we understand language so therefore if we can get if

computers can produce objects that are similarly difficult to understand not

impossible but similarly difficult it means we're on the right track and so

all those things helped me uh Converge on mural networks fairly early on yeah

what year was it when you sort of like remember initially getting excited about

neural networks and being pretty convicted like early 2000s I started

working at Jeff Hinton in 2003 yeah so quite a while ago now long before I mean

obviously the craze kind of started around 20 and so there was a there was a

good like I think this is a common theme whenever you look at sort of any uh

anybody who works like in in any sort of field that becomes very big but there's

a long stretch of like you know wandering in the desert maybe is one way

to put it yeah I mean definitely lots of perseverance is required because you

don't know how long the how long you want to stay in the desert you just got
to endure yeah and that's very helpful and did you expect like I mean obviously

today neural networks do some pretty incredible things like did did you

expect back in 2003 or early 2000s that like in your lifetime you would see sort

of the the things that we're seeing now with AI machine

learning I was hoping but I did not expect it back

then the field of AI was on the wrong track it was in a mindset of rejection

of neural networks right and the reason for that is that neural networks are

difficult to reason about math atically while other stuff you can prove theorems

about and there's something very seductive and dangerous about proving

theorems about things y because it's a way to to Showcase your skill but it's

not necessarily aligned with what makes the most progress in the field but I

think that neural networks are as successful as they are precisely because

they're difficult to reason about mathematically and so anyway my earlier

hope was to Simply to convince the field that they should work neural networks

rather than the other stuff that they were doing yeah but then when computers

started to get fast then my my level of excitement and about their potential has

increased as well yeah and so what what are your aspirations today like what do

you what is the like in your lifetime what's the thing you I mean I think it's

obvious from the uh open ey Mission set but yeah I mean exactly right so now now

thep now the hopes are much larger now I think we can really try to build not

only really powerful and useful AI but actually AGI make it useful make it

beneficial use it to solve and make it so that it will be used to solve a large

number of problems and create lots of amazing applications that's that's what

I'd like that's what I hope to see happen yep um and then you know

obviously along the way you had been doing a lot of uh a lot of This research

and doing a lot of groundbreaking work at Google and then you sort of left and

started open your eye with Sam mman and and Greg Brockman and a bunch of others

um what was kind of the what were your kind of goals with starting opening eye

at the outset what was what was s like the initial um conception or the initial
vision and um and and what did you hope to accomplish by by starting sort of a

new lab there were multiple motivations on my end for starting open a so the

first motivation was that I felt that the way to make the most progress in AI

was by merging science and engineering into a single hole into a unit to make

it so that there is no distinction or as little distinction as possible between

science and engineering so that all the science is in is infused with

engineering discipline and careful execution and all the engineering is

infused with the scientific ideas and the reason for that is because the field

is becoming mature and so it is hard to just do

small scale tinkering without having a lot of engineering skill and effort to

really make something work so that was one motivation I really wanted to have a

comp a company that will be operating on this principle another motivation was

that I came to see AI technology in a more sober way I

used to think that AI will just be this endless good and now I see it in a more

complex way where I think there will be a lot of truly incredible inconceivable

applications that will improve our lives in dramatic ways right but I also think

that there will be challenges I think that there will be lots of problems that

will be posed by the misapplication of AI and by its peculiar properties that

may be difficult for people to understand and I wanted a company that

will be operating with this awareness in mind and that will be trying to address

those challenges by you know as best as possible by not only working on

advancing technology but also working on making it safe and also working on the

policy side of things as much as is rational and reasonable to make the

whole be as useful and as beneficial as possible totally and I think it's

something we agree on I mean I think one thing that is very obvious to me is that

AI is something that's going to you know which countries have access to AI

technology and the ways in which they use them are going to Define how the

world plays out over the course of the next few decades like it's just I think

that's how uh that's the path we're on as a as a world that's right among many

other things right right and you know I and I this thing that you mentioned
around sort of bringing together the science and engineering I think it's

quite it's uh it's quite profound I think for a few reasons right because

like one is that um uh you know first of all I think a lot of the the best the

most incredible Innovative things happen often times from sort of blurring the

lines between disciplines like apple is one of the best examples where from the

very beginning they were always like Hey we're blending hardware and software and

that's that's our special sauce and obviously it's uh produced some

incredible things and I think a lot of other research Labs you know they

operate in very sort of scientists tell the engineers what to do mindset which

is counterproductive because you really need to understand both very well to

understand what the what kind of the limits of the technology are yeah that's

right and on that point you may even say isn't it obvious that the science and

the engineering should be together and on some level it is but it just so

happens that historically it hasn't been this way there's a certain kind of taste

like empirically it has been the case in the past less so now that people who

gravitate to research would have a certain taste that would also make them

less drawn to engineering and vice versa and I think now because people are also

seeing this reality on the ground that to do any kind of good science you need

the good engineering then people then you have more and more people who are

strong in both of these axes totally yeah and I think that you know uh

Switching gears a little bit to to kind of the GPT models I this is a great

illustration right because the GPT models are impossible without incredible

engineering like the the sort of um uh and and but yet they still required

novel research they still requireed novel science to be able to accomplish

and they they've obviously been some of the biggest breakthroughs in the field

of AI as of late and sort of um blown open many people's imaginations about

what AI can accomplish or at least increase people's confidence that AI can

accomplish incredible things you know I'm kind of curi ious about uh

originally when when at opening ey when you guys were you've been working on
these language models for some time what were the original sort of research

Inspirations behind it and what were the original sort of um things that led you

all to say hey this is something that's worth working on worth scaling up worth

continuing to double down on so there' have been

multiple lines of thinking that led us to converg in language

models there has been an idea that we believed in relatively early on that you

can somehow link understanding to prediction and specifically to

prediction of whatever data you give to the to the

model where the idea is well let's let's let's work out an example so before

before diving into the example I'll I'll start with the conclusion first the

conclusion is that if you can make really good guesses as to what's going

to come next you can't make it perfectly it's impossible but if you can make a

really good guess you need to have a meaningful degree of understanding you

know in the example of of a book suppose that you read a book and it's a mystery

novel and in the last chapter all the pieces are coming together and there is

a critical sentence and you start to read you read the first word and the

second word now you say okay the identity of some person is going to be

relieved and your mind is honing in on like it's either this person or that

person you don't know which one it is now maybe someone who read the book and

thought about it very carefully says you know I think it's probably this person

maybe that but probably this so what this example goes to show that really

good prediction is connected to understanding

and this kind of thinking has led us to experiment with all kinds of approaches

of hey can we predict things really well can we predict the next word can we

predict the next pixel and study their properties and through this line of work

we were able to get to we did some work before the gpts before Transformers

before the Transformers were invented and with um something that we

call the sentiment neuron which is a neural net which was trying to predict

the next word the next sorry the next character in reviews of Amazon

products and it was a small neural net because it was maybe four years ago but
it did prove the principle that if you predict the next character well enough

you will eventually start to discover the semantic properties of the text and

then with the gpts we took it further we said okay well we have the Transformer

it's a better architecture so we have a stronger effect and then later there

realization that if you make it larger it will be better so let's make it

larger and it will be better yeah I mean there's there's a lot of uh there's a

lot of great nuggets and what you just mentioned right I think first is the

Elegance of this concept which is like hey if you get really good at predicting

the next whatever ever get really good at prediction you get that obligates you

to be good at all these other things if if you're really good at that and it's

it's you know I think it's probably like under underrated how um that required

some some degree of vision because it's like early on you know you try to get

really good at predicting things and you know you got the sentiment NE on which

is cool but it's like that's like a it's like a a blip relative to what we

obviously have seen with the the large language models and so that I think

significant and I think the other significant piece is um where you just

mentioned which is kind of um scaling it up right and uh I think you know you

guys had had uh released this paper about this kind of like a scaling laws

of what you have found as you scaled up um compute data model size sort of in

concert with one another but I'm I'm kind of curious like what's the um

obviously there's there's some intuition was just like hey scaling things up um

is good and you see you see great behaviors what's kind of your intuition

behind um sort of if you think from now over the next few the next few years or

even the next few decades like what what is scaling up mean why is it likely to

continue resulting in in great results and and um what do you think the limits

are if any I think two two stat two statements are true at the same time on

the one hand it does look like our models are quite large can we keep

scaling them up even further can we keep finding more data for the scale up and I

want to spend a little bit of time on the data question because I think it's
not obvious at all yeah traditionally because of the roots of

the field of machine learning because of the roots of because the field has been

fundamentally academic and fundamentally concerned with discovering new methods

and less with the development of very big and powerful systems the mindset has

been someone builds someone creates a fixed

Benchmark so a data set of a certain of a of a certain of certain shape of

certain characteristics and then different people can compare their

methods on this data set but what it does is that it forces everyone to work

with a fixed data set yep the thing with the that the gpts have shown in

particular is that scaling requires that you increase the compute and the data

and tandem at the same time and if you do this then you keep getting better and

better results and in some domains like language there is quite a bit of data

valuable in other maybe more specialized subdomains the amount of data is a lot

smaller and that could be for example if you want to have an automated lawyer so

I think your big language model will know quite a bit about language and it

will be able to converse very intelligently about many topics but it

may perhaps not be as good at being a lawyer as we'd like it will be quite

formidable but will it be good enough so this is unknown because the amount of

data there is smaller but any time where data is abundant then it's possible to

apply the Deep the magic deep learning formula and to produce these

increasingly good and increasingly more powerful models and then in terms of

what are the limits of scaling so I think one thing

that's notable about the history of deep learning over the past 10 years is that

every year people said okay we had a good run but now we've hit the limits

and that happened year after year after year and so I think that I think that we

absolutely may hit the limits at some point but I also think that it would

be unwise to bet against deep learning yeah you know there's a number of things

I want to dig in here um because they're they're all pretty interesting one is is

this this um just I think this you you certainly have this mental model um that

I think is is is quite good which is kind of like hey uh Mo's law is this
incredible is this incredible accelerant for everything that we do right and the

more that there's More's law for everything you know More's law for

different inputs that go into the deep into the machine learning life cycle you

know we're just going to like push all these things to the Max and we're going

to see just incredible performance and I think is significant because as you

mentioned about this data point it's like hey if we if we get more efficient

at we get more efficient at compute which is something that's happening we

get more efficient at uh producing data or finding data or or generating data we

get more effic obviously there's more efficiency out of the algorithms you

know all these things are just going to keep enabling us to do the next

incredible thing and the next incredible thing the next incredible thing um so

first I guess like do like I we've talked about this a little bit before so

I know you agree with that but like how what do you think is

um where do you think like is there any flaw are there any flaws to that logic

what would you be worried about in terms of how everything will scale up over the

next few years I mean I think I think over the next few years I don't have too

much concern about Contin continued progress I think that we will we will

have faster computers we will find more data and we'll train better models I

think that is I don't see to I don't see particular risk there I think moving

forward we will need to start being more creative about okay so what do you do

when you don't have a lot of data can you somehow intelligently use the same

compute to compensate for that lack of data and I think those are the questions

that we and the the field will need to to Grapple with to continue our

yeah no and I think this point about data the other thing I wanted to touch

on because this is this is something obviously scale that we're we focus on

and I think that the large language models thankfully because you can

leverage the internet really like all the fact that like all this data has you

know existed and been accumulating for a while you can show some pretty

incredible things in all new domains you need efficient ways to to generate lots
of data right and I think that there's there's this whole question was like how

do you make it so that you know each ounce of of human effort that goes into

generating some data produces as much data as possible um and I think that

like something that we're passionate about that I think we talk a little bit

about like how do you get like a mors law for data right how do you get you

know more and more efficiency out of like a human effort in producing data

and that might require novel new um paradigms but uh is something that I

think is required for in this lawyer for example uh that you mentioned like we

have a pretty finite set of lawyers how do we get those lawyers to produce

enough data so you can create some great legal uh legal AI the choices that we

have is either improve our methods so that we can do more with the same data

or do the same with less data and the second is like you say somehow increase

the efficiency of the teachers yep and I think both will be

needed to make the most progress yeah well it's kind of like you know I I

really think mors law is instructive right like to get these chips performing

better people try all sorts of random crafts and then the end output is that

you have you have chips that have more transistors right and I think this like

if we think about as like do we have models that perform better with like

certain amounts of data or certain amounts of of teaching um how do we how

do we make that go up yeah I mean I think I'm I'm sure that there will be

ways to do that I mean for example if you ask the human teachers to help you

only in the hardest cases I think that will allow you to move faster uh I want

to switch gears to uh one of the offshoots of the large language model

efforts which is particularly exciting especially to me as an engineer probably

most uh people who spend a lot of time coding which is codex uh which

demonstrated some pretty incredible capabilities of going from uh sort of

natural language and to code and sort of being able to to uh interact with with a

program in in a very novel New Way um you know I'm kind of curious for you

what what excites you about this effort what do you think is the what do you

think are the reasonable expectations for what codex and codex like systems
will enable in the next few years what about far beyond that and and uh

ultimately why are you guys so excited about

it for for some context codex is pretty much a large GPT neural network let

train on code instead of training to predict the next word in text it's

trained to predict the next word in code the next I guess token in code and the

things that's cool about it is that it works at

all like I don't think it's self-evident to most people

that it would be possible to train a neural

net in such a way so that if you just give it some representation of text that

describes what you want and then the neural network will just process this

text and produce code and this code will be correct and and it will run and it's

exciting for a variety of reasons so first of all it is useful it is new it

shows that I'd say when it code has been a

domain that hasn't really been touched by AI too much even though it's

obviously very important and it touches on aspects where AI has been you know

today's AI deep learning has been perceived as weak which is reasoning and

carefully laying out plans

and not being fuzzy and so so it turns out that in

fact they can do a quite quite a good job here and like one analogy one

distinction between between codex and language models is that the the Codex

models the code models they allow you to they in effect they can control the

computer it's like they have the computer as an actuator and so that

makes them much more that it greatly expand it makes them much more useful

you can do so many more things with them and of course we want to make them

better still I think they can improve in lots of different ways those those are

just the preliminary code models I expect them to be quite useful to

programmers and especially in areas where you need to know random apis

because these neural networks they so one thing that I think is small

aggression the GPT neural networks they don't learn quite like
people a person will often have somewhat narrow knowledge in great depth while

these neural networks they want to know everything that exists and they really

try to do that so their knowledge is encyclopedic it's not as deep it's

pretty deep but not as deep as a person and so because of that these

neural networks in their in the way they work today they complement people with

their breads so you might say I want to do something with a library I don't

really know it could be some existing library or maybe the neural network had

read all you know the code of all my of all my colleagues and it knows it knows

what they've written and so I want to use some Library I don't know how to use

the network will have a pretty good guess of how to use it you'd still need

to make sure that what it said is correct because such is it level of

performance today you cannot trust it blindly especially if the code is

important for some domains where it's easy to undo anything that it writes any

code that it writes then I think you can trust it just fine but if you actually

want to have real code you want to check it but I expect that in the future those

models will continue to improve I expect that the neural network that the code

neural networks will keep getting better and I think the nature of the

programming of the programming profession will

change in response to these models I think that in a like in a sense it's a

it's a natural continuation

of how in in in the software engineering World we've been using

higher and higher level programming languages but first people wrote

assembly then they had foron then they had C now we have python now we have all

these amazing python libraries that's a layer on top of that and now we can be a

little bit more imprecise we can be a little bit more ambitious and the model

the neural network will do a lot of the work for us and I do think that I should

say I expect something similar to happen across the board in lots of other white

color professions as well you know there's if you think about the economic

impact of AI there's been an inversion I think I think there's been a lot of

thinking that uh maybe simple robotics tasks will be the first you know the
first ones to be hit by automation but instead if we are find that the creative

tasks counterintuitively they seem to be affected quite a bit if you look at the

generative neural networks in the way you generate images now now it's you can

find it on Twitter all kinds of stunning images being generated you know

generating cool text it's happening as well but the images are getting most of

the attention and then with things like code things like a lot of writing tasks

this is the uh White Collar tasks they are also being affected by these AIS and

I do expect that so Society will change as progress continues to make Society

will change and I think that it is important for economists and people who

think about these questions to pay attent pay careful attention to these

Trends so that as technology continues to improve there are good ideas in place

to like in effect to to be ready for this technology yeah there's a number of

like really again uh interesting nuggets in there I think one is that um I think

like the one of the big Ideas behind codex or codex like models right is that

you go from being able to go from human language to machine language right and

and you kind of mention like oh all of a sudden the machine is an actuator and if

you think about like I think many of us when we think about AI we think about

like the Star Trek computer you know you can just ask a computer and it'll do

things you know that's that this is this is a key enabling step right because if

if all of a sudden you can go from how we speak how humans speak to things that

a machine can understand then you like bridge this like key translation step so

I think that's super interesting you know another thing that this inversion

that you just mentioned about is is super interesting because I think that

you know one of the things that um my beliefs on this is like hey this is the

reason that some things have become much easier than others you know it's all a

product of availability of data right there's some areas where we've had there

just exists lots and lots of Digital Data that you can kind of suck up into

the algorithms and it can do quite well and then in things like robotic tasks or

setting a table or you know all these things that that are that we've had very
a lot of trouble um building machines to do you're like fundamentally limited by

amount of data you have first just by the amount of data that's been collected

so far but also like you know you can only have so much stuff happening in the

real world to collect that data I'm curious how like it how do you think

about that or do you think it's actually something intrinsic to the sort of like

creative tasks that is uh that is somehow more uh suited to current neural

networks I think it's both I think it is unquestionably true that with the so we

can take a step backwards at the base of all AI

progress that has happened at least in in all of deep learning and arguably

more is the ability of neural networks to generalize now generalization is a

technical term which means that you understand something

correctly or take the right action in a situation that's unlike any situations

that you've seen in the past in your experience and you can see and so now a

system generalizes better if from the same data it can do the right thing or

understand the right situation in a broader set of

situations and so to make an analogy suppose you have a student at a

university studying for an exam that student might say this is a very

important exam for me let me memorize this let me make sure I can solve every

single exercise in the textbook you know such a student will be very well

prepared and could achieve a very very high grade in the

exam now consider a different student who might say you know what I don't need

to to learn to to figure to know how to solve all the exercis in the textbooks

As Long As I Got the fundamentals right I read the first 20 pages and I feel I

got the fundamentals if if that if that second student also achieves a high

grade in the exam that second student did something harder than the first

student that second student exhibited a greater degree of generalization they

were able to even though the questions were the same the situation was less

familiar for the second student and the first student and so our neural networks

are a lot like the first students they they have an incredible ability to

generalize for a computer but we could we could do more


and because their generalization is not yet perfect definitely not yet at a

human level we need to compensate for it by training on very large amounts of

data that's where the data comes in the the better you generalize the less data

you need slash the further you can go with the same data so maybe once we find

figure out how to make a neural networks generalize a lot

better than all those small domains we don't have a lot of data it actually

won't matter the neur electric will say it's okay I know what to do well enough

even with this limited amount of data but today we need a lot of data but now

when it comes to the creative applications in particular there is some

way in which they are especially well suited for neural networks and that's

because generative models play a very Central role in machine learning and the

nature of the generations of generative models are somehow analogous to the

artistic process it's not perfect it doesn't capture everything and very and

there is certain kinds of art which our models cannot do yet but I think this

second connection the the generative aspect of Art and the ability of

generative models to generate new plausible data is another reason why art

has been we've seen so much progress in generative art so yeah I mean it's a

really interesting thing right because it's almost uh it's a it's a shade of

what you had mentioned at the very beginning which is that um part of the

reason that uh these that maybe we shied away from neural networks at the start

was that they're so hard to explain and that that aspect where we can't prove

theorems about them they do things that we can't quite explain maybe is what

naturally allows them to be better suited for Creative Pursuits which we

also can't explain very well yeah I think I think

that's that's definitely possible as well yeah so you know one thing I'm

really uh some of the other recent uh advancements from open AI were uh clip

and Dolly you know both super interesting uh examples of you know

being able to go from between modalities um from text to images um maybe I I

would love to understand from you what do you think is the kind of significance
of sort of what what is shown by clip and Dolly where do you think that

research goes over time and uh and what excites you about it yeah so for for

context so cliip and Del are neural networks

that learn to associate text with images so di

Associates text with images in the generative Direction and clip Associates

text with images but in the direction of perception from going from an image to

text versus going from text to image and both of them are cool because they are

simple it's the same old recipe you just say hey let's take a a neural a neural

net that we understand really well and just strain it on a big collection of

texts and image Pairs and see what happens and what happens is something

very good the the real motivation with clip and Del was just to dip our toes

into ways of combining the two modalities because one of the things

that we'll want in the future I think it's fairly likely that we want we

wouldn't want neural Nets that sorry we wouldn't want our future AIS to be

Texton AI like we could but seems like it's a missed opportunity I feel like so

much stuff is going on in the visual world and if it's not difficult to have

a neural net really understand the visual world and why not and then also

hopefully by connecting the textual world to the visual

World they will understand text better they will have a much the the the the

the the the understanding of text that they you learn by also being trained on

images make become a little bit closer to ours because you could make an

argument that maybe there is a distinction between what people learn

and what our artificial neural networks learn because people see and they walk

around they do all those different things whereas our neural networks you

know the text ones only train on text so maybe that means that something is

missing and maybe if you bring the training data to be more similar to that

people to to that of people then maybe we'll learn something more similar to

that of people as well so those were some of the motivations to study these

models and it was also fun to see that they worked quite well and especially

now with uh I would say most most recently clip clip has been enjoying
quite quite quite some degree of popularity and people have figured out

how to invert it to generate high resolution images and have a lot of fun

with it and actually that's I think that's the most for me emotionally satis

fine application that of the past maybe few months yeah no I think one of the um

uh yeah I think that it's an interesting point that you mentioned which is hey

the more that we can there's this concept of embodied AI right which is

like hey if you have an AI that actually like will go and experience things like

humans do maybe you get the ability to do uh to you get interesting behaviors

and the more we can go in that direction with stuff like multimodal learning is

uh is super interesting you know another thing I wanted to to touch on is um I

think I think you mentioned something quite profound which is hey it's a

simple it's a it's they're very simple the algorithm the use of the algorithm

is very simple and uh in this case like producing the data sets and getting the

data right is is from my perspective what really enabled a lot of the

incredible results I don't know how you think about that and how you think that

defines like future similar areas of research I'd say it's definitely a true

statement that the field of of deep learning the especially the academic

Branch not so much the applied Branch but the academic branch of the field has

underestimated the importance of data because of the mental framework of

the data is given to you in the form of a benchmark and your goal is to create a

better method that does better than the other existing methods and the reason it

was important for that framework to have a fixed data set is so that you could

compare which method is better and I think that really did lead to a blind

spot where very many researchers were working very hard on this pretty

difficult area of can we improve the model more and more and more while

living a very large improvements that are possible by simply saying hey let's

get much more data on the table I think now at this point the people appreciate

the importance of data a lot more and I think that at this point it's quite Pro

that domains with a lot of data will experience a lot of progress do you
think that um do you think that more of just conceptually do you think that over

the next few years more of the advancements more of the cool things

that we'll see in AI will come from uh sort of innovating more on the data side

or innovating more on the algorithm side I prefer to not make that

distinction I think making that distinction

is I mean it's useful for some things and maybe maybe maybe let let me roll

with that distinction I think both will be important I expect us to I I I

believe very firmly that very huge progress is possible from from

algorithmic from methodological improvements we are nowhere yet to being

as efficient as we can be with our compute we have a lot of compute we know

how to make use of it in some way which is already a huge achievement

compared to before here is a historical analogy you may remember that 10 years

ago or so the only way to productively use large amounts of huge amounts of

compute was through these embarrassingly parallel computations like map reduce

that was literally the only idea anyone had there weren't any interesting ways

in which you could use huge amounts of compute now with deep learning we have

one such way you say you need the compute needs to be a little bit more

interconnected but it is possible to have a large amount of compute and do

something useful with it but I don't think that we have figured out the best

formula for making use of this compute I believe that there will be better

formulas and we'll be able to go much further with the same amount of compute

that said I also very very confident that a lot of progress will happen from

data I mean I'm a big believer in data and I think that there can be so many

different things you could find new sources of data and you can filter it in

all kinds of ways and maybe apply some machine learning to improve it there can

be lots of I think there are lots of opportunities there and I expect the

combination of all of these to get when they come together they feed off each

other and I think that will lead to the most progress yeah and one question

because to go back to this question of Compu you somewhat answered it which is

like Hey we're going to have significantly more efficient algorithms


but if you kind of take this this sort of the concept of scaling to the Limit

that we mentioned before which is like hey if you scale everything to the limit

you'll get great performance at some point you're building supercomputers

that are um you know just far too you know they're way too big or they're way

too expensive or whatnot to be practically feasible how do you think do

you think as a field we get around that by um getting way better at using our

compute or do you think do you think there is some like fundamental limit of

compute that like we we need to kind of think about when we think about scaling

laws so there probably does exist an an ultimate way of using the

compute I don't think we found that way yet I think we can improve the

efficiency of our methods the usefulness they derive from our compute the the

extent to which they generalize I think there are lots of opportunities that we

haven't explored yet I also agree with you that there will be you know physical

limits and economic limits to the size of computers that one could

build and I think that progress will consist on pushing on all these axes now

one other thing I want to mention is that there is a huge amount of incentive

to find these better methods like think about what happens if you can find the

method that somehow allows you to train the same neural net with half the

computer it's huge it's like you double the size of your computer so the amount

of research there will only keep on increasing and I I believe that it will

lead to success it will take some time perhaps but I'm sure we'll find really

really powerful ways of training our neural Nets and setting them up in far

more efficient ways far more powerful ways than what we have right now and

then of course you want to give it all to give those better ways all the

computer deserves and all the data it deserves totally well one one

interesting concept just kind of related to this that I'm curious to give your

thoughts on you know one thing that I think we've talked a little bit about

before one concept of uh one of the concepts of neural networks embedded in

the name is that hey you're you're you you have you based this very simple
model of a neuron and that simple model of a neuron is then uh is then uh

allowing you to perform these like brain like um you know algorithms and the real

is that neurons are actually very weird they're very hard there's like a lot of

behaviors that we don't even fully understand mathematically we only have

like weak empirical to understand and so what do you think the likelihood that

like you know our current model of neurons which are these simple sort of

like R functions are the path to produce something that will resemble a brain or

things that resemble neurons or what do you think the chances that like hey we

need like we're on kind of this like interesting but slightly wrong path with

how how we're designing these networks so my view is that it is extremely

unlikely that there is anything wrong with the current

neurons I think that they might not be the best neurons perhaps but even if we

didn't change them we'll be able to go as far as we need to go now there is

still an important caveat here which is how do you know how many neurons you

need to reach human level intelligence while you can say maybe you can look at

the size of the brain but it may be that each biological neuron is like a small

supercomputer which is built out of a million artificial neurons so maybe you

will need to have a million times more artificial neurons in your artificial

neural net to be able to match the brain that's a possibility I don't think it

will be I don't think it will happen I don't think it will be that bad but I

would say worst case that would be the meaning of it in other words you you'll

need a lot more artificial neurons for each biological neuron for it to be

possible for them to simulate biological urance yeah yeah I know it's it's a it's

a very interest it's one of these interesting questions around you know

what how how much do we try to emulate biology or and or are we implicitly

emulating biology by having these small neurons that then create you know these

like super neurons that behave strangely yeah it's not I wouldn't even say that

we are trying to emulate biology but we are trying to be appropriately inspired

by it in the right way you know emulating biology precisely I think that

would be challenging and unwise but kind of using it as ballpark


estimates I think can be quite productive yeah uh one interesting thing

that you know it's so funny there's just like thing after thing after thing that

openi has worked on recently that's been interesting but um the instruct series

of models um I think was was was interesting that it kind of um it

demonstrates a potentially an interesting Paradigm for how humans and

models will collaborate in the future what do you sort of um why did you guys

work on the instruct series why is it interesting and and what excites you

about it yeah so so the instruct models are really important models I should

explain what they are and what the what the thinking there was

so after we trained gpt3 we started to experiment with it and try to understand

what it can do and we found that it can do a lot of different things but

it's and and it has a real degree of language understanding but it's very

very not humanlike it absolutely doesn't do what you ask it to even if even if

you can even if it can and so one of the one of the problems that we that we've

been thinking about a lot is alignment which is if you have a very powerful AI

system how do you make it so that it will Faithfully fulfill your Intent

Faithfully and correctly and the more powerful the AI system is the more the

greater it generalization and reasoning ability and creativity the more

important the alignment of the system becomes now gpt3 is a useful system it's

not profoundly smart but it is already interesting and then we can ask the

simpler question of how to align gpt3 how to build a version of gpt3 such that

it will try to as to the best of its abilities as as Faithfully as possible

to do what you ask it to do and so that led to the creation of the instruct

models and it's basically a version of GPT where you just say hey do X please

do y i want Z and it will do it and it's super convenient and people who use who

use it model this model love it and I think it's a great example where the the

more aligned model is also the more useful one I I want to you know thinking

about GB3 and large language models I'd be remiss to say uh to not talk about uh

sort of some of the challenges that that are associated with them in particular
uh you know gp3 is trained on uh and the future gpds are trained on just huge and

huge amounts of data and there's there's a lot of engineering involved in like

okay how do you like engineer this data to be um to work super well what what do

you think are some of the challenges especially as we try to figure out ways

to use more and more data into these uh machine Learning Systems how do you deal

with the fact that you know in any sea of large data there's going to be weird

biases or weird qualities that might be tough to might be tough to sift through

and and uh manage yeah lots lots of facets with this question so we've been

we this is something that we've been thinking about at open air for a long

time just you know even even before training gpt3 we've anticipated those

issues will come up and there are challenges I can mention some of the

strategies that we've T that we've pursued to address them and some of the

ideas that we have that we are working on to address those issues even further

so it is indeed the case that gpt3 in particular and models like it they learn

from the internet and they learn learn the full range

of data that's expressed on the internet now the model isn't and so

okay to a first approximation we knew that this would be an issue and one of

the advantages of releasing the model through an API is that it makes it

possible to deal with such challenges around misuse or

around us noticing that the model is producing undesirable outputs now of

course there's a very difficult challenge of defining what that means

yeah I mean to your point it's like um there there's

certainly it is a fact of life that these large data sets are going to

contain um are going to contain some sort of chaos noise things that like you

know we may not maybe uh from a moral perspective you may not want put in

front in front of a model but the I think what you're going towards is like

uh just from a pragmatic engineering perspective it's going to be impossible

to be precious about the data we put into the algorithm and so let's

acknowledge that that's going to happen and then be precious about defining the

performance of the algorithm post training on it and making sure that


you're able to uh sort of mold the algorithm into the exact to perform in a

way that you would like so I think that is I think that is the more productive

approach long term however I think it is possible to be precious about the data

as well because you can take the same models and filter the data or classify

it and decide what data you want to train on and I expect that as people

train these models we will and as we train these models we in fact we are

experimenting with these different approaches and find the most practical

and efficient way of having the model be as reasonably behaved as possible yeah

no I think the the the results in fine tuning the algorithms are are pretty

exciting right because it just means that um there's more degrees of freedom

to be able to produce algorithms perform behave in the ways that you want that's

right and this is a property of models that are better so this is a

counterintuitive thing the weaker the model is the less good your language

model is the harder it is to control whereas the better the bigger it is the

better it is the faster it is to fine tune the more responsive it is to

prompts which which specify you know one kind of behavior versus another so in a

sense I expect that at least this flavor of the problem that we just discussed

will become easier as the models become more powerful and more accurate yeah so

you know we we weave through a bunch of very interesting topics I want to take a

chance to kind of zoom out we started this uh we started this talk by talking

about how you know originally when you started working on on neural networks

you know uh the optimistic version was hey the field is going to pay attention

to neural networks and obviously now we believe more in something that resembles

AGI as like the optimistic version of what the field can accomplish um you

know I I think think that uh if we kind of in the zoomed in approach the past

few years have just been this sort of like incredible period of of new

breakthroughs very new interesting things as a result of AI when we kind of

like zoom out um to maybe some uh to a longer time Horizon what are the what

are the advancements of AI that you think are are sort of I shouldn't say on
the horizon but are just around the corner and the ones that you think are

going to have very meaningful implications for how the world will

operate so I think that of of the advances that

are around the corner I think that simple businesses

usual of the kind of mundane progress that we've seen over the past few years

will continue and I expect our language models our vision models our image

generation code text to speech speech to text I expect I expect all of them will

improve across the board and I think they'll all be impactful and

you know I would say with these generative models in particular it is a

little harder to reason about what kind of applications become possible once you

have a better code model or a better language model because it's not just

better at one thing it develops qualitatively new capabilities and it

unlocks qualitatively new applications and I think it's going to be just a lot

of them I think that like deep learning will continue to grow and to expand and

I think that more and more there'll be a lot more a lot more de loing data

centers and I think you'll have lots of interesting neural networks train on all

kinds of tasks I think medicine biology by the way those I think those ones will

be quite exciting like I I read that right now the field of biology is

undergoing a revolution in terms of their ability to get data I mean I'm not

an expert but I think it's not at least not false what I'm saying so I think

training neural networks there will be quite amazing I think it will be

interesting to see what kind of breakthroughs in medicine will led to it

or I should mention Alpha the think is also an example there so I think it's

going to be I think the progress is going to be

stunning to to kind of close I mean we have an incredible uh AI Community

that's sort of that's with us today and is uh is probably very excited to figure

out how they can how they can ensure that AI uh sort of has a positive future

that we have an positive AI future what do you think is are the things that sort

of everyone in the audience can can take away from this conversation and work on

that will help ensure that you know we have a positive future with AI so I
think the the most I mean there are many things which are which are worth

thinking about I'd say the biggest one is probably to keep in mind that a is a

very powerful technology and that it can have all

kinds of applications and to work on applications that are exciting and that

are solving real problems that are the kind of applications that that improve

people lives work on those as much as possible and also work on methods that

try to address the problems that exist with the technology to the m to the

extent they do and that would mean some of the questions about bias and

desirable outputs and possible other cons other questions around alignment

and questions that we haven't even discussed in this in this conversation

so i' say those two things work on useful applications and also whenever

possible work on reducing the harms you know the real harms and work on

alignment awesome well thank you so much thank but I I would be remiss to say uh

to not thank open Ai and the organization for all the incredible

contributions to the field of AI over the past many years and uh thank you so

much again for sitting down with us thank you for the conversation I really

enjoyed it

You might also like