Ilya Sutskever _AI Will Have a Human Brain That Can Think for Itself _AI Security Be Taken Seriously - 英语 (自动生成) (1)

and I think I was fortunate to realize fairly early on as I was looking into
what has been done in AI that learning is something that's both really
important for intelligence and something that we had no idea how to do at all and
so when my family moved to Canada I remember the first thing I did was to go
to the Toronto Public Library and try to find a book on machine
learning that's great how old were you I was 16 16 yeah
and then when I went to the University of Toronto and I sought out machine
learning professors I found Jeff Hinton I discovered neural networks and neural
networks felt like felt like the right thing because it's a very different way
of writing code normally you write code and you can kind of think it through and
understand yep whereas a neural network it's this you write an equation a
complicated equation inside a loop and then you run the loop and I would luck
figuring out what it does precisely and that connects to neural Nets not being
interpretable but it could also argue that the difficulty of understanding
what neural networks do is not a bug but it's
feature like we want to build intelligence intelligence is not simple
to understand we can't explain how we do the cognitive functions that we do how
we see how we hear how we understand language so therefore if we can get if
computers can produce objects that are similarly difficult to understand not
impossible but similarly difficult it means we're on the right track and so
all those things helped me uh Converge on mural networks fairly early on yeah
what year was it when you sort of like remember initially getting excited about
neural networks and being pretty convicted like early 2000s I started
working at Jeff Hinton in 2003 yeah so quite a while ago now long before I mean
obviously the craze kind of started around 20 and so there was a there was a
good like I think this is a common theme whenever you look at sort of any uh
anybody who works like in in any sort of field that becomes very big but there's
a long stretch of like you know wandering in the desert maybe is one way
to put it yeah I mean definitely lots of perseverance is required because you
don't know how long the how long you want to stay in the desert you just got
to endure yeah and that's very helpful and did you expect like I mean obviously
today neural networks do some pretty incredible things like did did you
expect back in 2003 or early 2000s that like in your lifetime you would see sort
of the the things that we're seeing now with AI machine
learning I was hoping but I did not expect it back
then the field of AI was on the wrong track it was in a mindset of rejection
of neural networks right and the reason for that is that neural networks are
difficult to reason about math atically while other stuff you can prove theorems
about and there's something very seductive and dangerous about proving
theorems about things y because it's a way to to Showcase your skill but it's
not necessarily aligned with what makes the most progress in the field but I
think that neural networks are as successful as they are precisely because
they're difficult to reason about mathematically and so anyway my earlier
hope was to Simply to convince the field that they should work neural networks
rather than the other stuff that they were doing yeah but then when computers
started to get fast then my my level of excitement and about their potential has
increased as well yeah and so what what are your aspirations today like what do
you what is the like in your lifetime what's the thing you I mean I think it's
obvious from the uh open ey Mission set but yeah I mean exactly right so now now
thep now the hopes are much larger now I think we can really try to build not
only really powerful and useful AI but actually AGI make it useful make it
beneficial use it to solve and make it so that it will be used to solve a large
number of problems and create lots of amazing applications that's that's what
I'd like that's what I hope to see happen yep um and then you know
obviously along the way you had been doing a lot of uh a lot of This research
and doing a lot of groundbreaking work at Google and then you sort of left and
started open your eye with Sam mman and and Greg Brockman and a bunch of others
um what was kind of the what were your kind of goals with starting opening eye
at the outset what was what was s like the initial um conception or the initial
vision and um and and what did you hope to accomplish by by starting sort of a
new lab there were multiple motivations on my end for starting open a so the
first motivation was that I felt that the way to make the most progress in AI
was by merging science and engineering into a single hole into a unit to make
it so that there is no distinction or as little distinction as possible between
science and engineering so that all the science is in is infused with
engineering discipline and careful execution and all the engineering is
infused with the scientific ideas and the reason for that is because the field
is becoming mature and so it is hard to just do
small scale tinkering without having a lot of engineering skill and effort to
really make something work so that was one motivation I really wanted to have a
comp a company that will be operating on this principle another motivation was
that I came to see AI technology in a more sober way I
used to think that AI will just be this endless good and now I see it in a more
complex way where I think there will be a lot of truly incredible inconceivable
applications that will improve our lives in dramatic ways right but I also think
that there will be challenges I think that there will be lots of problems that
will be posed by the misapplication of AI and by its peculiar properties that
may be difficult for people to understand and I wanted a company that
will be operating with this awareness in mind and that will be trying to address
those challenges by you know as best as possible by not only working on
advancing technology but also working on making it safe and also working on the
policy side of things as much as is rational and reasonable to make the
whole be as useful and as beneficial as possible totally and I think it's
something we agree on I mean I think one thing that is very obvious to me is that
AI is something that's going to you know which countries have access to AI
technology and the ways in which they use them are going to Define how the
world plays out over the course of the next few decades like it's just I think
that's how uh that's the path we're on as a as a world that's right among many
other things right right and you know I and I this thing that you mentioned
around sort of bringing together the science and engineering I think it's
quite it's uh it's quite profound I think for a few reasons right because
like one is that um uh you know first of all I think a lot of the the best the
most incredible Innovative things happen often times from sort of blurring the
lines between disciplines like apple is one of the best examples where from the
very beginning they were always like Hey we're blending hardware and software and
that's that's our special sauce and obviously it's uh produced some
incredible things and I think a lot of other research Labs you know they
operate in very sort of scientists tell the engineers what to do mindset which
is counterproductive because you really need to understand both very well to
understand what the what kind of the limits of the technology are yeah that's
right and on that point you may even say isn't it obvious that the science and
the engineering should be together and on some level it is but it just so
happens that historically it hasn't been this way there's a certain kind of taste
like empirically it has been the case in the past less so now that people who
gravitate to research would have a certain taste that would also make them
less drawn to engineering and vice versa and I think now because people are also
seeing this reality on the ground that to do any kind of good science you need
the good engineering then people then you have more and more people who are
strong in both of these axes totally yeah and I think that you know uh
Switching gears a little bit to to kind of the GPT models I this is a great
illustration right because the GPT models are impossible without incredible
engineering like the the sort of um uh and and but yet they still required
novel research they still requireed novel science to be able to accomplish
and they they've obviously been some of the biggest breakthroughs in the field
of AI as of late and sort of um blown open many people's imaginations about
what AI can accomplish or at least increase people's confidence that AI can
accomplish incredible things you know I'm kind of curi ious about uh
originally when when at opening ey when you guys were you've been working on
these language models for some time what were the original sort of research
Inspirations behind it and what were the original sort of um things that led you
all to say hey this is something that's worth working on worth scaling up worth
continuing to double down on so there' have been
multiple lines of thinking that led us to converg in language
models there has been an idea that we believed in relatively early on that you
can somehow link understanding to prediction and specifically to
prediction of whatever data you give to the to the
model where the idea is well let's let's let's work out an example so before
before diving into the example I'll I'll start with the conclusion first the
conclusion is that if you can make really good guesses as to what's going
to come next you can't make it perfectly it's impossible but if you can make a
really good guess you need to have a meaningful degree of understanding you
know in the example of of a book suppose that you read a book and it's a mystery
novel and in the last chapter all the pieces are coming together and there is
a critical sentence and you start to read you read the first word and the
second word now you say okay the identity of some person is going to be
relieved and your mind is honing in on like it's either this person or that
person you don't know which one it is now maybe someone who read the book and
thought about it very carefully says you know I think it's probably this person
maybe that but probably this so what this example goes to show that really
good prediction is connected to understanding
and this kind of thinking has led us to experiment with all kinds of approaches
of hey can we predict things really well can we predict the next word can we
predict the next pixel and study their properties and through this line of work
we were able to get to we did some work before the gpts before Transformers
before the Transformers were invented and with um something that we
call the sentiment neuron which is a neural net which was trying to predict
the next word the next sorry the next character in reviews of Amazon
products and it was a small neural net because it was maybe four years ago but
it did prove the principle that if you predict the next character well enough
you will eventually start to discover the semantic properties of the text and
then with the gpts we took it further we said okay well we have the Transformer
it's a better architecture so we have a stronger effect and then later there
realization that if you make it larger it will be better so let's make it
larger and it will be better yeah I mean there's there's a lot of uh there's a
lot of great nuggets and what you just mentioned right I think first is the
Elegance of this concept which is like hey if you get really good at predicting
the next whatever ever get really good at prediction you get that obligates you
to be good at all these other things if if you're really good at that and it's
it's you know I think it's probably like under underrated how um that required
some some degree of vision because it's like early on you know you try to get
really good at predicting things and you know you got the sentiment NE on which
is cool but it's like that's like a it's like a a blip relative to what we
obviously have seen with the the large language models and so that I think
significant and I think the other significant piece is um where you just
mentioned which is kind of um scaling it up right and uh I think you know you
guys had had uh released this paper about this kind of like a scaling laws
of what you have found as you scaled up um compute data model size sort of in
concert with one another but I'm I'm kind of curious like what's the um
obviously there's there's some intuition was just like hey scaling things up um
is good and you see you see great behaviors what's kind of your intuition
behind um sort of if you think from now over the next few the next few years or
even the next few decades like what what is scaling up mean why is it likely to
continue resulting in in great results and and um what do you think the limits
are if any I think two two stat two statements are true at the same time on
the one hand it does look like our models are quite large can we keep
scaling them up even further can we keep finding more data for the scale up and I
want to spend a little bit of time on the data question because I think it's
not obvious at all yeah traditionally because of the roots of
the field of machine learning because of the roots of because the field has been
fundamentally academic and fundamentally concerned with discovering new methods
and less with the development of very big and powerful systems the mindset has
been someone builds someone creates a fixed
Benchmark so a data set of a certain of a of a certain of certain shape of
certain characteristics and then different people can compare their
methods on this data set but what it does is that it forces everyone to work
with a fixed data set yep the thing with the that the gpts have shown in
particular is that scaling requires that you increase the compute and the data
and tandem at the same time and if you do this then you keep getting better and
better results and in some domains like language there is quite a bit of data
valuable in other maybe more specialized subdomains the amount of data is a lot
smaller and that could be for example if you want to have an automated lawyer so
I think your big language model will know quite a bit about language and it
will be able to converse very intelligently about many topics but it
may perhaps not be as good at being a lawyer as we'd like it will be quite
formidable but will it be good enough so this is unknown because the amount of
data there is smaller but any time where data is abundant then it's possible to
apply the Deep the magic deep learning formula and to produce these
increasingly good and increasingly more powerful models and then in terms of
what are the limits of scaling so I think one thing
that's notable about the history of deep learning over the past 10 years is that
every year people said okay we had a good run but now we've hit the limits
and that happened year after year after year and so I think that I think that we
absolutely may hit the limits at some point but I also think that it would
be unwise to bet against deep learning yeah you know there's a number of things
I want to dig in here um because they're they're all pretty interesting one is is
this this um just I think this you you certainly have this mental model um that
I think is is is quite good which is kind of like hey uh Mo's law is this
incredible is this incredible accelerant for everything that we do right and the
more that there's More's law for everything you know More's law for
different inputs that go into the deep into the machine learning life cycle you
know we're just going to like push all these things to the Max and we're going
to see just incredible performance and I think is significant because as you
mentioned about this data point it's like hey if we if we get more efficient
at we get more efficient at compute which is something that's happening we
get more efficient at uh producing data or finding data or or generating data we
get more effic obviously there's more efficiency out of the algorithms you
know all these things are just going to keep enabling us to do the next
incredible thing and the next incredible thing the next incredible thing um so
first I guess like do like I we've talked about this a little bit before so
I know you agree with that but like how what do you think is
um where do you think like is there any flaw are there any flaws to that logic
what would you be worried about in terms of how everything will scale up over the
next few years I mean I think I think over the next few years I don't have too
much concern about Contin continued progress I think that we will we will
have faster computers we will find more data and we'll train better models I
think that is I don't see to I don't see particular risk there I think moving
forward we will need to start being more creative about okay so what do you do
when you don't have a lot of data can you somehow intelligently use the same
compute to compensate for that lack of data and I think those are the questions
that we and the the field will need to to Grapple with to continue our
yeah no and I think this point about data the other thing I wanted to touch
on because this is this is something obviously scale that we're we focus on
and I think that the large language models thankfully because you can
leverage the internet really like all the fact that like all this data has you
know existed and been accumulating for a while you can show some pretty
incredible things in all new domains you need efficient ways to to generate lots
of data right and I think that there's there's this whole question was like how
do you make it so that you know each ounce of of human effort that goes into
generating some data produces as much data as possible um and I think that
like something that we're passionate about that I think we talk a little bit
about like how do you get like a mors law for data right how do you get you
know more and more efficiency out of like a human effort in producing data
and that might require novel new um paradigms but uh is something that I
think is required for in this lawyer for example uh that you mentioned like we
have a pretty finite set of lawyers how do we get those lawyers to produce
enough data so you can create some great legal uh legal AI the choices that we
have is either improve our methods so that we can do more with the same data
or do the same with less data and the second is like you say somehow increase
the efficiency of the teachers yep and I think both will be
needed to make the most progress yeah well it's kind of like you know I I
really think mors law is instructive right like to get these chips performing
better people try all sorts of random crafts and then the end output is that
you have you have chips that have more transistors right and I think this like
if we think about as like do we have models that perform better with like
certain amounts of data or certain amounts of of teaching um how do we how
do we make that go up yeah I mean I think I'm I'm sure that there will be
ways to do that I mean for example if you ask the human teachers to help you
only in the hardest cases I think that will allow you to move faster uh I want
to switch gears to uh one of the offshoots of the large language model
efforts which is particularly exciting especially to me as an engineer probably
most uh people who spend a lot of time coding which is codex uh which
demonstrated some pretty incredible capabilities of going from uh sort of
natural language and to code and sort of being able to to uh interact with with a
program in in a very novel New Way um you know I'm kind of curious for you
what what excites you about this effort what do you think is the what do you
think are the reasonable expectations for what codex and codex like systems
will enable in the next few years what about far beyond that and and uh
ultimately why are you guys so excited about
it for for some context codex is pretty much a large GPT neural network let
train on code instead of training to predict the next word in text it's
trained to predict the next word in code the next I guess token in code and the
things that's cool about it is that it works at
all like I don't think it's self-evident to most people
that it would be possible to train a neural
net in such a way so that if you just give it some representation of text that
describes what you want and then the neural network will just process this
text and produce code and this code will be correct and and it will run and it's
exciting for a variety of reasons so first of all it is useful it is new it
shows that I'd say when it code has been a
domain that hasn't really been touched by AI too much even though it's
obviously very important and it touches on aspects where AI has been you know
today's AI deep learning has been perceived as weak which is reasoning and
carefully laying out plans
and not being fuzzy and so so it turns out that in
fact they can do a quite quite a good job here and like one analogy one
distinction between between codex and language models is that the the Codex
models the code models they allow you to they in effect they can control the
computer it's like they have the computer as an actuator and so that
makes them much more that it greatly expand it makes them much more useful
you can do so many more things with them and of course we want to make them
better still I think they can improve in lots of different ways those those are
just the preliminary code models I expect them to be quite useful to
programmers and especially in areas where you need to know random apis
because these neural networks they so one thing that I think is small
aggression the GPT neural networks they don't learn quite like
people a person will often have somewhat narrow knowledge in great depth while
these neural networks they want to know everything that exists and they really
try to do that so their knowledge is encyclopedic it's not as deep it's
pretty deep but not as deep as a person and so because of that these
neural networks in their in the way they work today they complement people with
their breads so you might say I want to do something with a library I don't
really know it could be some existing library or maybe the neural network had
read all you know the code of all my of all my colleagues and it knows it knows
what they've written and so I want to use some Library I don't know how to use
the network will have a pretty good guess of how to use it you'd still need
to make sure that what it said is correct because such is it level of
performance today you cannot trust it blindly especially if the code is
important for some domains where it's easy to undo anything that it writes any
code that it writes then I think you can trust it just fine but if you actually
want to have real code you want to check it but I expect that in the future those
models will continue to improve I expect that the neural network that the code
neural networks will keep getting better and I think the nature of the
programming of the programming profession will
change in response to these models I think that in a like in a sense it's a
it's a natural continuation
of how in in in the software engineering World we've been using
higher and higher level programming languages but first people wrote
assembly then they had foron then they had C now we have python now we have all
these amazing python libraries that's a layer on top of that and now we can be a
little bit more imprecise we can be a little bit more ambitious and the model
the neural network will do a lot of the work for us and I do think that I should
say I expect something similar to happen across the board in lots of other white
color professions as well you know there's if you think about the economic
impact of AI there's been an inversion I think I think there's been a lot of
thinking that uh maybe simple robotics tasks will be the first you know the
first ones to be hit by automation but instead if we are find that the creative
tasks counterintuitively they seem to be affected quite a bit if you look at the
generative neural networks in the way you generate images now now it's you can
find it on Twitter all kinds of stunning images being generated you know
generating cool text it's happening as well but the images are getting most of
the attention and then with things like code things like a lot of writing tasks
this is the uh White Collar tasks they are also being affected by these AIS and
I do expect that so Society will change as progress continues to make Society
will change and I think that it is important for economists and people who
think about these questions to pay attent pay careful attention to these
Trends so that as technology continues to improve there are good ideas in place
to like in effect to to be ready for this technology yeah there's a number of
like really again uh interesting nuggets in there I think one is that um I think
like the one of the big Ideas behind codex or codex like models right is that
you go from being able to go from human language to machine language right and
and you kind of mention like oh all of a sudden the machine is an actuator and if
you think about like I think many of us when we think about AI we think about
like the Star Trek computer you know you can just ask a computer and it'll do
things you know that's that this is this is a key enabling step right because if
if all of a sudden you can go from how we speak how humans speak to things that
a machine can understand then you like bridge this like key translation step so
I think that's super interesting you know another thing that this inversion
that you just mentioned about is is super interesting because I think that
you know one of the things that um my beliefs on this is like hey this is the
reason that some things have become much easier than others you know it's all a
product of availability of data right there's some areas where we've had there
just exists lots and lots of Digital Data that you can kind of suck up into
the algorithms and it can do quite well and then in things like robotic tasks or
setting a table or you know all these things that that are that we've had very
a lot of trouble um building machines to do you're like fundamentally limited by
amount of data you have first just by the amount of data that's been collected
so far but also like you know you can only have so much stuff happening in the
real world to collect that data I'm curious how like it how do you think
about that or do you think it's actually something intrinsic to the sort of like
creative tasks that is uh that is somehow more uh suited to current neural
networks I think it's both I think it is unquestionably true that with the so we
can take a step backwards at the base of all AI
progress that has happened at least in in all of deep learning and arguably
more is the ability of neural networks to generalize now generalization is a
technical term which means that you understand something
correctly or take the right action in a situation that's unlike any situations
that you've seen in the past in your experience and you can see and so now a
system generalizes better if from the same data it can do the right thing or
understand the right situation in a broader set of
situations and so to make an analogy suppose you have a student at a
university studying for an exam that student might say this is a very
important exam for me let me memorize this let me make sure I can solve every
single exercise in the textbook you know such a student will be very well
prepared and could achieve a very very high grade in the
exam now consider a different student who might say you know what I don't need
to to learn to to figure to know how to solve all the exercis in the textbooks
As Long As I Got the fundamentals right I read the first 20 pages and I feel I
got the fundamentals if if that if that second student also achieves a high
grade in the exam that second student did something harder than the first
student that second student exhibited a greater degree of generalization they
were able to even though the questions were the same the situation was less
familiar for the second student and the first student and so our neural networks
are a lot like the first students they they have an incredible ability to
generalize for a computer but we could we could do more

and because their generalization is not yet perfect definitely not yet at a
human level we need to compensate for it by training on very large amounts of
data that's where the data comes in the the better you generalize the less data
you need slash the further you can go with the same data so maybe once we find
figure out how to make a neural networks generalize a lot
better than all those small domains we don't have a lot of data it actually
won't matter the neur electric will say it's okay I know what to do well enough
even with this limited amount of data but today we need a lot of data but now
when it comes to the creative applications in particular there is some
way in which they are especially well suited for neural networks and that's
because generative models play a very Central role in machine learning and the
nature of the generations of generative models are somehow analogous to the
artistic process it's not perfect it doesn't capture everything and very and
there is certain kinds of art which our models cannot do yet but I think this
second connection the the generative aspect of Art and the ability of
generative models to generate new plausible data is another reason why art
has been we've seen so much progress in generative art so yeah I mean it's a
really interesting thing right because it's almost uh it's a it's a shade of
what you had mentioned at the very beginning which is that um part of the
reason that uh these that maybe we shied away from neural networks at the start
was that they're so hard to explain and that that aspect where we can't prove
theorems about them they do things that we can't quite explain maybe is what
naturally allows them to be better suited for Creative Pursuits which we
also can't explain very well yeah I think I think
that's that's definitely possible as well yeah so you know one thing I'm
really uh some of the other recent uh advancements from open AI were uh clip
and Dolly you know both super interesting uh examples of you know
being able to go from between modalities um from text to images um maybe I I
would love to understand from you what do you think is the kind of significance
of sort of what what is shown by clip and Dolly where do you think that
research goes over time and uh and what excites you about it yeah so for for
context so cliip and Del are neural networks
that learn to associate text with images so di
Associates text with images in the generative Direction and clip Associates
text with images but in the direction of perception from going from an image to
text versus going from text to image and both of them are cool because they are
simple it's the same old recipe you just say hey let's take a a neural a neural
net that we understand really well and just strain it on a big collection of
texts and image Pairs and see what happens and what happens is something
very good the the real motivation with clip and Del was just to dip our toes
into ways of combining the two modalities because one of the things
that we'll want in the future I think it's fairly likely that we want we
wouldn't want neural Nets that sorry we wouldn't want our future AIS to be
Texton AI like we could but seems like it's a missed opportunity I feel like so
much stuff is going on in the visual world and if it's not difficult to have
a neural net really understand the visual world and why not and then also
hopefully by connecting the textual world to the visual
World they will understand text better they will have a much the the the the
the the the understanding of text that they you learn by also being trained on
images make become a little bit closer to ours because you could make an
argument that maybe there is a distinction between what people learn
and what our artificial neural networks learn because people see and they walk
around they do all those different things whereas our neural networks you
know the text ones only train on text so maybe that means that something is
missing and maybe if you bring the training data to be more similar to that
people to to that of people then maybe we'll learn something more similar to
that of people as well so those were some of the motivations to study these
models and it was also fun to see that they worked quite well and especially
now with uh I would say most most recently clip clip has been enjoying
quite quite quite some degree of popularity and people have figured out
how to invert it to generate high resolution images and have a lot of fun
with it and actually that's I think that's the most for me emotionally satis
fine application that of the past maybe few months yeah no I think one of the um
uh yeah I think that it's an interesting point that you mentioned which is hey
the more that we can there's this concept of embodied AI right which is
like hey if you have an AI that actually like will go and experience things like
humans do maybe you get the ability to do uh to you get interesting behaviors
and the more we can go in that direction with stuff like multimodal learning is
uh is super interesting you know another thing I wanted to to touch on is um I
think I think you mentioned something quite profound which is hey it's a
simple it's a it's they're very simple the algorithm the use of the algorithm
is very simple and uh in this case like producing the data sets and getting the
data right is is from my perspective what really enabled a lot of the
incredible results I don't know how you think about that and how you think that
defines like future similar areas of research I'd say it's definitely a true
statement that the field of of deep learning the especially the academic
Branch not so much the applied Branch but the academic branch of the field has
underestimated the importance of data because of the mental framework of
the data is given to you in the form of a benchmark and your goal is to create a
better method that does better than the other existing methods and the reason it
was important for that framework to have a fixed data set is so that you could
compare which method is better and I think that really did lead to a blind
spot where very many researchers were working very hard on this pretty
difficult area of can we improve the model more and more and more while
living a very large improvements that are possible by simply saying hey let's
get much more data on the table I think now at this point the people appreciate
the importance of data a lot more and I think that at this point it's quite Pro
that domains with a lot of data will experience a lot of progress do you
think that um do you think that more of just conceptually do you think that over
the next few years more of the advancements more of the cool things
that we'll see in AI will come from uh sort of innovating more on the data side
or innovating more on the algorithm side I prefer to not make that
distinction I think making that distinction
is I mean it's useful for some things and maybe maybe maybe let let me roll
with that distinction I think both will be important I expect us to I I I
believe very firmly that very huge progress is possible from from
algorithmic from methodological improvements we are nowhere yet to being
as efficient as we can be with our compute we have a lot of compute we know
how to make use of it in some way which is already a huge achievement
compared to before here is a historical analogy you may remember that 10 years
ago or so the only way to productively use large amounts of huge amounts of
compute was through these embarrassingly parallel computations like map reduce
that was literally the only idea anyone had there weren't any interesting ways
in which you could use huge amounts of compute now with deep learning we have
one such way you say you need the compute needs to be a little bit more
interconnected but it is possible to have a large amount of compute and do
something useful with it but I don't think that we have figured out the best
formula for making use of this compute I believe that there will be better
formulas and we'll be able to go much further with the same amount of compute
that said I also very very confident that a lot of progress will happen from
data I mean I'm a big believer in data and I think that there can be so many
different things you could find new sources of data and you can filter it in
all kinds of ways and maybe apply some machine learning to improve it there can
be lots of I think there are lots of opportunities there and I expect the
combination of all of these to get when they come together they feed off each
other and I think that will lead to the most progress yeah and one question
because to go back to this question of Compu you somewhat answered it which is
like Hey we're going to have significantly more efficient algorithms

but if you kind of take this this sort of the concept of scaling to the Limit
that we mentioned before which is like hey if you scale everything to the limit
you'll get great performance at some point you're building supercomputers
that are um you know just far too you know they're way too big or they're way
too expensive or whatnot to be practically feasible how do you think do
you think as a field we get around that by um getting way better at using our
compute or do you think do you think there is some like fundamental limit of
compute that like we we need to kind of think about when we think about scaling
laws so there probably does exist an an ultimate way of using the
compute I don't think we found that way yet I think we can improve the
efficiency of our methods the usefulness they derive from our compute the the
extent to which they generalize I think there are lots of opportunities that we
haven't explored yet I also agree with you that there will be you know physical
limits and economic limits to the size of computers that one could
build and I think that progress will consist on pushing on all these axes now
one other thing I want to mention is that there is a huge amount of incentive
to find these better methods like think about what happens if you can find the
method that somehow allows you to train the same neural net with half the
computer it's huge it's like you double the size of your computer so the amount
of research there will only keep on increasing and I I believe that it will
lead to success it will take some time perhaps but I'm sure we'll find really
really powerful ways of training our neural Nets and setting them up in far
more efficient ways far more powerful ways than what we have right now and
then of course you want to give it all to give those better ways all the
computer deserves and all the data it deserves totally well one one
interesting concept just kind of related to this that I'm curious to give your
thoughts on you know one thing that I think we've talked a little bit about
before one concept of uh one of the concepts of neural networks embedded in
the name is that hey you're you're you you have you based this very simple
model of a neuron and that simple model of a neuron is then uh is then uh
allowing you to perform these like brain like um you know algorithms and the real
is that neurons are actually very weird they're very hard there's like a lot of
behaviors that we don't even fully understand mathematically we only have
like weak empirical to understand and so what do you think the likelihood that
like you know our current model of neurons which are these simple sort of
like R functions are the path to produce something that will resemble a brain or
things that resemble neurons or what do you think the chances that like hey we
need like we're on kind of this like interesting but slightly wrong path with
how how we're designing these networks so my view is that it is extremely
unlikely that there is anything wrong with the current
neurons I think that they might not be the best neurons perhaps but even if we
didn't change them we'll be able to go as far as we need to go now there is
still an important caveat here which is how do you know how many neurons you
need to reach human level intelligence while you can say maybe you can look at
the size of the brain but it may be that each biological neuron is like a small
supercomputer which is built out of a million artificial neurons so maybe you
will need to have a million times more artificial neurons in your artificial
neural net to be able to match the brain that's a possibility I don't think it
will be I don't think it will happen I don't think it will be that bad but I
would say worst case that would be the meaning of it in other words you you'll
need a lot more artificial neurons for each biological neuron for it to be
possible for them to simulate biological urance yeah yeah I know it's it's a it's
a very interest it's one of these interesting questions around you know
what how how much do we try to emulate biology or and or are we implicitly
emulating biology by having these small neurons that then create you know these
like super neurons that behave strangely yeah it's not I wouldn't even say that
we are trying to emulate biology but we are trying to be appropriately inspired
by it in the right way you know emulating biology precisely I think that
would be challenging and unwise but kind of using it as ballpark

estimates I think can be quite productive yeah uh one interesting thing
that you know it's so funny there's just like thing after thing after thing that
openi has worked on recently that's been interesting but um the instruct series
of models um I think was was was interesting that it kind of um it
demonstrates a potentially an interesting Paradigm for how humans and
models will collaborate in the future what do you sort of um why did you guys
work on the instruct series why is it interesting and and what excites you
about it yeah so so the instruct models are really important models I should
explain what they are and what the what the thinking there was
so after we trained gpt3 we started to experiment with it and try to understand
what it can do and we found that it can do a lot of different things but
it's and and it has a real degree of language understanding but it's very
very not humanlike it absolutely doesn't do what you ask it to even if even if
you can even if it can and so one of the one of the problems that we that we've
been thinking about a lot is alignment which is if you have a very powerful AI
system how do you make it so that it will Faithfully fulfill your Intent
Faithfully and correctly and the more powerful the AI system is the more the
greater it generalization and reasoning ability and creativity the more
important the alignment of the system becomes now gpt3 is a useful system it's
not profoundly smart but it is already interesting and then we can ask the
simpler question of how to align gpt3 how to build a version of gpt3 such that
it will try to as to the best of its abilities as as Faithfully as possible
to do what you ask it to do and so that led to the creation of the instruct
models and it's basically a version of GPT where you just say hey do X please
do y i want Z and it will do it and it's super convenient and people who use who
use it model this model love it and I think it's a great example where the the
more aligned model is also the more useful one I I want to you know thinking
about GB3 and large language models I'd be remiss to say uh to not talk about uh
sort of some of the challenges that that are associated with them in particular
uh you know gp3 is trained on uh and the future gpds are trained on just huge and
huge amounts of data and there's there's a lot of engineering involved in like
okay how do you like engineer this data to be um to work super well what what do
you think are some of the challenges especially as we try to figure out ways
to use more and more data into these uh machine Learning Systems how do you deal
with the fact that you know in any sea of large data there's going to be weird
biases or weird qualities that might be tough to might be tough to sift through
and and uh manage yeah lots lots of facets with this question so we've been
we this is something that we've been thinking about at open air for a long
time just you know even even before training gpt3 we've anticipated those
issues will come up and there are challenges I can mention some of the
strategies that we've T that we've pursued to address them and some of the
ideas that we have that we are working on to address those issues even further
so it is indeed the case that gpt3 in particular and models like it they learn
from the internet and they learn learn the full range
of data that's expressed on the internet now the model isn't and so
okay to a first approximation we knew that this would be an issue and one of
the advantages of releasing the model through an API is that it makes it
possible to deal with such challenges around misuse or
around us noticing that the model is producing undesirable outputs now of
course there's a very difficult challenge of defining what that means
yeah I mean to your point it's like um there there's
certainly it is a fact of life that these large data sets are going to
contain um are going to contain some sort of chaos noise things that like you
know we may not maybe uh from a moral perspective you may not want put in
front in front of a model but the I think what you're going towards is like
uh just from a pragmatic engineering perspective it's going to be impossible
to be precious about the data we put into the algorithm and so let's
acknowledge that that's going to happen and then be precious about defining the
performance of the algorithm post training on it and making sure that

you're able to uh sort of mold the algorithm into the exact to perform in a
way that you would like so I think that is I think that is the more productive
approach long term however I think it is possible to be precious about the data
as well because you can take the same models and filter the data or classify
it and decide what data you want to train on and I expect that as people
train these models we will and as we train these models we in fact we are
experimenting with these different approaches and find the most practical
and efficient way of having the model be as reasonably behaved as possible yeah
no I think the the the results in fine tuning the algorithms are are pretty
exciting right because it just means that um there's more degrees of freedom
to be able to produce algorithms perform behave in the ways that you want that's
right and this is a property of models that are better so this is a
counterintuitive thing the weaker the model is the less good your language
model is the harder it is to control whereas the better the bigger it is the
better it is the faster it is to fine tune the more responsive it is to
prompts which which specify you know one kind of behavior versus another so in a
sense I expect that at least this flavor of the problem that we just discussed
will become easier as the models become more powerful and more accurate yeah so
you know we we weave through a bunch of very interesting topics I want to take a
chance to kind of zoom out we started this uh we started this talk by talking
about how you know originally when you started working on on neural networks
you know uh the optimistic version was hey the field is going to pay attention
to neural networks and obviously now we believe more in something that resembles
AGI as like the optimistic version of what the field can accomplish um you
know I I think think that uh if we kind of in the zoomed in approach the past
few years have just been this sort of like incredible period of of new
breakthroughs very new interesting things as a result of AI when we kind of
like zoom out um to maybe some uh to a longer time Horizon what are the what
are the advancements of AI that you think are are sort of I shouldn't say on
the horizon but are just around the corner and the ones that you think are
going to have very meaningful implications for how the world will
operate so I think that of of the advances that
are around the corner I think that simple businesses
usual of the kind of mundane progress that we've seen over the past few years
will continue and I expect our language models our vision models our image
generation code text to speech speech to text I expect I expect all of them will
improve across the board and I think they'll all be impactful and
you know I would say with these generative models in particular it is a
little harder to reason about what kind of applications become possible once you
have a better code model or a better language model because it's not just
better at one thing it develops qualitatively new capabilities and it
unlocks qualitatively new applications and I think it's going to be just a lot
of them I think that like deep learning will continue to grow and to expand and
I think that more and more there'll be a lot more a lot more de loing data
centers and I think you'll have lots of interesting neural networks train on all
kinds of tasks I think medicine biology by the way those I think those ones will
be quite exciting like I I read that right now the field of biology is
undergoing a revolution in terms of their ability to get data I mean I'm not
an expert but I think it's not at least not false what I'm saying so I think
training neural networks there will be quite amazing I think it will be
interesting to see what kind of breakthroughs in medicine will led to it
or I should mention Alpha the think is also an example there so I think it's
going to be I think the progress is going to be
stunning to to kind of close I mean we have an incredible uh AI Community
that's sort of that's with us today and is uh is probably very excited to figure
out how they can how they can ensure that AI uh sort of has a positive future
that we have an positive AI future what do you think is are the things that sort
of everyone in the audience can can take away from this conversation and work on
that will help ensure that you know we have a positive future with AI so I
think the the most I mean there are many things which are which are worth
thinking about I'd say the biggest one is probably to keep in mind that a is a
very powerful technology and that it can have all
kinds of applications and to work on applications that are exciting and that
are solving real problems that are the kind of applications that that improve
people lives work on those as much as possible and also work on methods that
try to address the problems that exist with the technology to the m to the
extent they do and that would mean some of the questions about bias and
desirable outputs and possible other cons other questions around alignment
and questions that we haven't even discussed in this in this conversation
so i' say those two things work on useful applications and also whenever
possible work on reducing the harms you know the real harms and work on
alignment awesome well thank you so much thank but I I would be remiss to say uh
to not thank open Ai and the organization for all the incredible
contributions to the field of AI over the past many years and uh thank you so
much again for sitting down with us thank you for the conversation I really
enjoyed it

Ilya Sutskever _AI Will Have a Human Brain That Can Think for Itself _AI Security Be Taken Seriously - 英语 (自动生成) (1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ilya Sutskever _AI Will Have a Human Brain That Can Think for Itself _AI Security Be Taken Seriously - 英语 (自动生成) (1)

Uploaded by

Copyright:

Available Formats

and I think I was fortunate to realize fairly early on as I was looking into

to the Toronto Public Library and try to find a book on machine

learning that's great how old were you I was 16 16 yeah

interpretable but it could also argue that the difficulty of understanding

what neural networks do is not a bug but it's

feature like we want to build intelligence intelligence is not simple

to understand we can't explain how we do the cognitive functions that we do how

we see how we hear how we understand language so therefore if we can get if

to put it yeah I mean definitely lots of perseverance is required because you

of the the things that we're seeing now with AI machine

learning I was hoping but I did not expect it back

they're difficult to reason about mathematically and so anyway my earlier

it so that there is no distinction or as little distinction as possible between

science and engineering so that all the science is in is infused with

engineering discipline and careful execution and all the engineering is

is becoming mature and so it is hard to just do

that I came to see AI technology in a more sober way I

will be posed by the misapplication of AI and by its peculiar properties that

may be difficult for people to understand and I wanted a company that

those challenges by you know as best as possible by not only working on

policy side of things as much as is rational and reasonable to make the

whole be as useful and as beneficial as possible totally and I think it's

AI is something that's going to you know which countries have access to AI

is counterproductive because you really need to understand both very well to

the engineering should be together and on some level it is but it just so

novel research they still requireed novel science to be able to accomplish

of AI as of late and sort of um blown open many people's imaginations about

what AI can accomplish or at least increase people's confidence that AI can

continuing to double down on so there' have been

multiple lines of thinking that led us to converg in language

can somehow link understanding to prediction and specifically to

prediction of whatever data you give to the to the

good prediction is connected to understanding

before the Transformers were invented and with um something that we

realization that if you make it larger it will be better so let's make it

fundamentally academic and fundamentally concerned with discovering new methods

been someone builds someone creates a fixed

Benchmark so a data set of a certain of a of a certain of certain shape of

certain characteristics and then different people can compare their

will be able to converse very intelligently about many topics but it

what are the limits of scaling so I think one thing

to see just incredible performance and I think is significant because as you

at we get more efficient at compute which is something that's happening we

get more efficient at uh producing data or finding data or or generating data we

on because this is this is something obviously scale that we're we focus on

the efficiency of the teachers yep and I think both will be

certain amounts of data or certain amounts of of teaching um how do we how

to switch gears to uh one of the offshoots of the large language model

efforts which is particularly exciting especially to me as an engineer probably

demonstrated some pretty incredible capabilities of going from uh sort of

ultimately why are you guys so excited about

things that's cool about it is that it works at

all like I don't think it's self-evident to most people

that it would be possible to train a neural

exciting for a variety of reasons so first of all it is useful it is new it

shows that I'd say when it code has been a

carefully laying out plans

and not being fuzzy and so so it turns out that in

just the preliminary code models I expect them to be quite useful to

try to do that so their knowledge is encyclopedic it's not as deep it's

to make sure that what it said is correct because such is it level of

performance today you cannot trust it blindly especially if the code is

programming of the programming profession will

change in response to these models I think that in a like in a sense it's a