AI-Driven Development Is Here- Should You Worry?

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Department: SE4AI

Editor: Tim Menzies, timm@ieee.org

AI-driven Development Is
Here: Should You Worry?
arXiv:2204.07560v1 [cs.SE] 15 Apr 2022

Neil A. Ernst
University of Victoria
Gabriele Bavota
Università della Svizzera italiana

From the Editor: AI- DRIVEN D EVELOPMENT E NVIRON -


The benefits, as well as the draw-backs, of new MENTS (AIDE S ) integrate the power of
AI technology needs to be carefully scrutinized. modern AI into IDEs like Visual Studio Code
For example, in this article, researchers review and JetBrains IntelliJ. By leveraging massive
the promise, and potential pitfalls, of AI tools that language models and the plethora of openly
help programmer co-write their code. available source code, AIDEs promise to
This “SE for AI” column publishes commen- automate many of the obvious, routine tasks in
taries on the growing field of SE for AI. Submis- programming. At the same time, AIDEs come
sions are welcomed and encouraged (1,000–2,400 with new challenges to think about, such as bias,
words, each figure and table counts as 250 words, legal compliance, security vulnerabilities, and
try to use fewer than 12 references, and keep the their impact on learn programming.
discussion practitioner focused). Please submit Time was, programmers were laboriously en-
your ideas to me at timmieee.org.—Tim Menzies tering machine instructions on punch cards, to
be painstakingly offered up to the machine for
processing. Nowadays, of course, we type code
into editors, often hosted online, and get near-
instant feedback on the compilation/test outcomes
of our latest change. Let’s call this the Integrated
Development Environment (IDE) Revolution.
Another revolution is now underway, call
it the “AI-driven Development Environment”
(AIDE), one where a programmer can leverage
the hive-mind of the many many programmers
that came before them. Large language models
are now available to make suggestions based on
what hundreds of previous programmers typed in.

Software Published by the IEEE Computer Society © 2019 IEEE


1
Department Head

Not only is the syntax of programming “nat- Codex and Copilot are not alone: several
ural”, but in fact a lot of programming follows other AIDEs exist, including TabNine1 and Kite2 .
repetitive patterns. If you have used a form of Intelligent fill in spreadsheets is a close com-
autocomplete in your IDE, you have seen a panion, as is program repair such as Facebook’s
very primitive version of this idea (autocomplete SapFix [Marginean et al., 2019].
seems to have been popularized with Microsoft’s Codex [Chen et al., 2021] is built on OpenAI’s
Visual-* tools in the 90s). Autocomplete at its GPT-3 language model with 12B parameters, and
simplest looks for all valid completions, such as fine-tuned on 159 Gb of data from public GitHub
possible library calls in List.. But things have repositories. The nature of a language model is
progressed very rapidly in this space. Think about that it builds deep associations between words in
preparing some data analysis code: a high-dimensional space, learning complex pat-
# import a file named "observations.csv" terns that produce code. They do not require ab-
# and print the number of rows and columns
# then use the data to create stract representations of code like ASTs, working
# a plot of the observations on the token level directly. With suitable unit test
import csv
import matplotlib.pyplot as plt
coverage, generating 100 possible completions
import numpy as np produced a solve rate (pass@100) of over 70%
on a benchmark of common completion tasks.
data = []
Codex promises to dramatically change the
with open('observations.csv') as csvfile: way we write and build software systems. Like
reader = csv.reader(csvfile)
for row in reader: all AI technology, it has great promise, but also
data.append(float(row[0])) comes with several new challenges for research
print("Number of rows: ", len(data)) and for practice.
print("Number of columns: ", len(data[0]))

plt.plot(data) Promises of AI-driven DEs


plt.show() Automate the mundane Much of software
Spoiler alert: we didn’t write a single line of development is routine. Developers get a bug
that code! Instead, we typed the four lines of report, track down the bug, and file a patch;
comments into Visual Studio Code, a lightweight they wire library code together to leverage APIs;
IDE, and had GitHub’s Copilot extension “write” they need to display database records on a web
the actual Python. page and handle any updates. Much of software
Copilot, and its underlying AI engine, Ope- development is also staggeringly complex and
nAI’s Codex, leverages the power of large lan- creative, too! Software is, as Grady Booch once
guage models trained on billions of text data wrote, “the invisible thread ... on which we weave
online. Codex further tunes those language mod- the fabric of computing.” A key developer task
els with publicly available GitHub repositories. then is to carefully distinguish those tasks which
In essence, since so many repositories contain are complex, and those which are obvious or
similar lines to the ones we wrote above, the complicated [Snowden et al., 2020]. AIDEs can
language model can associate those lines with the remove the accidental complexity from what are
comments we wrote, and something that might obvious tasks, just like the code showed earlier.
seem like magic can appear. An AIDE like Copilot is already capable of
This did not happen overnight, of course. automating these routine tasks, and other tech-
Codex is the product of: nologies, such as the automated program repair
work of Facebook’s SapFix tool [Marginean et al.,
• AI research into deep neural networks and 2019], are tackling similar routine tasks.
language models; Automate API interactions Much of pro-
• Software repository mining for extracting data gramming today is about framework and API-
from repositories; driven development: connecting to a third party
• Investigations into the linguistic nature of
source code [Hindle et al., 2012]; 1 tabnine.com

• Massive amounts of compute and data storage. 2 kite.com

2 Software
service, processing the result, and sending the revisited to explicitly regulate the usage of code
result back to the user. Just as often we work for training commercial code generation tools.
within an existing architectural framework, for Also, currently Codex output is the intellectual
example for web or mobile applications, and our work product of the person who activated Codex,
programs are closely coupled with those library but this is currently because Codex is a beta, and
calls. Many of these library calls are routine and these terms might change.
repetitive for each new variant of an app. Learning to Program The nature of learn-
Teach Programming As programming lan- ing programming will change dramatically with
guages and APIs proliferate, learning new ap- AIDEs. Whether these assistants will speed up
proaches and syntax becomes more challenging. or slow down the learning process is currently
Stack Overflow is invaluable for specific answers an open question. On the one hand, novice pro-
to common, and not so common, programming grammers can benefit from AIDEs by receiving
problems. For example, how does one configure a recommendations useful to deal with tasks they
particular plotting library such as R’s ggplot to struggle with. On the other hand, the risk of
change the background colour? But Stack Over- not fully understanding the received recommen-
flow, while incredibly helpful (we certainly could dations and just accept them is there. On top of
not program without it), requires one to leave the this, AIDEs do also pose challenges for instruc-
IDE to ask questions or perform search. AIDE tors: Codex is already so good it might surpass
promises the knowledge potential of Stack Over- first year university students in introduction to
flow while avoiding continuous context switches programming (CS1) courses. Some initial results
between the IDE and the browser. And this will in our testing show it is relatively simple to get
be useful for novices and experts alike. No more Codex to generate reasonable (passing) solutions.
‘yak shaving’3 trying to figure out the correct CS1 programming assignments must change to
series of syntax calls for a given problem. handle those students who can merely pass the
entire assignment spec to Codex for a solution.
Possible Challenges with AIDE Dataset Quality Plenty of code freely avail-
Like any software development, AIDE will able online has flaws. For example, much of
come with a host of challenges to be overcome, it features student submissions, one-off explo-
challenges in traditional software concerns such rations, or other low quality work [Kalliamvakou
as defects and security vulnerabilities, but in new et al., 2014]. Like any trained model, Codex and
areas as well. other AIDEs are only as good as the training
Copyright and Licensing Codex is trained data. And although Codex has done extensive
on (54 million) public GitHub repositories, and work filtering low-quality inputs, there remains
the creators of these GitHub repositories agreed code that has bugs, that has technical debt, or
to Codex-like usage of their code. However, the that uses outdated APIs. Subtle security holes can
Codex context of use was something most of us easily persist even in high-quality, high-volume
probably did not anticipate. Does Codex have the repositories (consider the OpenSSH Heartbleed
right to all the code it was trained on? For output incident), and recent work showed how deep
the language model is a series of weights, so in learning models can learn vulnerable code and
theory, code produced is an amalgamation of the inject it during autocompletion [Schuster et al.,
inputs. Accordingly to a recent study [Ziegler, 2020]. AIDE demands that humans inspect its
2021] Codex rarely quotes code verbatim from outputs carefully, but if we use it to create code
the training set and when this happens it is usually for a problem we don’t fully understand, we
code largely reused across open source projects. won’t be able to understand its outputs either.
Does Codex-created code violate copyright? Is it More worrisome is that the language model
fair use? We don’t have an answer to this ques- reflects the biases that we humans have. For
tion, and open source licenses might need to be example, asking Copilot to generate a list of
names produces a list of predominantly En-
3 Yak shaving refers to doing a series of trivial tasks which
distract you from the original, and important, goal. Compare with
glish/American names (Fig. 1). Plotting sugges-
bikeshedding. tions generate graphs that fail to accommodate

May/June 2019
3
Department Head

the routine and simple, and the complex and


contextual? Will AI eventually design and write
complex software solutions? Currently being very
clear with AIDE is essential for it to understand
the context; but developing and communicating
a clear understanding of the problem is one
of the essentially complex problems in software
engineering.

What’s Next
The AIDE revolution has just begun, leaving
open questions on what to expect in future.
Language Models, Data, and Computa-
tional Power The rapid progress in the capabil-
ities of language models is difficult to quantify.
A simple proxy for it is the increasing number
of parameters in the language models presented
Figure 1. Copilot portends a new generation of AI- by OpenAI in the last few years. In 2018 GPT-
based productivity. The benefits, as well as the draw- 1 had 117M parameters. One year later GPT-
backs, of this new approach need to be carefully 2 pushed the boundaries to 1.5B, and in 2020
scrutinized. For example, asking Copilot to generate GPT-3 reached 175B parameters. Rumors place
a list of names produces the gray text; which is a list the next release (GPT-4) at an astonishing 100T
of predominantly English/American names (indicating parameters (500 × GPT-3) [Romero, 2021]. Sim-
an interesting, if not perhaps troubling, bias in its ilarly, the amount of training data available for
training data.) code-related tasks is increasing every day, as
are the computational capabilities of GPUs. Put
together, these advances are expected to substan-
people with color-blindness. This is of course tially improve the support AIDEs can provide
both a challenge for us, as much as it is for AIDE. to developers. To what extent will these im-
Sociotechnical Questions The IDE revolution provements be affordable (in money and climate
produced a now well known paradigm in com- terms), and accessible (for those with no data
puter programming, with continuous integration centres)?
workflows dominant. But AIDE will possibly Improving the Quality of Training Data As
change that as more and more of the work is previously discussed, one of the main challenges
routinized and automated. More programmer time when dealing with data-driven assistants lies in
will be available for complex problem solving. the quality of the training data. Manually check-
But that means our current knowledge of how ing all training instances is just not an option, but
humans and machines interact will change. When can AI help AI? In other words, can we teach AI
Facebook rolled out their automated bug repair what a high-quality training instance is? Whatever
approach, one of the biggest challenges was not the underlying technology will be, defining tech-
the technical problem, but rather integrating the niques to automatically filter out noisy and flawed
repair bot into the humans that worked with training instances is a cornerstone for AIDEs, and
them [Harman and O'Hearn, 2018]. a focus for GitHub’s next iteration of Copilot.
Context and Complexity Mechanization— Code is Not (Just) Text Language models
such as in steel-making or automotive—has have been proposed in the context of natural
greatly improved productivity at the expense of language processing, in which they are fed with a
those humans doing the routine. AIDEs will likely stream of tokens representing the text to process.
be no different. To what degree will an AIDE However, code is not just text and there is active
be able to carefully contextualize the solution for research investigating what the best representa-
a specific problem? Where is the line between tion is when feeding code as input to language

4 Software
models. For example, structural information can I. Babuschkin, S. Balaji, S. Jain, W. Saunders,
be extracted from the code Abstract Syntax Tree C. Hesse, A. N. Carr, J. Leike, J. Achiam,
(AST) and used to boost the model’s perfor- V. Misra, E. Morikawa, A. Radford, M. Knight,
mance. M. Brundage, M. Murati, K. Mayer, P. Welin-
Consumer-related Customization While der, B. McGrew, D. Amodei, S. McCandlish,
Copilot is able to provide code recommendations I. Sutskever, and W. Zaremba. Evaluating large
that are tailored for the specific coding task language models trained on code, 2021.
at hand, no customization is performed when M. Harman and P. O'Hearn. From start-ups to
it comes to the developer receiving such scale-ups: Opportunities and open problems for
recommendation. However, two developers static and dynamic program analysis. IEEE,
having a different technical background, coding Sept. 2018. doi: 10.1109/scam.2018.00009.
history, and skills, may benefit from different URL https://doi.org/10.1109/scam.2018.00009.
recommendations. For example, more expert A. Hindle, E. T. Barr, Z. Su, M. Gabel, and
developers working on real-time software are P. Devanbu. On the naturalness of software. In
likely to appreciate multi-threading solutions to Proceedings of the 34th International Confer-
a given task, while newcomers may be confused ence on Software Engineering, page 837–847.
by its usage. Customizing the recommendations IEEE Press, 2012. ISBN 9781467310673.
based on the target developer can substantially E. Kalliamvakou, G. Gousios, K. Blincoe,
increase the usefulness of AIDEs. L. Singer, D. M. German, and D. Damian.
AIDE Learning Rate A last point worth The promises and perils of mining github. In
discussing is the learning rate we can expect Proceedings of the 11th Working Conference
from AIDEs, namely the pace at which they’ll be on Mining Software Repositories, page 92–101,
able to improve their capabilities. All the above- 2014.
discussed points can contribute to that, but it’s A. Marginean, J. Bader, S. Chandra, M. Harman,
unclear when the AI will be able to pass a pro- Y. Jia, K. Mao, A. Mols, and A. Scott. SapFix:
grammer’s Turing Test, for example submitting Automated end-to-end repair at scale. IEEE,
pull requests that reviewers cannot distinguish May 2019. doi: 10.1109/icse-seip.2019.00039.
from a human’s submission. A. Romero. GPT-4 will have 100 trillion pa-
An important truism in software development rameters — 500x the size of GPT-3. https:
is Fred Brook’s maxim “there is no silver bullet”, //bit.ly/3uLakBC, 2021. Accessed: 2021-11-02.
derived from his insight into essential (inherent) R. Schuster, C. Song, E. Tromer, and
complexity versus accidental (self-imposed) com- V. Shmatikov. You autocomplete me:
plexity. Like any new approach to our challenging Poisoning vulnerabilities in neural
field, AIDE is unlikely to become a panacea code completion. Technical Report
for software development. But it does seem to arXiv:2007.02220, 2020.
portend an important shift in how we develop D. Snowden, Z. Goh, R. Greenberg, and
software, and just might remove some of the B. Bertsch. Cynefin: Weaving Sense-Making
accidental complexity in our projects. into the Fabric of Our World. Cognitive Edge,
2020.
REFERENCES A. Ziegler. GitHub Copilot: Parrot or crow? a
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. first look at rote learning in GitHub Copilot
de Oliveira Pinto, J. Kaplan, H. Edwards, suggestions. https://docs.github.com/en/github/
Y. Burda, N. Joseph, G. Brockman, A. Ray, copilot/research-recitation, 2021. Accessed:
R. Puri, G. Krueger, M. Petrov, H. Khlaaf, 2021-11-02.
G. Sastry, P. Mishkin, B. Chan, S. Gray,
N. Ryder, M. Pavlov, A. Power, L. Kaiser, Neil A Ernst is with University of Victoria, Canada.
M. Bavarian, C. Winter, P. Tillet, F. P. He conducts research into software design and natu-
Such, D. Cummings, M. Plappert, F. Chantzis, ral language understanding in software. Contact him
at nernst@uvic.ca.
E. Barnes, A. Herbert-Voss, W. H. Guss,
A. Nichol, A. Paino, N. Tezak, J. Tang,

May/June 2019
5
Department Head

Gabriele Bavota is with Università della Svizzera ital-


iana, Switzerland. He conducts research into mining
software repositories and recommender systems for
software developers. Contact him at gbavota@usi.ch.

6 Software

You might also like