Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

History and Evolution of GPU Architecture

A Paper Survey
Chris McClanahan
Georgia Tech
College of Computing
chris.mcclanahan@gatech.edu
ABSTRACT
The graphics processing unit (GPU) is a specialized and highly
parallel microprocessor designed to offload and accelerate 2D or 3D
rendering from the central processing unit (CPU). GPUs can e
found in a !ide range of systems" from des#tops and laptops to
moile phones and super computers $3%.
This paper pro&ides a summary of the history and e&olution of GPU
hard!are architecture. The information in this paper" !hile eing
slightly '()D)* iased" is presented as a set of milestones noting
ma+or architectural shifts and trends in GPU hard!are.
The e&olution of GPU hard!are architecture has gone from a
specific single core" fi,ed function hard!are pipeline
implementation made solely for graphics" to a set of highly parallel
and programmale cores for more general purpose computation.
The trend in GPU technology has no dout een to #eep adding
more programmaility and parallelism to a GPU core architecture
that is e&er e&ol&ing to!ards a general purpose more CPU-li#e core.
.uture GPU generations !ill loo# more and more li#e !ide-&ector
general purpose CPUs" and e&entually oth !ill e seamlessly
comined as one.
Keywords
/ard!are" Graphics" Graphics Pipeline" 0ulti-core
Permission to ma#e digital or hard copies of all or part of this !or# for
personal or classroom use is granted !ithout fee pro&ided that copies are
not made or distriuted for pro1t or commercial ad&antage and that copies
ear this notice and the full citation on the 1rst page. To copy other!ise" to
repulish" to post on ser&ers or to redistriute to lists" re2uires prior speci1c
permission and3or a fee.
Copyright 2454 Chris 0cClanahan ...67.44
1. INTRODUCTION
* graphics processing unit (GPU) is a dedicated parallel
processor optimized for accelerating graphical computations. The
GPU is designed specifically to perform the many floating-point
calculations essential to 3D graphics rendering. 0odern GPU
processors are massi&ely parallel" and are fully programmale. The
parallel floating point computing po!er found in a modern GPU is
orders of magnitude higher than a CPU $2%.
GPUs can e found in a !ide range of systems" from des#tops
and laptops to moile phones and super computers. 8ith their
parallel structure" GPUs implement a numer of 2D and 3D graphics
primiti&es processing in hard!are" ma#ing them much faster than a
general purpose CPU at these operations $3%.
Illustration 1: The traditional fied!fun"tion #ra$hi"s
$i$eline
5
The original GPUs !ere modeled after the concept of a
graphics pipeline. The graphics pipeline is a conceptual model of
stages that graphics data is sent through" and is usually implemented
&ia a comination of hard!are (GPU cores) and CPU soft!are
(9penG:" Direct;). The graphics pipeline design approach is fairly
consistent among the ma+or GPU manufacturers li#e '()D)*" *T)"
etc." and helped accelerate GPU technology adoption $5%.
The pipeline simply transforms coordinates from 3D space
(specified y programmer) into 2D pi,el space on the screen. )t is
asically an <assemly line< of &arious stages of operations to apply
to triangles3pi,els" and can e generalized out to 2 stages= Geometry
and >endering.
?arly GPUs implemented only the rendering stage in hard!are"
re2uiring the CPU to generate triangles for the GPU to operate on.
*s GPU technology progressed" more and more stages of the
pipeline !ere implemented in hard!are on the GPU" freeing up
more CPU cycles $5%.
%. 1&'()s
:eading up to the early 5@A4Bs" the <GPUs< of the time !ere
really +ust integrated frame uffers. They !ere oards of TT: logic
chips that relied on the CPU" and could only dra! !ire-frame
shapes to raster displays $C%. The term DGPUE !ould not e
introduced until 5@@@ y '()D)*" ut for consistency" it !ill e
used throughout this paper $A%.
The )F0 Professional Graphics Controller (PG*) !as one of
the &ery first 2D33D &ideo cards for the PC $2%. The PG* used an
on-oard )ntel A4AA microprocessor to ta#e o&er processing all
&ideo related tas#s" freeing up the CPU for &ideo processing (such
as dra!ing and coloring filled polygons). Though it !as released in
5@AC" 54 years efore hard!are 2D33D acceleration !as
standardized" the G67744 price tag and lac# of compatiility !ith
many programs and non-)F0 systems at the time" made it unale to
achie&e mass-mar#et success. The PG*Bs separate on-oard
processor mar#ed an important step in GPU e&olution to further the
paradigm of using a separate processor for graphics computations
$5%.
Fy 5@AH" more features !ere eing added to early GPUs" such
as Ihaded Iolids" (erte, lighting" >asterization of filled polygons"
and Pi,el depth uffer" and color lending. There !as still much
reliance on sharing computation !ith the CPU $C%.
)n the late 5@A4Bs" Iilicon Graphics )nc. (IG)) emerged as a
high performance computer graphics hard!are and soft!are
company. 8ith the introduction of 9penG: in 5@A@" IG) created
and released the graphics industryBs most !idely used and
supported" platform independent" 2D33D application programming
interface (*P)). 9penG: support has also ecome an intricate part
of the design of modern graphics hard!are. IG) also pioneered the
concept of the graphics pipeline early on $5%.
Illustration %: *arly trends in $i$eline $arallelis+
,. 1&&()s
,.1 -eneration (
GPU hard!are and the graphics pipeline started to really ta#e
shape in 5@@3" !hen IG) released its >eality?ngine oard for
graphics processing $7%. There !ere distinct oards and logic chips
for &arious later stages of the graphics pipeline" ut still relied on
the CPU for the first half. Data had a fi,ed flo! through each stage.
Fy the mid 5@@4Bs IG) cards !ere mainly found on
!or#stations" !hile 3D graphics hard!are ma#ers 3D.; ((oodoo)"
'()D)* (T'T)" *T) (>age)" and 0atro, started pro&iding
consumer 3D graphics oards. * comination of <cheap< hard!are
!ith games such as Jua#e and Doom really dro&e the gaming
industry and GPU adoption $K%.
?&en !ith deep pipelines though" early GPUs still only output
one pi,el per cloc# cycle" meaning CPUs could still send more
triangles to the GPU than it could handle. This lead to adding more
pipelines in parallel to the GPU (and ultimately more cores)" so
multiple pi,els could e processed in parallel each cloc# cycle $5%.
,.% -eneration I
The 3df, (oodoo (5@@K) !as considered one of the first true
3D game cards $H%. )t only offered 3D acceleration" so you still
needed a 2D accelerator. )t operated o&er the PC) us" had 5 million
transistors" C 0F of KC-it D>*0" and the core cloc# operated at
74 0/z. The CPU still did &erte, transformations" !hile the
(oodoo pro&ided te,ture mapping" z-uffering" and rasterization.
2
,., -eneration II
)n 5@@@" the first cards to implement the entire pipeline (no! !ith
transform and lighting calculations) in GPU hard!are !ere
released. 8ith the introduction of '()D)*Bs Ge.orce27K and *T)Bs
>adeon H744" the first true GPUs !ere a&ailale at the consumer
le&el $H%.
Illustration ,: .i#ration of fun"tionality to -/U hardware
Up until 5@@@" the term <GPU< didnBt actually e,ist" and
'()D)* coined the term during itBs launch of the Ge.orce 27K $A%.
The Ge.orce 27K had 23 million transistors" 32 0F of 52A-it
D>*0" he core cloc# operated at 5240/z" and had four KC-it
pipelines for rendering $@%.
This generation of cards !ere the first to use the ne!
*ccelerated Graphics Port (*GP) instead of the PC) us" and
offered ne! graphics features in hard!are" such as multi-te,turing"
ump maps" light maps" and hard!are geometry transform and
lighting $H" A%.
The first pipelines no! in hard!are !ere #no!n as a <fi,ed
function< pipeline" ecause once the programmer sent graphics data
into the GPUBs pipeline" the data could not e modified $5%.
8ith these cards" the GPU hard!are and computer gaming
mar#et really started to ta#e off $5%. 8hile much faster" the main
prolem !ith the fi,ed function pipeline model !as the infle,iility
of graphical effects. Iince the feature sets defined y 9penG: and
Direct; *P)s !ere implemented in hard!are" as ne!er features
!ere added to graphics *P)s" the fi,ed function hard!are could not
ta#e ad&antage of the ne! standards $5%.
0. %((()s
0.1 -eneration III
The ne,t step in the e&olution of GPU hard!are !as the
introduction of the programmale pipeline on the GPU. )n 2445"
'()D)* released the Ge.orce 3 !hich ga&e programmers the
aility to program parts of the pre&iously non-programmale
pipeline $5%. )nstead of sending all the graphics description data to
the GPU and ha&e it simply flo! through the fi,ed pipeline" the
programmer can no! send this data along !ith &erte, <programs<
(called shaders) that operate on the data !hile in the pipeline $5%.

These shader programs !ere small <#ernels<" !ritten in
assemly-li#e shader languages. .or the first time" there !as limited
amount of programmaility in the &erte, processing stage of the
pipeline. 9ther popular cards at this time !ere *T) >adeon A744"
and 0icrosoftBs ;-Fo, $54%.
Illustration 0: -e1or"e , Ar"hite"ture
The Ge.orce 3 had 7H million transistors" KC 0F of 52A-it
DD> D>*0" and the core cloc# operated at 5240/z $@%.
0.% -eneration I2
9ne year later" in 2442" the first fully programmale graphics
cards hit the mar#et= '()D)* Ge.orce .;" *T) >adeon @H44.
These cards allo!ed for per-pi,el operations !ith programmale
&erte, and pi,el(fragment) shaders" and allo!ed for limited user-
specified mapping of input-output data operations $H" 54%. Ieparate"
dedicated hard!are !ere allocated for pi,el shader and &erte,
shader processing.
3
The first Ge.orce .; had A4 million transistors" 52A 0F of
52A-it DD> D>*0" and the core cloc# operated at C440/z. $@%.
)n 2443" the first !a&e of GPU computing started to come
aout !ith the introduction of Direct; @" y ta#ing ad&antage of the
programmaility no! in the GPU hard!are" ut for non graphics
$A%. .ull floating point support and ad&anced te,ture processing
started to appear in cards.
0., -eneration I2
Fy this time" the rate of GPU hard!are technology !as
accelerating at a rate much faster than 0ooreBs :a! $H%. )n 244C" the
Ge.orce K and >adeon ;A44 !ere released" and !ere some of the
first cards to use the PC)-e,press us. .or soft!are" early high le&el
GPU languages such as Froo# and Ih started to appear" that offered
true conditionals and loops and dynamic flo! control in shader
programs. 9n the hard!are side" higher precision (KC-it doule
support)" multiple rendering uffers" increased GPU memory and
te,ture accesses !ere eing introduced $H" 54%.
Illustration 3: -e1or"e 4 Ar"hite"ture
The first Ge.orce K had 5CK million transistors" 27K 0F of
27K-it GDD>3 D>*0" and the core cloc# operated at 7440/z
$@%.
8hen &ie!ed as a graphics pipeline" the a GPU contains a
programmale &erte, engine" a programmale fragment engine" a
te,ture engine" and a depth-compare3lending-data !rite engine.
8hen &ie!ed alternati&ely as a processor Dfor non-graphics
applications" a GPU can e seen as a large amount of programmale
floating-point horsepo!er and memory and!idth that can e
e,ploited for compute-intensi&e applications completely unrelated
to computer graphicsE $55%.
* trend to!ards GPU programmaility !as starting to form.
0.0 -eneration 2I
The introduction of '()D)*Bs Ge.orce A series GPU in 244K
mar#ed the ne,t step in GPU e&olution= e,posing the GPU as
massi&ely parallel processors $C%.
Illustration 4: Unified hardware shader desi#n
The GA4 (Ge.orce AA44) architecture !as the first to ha&e
<unified<" programmale shaders - in other !ords" a fully
programmale unified processor called a Itreaming 0ultiprocessor"
or I0" that handled &erte," pi,el" and geometry computation. *lso
introduced !as a ne! geometry shader" adding more
programmaility !hen comined !ith the &erte, shader and pi,el
shaders $54%. 8ith a no! unified shader hard!are design" the
traditional graphics pipeline model is no! purely a soft!are
astraction.
C
Illustration 5: -e1or"e ' Ar"hite"ture
The first Ge.orce A had KA5 million transistors" HKA 0F of
3AC-it GDD>3 D>*0" and the core cloc# operated at K440/z
$@%.
To harness all this <general purpose< GPU po!er !as the ne!
programming language CUD* y '()D)* for '()D)* only. 'ot
much later" *T) Itream for *T) cards and Direct; 54 for either card
(0icrosoft 8indo!s only though) !ere introduced $52%.
0.3 -eneration 2II
The trend to!ards more CPU-li#e" programmale GPU cores
continues !ith the introduction of '()D)*Bs .ermi architecture.
*nnounced in late 244@" ut released in early 2454" the .ermi GPU
!as the first GPU designed for GPGPU computing" ringing
features such as= true /8 cache hierarchy " ?CC " unified memory
address space " concurrent #ernel e,ecution" etter doule precision
performance" and dual !arp schedulers $53%. The GT;CA4 .ermi
had a total of CA4 CUD* cores at launch (57 streaming
multiprocessors L 32 CUD* cores each) $53%.
Illustration ': 1er+i Ar"hite"ture
The first .ermi cards had 3 illion transistors" 5.7 GF of 3AC-
it GDD>7 D>*0" and the core cloc# operated at H440/z $53%.
Illustration &: Transistor "ount trends for -/U "ores
7
3. %(1( and Beyond
>ecently" in late 2454" '()D)* refreshed their .ermi-ased
gaming card" the GT;7A4" adding one more I0 (ringing the total
CUD* core count to 752) and offering a slightly higher memory
and!idth.
The GPU hard!are e&olution thus far has gone from an
e,tremely specific" single core" fi,ed function hard!are pipeline
implementation +ust for graphics rendering" to a set of highly
parallel and programmale cores for more general purpose
computation. 'o!" the architecture of many-core GPUs are starting
to loo# more and more li#e multi-core" general purpose CPUs $5C%.
)n that respect" .ermi can essentially e thought of as a 5K-core
CPU !ith 32-!ay hyper-threading per core" !ith a !ide &ector
!idth.
*0D recently announced their .usion line of CPUMGPUs on
the same die (dued *PU" accelerated processing unit)" to e
released early 2455 $5C%. *0DBs *PUs are designed so that a
standard ,AK processor for scalar !or#loads and a D;55 GPU for
&ector !or#loads are rought together on the same die.
)ntelBs :arraee processor rings many smaller ,AK cores on a
single die" augmented y a !ide &ector unit $57%. )ntelBs
IandyFridge CPU also has an integrated GPU !hich oth cores
share the :3 cache $5K%.
)f merging CPUMGPU is the trend" then !hat sort of fi,ed-
function hard!are on the GPU (if any) should remainN Ihould there
e&en e a distinction to the programmer et!een CPU &s GPUN
8ill future programming languages astract the GPU into +ust
another set of multi-core processors for applicationsN
D*ll processors aspire to e general purposeE Tim (an /oo#"
Oeynote" Graphics /ard!are 2445 $5%.
4. CONC6USION
The e&olution of GPU hard!are architecture has gone from a
specific single core" fi,ed function hard!are pipeline
implementation made solely for graphics" to a set of highly parallel
and programmale cores for more general purpose computation. The
trend in GPU technology has no dout een to #eep adding more
programmaility and parallelism to a GPU core architecture that is
e&er e&ol&ing to!ards a general purpose more CPU-li#e core.
.uture GPU generations !ill loo# more and more li#e !ide-
&ector general purpose CPUs" and e&entually oth !ill e
seamlessly comined as one.
5. R*1*R*NC*S
$5% Crow, Thomas. Evolution of the Graphical
Processing Unit. 2004 Paper.
http :citeseer!.ist.psu.e"uview"oc"ownloa"#
"oi$%0.%.%.%42.&'()rep$rep%)t*pe$p"f
$2% +i,ipe"ia: -i"eo Car". .etrieve" /ovem0er 20%0.
http :en.wi,ipe"ia.orgwi,i-i"eo1car"
$3% +i,ipe"ia: Graphics Processing Unit. .etrieve"
/ovem0er 20%0.
http : en .wi,ipe"ia .org wi,i Graphics 1 processing 1 unit
$C% 2uc,, 3an. The Evolution of GPUs for General
Purpose Computing. GTC 20%0.
https :"ocs.google.comviewer#url$http:www .
nvi"ia .com content GTC 4
20%0 p"fs 22561 GTC 20%0. p"f
$7% 7ue0,e, 8avi". GPU 9rchitecture: 3mplications )
Tren"s. :3GG.9P; 200(.
http : s 0(. i"av .uc"avis .e"u lue0,e 4nvi"ia 4gpu 4
architecture .p"f
$K% <ran,en, =annic,. 3ntro"uction to Graphics
;ar"ware an" GPUs. 200( 7ecture.
https :"ocs.google.comviewer#url$http:
research .e"m .uhasselt .0e > *franc,en aac aac .ppt
K
Illustration 1(: A.D 1usion Ar"hite"ture
$H% -en,at, :uresh. Un"erstan"ing the graphics
pipeline. 2006 7ecture.
www .cis .upenn .e"u > suven,at 500 lectures 2 7ecture 2
.ppt
$A% G?""e,, 8omini,. GPU Computing with CU89 4
Part %: GPU Computing Evolution. 200@ 7ecture.
http :www.mathemati,.uni4"ortmun"."e
> goe""e,e gpgpu cu"a 4200@%4 ;istor* .p"f
$@% <rom -oo"oo to Ge<orce: The 9wesome ;istor*
of &8 Graphics. .etrieve" /ovem0er 20%0.
http : www .ma!imumpc .com print '&&(
$54% ;*esoon, Aim. 8esign Game Consoles 0@:
lec1graphics.p"f. 200@ 7ecture.
https : "ocs .google .com viewer #
url $ http : www .cc .gatech .e"u > h*esoon spr 0@ lec 1 gra
phics .p"f
$55% GPU Gems 2. Chapters 2@4&0. 2006 2oo,.
http : http ."ownloa" .nvi"ia .com "eveloper GPU 1 Gems
12
$52% ;ouston, Bi,e. GPU 9rchitecture. 20%0 7ecture.
https : graphics .stanfor" .e"u wi,is cs 44( s 4
%0 <rontPage #
action $ 9ttach<ile ) "o $ get ) target $ C: 44( s 4%040&. p"f
$53% <ermi +hitepaper. 20%0 /-3839.
http : www .nvi"ia .com content P8< fermi 1 white 1 paper
s /-3839 1 <ermi 1 Compute 1 9rchitecture 1 +hitepaper .p
"f
$5C% 9B8 <usion whitepaper. 20%0 9B8.
http :sites.am".comus8ocuments4(42&21fusion1w
hitepaper1+E2.p"f
$57% :eiler, 7arr*. 7arra0ee: a man*4core !('
architecture for visual computing.
http:portal.acm.org.www.li0rar*.gatech.e"u:204(ft1g
atewa*.cfm#
i"$%&'0'%5)t*pe$p"f)coll$portal)"l$9CB)C<38$%5
%@&(@4)C<TCAE/$(%%466@0
$5K% +i,ipe"ia: :an"* 2ri"ge. .etrieve" 8ecem0er
20%0.
http:en.wi,ipe"ia.orgwi,i:an"*12ri"ge1Dmicroarchit
ectureE
H

You might also like