Professional Documents
Culture Documents
Design of Graphics Processing Framework On FPGA
Design of Graphics Processing Framework On FPGA
Design of Graphics Processing Framework On FPGA
Abstract— High performance graphics processing is a on processing graphical parameters to generate 2D/3D effects
challenging task in embedded system domain. Field on the projected displays. GPU design is highly optimized to
Programmable Gate Array (FPGA) attracts a great interest in perform this complex graphics processing on 3D polygons to
building Graphics Processing Unit (GPU) framework on its generate motion pictures on screen.
platform due to its feasibility and configurable resources. This
In past generation, there was less importance for graphics
paper provides design perspectives of GPU framework and its
building blocks to realize GPU functionality on FPGA to draw
processing in computer systems. CPUs were used all the way
base primitives. In this proposal, triple video display buffers has to handle system’s internal processing along with graphics
been designed and implemented for buffering entire video frames processing. The technology mandates for dedicated graphics
in cyclic manner. The GPU framework IP core for the proposed processing. CPU has been offloaded from graphics processing
approach has been designed and developed using Xilinx tools and to boost the overall system performance.
has been implemented and tested using a Xilinx SP605 FPGA GPU has been evolved to offload the graphics processing
evaluation board. The IP Core is commanded from host system. capabilities from CPU to avoid the compromise in system
Hardware Description Language (VHDL) has been used for speed due to complex graphics processing. It is optimized in
design of GPU framework. This paper provides design and terms of instructions sets to handle graphical data and
implementation details of basic building blocks required to processing. So CPU and GPU entirely varies in terms of
achieve GPU functionalities on FPGA. instruction set architecture. In a way it helps in accelerating
the video processing capabilities.
The video/image processing like motion pictures, video
Keywords— GPU, Video Frame Buffers, FPGA, DDR3 editing, gaming, animation works or anything deals with
Memory Controller, VHDL graphics portion of system processing is handled by GPU. It is
integrated with Video RAM, which helps to buffer the
I. INTRODUCTION computed output of GPU and transfers to screen. CPU and
GPU works together to make the graphics processing much
GPU is a Unit, processes graphical information required simpler. It acts as co-processor in the motherboard to
for the display systems such as primitives, images, videos, accelerate graphics processing capabilities of CPU.
etc., GPUs are very much similar to CPU but the design cores GPU was initially assigned to perform the rendering of
in GPU are optimized to handle images and video information. polygons rectangles, lines, etc., and populating character
The calculation and computation involves complex math on pixels on the screen. Later, the functionality has been
given input parameters to display graphics information. increased to perform operation like movement of objects,
Graphics Processing Unit (GPU) is designed to perform sequence of graphics operation, filling of object with colors
highly mathematical computation on multiple graphical input etc., and 2D accelerator has been implemented in GPU using
data in parallel. The input data is graphics primitives, such as fixed function graphics pipeline for graphics processing. Later
polygon structures which are of floating point unit type. The 3D hardware accelerator has been implemented with dynamic
processing capabilities of GPU are mostly applying function graphics pipeline to meet the graphics animation and
trigonometric on input graphics primitive vertices. gaming console demands in the market.
Computation involved inside GPU is highly parallel and Advanced GPU comprises of multiples of cores to perform
complex. The real time graphics processing of motion pictures compute-intensive functions whereas CPU has multi-core to
requires highly accurate and precise algorithms in it to make perform sequential serial operation to form a heterogeneous
the smooth transition of motion pictures. system. GPU processes many vertices in parallel using same
GPU functionalities involve mathematical calculation program i.e., SIMD (Single Instruction Multiple Data). In turn
which is beyond the basic mathematics which can be it processes given primitives at the same time to create a scene
performed well by CPU. GPU provides highly computational on the screen.
processing capabilities compared to CPU. Thus there is a huge II. BACKGROUND
difference in computation capabilities across CPU and GPU
A. Graphics Pipeline
Ramanathan S G: Aerospace Electronics and Systems Division, CSIR
National Aerospace Laboratories, Bangalore, India (ram_ald@ nal.res.in) A number of studies attempted towards the understanding
of GPU underline architecture [1] to develop the design frame
Pradeep Kumar B: Aerospace Electronics and Systems Division, CSIR work of GPU on FPGA. Typical Graphics pipeline [4] of GPU
National Aerospace Laboratories, Bangalore, India is represented in Fig. 1. The GPU functions lies between input
(pradeepkumar@nal.res.in) commands to frame memory buffer. The efficiency of any
C M Ananda: Aerospace Electronics and Systems Division, CSIR National
GPU depends on algorithm developed for following functions
Aerospace Laboratories, Bangalore, India (ananda_cm@nal.res.in ) [2]:
978-1-5090-0774-5/16/$31.00
Authorized licensed use limited to:© 2016 IEEE DE SAO PAULO. Downloaded on May 12,2024 at 06:31:22 UTC from IEEE Xplore.
UNIVERSIDADE Restrictions apply.
387
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
Rasterization Coordinates
Process
Clipping
Fig. 3: Frame Structure of Host’s Instruction to FPGA
Texturing
B. Rasterization Logic
Fragmentation
The Rasterization works on the primitive’s vertices
available on instruction packet structure.
Frame Buffers Rendering/Rasterization logic is to compute the memory
address of DDR3. In-turn the memory address are mapped [8]
Display Screen to projected screen location. In this design, used screen
resolution is XGA (1024 x 768). In case provided coordinates
Fig. 1: Graphics Pipeline Blocks are out of projected screen resolution then design logic ensures
that all those regions are clipped. The mapping between screen
The advanced graphics processors are designed to have ‘n’ coordinates and absolute screen location in X and Y direction
graphics pipelines as cores in parallel to each other to handle is represented in Fig. 4:
multiple data at the same time. On processing video frame,
multiple objects and primitives with different coordinates are -1, +1 (0, 0) 0, +1 (511, 0) +1, +1 (1023, 0)
handled in parallel. The efficient graphics processing unit
processes all such inputs in parallel and generates minimum of
+1, 0 (1023, 383)
Flush
Buffer Video Display
HOST Rendering/ DDR3
Application Rasterization Memory Logic Graphics Station -1, -1 (0, 767) 0, -1 (511, 767) +1, -1(1023, 767)
on CPU Logic Controller (XGA
Controller Resolution) Fig. 4: Coordinate and Absolute Location in Screen
Read Latest Design
Display
Buffer The formulae used to compute absolute screen location
from given screen coordinates of design resolution is given in
Operational GPU on FPGA equations (1) and (2). & are absolute locations;
Video Frame Buffers
Buffer
& are screen coordinates
Display
Buffer#1
Display
Buffer#2 (2)
Display
Buffer#3 The absolute screen location has to be offset in case the
DDR3 Memory
required primitive’s size is more than two, so that primitive
looks center in the screen’s given coordinates. The formulae
Fig 2: GPU Blocks on FPGA for offsetting the computed absolute screen location are given
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:31:22 UTC from IEEE Xplore. Restrictions apply.
388
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
in equations (3) and (4). & are offset absolute followed by command in a regular sequence as depicted in
locations; is given dot size to draw. Fig. 8 flowchart.
(3)
(4) S TART
Memory
Rasterization process provides address and Not
Calibration
color data to draw into operational memory buffer. The same Completed
S tatus
color is used to illuminate pixels on the screen.
Completed
RG Read
Finish
B S tatus
Finished
S TOP
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:31:22 UTC from IEEE Xplore. Restrictions apply.
389
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
START
Rendering Logic Port A Operational Buffer
Memory Region
Initialize all
Counters/ Port B
Register Read
Flush Buffer On Flush Display Buffer1
Trigger Latest
Logic Port D
Memory Port C
On Display
Not Calibration Display Buffer2
Completed Status
Vsync_Trigger Buffer
Completed
Display Buffer3
Controller’s
Not
Data FIFO
Empty
Empty
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:31:22 UTC from IEEE Xplore. Restrictions apply.
390
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
TABLE II. COMPUTED NUMERIC FOR DOT SIZE = 2 VI. SUMMARY/CONCLUSIONS
Basic GPU frame work blocks design on FPGA has been
510 382 391678 presented in this paper where rendering and frame buffering at
2 510 383 392702 the input side and video graphics controller design at the
output side were addressed. At present dot primitive is
addredded as part of rendering logic in input side. This design
1024 prototype shall become a baselined framework to build rest of
the GPU functionalities. Implementation of triple-display
391678
buffer in this platform shall improve the overall efficiency of
392702 1024 * 768 GPU system. VHDL has been used for overall design and
Display implementation of this prototype system. In future, the same
platform shall be used for addressing Anti-Aliasing effect,
Blending, Overlay of additional information, Multi-layering
768 concepts etc.,
Fig. 11: DOT Size = 2 in Display Unit
ACKNOWLEDGMENT
B. Case 2:DOT Size = 5:
This work was supported in part by the Council for
The computed ( , ) coordinates are (511,383). For
Scientific & Industrial Research (CSIR), National Aerospace
the given dot size = 5, five rows starting addresses are
computed and shown in Table 3. Laboratories (NAL). The authors gratefully acknowledge Mr.
Shyam Chetty, Director, CSIR-NAL, Bangalore for
continuous support and motivation.
TABLE I. COMPUTED NUMERIC FOR DOT SIZE = 5
REFERENCES
509 381 390653 [1] J. George Cherian Panappally, Dhanesh M.S “Design of Graphics
Processing Unit for Image Processing,” IEEE’14, Dec. 2014
509 382 391677
5 [2] Ahmed Al Maashri, Guangyu Sun, Xiangyu Dong, Vijay Narayanan and
509 383 392701 Yuan Xie “3D GPU Architecture using Cache Stacking: Performance,
509 384 393725 Cost,Power and Thermal analysis,” IEEE’09.
509 385 394749 [3] Virginie FRESSE, Dominique HOUZET and Christophe GRAVIER
“GPU architecture evaluation for multispectral and hyperspectral image
analysis,” IEEE’10.
V. EXPERIMENTAL SETUP AND RESULTS [4] Matt Pharr, Randima Fernando, “GPU Gems 2: Programming
Techniques for High-Performance Graphics and General-Purpose
GPU framework design blocks has been designed, Computation”, Addison-Wesley Professional, 2005.
integrated and implementation on FPGA SPARTAN 6
[5] Shujjat Khan, Donald Bailey and Gourab Sen Gupta “Simulation of
XC6SLX45T [10] and evaluated on SP605 Evaluation board Triple Buffer Scheme,” IEEE’09.
[13]. The design utilized Xilinx IP blocks such as FIFO, [6] Van-Huan Tran, Xuan-Tu Tran “An Efficient Architecture Design for
Memory controller, clock generator etc., in XILINX ISE 14.7 VGA Monitor Controller,” IEEE’11.
tool. The evaluation board provides differential clocks of [7] Guohui Wang, Yong Guan “Designing of VGA Character String
200MHz to the FPGA. The clock generator IP blocks derive Display Module Base on FPGA,” IEEE’09.
the required clocks for its internal block operation. [8] Jong Won Park “An Efficient Buffer Memory System for Subarray
Access,” IEEE’01.
Display driven by FPGA
[9] VESA and Industry Standards and Guidelines for Computer Display
Monitor Timing, VESA Standard DMT v1.0, r11, 2007
[10] Xilinx, “Spartan-6 FPGA Family: Complete Data Sheet”,
www.xilinx.com
[11] Xilinx, Using Memory Controller in Spartan-6 FPGAs, Four-Port
Memory Controller Core v3.92. www.xilinx.com.
[12] Xilinx, Using Memory Controller in Spartan-6 FPGAs, Memory
Controller User Guide UG388 v2.3. www.xilinx.com.
[13] Xilinx, Using SP605 Hardware User Guide, UG526 v1.8.
www.xilinx.com.
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 12,2024 at 06:31:22 UTC from IEEE Xplore. Restrictions apply.
391