Computing Architectures For Virtual Reality: Electrical and Computer Engineering Dept

Electrical and Computer Engineering Dept.
Computing Architectures for

Virtual Reality
Computer
(rendering
pipeline)
System architecture
Computing Architectures
The VR Engine
Definition:
A key component of the VR system which reads its input devices,

accesses task-dependent databases, updates the state of the
virtual world and feeds the results to the output displays.
It is an abstraction – it can mean one computer, several

co-located cores in one computer, several co-located computers,
or many remote computers collaborating in a distribute simulation
The real-time characteristic of VR requires a VR engine

which is powerful in order to assure:
 fast graphics and haptics refresh rates (30 fps for graphics and
hundreds of Hz for haptics);
 low latencies (<100 ms to avoid simulation sickness);
 at the core of such architecture is the rendering pipeline.
 within the scope of this course rendering is extended to include

haptics
The Graphics Rendering Pipeline

The process of creating a 2-D scene from a 3-D model is
called “rendering.” The rendering pipeline has three
functional stages. The speed of the pipeline is that of its
slowest stage.
Application Geometry Rasterizer

The Graphics Rendering Pipeline
Old rendering pipelines were done in software (slow)
Modern pipeline architecture uses parallelism
and buffers. The application stage is implemented in software,
while the other stages are hardware-accelerated.
 Modern pipelines also do anti-aliasing for points, lines
or the whole scene;
Aliased polygons Anti-aliased polygons

(jagged edges)
How is anti-aliasing done? Each pixel is subdivided
(sub-sampled) in n regions, and each sub-pixel has a color;
The anti-aliased pixel is given a shade of green-blue

(5/16 blue + 11/16 green). Without sub-sampling the
pixel would have been entirely green – the color of
the center of the pixel (from Wildcat manual)
More samples produce better anti-aliasing;
8 sub-samples/pixel
16 sub-samples/pixel
From Wildcat “SuperScene” manual

http://62.189.42.82/product/technology/superscene_antialiasing.htm
Ideal vs. real pipeline output (fps) vs. scene complexity
(Influence of pipeline bottlenecks)
HP 9000 workstation
The Rendering Pipeline

The application stage
 Is done entirely in software by the CPU;
 It reads Input devices (such as gloves, mouse);
 It changes the coordinates of the virtual camera;
 It performs collision detection and collision
response (based on object properties) for haptics;

 One form of collision response is force feedback.
Application stage optimization…
 Reduce model complexity (models with less polygons –
less to feed down the pipe);
Low res. Model Higher resolution model

~ 600 polygons 134,754 polygons.
Application stage optimization…
 Reduce floating point precision (single precision
instead of double precision)

 minimize number of divisions
 Since all is done by the CPU, to increase
speed a multi-core architecture is recommended.

Rendering pipeline
The geometry stage
 Is done in hardware;
 Consists first of model and view transforms
(to be discussed in Chapter 5);

 Next the scene is shaded based on light models;
 Finally the scene is projected, clipped, and
mapped to the screen coordinates.

The lighting sub-stage
 It calculates the surface color based on:
 type and number of simulated light sources;
 the lighting model;
 the reflective surface properties;
 atmospheric effects such as fog or smoke.
 Lighting results in object shading which makes
the scene more realistic.

Computing architectures
Iλ = Iaλ Ka Odλ +
fatt Ipλ [Kd Odλcosθ + Ks Osλcosnα]
where: Iλ is the intensity of light of wavelength λ;

Iaλ is the intensity of ambient light;
Ka is the surface ambient reflection
coefficient;
Odλ is the object diffuse color;
fatt is the atmospheric attenuation factor;
Ipλ is the intensity of point light source of
wavelength λ;
Kd is the diffuse reflection coefficient;
Ks is the specular reflection coefficient;
Osλ is the specular color;
The lighting sub-stage optimization…
 It takes less computation for fewer lights
in the scene;
 The simpler the shading model, the less
computations (and less realism):

 Wire-frame models;
 Flat shaded models;
 Gouraud shaded;
 Phong shaded.
The lighting models
 Wire-frame is simplest – only shows polygon
visible edges;
 The flat shaded model assigns same color to all
pixels on a polygon (or side) of the object;

 Gouraud or smooth shading interpolates colors
Inside the polygons based on the color of the edges;

 Phong shading interpolates the vertex normals
before calculating the light intensity based on the

model described – most realistic shading model.
Wire-frame model
Flat shading model
Gouraud shading model

The rendering speed vs. surface polygon type
 The way surfaces are described influences rendering speed.
 If surfaces are described by triangle meshes, the rendering will

be faster than for the same object described by independent
quadrangles or higher-order polygons. This is due to the
graphics board architecture which may be optimized to render
triangles.
 Example the rendering speed of SGI Reality Engine.

SGI Onyx 2 with Infinite Reality

The Rasterizer Stage
 Performs operations in hardware for speed;
 Converts 2-D vertices information from the
geometry stage (x,y,z, color, texture) into pixel

information on the screen;
 The pixel color information is in color buffer;
 The pixel z-value is stored in the Z-buffer (has
same size as color buffer);

 Assures that the primitives that are visible from
the point of view of the camera are displayed.

The Rasterizer Stage - continued
 The scene is rendered in the back buffer;
 It is then swapped with the front buffer which
stores the current image being displayed;

 This process eliminates flicker and is called
“double buffering”;
 All the buffers on the system are grouped into the
frame buffer.
Testing for pipeline bottlenecks
 If CPU operates at 100% – then the pipeline is
“CPU-limited” (bottleneck in application stage);

 If the performance increases when all light
sources are removed, then the pipeline is

“transform-limited” (bottleneck in geometry stage);
 If the performance increases when the resolution
of the display window, or its size are reduced

then the pipeline is “fill-limited” (bottleneck in
rasterizer stage).
Transform-limited
(reduce level of detail)
Fill-limited
(increase realism)
The Pipeline Balancing
Single buffering

(75%) (75%) (100%)
Double buffering, balanced pipeline

(90%) (95%) (100%)
The Haptics Rendering Pipeline

The process of computing the forces and mechanical textures
Associated with haptic feedback. Is done is software and in
hardware. Has three stages too.
PC graphics architecture – PC is King!
 Went from 66 MHz Intel 486 in 1994 to 3.6 GHz Pentium IV
today;
 Newer PC CPUs are dual (or quad) core – improves performance
by 50%
 Went from 7,000 G-shaded poly./sec (Spea Fire board) in 1994 to

27 Mil G-shaded poly/sec. (Fire GL 2 used to be in our lab);
 Today PCs are used for single or multiple users, single

or tiled displays;
 Intensely competitive industry.

PC bus architecture – just as important
 Went from 33 MHz “Peripheral Component Interface”
(PCI) bus to 264 MHz “Accelerated Graphics Port”
(AGP4x) bus, and doubled again in the AGP8x;
 Larger throughput and lower latency since address bus

lines decoupled from data lines. AGP uses “sideband” lines
Intel 820/850
chipset
Graphics
Accelerator
(memory +
processors
AGP 8x rate ~ 2 GBps

unidirectional
533 MHz x 32 bit/sec
PCI transfer rate ~ 133 MBps

33 MHz x 32 bit/sec
PCI Express rate ~ 4 GBps
bidirectional
yesterday’s PC system architecture

PC system architecture for the VR Teaching Lab
PC system architecture for VR Teaching Lab
Product Port of budget

PC 1.7 GHz NA 48%
(Fire GL2)
Polhemus Com 1 37%
Fastrack
5DT glove Com 2 10 %
Stereo FireGL2 3%
Glasses
FF joystick USB 2%
Java/Jave3D NA 0%
VRML NA 0%
Stereo glasses Fire GL 2
connector
Passive coolers
AGP bus connector

Fire GL 2 architecture
Fire GL 2 features:
 27 Million G-shaded/sec., non-
textured polygons/sec;
 Fill rate is 410 M Pixels/sec.;
 supports up to 16 light sources;
 has a 300 MHz D/A converter
Stereo glasses
connector Fire GL X3 256
Passive coolers
DVI-I video output AGP bus connector

Fire GL X3-256 architecture
 24-bit pixel processing, 12 pixel pipes

 dual 10-bit DAC and dual DVI-I connections
 does not have Genlock
 anti-aliased points and lines
 quad-buffered stereo 3D support (2 front and 2 back buffers)
NVIDIA Quadro FX 4000
500 MHz DDR
Memory
Graphics processor
Unit (GPU)
NVIDIA Quadro FX 4000 architecture
 dual DVI-I connections

32-bit pixel processing, 16 pixel pipes
 has Genlock
 anti-aliased points and lines
 quad-buffered stereo 3D support
FireGL X3-256 vs. NVIDIA
Quadro vs 3DLabs
CPU Evolution to Multi-Core
 Places several processors on a single chip.

 It has faster communication between cores than between separate
processors
 Each core has its own resources (L1 and L2 caches) unlike multi-
threads on a single core.
 It is more energy efficient and results in higher performance
Multi-core details
AMD64 x2 Architecture
Guts of Native Quad Core (Next Gen)
Graphics Benchmarks
 Benchmark established by independent
organization;
 Allow comparison of graphics cards
performance based standardized
application cases.
 Can be application-specific like
SPECapc (Application Performance
Characterization)
 Or general-purpose for OpenGL
architectures like SPECviewperf
for OpenGL-based systems
Accelerator boards viewperf 10.0 benchmark
SPECviewperf™ is a portable OpenGL performance
benchmark;
program written in CSPECviewperf reports performance in frames
per second.
There are 8 tests done at frame resolution of 1280x1024:
 3ds Max –modeling, simulation and rendering
 CATIA (DX) -CAD design application.
 EnSight(DRV)- a 3D visualization package.
 Maya- an animation application.
 ProEngineer – CAD software
 TCVIS radiosity application for large data sets.
 Solidworks -3D CAD design
 Unigraphics – a digital electronic and mechanical engineering design
Accelerator boards viewperf comparison:
comparison
Updated regularly at www.spec.org

SPECviewperf™ uses a geometric mean formula to
determine scores:
Geometric mean (fps) = (test1 weight 1)  (test2 weight 2) …

….  (testN weight n)
Boards Viewperf 10.0 comparison
Introduction to Water cooling
www.tomshardware.com
State of Watercooling
• Water cooling has been performed by
consumers as early as 2006.
• While niche, the field is popular enough for
parts to be mass-produced.
• Parts are usually bought off specialty websites.
• Becoming more popular as more people
construct their own systems
• Self-contained pre-made water systems are
sold retail. (Corsair H70 and Domino ALC)
• Transfers heat from
Waterblock
EK Supreme High Flow – Full Copper
CPU/GPU to water
• Usually made of copper,
which has better heat
transfer characteristics.
• Contact surfaces
machined to mm
GTX 580 with MSI Waterblock
precision - maximize
contact with CPU/GPU
• Sometimes nickel-plated
to protect against
corrosion
Radiator
• Stores heat to be HW Labs Black Ice Xtreme GTX 360
conducted out of loop
• Multiple sizes, usually in
increments of 120mm.
• Usually copper cores
surrounded by large fin
XSPC RX 240
assemblies.
• Designed to work with
certain types of fans,
either performance or
quiet fans.
Reservoir and Pump
• Pump moves water through Swiftech MCP 655 Pump
loop.
• Can be fitted with
aftermarket “Tops” that offer
more fitting options and
higher flow rate.
EK Spin Bay Reservoir
• Reservoir provides easy way
to measure water level and
fill and empty loop.
• Can be replaced by T-line
Coolant, Tubing, Kill coils and Fans
• Coolant of choice is distilled water. Dyes in colored
coolants often separate from liquid and collect in
waterblocks and radiators.
• Tubing is laboratory grade, meant to bend at sharp angles.
Often coated with anti-bacterial compounds.
• Kill coils are anti-microbial strips of pure silver
• Fans are conventional 120mm fans, ranging from 30 to 260
cubic feet per minute, with accompanying increase in noise
levels (14dBA- 60dBA)
• Push-pull configuration - radiator sandwiched by fans
facing the same direction; 3-4o C decrease in temperature.
Results
Cooler Clock Speed Idle Temperatures
Stock 2.66ghz 34°C
TRUE* (Air) 2.66ghz 31°C
TRUE *(Air) 3.8ghz 42°C
EK HF Supreme (Water) 2.66ghz 28°C
EK HF Supreme (Water) 4.5ghz 38°C
*Thermalright Ultra-120 eXtreme

Problems
• Mixing metals (aluminum and copper) in the loop will
cause galvanic corrosion
• Fungal growth in the loop
• Leaks
• Expensive ($400+ for a high-quality loop)
• Space
The Nintendo Wii
•First to introduce a form of player interaction –
accelerometer and IR tracking
•Contains solid-state accelerometers and
gyroscopes.
• Tilting and rotation up and down, left and right
and along the main axis (as with a screwdriver).
• Acceleration up /down, left /right, toward the
screen and away.
•Dramatically improved interface for video
games.
• Innovative controller, integrates vibration
feedback.
•Uses Bluetooth technology, 30 foot range.
•Can send a signal up to 15 feet away. Up to 4
Wii Remotes connected at once.
Wii Remote
• It uses two batteries, and has to be worn with a wrist strap.
• It is wireless, which is unencuberring for the patient
• Needs to be turned on and set a “neutral position” – adapts
to patient ROM
Wii Remote - Nunchuk
• It plugs into the Wii remote extension port
• Allows bi-manual interaction, but is wired to the remote
(reduces arm movement range) and poses safety hazards
New wireless Analog
Nunchuk control
stick
Wireless
transmitter
Wii Zapper
•Wii remote and Nunchuk connected in one “gun-like”
frame.
•For shooting games, makes aiming more accurate, but
slower
Wii (use of shoulder instead of wrist)
Nunchuck
remote
Plastic
frame
The Wii Motion Plus
 Has a pass-through expansion port, allowing other
expansions such as the Nunchuk or Classic Controller to be
used simultaneously with the device.
 Wii Motion Plus was released in 2009 with the Wii Sports
Resort games
The Wii Motion Plus
 It incorporates a pair of resonating gyroscopes which
measure rotation. Sensor is an InvenSense's IDG-1004
integrated dual-axis gyroscope (www.ivensense.com)
PS3 Specs
 PS3 CPU: Cell Processor
 - Developed by IBM.
- Cell Processor
- PowerPC-base Core @ 3.2GHz
- 1 VMX vector unit per core
- 512KB L2 cache
- 7 x SPE @ 3.2GHz
- 7 x 128b 128 SIMD GPR’s
- 7 x 256KB SRAM for SPE
- 1 of 8 SPE’s reserved for redundancy
- total floating point performance: 218 GFLOPS
Cell Processor Architecture
The PowerPC core present in the system is a general-

purpose 64-bit PowerPC processor that handles the Cell
BE's general-purpose workload (or, the operating
system) and manages special-purpose workloads for the
SPEs.
PlayStation 3 use of the multi-core processor
(IEEE Spectrum 2006)

Screenshot -Gran Turismo
Madden Nextgen Demo

PlayStation Move
PlayStation Eye camera,
charging bay, gyroscope,
accelerometer, Bluetooth
transmitter, vibration motor, and
MEMS compass.
The X-Box 360
 Aims at a balance between hardware
software and service
Has a flexible design by abandoning the
nVidia-only deal of the xBox
 Uses a multi-core design– like having
three PowerPC CPUs running at 3.2 GHz
 Each of the three cores can process two
threads at-a-time (like 6 conventional
processors
 Each core has a SIMD unit - exploits real-
time graphics data parallelism
The X-Box 360
 The GPU has a Unified Shader Architecture, meaning one unit

that does both geometry and rasterization stage (vs. separate vertex
and pixel shaders)
 The Arbiter retrieves commands from the Reservation Stations
and delivers them to the appropriate Processing Engine
 The xBox 360 has several Arbiters and 48 ALUs
The X-Box 360
The GPU has embedded 10 MB DRAM for use as a frame buffer

 Resolution up to 1920x1080 with full-screen anti-aliasing
 The GPU has the memory controller connecting to the 3 cores at
22 GB/sec
Renders 500 million triangles/sec and fill rate of 16 Gsamples/sec
Kinect Background
Kinect uses three cameras to capture objects in 3D
space
– Infrared light emitter
– Depth Sensor
– RGB Camera 640x 480
 Array of four Microphones Input
– Designed to find location of voice

– 16hz sampling rate
USB connector with additional power supply.
Sensor Range limit of 1.2-3.9 m
Background cont. -Uses
Angular field of view 57° horizontally 43°
vertically
– motorized to pivot up and down
Play area of 6 square meters
Video output of sensors 30 Hz frame rate
Cost $150
Benefits
Gesture recognition
User awareness
–facial recognition
 Voice recognition
Independent of lighting
conditions.
Multiple users
Limitations
Blind spots
-sitting down issues
-only in front of the user
30hz frame rate
-may not be fast enough for some applications.

Space limitation
-Passive user cap

-Movement
Useful only for users who have full range of motion.
Feature Xbox 360 PlayStation 3 Nintendo Wii
Processor 3.2 GHZ 3.2 GHz with 8 729 MHz IBM
PowerPC with 2 cores with 5 execution
dual-thread cores units
GPU ATI 500 MHz NVIDIA 550 ATI 243 MHZ
MHz
Video memory 21.6 GBps 22.4 GBps 3.9 GBps
bandwidth
HDTV output yes yes No
Hard Drive 20 GB 20 -60 GB None (512 MB

flash included)
Ethernet 100 Mbps 1 Gbps None
http://www.winsupersite.com/showcase/xbox360_vs_ps3.asp
Distributed VR architectures
 Single-user systems:
 multiple side-by-side displays;
 multiple LAN-networked computers;
 Multi-user systems:
 client-server systems;
 pier-to-pier systems
 hybrid systems;
Single-user, multiple displays
(3DLabs Inc.)
Side-by-side displays.
 Used is VR workstations (desktop), or in large

volume displays (CAVE or the “Wall”);
 One solution is to use one PC with graphics
accelerator for every projector;
 This results is a “rack mounted” architecture,
such as the MetaVR “Channel Surfer” used in
flight simulators or the Princeton Display Wall
Side-by-side displays.
 Another (cheaper) solution is to use one PC

only; with several graphics accelerator cards
(one for every monitor). Windows 2000 allows
this option, while Windows NT allowed only
one accelerator per system;
 Accelerators need to be installed on a PCI bus;
Genlock..
 If the output of two or more graphics pipes is

used to drive monitors placed side-by-side, then
the display channels need to be synchronized
pixel-by-pixel;
 Moreover, the edges have to be blended, by
creating a region of overlap.
(Courtesy of Quantum3D Inc.)
Problems with non-synchronized displays...
 CRTs that are side-by-side induce fields in

each other, resulting in electronic beam
distortion and flickers – need to be shielded;
 Image artifacts reduce simulation realism,
increase latencies, and induce “simulation
sickness.”
Problems with non-synchronized CRT displays...
Synchronization of displays:
 software synchronized – system commands that frame

processing start at same time on different rendering pipes;
 does not work if one pipe is overloaded – one image finishes
first
CRT
Buffer
Synchronization command CRT

Buffer

 frame buffer synchronized – system commands that frame

buffer swapping starts at same time on different rendering pipes;
 does not work because swapping depends on electronic gun
refresh - one buffer will swap up to 1/72 sec before the other.
CRT
Buffer
Synchronization command CRT

Buffer
 video synchronized – system commands that CRT vertical
beam starts at same time; one CRT becomes the “master”
 does not work if horizontal beam is not synchronized too (one
line too many or too few).
Master CRT
Buffer

Synchronization command

Buffer Slave CRT
 Best method is to have software + buffer + video

synchronization of the two (or more) rendering pipes
Master CRT
Buffer

Synchronization command Synchronization command

Buffer Slave CRT
 Best method is to have software + buffer + video

synchronization of the two (or more) rendering pipes
Master Display
Buffer
Geometry+Rasterizer
Application
GPU
Synchronization command Synchronization command
Geometry+Rasterizer
Application GPU
Buffer Slave Display
Video synchronized displays (three PCs)
release done
(Digital Video Interface- Video out)
Wildcat 4210
Genlock
 Used to synchronize output of graphics card and
connected displays to external synchronization
source.
Ex: Used to
synchronize cameras
with CRT displays so
that scan lines would
not be visible.
 Option card for NVIDIA Quadro FX

graphics cards to add multiple display
synchronization.
Frame Synchronization
 Frame synchronization is the process of synchronizing
display pixel scanning to a synchronization source. When
several systems are connected, a sync signal is fed from a
master system to the other systems in the network, and
the displays are synchronized with each other.
 For proper display synchronization you need:
– Frame Lock Synchronization
– Swap Synchronization
Frame Lock Synchronization
 Uses hardware to synchronize the frames on each display.
 Very critical for stereo viewing
 Synchronizing application buffer swaps across multiple
systems.
Stereo Image Unsynchronized

How is Refresh rate controlled
 DDC/CI (Display Data Channel / Command Interface)
– Transmits data about monitor specifications to graphics
hardware and allows hardware to switch monitor
settings.
 Vertical Blanking signal
– Controls vertical blanking interval, time difference
between end of one line and beginning of next and the
end of image to beginning of next image.
– Controls monitor refresh rate.
 Graphics hardware is still designed with CRT
characteristics in mind.
 Until hardware architecture is redesigned for modern

displays, OLEDS and LEDS it still work the way
CRTs did when it comes to synchronization.
http://www.nvidia.com/page/quadrofx_gsync.html
http://www.techpubs.sgi.com
Graphics and Haptics Pipeline Synchronization:
 Has to be done at the application stage to allow decoupling of

the rendering stages (have vastly different output rates)
Haptic
Graphics pipe Interface
and Controller
(embedded
Haptics pipe Pentium)
Pentium II
Dual-processor
Host computer
Haptic
Interface
Physics Processing Unit (PPU)
 First PPU made by Ageia Inc., called “PhysX
 PhysX available as an add on card (see above).
 Helps the CPU do computations related to material
properties (elasticity, friction, density);
 Better fog effects and more realistic clothing simulation
Better fluid dynamics simulation and collision effects;
Cost $160
Co-located Rendering Pipelines (older design)
 Another, cheaper, solution is to use a single

multi-pipe graphics accelerator;
 one output channel for every monitor,
Separate geometry and rasterizer chips .
Wildcat II 5110
Wildcat II 5110
Co-located Rendering Pipelines (newer design)
Wildcat4 7210 features:

 38 Million Gouraud-shaded, Z-buffered triangles/sec/
400 Megapixel/sec texture fill rate
 32 light sources in hardware
 Independent dual display support, 2 GPUs
 1529x856 frame-sequential stereo @ 120 Hz.
Nvidia GTX 590:
 has two GPUs
Can drive three side-by-side
monitors (DVI ports)
Has a mini-display port
PCI-express 2.0 motherboard
PC Clusters
 multiple LAN-networked computers;

 used for multiple-PC video output;
 used for multiple computer collaboration
(when computing power is insufficient on a
single machine) – older approach.
32 rendering PCs
Chromium networking
architecture
Frame refresh rate
comparison
Princeton display wall using eight LCD rear projectors
Princeton display wall: eight 4-way Pentium-Pro SMPs with

E&S graphics accelerators. They drive 8 Proxima 9200
LCD projectors. (1998)
Clusters of PlayStations
200 PS3 used by hackers to crack SSL

VRX Rack - Ciara Technologies
256 Xeon processors and 1.T TerraBytes of DDR Memory
Best price/performance ratio, Lynux and Windows OS
Ciara VRX
Google Server Farms
 Massive computations done on “server farms”
 Google owns about 1 Million servers around the world
 Massive installation requiring massive energy (including
for cooling the servers)
 That is why server farms are located close to water
sources, and cheaper energy sources.
Cloud Computing in the 21st century
 Instead of installing all application software locally, and
worrying about backup, updates, slow graphics, etc., the
only thing the user would need on the local machine is a
web browser.
 Everything (graphics rendering, databases, security and
encoding) are done on servers and much less powerful and
cheaper client computers will be needed.
the AMD Fusion Render Cloud

Drawbacks of cloud computing
 More expensive data storage - cloud server will have at
least twice the storage capacity as the corresponding
individual machines it replaces
 The communication is not instantaneous. It is affected by
network quality, the time of day (affects network
congestion), the Internet communication protocol used,
and of course the amount of data transmitted.
 Due to network delays, the simulation response is not
instantaneous, and may affect interaction and immersion
into the virtual world, depending on user and application.
Supercomputers
• A supercomputer is a computer which aims to optimize
processing power beyond the scope of commercial PCs
• Modern supercomputers are made by forming
“computer clusters”.
• Supercomputers are used for calculation intensive tasks
such as nuclear detonation simulations, quantum
physics, and weather forecasting.
• The fastest supercomputer was the “Jaguar” until ”the
“Tianhe-I” edged in front of it in 2010.
The RoadRunner by IBM. The first supercomputer to break the
Petaflop (10^15 flops per second) barrier on May 28th 2008
No. 3 The Jaguar by Cray. The fastest supercomputer of
2009 and 2010 until succeeded by the Tianhe-I
No. 2 The "Tianhe-I" ( 天河一号 meaning River in Sky) by
NUDT. 14,336 central processing units, several thousands
nVidia GPUs.
In Nov 2001 the faster supercomputer became the K computer
(Japan)
•is more than 96,000 times faster than your PC (Intel core i7)
•it comprises 864 computer racks, 88,000 CPUs.
•it remains surprisingly energy efficient
In June 2012 the faster supercomputer became the IBM
Sequoia computer (US)
•is 1.55 times faster than K, and does 16 petaflops
•it has 96 racks with 1.5 million of CPUs, compared to 864 racks
and 88,000 CPUs.
•it is the most energy efficient
•http://postbulletin.com/news/stories/display.php?id=1500158
Cheap Supercomputers
• Having the most powerful computer as a PC would be
nice, but it’s not feasible, so we try to make powerful
computers at an affordable cost
• Form a computer cluster with multiple “off the shelf”
devices
• A famous example of this was the US air force combining
1760 PS3’s (168 separate graphical processing units and
84 coordinating servers) to form a supercomputer.
• Known as the Condor Cluster, it’s only 5-10% the cost of
an equally powerful system.
• Also uses 10% of the power of a comparable system.
The “Condor Cluster” by the US Air Force.
Capable of 500 Tera Flops per second.
Multi-User distributed remote system

architecture:
 Multiple modem-networked computers;
 multiple LAN-networked computers;
 multiple WAN-networked computers;
 what is the network topology and influence on
number of users?
IBM RoadRunner
 Cost: $133million
 Power: 2.35MW
 Speed: 1.042 PetaFlops
 444.94 Megaflops/W
 Cost per hour at $0.11/KWh: $258
 Cost per year: $2.2 million
Cray Jaguar
 Power: 7.6MW
 Speed: 2.33(peak) PetaFlops
 306 Megaflops/W
 Cost per year: $7.2million
Tianhe-1A
 Power: 4.04MW
 640 Megaflops/W
 Cost per year: 3.8 million
K
 Cost: Unavailable
 Power: 9.89MW
 1062.6 Megaflops/W
 Cost per year: 9.5 million
Condor Cluster(couldn’t find
much information)
 Cost: $2million
 Power: 10% < average
 Speed: 500 TeraFlops
 Cost per hour at $0.11/KWh: ~$25
 Cost per year: ~$200,000
Comparison
Comparison
Power Efficiency
1200
1000
800
600 MegaFlops/W
400
200
R oadR unner
C ra y J a g u a r
T ia n h e -1 A
0
K
IB M
Server-mediated
communication;
Unicast mode;
Sever is bottleneck
Server
on allowable number
of clients
Client 1 Client 2 … Client n
(adapted from “Networked Virtual Environments”

Singhal and Zyda, 1999)
Server-mediated
Client 2,1 Client 2,2 Client 2,n
communication;
Allows more clients to be
networked over LANs;
Server 2
LAN LAN
Server 1 Server 3
LAN
Client 1,1 Client 1,2 Client 1,n Client 3,1 Client 3,2 Client 3,n

Pier-to-pier communication;
Allows more clients to be networked over LANs;
Can use broadcast or multicast
Reduces network traffic, BUT.. More vulnerable to
viruses, and does not work well over WAN.
LAN
Multicast packets
Area of interest
management
AOIM 1 AOIM 2 AOIM 3 AOIM n
User 1 User 2 User 3 User n

Hybrid network using multiple servers communicating through
multicast – allows deployment over WAN - no broadcasting allowed
WAN
Unicast packets Unicast packets
Proxy Proxy Proxy Proxy

Server 1 Server 2 Server 3 Server n
LAN Multicast packets Multicast packets LAN
User 1,1 User 1,2 User 1,n

User n,1 User n,2 User n,n
For very large DVEs current WAN - do not support multicasting
(adapted from “Avatars in Networked Virtual Environments”

Chapin, Pandzic, Magnenat-Thalman and Thalman, 1999)
Example of distributed Virtual Environment
(connection between Geneva and Lausanne in Switzerland
Cybertennis

Computing Architectures For Virtual Reality: Electrical and Computer Engineering Dept

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computing Architectures For Virtual Reality: Electrical and Computer Engineering Dept

Uploaded by

Copyright:

Available Formats

Electrical and Computer Engineering Dept.

Computing Architectures for

A key component of the VR system which reads its input devices,

It is an abstraction – it can mean one computer, several

The real-time characteristic of VR requires a VR engine

 low latencies (<100 ms to avoid simulation sickness);

 at the core of such architecture is the rendering pipeline.

 within the scope of this course rendering is extended to include

The Graphics Rendering Pipeline

Application Geometry Rasterizer

Aliased polygons Anti-aliased polygons

The anti-aliased pixel is given a shade of green-blue

From Wildcat “SuperScene” manual

The Rendering Pipeline

Application Geometry Rasterizer

response (based on object properties) for haptics;

Low res. Model Higher resolution model

instead of double precision)

speed a multi-core architecture is recommended.

The Rendering Pipeline

Application Geometry Rasterizer

(to be discussed in Chapter 5);

mapped to the screen coordinates.

the scene more realistic.

where: Iλ is the intensity of light of wavelength λ;

computations (and less realism):

pixels on a polygon (or side) of the object;

Inside the polygons based on the color of the edges;

before calculating the light intensity based on the

Flat shading model

Gouraud shading model

 The way surfaces are described influences rendering speed.

 If surfaces are described by triangle meshes, the rendering will

 Example the rendering speed of SGI Reality Engine.

The Rendering Pipeline

Application Geometry Rasterizer

geometry stage (x,y,z, color, texture) into pixel

same size as color buffer);

the point of view of the camera are displayed.

stores the current image being displayed;

“CPU-limited” (bottleneck in application stage);

sources are removed, then the pipeline is

of the display window, or its size are reduced

Application Geometry Rasterizer

Double buffering, balanced pipeline

Application Geometry Rasterizer

The Haptics Rendering Pipeline

 Went from 7,000 G-shaded poly./sec (Spea Fire board) in 1994 to

 Today PCs are used for single or multiple users, single

 Intensely competitive industry.

 Larger throughput and lower latency since address bus

AGP 8x rate ~ 2 GBps

PCI transfer rate ~ 133 MBps

yesterday’s PC system architecture

Product Port of budget

AGP bus connector

DVI-I video output AGP bus connector

 24-bit pixel processing, 12 pixel pipes

 dual DVI-I connections

 Places several processors on a single chip.

Updated regularly at www.spec.org

Geometric mean (fps) = (test1 weight 1)  (test2 weight 2) …

*Thermalright Ultra-120 eXtreme