Supercomputing On Graphics Cards: Marcus Bannerman

What is OpenCL? Why Use OpenCL?
OpenCL Hello World
Supercomputing on Graphics Cards

Marcus Bannerman
marcus.bannerman@cbi.uni-erlangen.de
An Introduction to OpenCL and the C++ Bindings
M. Bannerman
What is OpenCL? Why Use OpenCL? OpenCL Hello World
Outline
1
What is OpenCL? Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources Why Use OpenCL? An Example OpenCL Hello World Header OpenCL Initialisation Memory Initialisation Running the Kernel Output
M. Bannerman Supercomputing on Graphics Cards
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
Outline
1
Why was OpenCL Created?

Programmable shaders allowed graphics cards to be utilised for other calculations than rendering, but the cards would need to be tricked into performing these other computations. Vendors began developing SDKs to facilitate programming shaders but each vendor had its own standard. There are other devices (DSP, IBM Cell processor etc.) which are computationally powerful but lack a standard interface with which to access them. Apple wanted to access these resources in their hardware implementations (e.g., iPhone) and decided a standard interface would be a good thing.
M. Bannerman
OpenCL is : A platform that allows a host program to discover OpenCL enabled devices (CPU, GPU, DSP, etc.). A runtime that allows the host program to manipulate contexts once they are created. A JIT compiler to create executables from OpenCL kernels so they may be run on the OpenCL devices. The kernel language is :
A subset of ISO C99 (restricted pointer operations, no unied namespace). There are extensions for parallelism, determining thread identity and synchronisation. Many built in math functions.
M. Bannerman
Moores Law
Moores Law
10
10
10
CPU GPU
NVidia Fermi AMD HD5800
10 Transistor count
10
10
10
10
10 1970
1980
1990 Year
2000
2010
Figure: The evolution of processors, following the revised Moores law of doubling performance every 18 months and transistor count every 2 years.
NVidia Fermi architecture is expected to achieve close to a teraFLOP at double precision. Of the top 500 supercomputers, positions 33500 exhibit 9817 teraFLOPs. AMDs Cypress architecture achieves 150 GB/s memory bandwidth. Intels Core i7-965 is benchmarked at 24 GB/s memory bandwidth.
M. Bannerman
June 2008: Apple submits an initial proposal for OpenCL to the Khronos Group (standards committee for OpenGL). December 2008: The specication for OpenCL 1.0 is standardised and released.
NVidia announce they will support OpenCL along with their existing CUDA architecture. AMD is replacing its Close to Metal oering with a OpenCL implementation.
August 2009 : AMD release their rst OpenCL development tools supporting CPUs in OpenCL. August 2009 : Apple release Snow Leopard which has full CPU+GPU OpenCL support. September 2009 : NVidia release its GPU OpenCL drivers and SDK. October 2009 : AMD release the latest version of their SDK, including OpenCL GPU support.
Khronos group, OpenCL C specication and quick reference card. http://www.khronos.org/opencl MacResearch.org. An excellent webcast series on the basics of OpenCL. http://www.macresearch.org AMDs OpenCL implementation, creators of the C++ bindings. http://ati.amd.com/technology/streamcomputing/opencl.html NVidias OpenCL implementation, with best practise guides. http://www.nvidia.com/object/cuda opencl.html
M. Bannerman
An Example
Outline
1
An Example
Boundary value of the electrostatic potential
Play movie Video taken from the MacResearch.org, OpenCL tutorial series, Episode 1.
M. Bannerman
Header OpenCL Initialisation Memory Initialisation Running the Kernel Output
Outline
1
OpenCL Hello World

Graphics cards have no traditional console output, so a true Hello, world! program would be useless. Aims of this example:
Demonstrate the initialisation steps required for OpenCL C++. Provide an example OpenCL kernel. Show that although the intitialisation is lengthy, it is straightforward.
OpenCL Hello World, A.K.A. the hard way to square the elements of an array. A simple example program that performs the following operation Outputi = Input2 i
M. Bannerman
Hello World: Header and kernel

#i n c l u d e < i o s t r e a m > #i n c l u d e < v e c t o r > #i n c l u d e < a l g o r i t h m > // The OpenCL C++ b i n d i n g s , w i t h e x c e p t i o n s #d e f i n e CL ENABLE EXCEPTIONS #i n c l u d e c l . hpp c o n s t size_t problemSize = 1 0 2 4 ; // The compute k e r n e l we w i l l r u n c o n s t c h a r kernelSrc = k e r n e l void squareArray ( g l o b a l f l o a t input , g l o b a l f l o a t output ) { output [ g e t g l o b a l i d (0) ] = input [ g e t g l o b a l i d (0) ] input [ g e t g l o b a l i d (0) ] ; } ; i n t main ( )
M. Bannerman
Hello World: OpenCL Initialisation

try { / OpenCL I n i t i a l i s a t i o n / // Open a c o n t e x t t o r u n t h e openCL k e r n e l i n cl : : Context context ( C L _ D E V I C E _ T Y P E _ G P U ) ;
// G a t h e r a l l t h e k e r n e l s o u r c e s f o r t h e OpenCL program cl : : Program : : Sources source ; source . push_back ( std : : make_pair ( kernelSrc , strlen ( kernelSrc ) ) ) ; //Make an OpenCL program cl : : Program program ( context , source ) ; // Get a l l t h e a v a i l a b l e d e v i c e s i n t h e c o n t e x t std : : vector <cl : : Device > devices = context . getInfo <CL_CONTEXT_DEVICES > () ; // B u i l d t h e k e r n e l s o u r c e s f o r try { program . build ( devices ) ; } c a t c h ( cl : : Error& err ) M. Bannerman { all d e v i c e s in the context
Hello World: Memory initialisation
std : : cerr < < Building failed , < < err . what ( ) < < ( < < err . err ( ) < < ) < < \ n R e t r i e v i n g b u i l d l o g \ n < < program . getBuildInfo <CL_PROGRAM_BUILD_LOG >( devices [0]) < < \ n ; r e t u r n 1; } // Get t h e s q u a r e A r r a y k e r n e l t o u s e i n c a l c u l a t i o n s cl : : Kernel kernel ( program , s q u a r e A r r a y ) ; //Make a queue t o p u t j o b s on t h e f i r s t compute d e v i c e cl : : CommandQueue cmdQ ( context , devices [ 0 ] ) ;
M. Bannerman
Hello World: Running the kernel
// C r e a t e a v e c t o r o f random i n p u t v a l u e s std : : vector <cl_float > input ; std : : generate_n ( std : : back_inserter ( input ) , problemSize , rand ) ; // S t a r t c o p y i n g t h i s d a t a t o t h e g r a p h i c s c a r d cl : : Buffer inputBuffer ( context , C L _ M E M _ R E A D _ O N L Y | CL_MEM_COPY_HOST_PTR , s i z e o f ( cl_float ) input . size ( ) , & input [ 0 ] ) ; //Make a b u f f e r t o h o l d t h e o u t p u t o f t h e k e r n e l cl : : Buffer outputBuffer ( context , CL_MEM_WRITE_ONLY , cl_float ) input . size ( ) ) ; s i z e o f (
M. Bannerman
Hello World: Gathering the output
/ /
Ru nn in g on t h e g r a p h i c s c a r d
// S e t t h e two a r g u m e n t s o f t h e s q u a r e A r r a y k e r n e l kernel . setArg ( 0 , inputBuffer ) ; kernel . setArg ( 1 , outputBuffer ) ; // Get a F u n c t o r w h i c h w i l l r u n t h e k e r n e l on e v e r y i n p u t i t e m i n b l o c k s o f 64 t h r e a d s cl : : KernelFunctor func = kernel . bind ( cmdQ , cl : : NDRange ( input . size ( ) ) , cl : : NDRange ( 6 4 ) ) ; // Run t h e k e r n e l and w a i t f o r func ( ) . wait ( ) ; / / i t to f i n i s h
Checking the outputted data
//Make a b u f f e r t o h o l d t h e o u t p u t t e d d a t a

Supercomputing On Graphics Cards: Marcus Bannerman

Uploaded by

Copyright:

Available Formats

You might also like

Supercomputing On Graphics Cards: Marcus Bannerman

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supercomputing On Graphics Cards: Marcus Bannerman

Uploaded by

Copyright:

Available Formats

What is OpenCL? Why Use OpenCL?

OpenCL Hello World

Supercomputing on Graphics Cards

An Introduction to OpenCL and the C++ Bindings

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

What is OpenCL? Why Use OpenCL? OpenCL Hello World

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Why was OpenCL Created?

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

NVidia Fermi AMD HD5800

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Boundary value of the electrostatic potential

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Header OpenCL Initialisation Memory Initialisation Running the Kernel Output

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Header OpenCL Initialisation Memory Initialisation Running the Kernel Output

OpenCL Hello World

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Header OpenCL Initialisation Memory Initialisation Running the Kernel Output

Hello World: Header and kernel

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Header OpenCL Initialisation Memory Initialisation Running the Kernel Output

Hello World: OpenCL Initialisation

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Header OpenCL Initialisation Memory Initialisation Running the Kernel Output

Hello World: Memory initialisation

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Header OpenCL Initialisation Memory Initialisation Running the Kernel Output

Hello World: Running the kernel

Supercomputing on Graphics Cards

What is OpenCL? Why Use OpenCL? OpenCL Hello World

Header OpenCL Initialisation Memory Initialisation Running the Kernel Output

Hello World: Gathering the output

Checking the outputted data

You might also like