Professional Documents
Culture Documents
Supercomputing On Graphics Cards: Marcus Bannerman
Supercomputing On Graphics Cards: Marcus Bannerman
Supercomputing On Graphics Cards: Marcus Bannerman
marcus.bannerman@cbi.uni-erlangen.de
M. Bannerman
Outline
1
What is OpenCL? Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources Why Use OpenCL? An Example OpenCL Hello World Header OpenCL Initialisation Memory Initialisation Running the Kernel Output
M. Bannerman Supercomputing on Graphics Cards
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
Outline
1
What is OpenCL? Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources Why Use OpenCL? An Example OpenCL Hello World Header OpenCL Initialisation Memory Initialisation Running the Kernel Output
M. Bannerman Supercomputing on Graphics Cards
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
M. Bannerman
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
OpenCL is : A platform that allows a host program to discover OpenCL enabled devices (CPU, GPU, DSP, etc.). A runtime that allows the host program to manipulate contexts once they are created. A JIT compiler to create executables from OpenCL kernels so they may be run on the OpenCL devices. The kernel language is :
A subset of ISO C99 (restricted pointer operations, no unied namespace). There are extensions for parallelism, determining thread identity and synchronisation. Many built in math functions.
M. Bannerman
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
Moores Law
Moores Law
10
10
10
CPU GPU
10 Transistor count
10
10
10
10
10 1970
1980
1990 Year
2000
2010
Figure: The evolution of processors, following the revised Moores law of doubling performance every 18 months and transistor count every 2 years.
M. Bannerman Supercomputing on Graphics Cards
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
NVidia Fermi architecture is expected to achieve close to a teraFLOP at double precision. Of the top 500 supercomputers, positions 33500 exhibit 9817 teraFLOPs. AMDs Cypress architecture achieves 150 GB/s memory bandwidth. Intels Core i7-965 is benchmarked at 24 GB/s memory bandwidth.
M. Bannerman
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
June 2008: Apple submits an initial proposal for OpenCL to the Khronos Group (standards committee for OpenGL). December 2008: The specication for OpenCL 1.0 is standardised and released.
NVidia announce they will support OpenCL along with their existing CUDA architecture. AMD is replacing its Close to Metal oering with a OpenCL implementation.
August 2009 : AMD release their rst OpenCL development tools supporting CPUs in OpenCL. August 2009 : Apple release Snow Leopard which has full CPU+GPU OpenCL support. September 2009 : NVidia release its GPU OpenCL drivers and SDK. October 2009 : AMD release the latest version of their SDK, including OpenCL GPU support.
M. Bannerman Supercomputing on Graphics Cards
Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources
Khronos group, OpenCL C specication and quick reference card. http://www.khronos.org/opencl MacResearch.org. An excellent webcast series on the basics of OpenCL. http://www.macresearch.org AMDs OpenCL implementation, creators of the C++ bindings. http://ati.amd.com/technology/streamcomputing/opencl.html NVidias OpenCL implementation, with best practise guides. http://www.nvidia.com/object/cuda opencl.html
M. Bannerman
An Example
Outline
1
What is OpenCL? Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources Why Use OpenCL? An Example OpenCL Hello World Header OpenCL Initialisation Memory Initialisation Running the Kernel Output
M. Bannerman Supercomputing on Graphics Cards
An Example
Play movie Video taken from the MacResearch.org, OpenCL tutorial series, Episode 1.
M. Bannerman
Outline
1
What is OpenCL? Why was OpenCL Created? The Architecture of OpenCL GPU Power Current Implementations History Resources Why Use OpenCL? An Example OpenCL Hello World Header OpenCL Initialisation Memory Initialisation Running the Kernel Output
M. Bannerman Supercomputing on Graphics Cards
OpenCL Hello World, A.K.A. the hard way to square the elements of an array. A simple example program that performs the following operation Outputi = Input2 i
M. Bannerman
M. Bannerman
// G a t h e r a l l t h e k e r n e l s o u r c e s f o r t h e OpenCL program cl : : Program : : Sources source ; source . push_back ( std : : make_pair ( kernelSrc , strlen ( kernelSrc ) ) ) ; //Make an OpenCL program cl : : Program program ( context , source ) ; // Get a l l t h e a v a i l a b l e d e v i c e s i n t h e c o n t e x t std : : vector <cl : : Device > devices = context . getInfo <CL_CONTEXT_DEVICES > () ; // B u i l d t h e k e r n e l s o u r c e s f o r try { program . build ( devices ) ; } c a t c h ( cl : : Error& err ) M. Bannerman { all d e v i c e s in the context
std : : cerr < < Building failed , < < err . what ( ) < < ( < < err . err ( ) < < ) < < \ n R e t r i e v i n g b u i l d l o g \ n < < program . getBuildInfo <CL_PROGRAM_BUILD_LOG >( devices [0]) < < \ n ; r e t u r n 1; } // Get t h e s q u a r e A r r a y k e r n e l t o u s e i n c a l c u l a t i o n s cl : : Kernel kernel ( program , s q u a r e A r r a y ) ; //Make a queue t o p u t j o b s on t h e f i r s t compute d e v i c e cl : : CommandQueue cmdQ ( context , devices [ 0 ] ) ;
M. Bannerman
// C r e a t e a v e c t o r o f random i n p u t v a l u e s std : : vector <cl_float > input ; std : : generate_n ( std : : back_inserter ( input ) , problemSize , rand ) ; // S t a r t c o p y i n g t h i s d a t a t o t h e g r a p h i c s c a r d cl : : Buffer inputBuffer ( context , C L _ M E M _ R E A D _ O N L Y | CL_MEM_COPY_HOST_PTR , s i z e o f ( cl_float ) input . size ( ) , & input [ 0 ] ) ; //Make a b u f f e r t o h o l d t h e o u t p u t o f t h e k e r n e l cl : : Buffer outputBuffer ( context , CL_MEM_WRITE_ONLY , cl_float ) input . size ( ) ) ; s i z e o f (
M. Bannerman
/ /
Ru nn in g on t h e g r a p h i c s c a r d
// S e t t h e two a r g u m e n t s o f t h e s q u a r e A r r a y k e r n e l kernel . setArg ( 0 , inputBuffer ) ; kernel . setArg ( 1 , outputBuffer ) ; // Get a F u n c t o r w h i c h w i l l r u n t h e k e r n e l on e v e r y i n p u t i t e m i n b l o c k s o f 64 t h r e a d s cl : : KernelFunctor func = kernel . bind ( cmdQ , cl : : NDRange ( input . size ( ) ) , cl : : NDRange ( 6 4 ) ) ; // Run t h e k e r n e l and w a i t f o r func ( ) . wait ( ) ; / / i t to f i n i s h
//Make a b u f f e r t o h o l d t h e o u t p u t t e d d a t a
M. Bannerman Supercomputing on Graphics Cards