Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Hng Dn Cch Thit Lp D n CUDA

Ngo Quoc Vinh


Kyoto Japan 2008

1. Cu hnh phn cng v phn mm cn thit

nh ngha:
CUDA ngha l Compute Unified Device Architecture l 1 kin trc phn mm v
phn cng cho mc ch pht trin tnh ton trn GPU. Trong h thng a nhim
vic s dng GPU trong vic tnh ton (Lp trnh CUDA) v ha c th xy ra ng
thi.

Phn mm cn thit:
Cuda SDK version2.0 c th dng cho windows XP 32-bit hoc 64-bit.
Trong windows bn cn s dng chng trnh Microsoft Visual C++ 2005 vit 1 d
n CUDA

Phn cng cn thit:


vit 1 chng trnh CUDA, ngoi cc phn mm h tr ta cn c phn cng
chng trnh hot ng (khng phi ch m phng)
Cc thit b phn cng c NVIDIA h tr trong lp trnh cuda, c th tham kho
tai site: http://www.nvidia.com/object/cuda_learn_products.html (trong trng hp
trn ch s dng 1 graphics card nu c bn c th s dng nhiu hn 1 graphics
card).

Tham kho:
Ti site ny c th tham kho cc ti liu lien quan n cuda s dng ting anh
http://forums.nvidia.com/index.php?showtopic=36286
Ti site ny c th tham kho cc ti liu lien quan n cuda s dng ting nht
http://www.nvidia.co.jp/object/cuda_home_jp.html.

2. Cch ci t CUDA driver, CUDA tool kit v cuda SDK


1 chng trnh cuda hot ng c trong mi trng windows xp. Bn cn phi c
cc th vin h tr. Cc th vin ny c cha trong b SDK do NVIDIA cung cp.

Cch download CUDA driver

Driver c download t http://www.nvidia.com/object/cuda_get.html#windows tng


ng vi s serial ca card. Nu ng ting nht bn c th download ti site:
http://www.nvidia.co.jp/object/cuda_get_jp.html#windows
Trong site ny bn chn vo mc NVIDIA driver for Microsoft Windows XP with CUDA
support (174.55). nu bn dng OS l Windows 32-bit thi chon vo x86 trong muc
Architecture download (Figure 1).

Figure 1
1. Tip theo 1 site NVIDIA Driver Download s xut hin v bn click vo text click
here download.Figure 2.

Figure 2
Hp thoi File Download xut hin v bn click Save. Figure 3.
Save

Figure 3

2. hp thoi Save As hin th ra hi bn ni mun save file driver, lc ny bn


chn ng dn ni mun save file v click Save (Figure 4). Ch 1 thi gian
chng trnh t dng download file c hon tt.

Figure 4

Cch ci t CUDA driver.


Sau khi download xong, bn double click vo file *.exe download ( v d trong
trng hp ny l file 169.21_forceware_winxp_32bit_english_whql dung cho
Geforce8800GT, Operation System Window XP, language English(US)).
Tip theo bn chn I accept the terms in the license agreement ri click Next.
Chng trnh s ni m bn mun ci chng trnh. Theo ti bn nn mc nh
c:\NVIDIA\Win2k\169.21\English (Figure 5)

Figure 5
Click Next chng trnh load cc file cn ci t,
Click Next chng trnh ci t t ng (Figure 6).

Figure 6
Ch trong giy lt, sau khi chng trnh ci t xong bn click Finish khi ng
my li.

Download file SDK v Toolkit.


Sau khi ci dat driver cho card ban cn phi ci b cng c h tr lp trnh cho
CUDA
Bn cn download 2 file NVIDIA_CUDA_Toolkit_1.0.exe v
NVIDIA_CUDA_SKD_1.0.exe ti site
http://www.nvidia.com/object/cuda_get.html#windows . ty thuc vo OS ca my bn
l 32 hay 64-bit (nu 32-bit bn chn Architecture l x86 v nu l 64-bit bn chon
x86-64) (hinh 4.1).Cch download 2 file ny hon ton ging nhau.
Sau khi click vo kiu Architecture, mt site mi s xut hin tip theo bn click vo
click here download file. thc h in download file bn thc hin cc bc 1,2,3
ca mc 4.1.1

Ci t CUDA Toolkit.

NVIDIA_CUDA_Toolkit_1.0.exe file ny cha cng c cc th vin h tr trong lp trnh


cuda v cc ti liu hng dn lp trnh.
Cch install file Toolkit.
Sau khi download file NVIDIA_CUDA_Toolkit_1.0.exe (hoc mi hn) bn double
click vo file ny ci t vo h thng. sau khi double click vo file ny th chng
trnh ci t t ng Install Shied Wizard s c kch hot.
Click button Next ci t chng trnh. Tip theo bn chn I accept the terms of
license Agrement v click Next (hnh 4.10), lc ny chng trinh s hi ni bn mun
ci t (theo ti bn nn ch default C:\CUDA)

Figure 7

Click Next tip tc qua bc tip theo. Bn click Install ci t phn mm


sau khi qua cc bc m chng trnh Install Shied Wizard hng dn. ch 1
vi pht sau khi chng trnh ci t xong bn click Finish kt thc vic ci t.
Ci t SDK
NVIDIA_CUDA_SKD_1.0.exe y l b SDK ca NVIDIA. Trong file ny sau khi ci t
s cha cc d n mu. cc d n ny rt quan trng trong vic t nghin cu ca
ban.
1.
Sau khi download file NVIDIA_CUDA_SDK_1.0.exe (ho mi hn) bn double
click vo file ny ci t vo h thng. sau khi double click vo file ny th chng
trnh ci t t ng Install Shied Wizard s c kch hot.

2.
Click button Next ci t chng trnh. Tip theo bn chn I accept the
terms of license Agrement v click Next.bc ny c thc hin tng t bc 2
ca cch install file Toolkit.
3.
Chng trinh s hi mt s thng tin ca bn. Bn cn phi in tn vo
textbox Name. in tn cng ty hoc t chc vo Textbox Organization, v in
a ch Email ca bn vo Email(Optional).(Figure 8)
Name

Organization

Email
Next

Figure 8
4.
Click Next tip tc. chng trnh s yu cu bn ch r ni bn mun ci
t (theo ti bn nn ch default C:\Program Files\NVIDIA
Corporation\NVIDIA CUDA SDK) ri click Next tip tc qua bc tip theo.
5.
Click Install ci t phn mm sau khi qua cc bc m chng trnh
Install Shied Wizard hng dn. ch 1 vi pht sau khi chng trnh ci t
xong bn click Finish kt thc vic ci t.
Hu ht cc chng trnh cuda mu c NVIDIA cung cp chy trn nn Visual C++.
V th bn cn phi c phn mm Microsoft Visual C++, c th dng bng Microsoft
Visual Studio C++ Express,c cung cp min ph.
Sau khi hon tt vic ci t, bn c th m d n mu deviceQuery ca NVIDIA
cung cp trong C:\Program Files\NVIDIA Corporation\NVIDIA CUDA
SDK\bin\win32\Release v chy th, nu thnh cng chng trnh s hin th cu
hnh card GPU ca bn v hin th thong bo TEST
3. Cch ci t chng trnh Visual profiler
Visual profiler c cung cp bi NVIDIA dng phn tch v nh gi 1 chng trnh
cuda.
Download visual profiler t site: http://www.nvidia.com/object/cuda_get.html#windows.
Trong site ny bn s tm thy dng text Cuda Visual Profiler trong bng Cuda for
Windows(hnh 4.9).
Vic download chng trnh ny tng t nh bc 7,8,9 ca mc 4.1.1.
Sau bn extract file CudaVisualProfiler_0.2_beta_windows.zip
Sau khi extract s xut hin folder CudaVisualProfiler trong cha 2 folders bin v
Projects.
Folder project s cha thng tin ca 1 d n cuda sau khi c phn tch.

Folder bin cha cc file *.dll v 1 file cudaprof.exe y l file chng trnh Cuda
Visual Profiler.
Chy chng trnh Cuda Visual Profiler bng cch double click vo file cudaprof.exe
(Figure 9).

Figure 9
Chng trnh ny s hot ng m khng cn phi ci t.
4. To highlighting cho syntax ca 1 file cuda (*.cu)
1 file ngun cuda c m rng bng *.cu. nu bn dng Microsoft Visual C++ m
file ny th n s hin th dng file text (s khng trc quan bi v cc bin, tu kha u
l mu en.) tng kh nng trc quan cho chng rnh d quan st. NVIDIA cung
cp 1 file nhng vo Microsoft Visual C++ file *.cu hin th trc quan di Microsoft
Visual C++ nh 1 file *.cpp.
1. Bn vo ng dn C:\Program Files\NVIDIA CUDA
SDK\doc\syntax_highlighting\visual_studio_8 v copy file
usertype.dat vo C:\Program Files\Microsoft Visual Studio 8\Common7\IDE.
2. Tip theo bn vo menu tool ->options trong hp thoi Options bn vo Text
Editor->File Extension (hnh 4.15).
Pha bn phi hp thoi trong Extension: text bn g vo cu (tn m rng ca
chng trnh cuda)
3. Tip theo trong listbox Editor: bn chn Microsoft Visual C++ (mi trng hot
ng ca cuda file).
Sau bn click button Apply and then click OK.khi ng li Microsoft Visual
studio hon tt (Figure 10).

(3)

(2)

Figure 10
By gi bn hon tt vic lm highlighting 1 file *.cu, lm chng trnh sang sa d
c hn
5. Cch thit lp 1 d n CUDA trn Microsoft Visual C++ 2005
Nhng phn trc gii thiu cc ci t lien quan n 1 d n CUDA.trong mc 4.4 gii
thiu cch hot ng 1 chng trnh mu c NVIDIA SDK cung cp km theo. Tuy nhin
bn c th chy bt c chng trnh mu no ca NVIDIA SDK cung cp.
Trong phn ny s ch ra phng php to t mnh to ra 1 d n CUDA.
n gin ta s to 1 d n console


M Microsoft Visual C++, vo menu File->New->Project hp thoi New Project


hin th bn vo Visual C++->Win32 sau chn Win32 Console Application.
Bn c th t tn cho d n l CudaStep1 v solution CudaProgram. Sau
bn click OK->Next->Finish. Cho n thi im ny bn c 1 d n console
nhng cha phi l d n cuda (Figure 11).

Figure 11


Vo ca s Solution Explorer click phi vo Header Files->Add->New Item


hp thoi Add new Item- CudaStep1 hin th. Tip theo bn vo Visual C++>Code chn Header File(.h) v t tn CudaHeader.h ri click Add (Figure 12).
File ny s cha thng tin v cu hnh ca chng trnh cuda v prototype ca
cc hm kernel m bn s vit.

Name

Figure 12


Tip theo cn to 1 file cha m ngun cho 1 chng trnh cuda, file ny s c
m rng bng .cu tng t nh bc 2 bn
vo ca s Solution Explorer click phi vo Header Files->Add->New Item hp
thoi Add new Item- CudaStep1 hin th. Tip theo bn vo Visual C++->Utility
chn Text File (.txt) v t tn l CudaFunction.cu ri click Add.
Cho n lc ny bn to ra 1 d n cuda, nhng chng trnh vn cha hot
ng c v bn cha vit code cho chng trnh. m t hot ng chng
trnh ta cn 1 chn trnh nh x l 1 matrix gm 32 phn t. hm cuda s c
nhim v tng gi tr 1 phn t ln 1 n v

 M file CudaStep1.cpp v khi to 1 matrix dng lm d liu mu tnh ton


C th copy on chng trnh sau:
// cudastep1.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <stdio.h>
#include "CudaHeader.h"
#include <iostream>
using std::cout;
using std::cin;
//prototype display function de hien thi len man hinh
void display(float *matrix, int col, int row);
//ham chnh
int _tmain(int argc, _TCHAR* argv[])
{
//khi to mng v gn gi tr ban u
float matrix[32];
for (int I = 0; I < 32; i++) {
matrix[i] = 9;
}
//hin th matran cha x l

cout<<"before call cuda function \n";


display(matrix, 8, 4);
//gi hm tnh ton ca chng trnh cuda
CudaProcessing(matrix);
//hin th matran sau khi tnh ton
cout<<"after call cuda function \n";
display(matrix, 8, 4);
//dng mn hnh xem xt d liu
int wait;
cin>>wait;
return 0;
}
//************************************************************************************//
//display a matrix
void display(float *matrix, int col, int row)
{
printf("\n");
for(int I = 0; I < row; i++) {
printf("\nRow %2d:\n", i);
for(int j = 0; j < col; j++) {
printf(" ");
printf("%.1f", matrix[I * col + j]);
}
}
printf("\n\n");
}

M file CudaHeader.h v type on code sau vo file CudaHeader.h


/*chng trnh ny dng x l 1 matrix vi 32 phn t nn ta cn 32 threads
1 Block s cha 16 threads
vy ta cn 2 block c 32 threads*/
#define XTHREADS 16
#define YTHREADS 1
#define ZTHREADS 1
#define XBLOCKS 2
#define YBLOCKS 1
#define MATRIXSIZE 32
//prototype CudaProcessing kernel function
extern "C" {void CudaProcessing( float *hostData);};

M file CudaFunction.cu vit hm tnh ton. Trong file ny ta s vit 2 hm,


hm CudaProcessing dng trong vic truyn d liu gia Host, Device v gi
hm kernel tnh ton. Hm cn li l CudaProcessingKernel dng tnh
ton.
#include "CudaHeader.h"
#include <cuda.h>
#include <cuda_runtime.h>
#include <cuda_runtime_api.h>
#include <cutil.h>
extern "C"
//prorotype hm kernel
__global__ void CudaProcessingKernel(float *data);
/*hm ny dng chuyn data t Host qua Device, gi hm kernel sau truyn d liu
tnh ton v li cho Host*/
void CudaProcessing( float *hostData)
{
// chun b b nh trn Driver cha data nhn t Host
float *deviceData;
int size = sizeof(float)*MATRIXSIZE;
cudaMalloc((void**)&deviceData, size);
//copy data t b nh Host vo b nh Device tnh ton
cudaMemcpy(deviceData, hostData, size, cudaMemcpyHostToDevice);
//khai bo s thread trn 1 block cn x l
dim3 dimBlock(XTHREADS, YTHREADS);
//khai bo s block trn 1 grid cn x l
dim3 dimGrid(XBLOCKS, YBLOCKS);
//gi chng trnh tnh ton kernel
CudaProcessingKernel<<<dimGrid, dimBlock>>>(deviceData);
//sau khi tnh ton xong d liu c tr v li cho b nh Host
cudaMemcpy(hostData, deviceData, size, cudaMemcpyDeviceToHost);
//xa b nh tm thi trn Device
cudaFree(deviceData);
}

__global__ void CudaProcessingKernel(float *data)

//kernel function

{
//s th t block trn 1 grid
int bx = blockIdx.x;
//s th t thread trn 1 block
int tx = threadIdx.x;
//s th t thread trn 1 grid
int tid = bx * XTHREADS + tx;
//tnh ton data
data[tid] = data[tid]+1;
//ng b cc thread
__syncthreads();
}

Bin dch chng trnh: bn c th bin dch chng trnh ny vi Win32 hoc
Win64, release, debug, emurelease, hoc emudebug Ty thuc vo cu hnh my
ca bn v ch bn mun build. Tuy nhin chng trnh s bo li v khng
compiler dc file CudaFunction.cu.
Bn cn download builrule t site
http://forums.nvidia.com/index.php?showtopic=30273 file cuda_build_rule.zip
vic download file ny tng t nh bc 8,9 ca mc 4.1.1.Vo Solution
Explorer click phi ln d n CudaStep1 chn Custom Build Rules hp
thoi Visual C++ Build Rule Files hin th click vo Find Existing bn chn
file cuda (file cuda build rules sau khi extract file cuda_build_rule.zip) v
click Open (Figure 13).

Figure 13

Tr li hp thoi Visual C++ Build Rule Files bn check vo CUDA bo cho


compiler bit s dng build rule ny bin dch file cuda (*.cu).
Do trong chng trnh cn kt ni cc th vin nn bn vo Vo Solution
Explorer click phi ln d n CudaStep1 chn Properties hp thoi
CudaStep1 property pages xut hin. vo Configuration Properties->C/C++>General trn ca s pha phi bn vo Additional Include Directories v nhp
ng dn $(CUDA_INC_PATH);./;../../common/inc;"C:/Program Files/NVIDIA

Corporation/NVIDIA CUDA SDK/common/inc" kt ni cc header ca


chng trnh Cuda (Figure 14).

Figure 14


Vo Configuration Properties->linker->General trn ca s pha bn phi bn


vo Additional Library Directories v nhp ng dn cha dn cc file library
ca Cuda. $(CUDA_INC_PATH);./;../../common/lib;"C:/Program Files/NVIDIA
Corporation/NVIDIA CUDA SDK/common/lib";"C:/CUDA/lib"
vo Configuration Properties->linker->Input trn ca s pha bn phi bn vo
Additional Dependences v nhp tn cc th vin cn thit cho chng trnh.
Trong trng hp ny ta dng 2 th vin cudart.lib cutil32.lib.(Figure 15).

Figure 15
By gi th bin dch v chy chng trnh bn s thy kt qu hin ln mng
hnh console.
Gii thch code:
Trong file CudaStep1.cpp cha hm Main() v hm hin th display(). Trong hm
main() 1 matrix s c khi to v gn gi tr ban u l 9. Sau hm main()
cho hin th gia tr ca matrix chua tnh ton nay ln man hnh.
Tip theo chng trnh chnh s gi hm tnh ton (CudaProcessing()) ca device
v truyn matrix ny device tnh ton.
Sau khi tnh ton xong th hm main() s cho hin th kt qu ln mn hnh.
Trong file CudaFunction.cu s tn ti 2 hm.

Hm CudaProcessing() dng copy data t b nh Host sang b nh device


sau gi hm tnh ton kernel, sau khi kt thc tnh ton d liu c tr v li
cho Host.
Hm CudaProcessingKernel() dng tnh ton. S th t ca mi phn t trong
matrix s tng ng vi s th t ca mi thread trong grid, iu ny c xc nh
thng qua ch s tid
6. C ch hot ng 1 chng trnh cuda
Ta s dng Cuda v mong mun chng trnh chy nhanh hn nh kh nng x l song
song. V th tt hn ht chng ta cn loi b cc nh hng lm 1 chng trnh chy
chm i.
1 chng trnh cuda hot ng theo m hnh SIMD (single instruction multiple data) v
th cc nh hng chnh n tc ca chng trnh l s khng thng nht v tranh
chp vng nh trong qu trnh c v lu data. iu ny buc trnh bin dch phi
chn gii php an ton trong truy cp d liu, iu ny bin 1 chng trnh song song
theo m hnh SIMD bin thnh m hnh ni tip.
Kch thc ca kiu d liu rt quan trng trong vic truy cp data 1 cch thng nht
(coalescing) kch thc data phi bng 4,8,16 bytes.
ngoi ra nu s lnh tnh ton l ln th ta nn copy data t global memory vo shared
memory hn ch vic truy cp thng xuyn vo global memory lm chm chng
trnh (do vic truy cp vo global memory mt rt nhiu thi gian hn truy cp vo
shared memory)
Pattern ca 1 chng trnh cuda thng s dng 2 hm (1 hm dnh cho vic truy cp
data v hm cn li thng c gi l hm kernel dng cho vic x l data)
//hm dng trong vic truy cp data
Void DataFunction( type hostData)
{
//to 1 vng nh trn device lu data t host vo device
Type *deviceData;
Int size=sizeof(type)*(s phn t ca deviceData);
cudaMalloc((void**)&deviceData,size);
//copy data t b nh Host vo b nh Device tnh ton
cudaMemcpy(deviceData, hostData, size, cudaMemcpyHostToDevice);
//to 1 vng nh lu data sau khi tnh toan
Type *resultData;
Int resultSize =sizeof(type)*(s phn t ca resultData);
cudaMalloc(void**)&resultData,resultSize);
//khai bo s thread trn 1 block cn x l

dim3 dimBlock(XTHREADS,YTHREADS);
//khai bo s block trn 1 grid cn x l
dim3 dimGrid(XBLOCKS,YBLOCKS);
//gi chng trnh tnh ton kernel
CudaProcessingKernel<<<dimGrid,dimBlock>>>(deviceData,resultData);
//sau khi tnh ton xong d liu c tr v li cho b nh Host
cudaMemcpy(hostData, resultData, resutlSize, cudaMemcpyDeviceToHost);
//xa b nh tm thi trn Device
cudaFree(deviceData);
cudaFree(resultData);
}
//hm dng trong tnh ton data
__global__ void CudaProcessingKernel(type *data, type * result)
{
//s th t block trn 1 grid
int bx=blockIdx.x;
int by=blockIdx.y;
//s th t thread trn 1 block
int tx=threadIdx.x;
int ty=threadIdx.y
//copy data t global memory vo shared memory
__shared__ type sharedData[];
__shared__ type sharedResult[];
//ng b h thng m bo data c copy ln shared memory
__synchreads();
//tnh ton data da theo ch s ca thread
//ng b cc thread m bo data c tnh ton xong
__syncthreads();
}

hiu cch hot ng 1 chng trnh cuda ta cn thng nht 1 s cc khi nim sau.
Host: l nhng tc v v cu trc phn cng, phn mm c x l t CPU.
Driver: l nhng tc v v cu trc phn cng, phn mm c x l t GPU.

Figure 16
Cch hot ng c m t nh sau:
1) D liu cn c tnh ton lun trn b nh ca Host v vy bc 1 truyn d

liu cn tnh ton t b nh Host qua b nh Device.


2) Sau Device s gi cc hm ring ca mnh tnh ton d liu .
Sau khi tnh ton xong, d liu cn c tr v li cho b nh ca Host.

7. nh gi 1 chng trnh CUDA da vo Cuda visual profiler


Trong phn 4.3 miu t cch ci t 1 chng trnh Visual profiler. By gi ta s
dng phn mm quan st 1 d n cuda.
Chy chng trnh Visual profiler bng cch double click vo file visualprof.exe
1) To 1 project mi bng cch vo menu File->New hoc dng toolbar. Sau hp
thoi New Project xut hin, bn cn in tn v ng dn lu li d n ny
(trong trng hp ny ta dng tn l CudaStep1Test) (Figure 17).

Figure 17
2) Click OK hp thoi session settings xut hin. Vo laugh chn file trong d n
cuda m bn bin dch thnh cng (trong trng hp ny chn file
CudaStep1.exe) ri click Start chng trnh hot ng (Figure 18).

Figure 18

Lu : chng trnh visual profiler s hin th 1 thng bo li do chng trnh


CudaStep1.exe khng kt thc. gii quyt vn ny bn xa b dng lnh
cout>>wait; trong file CudaStep1.cpp ri bin dch li chng trnh. By gi bn
c th dng visual profiler quan st d n CudaStep1.exe.(Figure 19).

Figure 19
Nu thnh cng chng trnh s hin th 1 table cha cc thng s cn thit nh
gi 1 d n cuda. D liu thu c sau khi phn tch s c lu trong file Excel.
.

You might also like