Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

ThewayIimplementbyreductionisbywritingfourkernelstotal_1,total_2,total_3andtotal_4.

Whiletotal_1andtotal_2makeuseofsharedmemory,total_3andtotal_4makeuseofglobal
memoryforoperations.Asummaryforeachofthesefunctionsisprovidedinthetablebelow.

Padding
:BeforeIsendmyhostInputarraytomykernel,Ipadthemwithzeroesinorderto
makeitssizeaperfectmultipleoftheblockdimension.Idothisforallthekernels.

FunctionDescription:

FunctionName

Description

total_1

SequentialAddressingusingSharedMemory
:Theelementsinthekernelsum
uptosequentialelementsineachblock.Asharedmemoryisallocatedforevery
blockthatgetsinitializedatthebeginningofeachthread.Theoutputisthen
computedinthe0thelement,whichisfinallyaddedinthemainfunction.

total_2

InterleavedAddressingusingSharedMemory
:Theelementsinthekernelsum
uptointerleavedaddressesineachblock.Asharedmemoryisallocatedforevery
blockthatgetsinitializedatthebeginningofeachthread.Theoutputisthen
computedinthe0thelement,whichisfinallyaddedinthemainfunction.

total_3

SequentialAddressingusingGlobalMemory
:Theelementsinthekernelsum
uptosequentialelementsineachblock.ThekernelmakesuseofGlobalmemory.
Theoutputisthencomputedinthe0thelement,whichisfinallyaddedinthemain
function.

total_4

InterleavedAddressingusingGlobalMemory
:Theelementsinthekernelsum
uptointerleavedaddressesineachblock.ThekernelmakesuseofGlobalmemory.
Theoutputisthencomputedinthe0thelement,whichisfinallyaddedinthemain
function.

AverageRuntimes:

DATASET

total_1

total_2

total_3

total_4

Dataset0

0.0665024

0.08832

0.088912

0.0679264

Dataset1

0.1025056

0.1121408

0.1037472

0.0877376

Dataset2

0.1048416

0.0924896

0.094928

0.1089248

Dataset3

0.097296

0.090336

0.1323584

0.1061632

Dataset4

0.1146208

0.1167808

0.1025984

0.1065056

Dataset5

0.102272

0.0886848

0.10704

0.117056

Dataset6

0.1087072

0.1168128

0.1076704

0.1284864

Dataset7

0.1314368

0.1143584

0.1339328

0.115504

Dataset8

0.1564992

0.1742336

0.1031648

0.2341344

Dataset9

0.1845984

0.1601152

0.2680192

0.2333856

Learning:
Ilearnedthefollowingstuff:
1. ThefirstfeelofwritingaCUDAprogram.
2. HowthesharedmemoryandglobalmemoriesareutilizedinCUDAkernels.
3. Howreductionsalgorithmsfunction.
4. Whatarethevariouskindsofreductionalgorithms.
5. Howruntimesvarywithvariousimplementationsofmemoriesandreductionalgorithms.

InstructiontoRun:
Forrunningeachkernel,youllhavetouncommentthemonebyone.Allthekernelsarelisted
togetherinthedriver.cu

You might also like