Professional Documents
Culture Documents
Assignment 1 Read Me
Assignment 1 Read Me
Whiletotal_1andtotal_2makeuseofsharedmemory,total_3andtotal_4makeuseofglobal
memoryforoperations.Asummaryforeachofthesefunctionsisprovidedinthetablebelow.
Padding
:BeforeIsendmyhostInputarraytomykernel,Ipadthemwithzeroesinorderto
makeitssizeaperfectmultipleoftheblockdimension.Idothisforallthekernels.
FunctionDescription:
FunctionName
Description
total_1
SequentialAddressingusingSharedMemory
:Theelementsinthekernelsum
uptosequentialelementsineachblock.Asharedmemoryisallocatedforevery
blockthatgetsinitializedatthebeginningofeachthread.Theoutputisthen
computedinthe0thelement,whichisfinallyaddedinthemainfunction.
total_2
InterleavedAddressingusingSharedMemory
:Theelementsinthekernelsum
uptointerleavedaddressesineachblock.Asharedmemoryisallocatedforevery
blockthatgetsinitializedatthebeginningofeachthread.Theoutputisthen
computedinthe0thelement,whichisfinallyaddedinthemainfunction.
total_3
SequentialAddressingusingGlobalMemory
:Theelementsinthekernelsum
uptosequentialelementsineachblock.ThekernelmakesuseofGlobalmemory.
Theoutputisthencomputedinthe0thelement,whichisfinallyaddedinthemain
function.
total_4
InterleavedAddressingusingGlobalMemory
:Theelementsinthekernelsum
uptointerleavedaddressesineachblock.ThekernelmakesuseofGlobalmemory.
Theoutputisthencomputedinthe0thelement,whichisfinallyaddedinthemain
function.
AverageRuntimes:
DATASET
total_1
total_2
total_3
total_4
Dataset0
0.0665024
0.08832
0.088912
0.0679264
Dataset1
0.1025056
0.1121408
0.1037472
0.0877376
Dataset2
0.1048416
0.0924896
0.094928
0.1089248
Dataset3
0.097296
0.090336
0.1323584
0.1061632
Dataset4
0.1146208
0.1167808
0.1025984
0.1065056
Dataset5
0.102272
0.0886848
0.10704
0.117056
Dataset6
0.1087072
0.1168128
0.1076704
0.1284864
Dataset7
0.1314368
0.1143584
0.1339328
0.115504
Dataset8
0.1564992
0.1742336
0.1031648
0.2341344
Dataset9
0.1845984
0.1601152
0.2680192
0.2333856
Learning:
Ilearnedthefollowingstuff:
1. ThefirstfeelofwritingaCUDAprogram.
2. HowthesharedmemoryandglobalmemoriesareutilizedinCUDAkernels.
3. Howreductionsalgorithmsfunction.
4. Whatarethevariouskindsofreductionalgorithms.
5. Howruntimesvarywithvariousimplementationsofmemoriesandreductionalgorithms.
InstructiontoRun:
Forrunningeachkernel,youllhavetouncommentthemonebyone.Allthekernelsarelisted
togetherinthedriver.cu