Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

An introduction to Thrust

A CUDA library of parallel algorithms

Jose Nunez Gonzalez


Michel A. Rivero Corona
What is Thrust?

Thrust is a C++ template library for CUDA.


Requires CUDA 3.0 (included in CUDA 4.x)
Thrust allows you to program GPUs with
minimal programing effort using an interface
similar the C++ Standard Template Library
(STL).
You just need to #include the appropriate
header files into your .cu file and compile with
nvcc.
What is Thrust?

Thrust is a cohesive collection of algorithms


and data structures in a single package.
Thrust is self-contained and requires no
additional libraries.
Thrust is open-source software.
Thrust has been tested extensively on Linux,
Windows and MacOSX systems.
Thrust components
Container Classes
Storage your data
Vector, list, map ...

Algorithm Classes
frequently used algorithms
sort, find, binary search, ...

Iterator Classes
Vector containers
Thrust provides

thrust::host_vector
thrust::device_vector

These vector data structures simplify memory


management and transferring data between the
host and device.
Basic_vector.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <iostream>

int main(void)
{
// H has storage for 4 integers
thrust::host_vector<int> H(4);

// initialize individual elements


H[0] = 14; H[1] = 20; H[2] = 38; H[3] = 46;

// resize H
H.resize(2);

// Copy host_vector H to device_vector D


thrust::device_vector<int> D = H;

// elements of D can be modified


D[0] = 99;
D[1] = 88;

// print contents of D
for(int i = 0; i < D.size(); i++)
std::cout << "D[" << i << "] = " << D[i] << std::endl;

// H and D are automatically deleted when the function returns


return 0;
}
STL Vector::Member functions
Iterators:
begin Return iterator to beginning
end Return iterator to end

Capacity:
size Return size
resize Change size

Modifiers:
assign Assign vector content
push_back Add element at the end
pop_back Delete last element
insert Insert elements
erase Erase elements
An important question
Can I create a thrust::device_vector from memory
I've allocated myself?

Answer: No, instead, wrap your externally


allocated raw pointer with thrust::device_ptr and
pass it to Thrust algorithms.
Wrap_pointer.cu
#include <thrust/device_ptr.h>
#include <thrust/fill.h>
#include <cuda.h>

int main(void)
{
size_t N = 10;

// raw pointer to device memory


int * raw_ptr;
cudaMalloc((void **) &raw_ptr, N * sizeof(int));

// wrap raw pointer with a device_ptr


thrust::device_ptr<int> dev_ptr(raw_ptr);

// use device_ptr in thrust algorithms


thrust::fill(dev_ptr, dev_ptr + N, (int) 0);

// access device memory through device_ptr


dev_ptr[0] = 1;

// free memory
cudaFree(raw_ptr);

return 0;
}
Thrust algorithms
● Linear Search
find, find_if ...
● Subsequence Matching

search, find_end ...


● Counting Elements

count, count_if
● for_each

● Comparing Two Ranges

equal, mismatch …
● Generalized Numeric Algorithms

inner_product, adjacent_difference ...


Thrust algorithms
● Copy Ranges
copy, copy_n ..
● Swapping Elements

swap, swap_ranges ...


● Replacing Elements

replace, replace_if, replace_copy …


● Permuting Elements

reverse
● Others

sort, generate, random


Thrust (STL) algorithms
Approximately 60 standard algorithms

● Search
● Sort

● Transformations

● Numeric

Most functions take the form

Function(Iter_begin,Iter_end, ...)
sort.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <cstdlib>
#include <math.h>

int main(void)
{
int k,n;
k=10;
n=int(pow(2.0,k));

// generate random data on the host


thrust::host_vector<int> h_vec(n);
thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device
thrust::device_vector<int> d_vec = h_vec;

// sort on device
thrust::sort(d_vec.begin(), d_vec.end());

return 0;
}
sort.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <cstdlib>
#include <math.h>

int main(void)
{
int k,n;
k=10;
n=int(pow(2.0,k));

// generate random data on the host


thrust::host_vector<int> h_vec(n);
thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device
thrust::device_vector<int> d_vec = h_vec;

// sort on device
thrust::sort(d_vec.begin(), d_vec.end());

return 0;
}
sum.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>
#include <cstdlib>

int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(100);
thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device and compute sum


thrust::device_vector<int> d_vec = h_vec;
int x = thrust::reduce(d_vec.begin(), d_vec.end(), (int) 0,
thrust::plus<int>());
return 0;
}
Iterators
●Iterators provide a means for accessing data stored in
container classes such a vector.

● Iterators can be thought of as limited pointers.

● Thrust algorithms (discussed before) use iterators.

● For instance, if you had an Thrust device vector storing


integers, you could create an iterator for it as follows:

thrust::device_vector<int> d_vec;
thrust::device_vector<int>::iterator vecIterator;
Iterators
Thrust provides

constant_iterator

counting_iterator
constant_iterator.cu
#include <thrust/iterator/constant_iterator.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <thrust/device_vector.h>

// for printing
#include <thrust/copy.h>
#include <iterator>

int main(void)
{
thrust::device_vector<int> data(4);
data[0] = 3;
data[1] = 7;
data[2] = 2;
data[3] = 5;

// add 10 to all values in data


thrust::transform(data.begin(), data.end(), thrust::constant_iterator<int>(10),
data.begin(), thrust::plus<int>());

// data is now [13, 17, 12, 15]

// print result
thrust::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, "\n"));

return 0;
}
Iterators
Thrust provides

transform_iterator

zip_iterator
Zip_iterator.cu
#include <thrust/iterator/zip_iterator.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <thrust/device_vector.h>
// for printing
#include <thrust/copy.h>
#include <iterator>
using namespace thrust;

int main(void)
{
device_vector<int> A(3);
device_vector<char> B(3);
A[0] = 10; A[1] = 20; A[2] = 30;
B[0] = ‘x’; B[1] = ‘y’; B[2] = ‘z’;

// create iterator (type omitted)


begin = make_zip_iterator(make_tuple(A.begin(), B.begin()));
end = make_zip_iterator(make_tuple(A.end(), B.end()));

begin[0] // returns tuple(10, ‘x’)


begin[1] // returns tuple(20, ‘y’)
begin[2] // returns tuple(30, ‘z’)

// maximum of [begin, end)


maximum< tuple<int,char> > binary_op;
reduce(begin, end, begin[0], binary_op); // returns tuple(30, ‘z’)
return 0;
}
Troubleshooting
Make sure that files that #include Thrust have a
.cu extension.
Other extensions (e.g..cpp) will cause nvcc to
treat the file incorrectly and produce an error
message.
Some C++ templates could not be supported.
References
http://code.google.com/p/thrust/
http://wiki.thrust.googlecode.com/hg/html/index.htm
http://www.cplusplus.com/reference/stl/

You might also like