Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

CMP 464/788 – Final Spring 2020

Do 8 of the 16 problems.

1) Explain what my algorithm for multiplying a lower triangular Toeplitz matrix


by a vector achieves and what the trade off made.

2) Describe the Odd Even Transposition Sort algorithm, indicate how fast it runs
indicate how fast it runs and how you know it is this efficient.

3) Describe the Shear Sort algorithm and indicate how fast it runs.

4) Given an NxN grid, Shear Sort never requires more than lg N + 1 row sorts
and lg N column sorts. Assuming that N is a positive power of 2.
Demonstrate that this is true.

5) Using big O notation, express the number of operations needed to perform


an FFT on an N element vector. How many parallel steps are required if you
have N parallel processors?

6) How many processors would you need to pipeline the FFT on N elements?
Explain the circumstances under which pipelining the FFT would be
beneficial. Describe the benefit.

7) Trace computation of the product Z0(t) x, where t = <2, 3, 5, 0> and


x = <4, 2, 0, 0> via FFT/IFFT. Recall: Z0(v) is the lower Triangular Toeplitz
matrix defined by its first column vector v.

8) Trace computation of the product Z1(c) x, where c = <4, 2, 3, 1> and


x = <3, 5, 8, 3> via FFT/IFFT. Recall: Z1(v) is the Circulant matrix defined by its
first column vector v.
9) Solve the Circulant system of equations Cx = b where C = Z1(c) given that
c=<2, 4, 3, 1> and b = <34, 28, 26, 32>.

10) Trace computation of the product (8x2 + 3x + 2) (7x – 5) via FFT/IFFT.

11) In today’s CPUs and GPUs transistors are allocated very differently.
What is the main difference in allocation?

12) What is latency?

13) NVIDIA partitions shared memory into multiple banks.


Why did they do this?

14) Each thread in a warp executes the same instruction stream. How then
does each thread in a warp perform a different parallel task?

15) Why does NVIDIA claim that CUDA programmers should issue more
threads than there are thread processors?

16) FFTs are always provided in libraries for parallel hardware. A lot of
work goes into devising efficient implementations of these algorithms for
each new hardware design. Why?

You might also like