Ph.D. School in

Scientific GPU Computing

Copenhagen, 23 to 27 May 2011

Syllabus

The course content will be divided in five parts detailed below.

Part 1 : Introduction to GPU basics

- introduction to trends in graphics processing unit (GPU) hardware.
- progression of NVIDIA GPUs.
- background information & history on GPGPU (general purpose GPU) computing.
- hardware considerations in GPU design.
- General Purpose GPU computing community & resources.
- CUDA programming basics.
- CUDA programming model and terminology.
- Asynchronous CPU/GPU compute model.
- Work flow for a GPGPU computation.
- Allocating storage arrays on the GPU device.
- Transferring data between host and device.
- The CUDA thread hierarchy.
- Invoking a CUDA kernel through special syntax

   Hands-on Lab 1 : Mandelbrot Generator (CUDA)

Part 2 : Memory hierarchy, optimizations and libraries

- A simple CUDA kernel to add two vectors together.
- Catching CUDA errors.
- Timing CUDA kernels.
- How to compile and link CUDA programs using the nvcc compiler.
- Non-uniform memory architecture of GPGPU devices:
- Optimization techniques and case examples.
- Strategies for achieving high performance of CUDA (and OpenCL) kernels:
- Overview of NVIDIA's CUDA Toolkit
- The nvcc compilation chain and intermediate compiler files.
- Debugging kernels with the NVIDIA's CUDA gdb debugger.
- Profiling CUDA kernels with NVIDIA's Visual Profiler.
- Profiling CUDA kernels from the command line.
- Compiler optimization options (CUDA and OpenCL)

   Hands-on Lab 2 : Matrix-matrix operation (CUDA)

Part 3 : Programming tools and math libraries

- Building blocks for high-performance computing.
- CUDA Programming Tools.
- Profiling tools.
- Debugging tools and strategies.
- Standard libraries.
- Scripting for GPUs via python (pyCUDA).

   Hands-on Lab 3 : Matrix-matrix operation via cuBLAS (CUDA)

Part 4 : OpenCL

- GPU hardware architectures (Nvidia and AMD).
- Background to OpenCL - OpenCL standard for heterogenous computing on multicore archtectures.
- CUDA vs. OpenCL (syntax, functionality, terminology, memory models).
- CUDA vs. OpenCL case examples.
- Scripting for GPUs via python (pyOpenCL).
- Cross platform performance comparison.
- Porting CUDA to OpenCL using Swan.

   Hands-on Lab 4 : Mandelbrot Generator and Matrix-Matrix operation (OpenCL)

Part 5 : GPU-based advanced PDE solvers

- GPU Accelerated Discontinuous Galerkin Methods.
- The Discontinous Galerkin Methods for building advanced solvers..
- Scientific computing challenges.
- Why GPUs Matter: trends.
- Parallel partitioning on Multi-GPUs.
- High-performance scientific computations.

   Hands-on Lab: Initiation of Project work (CUDA/OpenCL)

Relevant Text Book (background reading):

David B. Kirk, Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, 2010.

This book is now available in the bookshop at DTU: Polyteknisk Boghandel in Building 101.

Ph.D. School in Scientific GPU Computing, Richard Petersens Plads, DTU - Building 321, DK-2800 Lyngby
dcamm@mat.dtu.dk