Computing with GPUs and CUDA

Slides: 2.1 Computing with GPUs and CUDA-1.pdf Download 2.1 Computing with GPUs and CUDA-1.pdf

Transcription of the video lecture

Slide 2 – Four Key-Points

There are four main points to take with you after this lecture.

First, general-purpose computing on GPU (GPGPU) required reformulating computational problems in terms of graphics primitives and using graphics APIs.
Second, since 2006 GPU Computing frameworks allow us to go beyond GPGPU by bypassing graphics API and ignore the underlying graphical concepts.
Third, there are three major approaches for programming GPUs. You can choose between, low-level, compiler directives and library approaches.
Finally, CUDA is a framework for parallel computing on NVIDIA GPUs, based on extending C/C++ for programming GPUs and it is the framework I will be using in this course.

Slide 3 – Timeline of GPU Computing

As first step in this lecture, I would like to give you an historical perspective about GPU computing. The first NVIDIA GPU, the GeForce 256, was released in 1999. However, there was no public API to program such GPUs. For this reason, it was impossible, to a user like us, to program a GPU. We need to wait until 2003 to have the first NVIDIA GPUs that can be programmed using graphics APIs, like OpenGL. In 2006, NVIDIA introduced CUDA that allowed us programmers to write code without using graphics APIs but instead using parallel computing paradigms. While we will work a little bit on OpenGL in the next module, we only focus on GPU programming in this module.

Slide 4 - GPGPU Computing (2003)

As we saw in the previous slide, GPGPU was introduced first in 2003. GPGPU is the use of GPU to perform computation in applications traditionally handled by the CPU. Examples of traditional CPU applications are computer simulations in fluid dynamics, weather forecast, biology and so on and so forth. GPGPU emerged in 2003, with the introduction of two key features to allow for general-purpose computing: programmable shaders and support for floating-point on graphics processors. Before that GPUs didn’t have programmable shaders and used fixed-point arithmetic.

Slide 5 - GPUGPU Computing – Do Computing Like It were Graphics …

Since 2003, we were able to program a GPU. But what was the problem? The problem was that we could program GPU for solving general-purpose problems, only by using graphics APIs, like OpenGL and DirectX. For this reason, if you wanted to program your simulation code to run on a GPU, you had to program it as it were a graphics application using graphics primitives and APIs. We will see in the next module that graphics APIs provide lot of functions for matrix and vector calculations so we could use these operations when designing our general-purpose application.

Slide 6 - GPU Computing – Computing without Graphics APIs (2006)

In 2007 NVIDIA introduced CUDA to bypass graphics API and ignore the underlying graphical concepts. We could simply program in C or C++ and use threads for make our code to run in parallel on the GPU streaming processors. CUDA uses concepts from High-Performance computing, like threading and shared-memory programming, instead of graphics API and without requiring explicit conversion of the data to a graphical form.

Slide 7 - GPU Computing: Three Major Approaches

When it comes to program a GPU, we can choose three different approaches that differ, by the level of control they give you on the hardware, and by number of lines of code you need to write. The three approaches are low-level, compiler-directives and libraries approaches. Low-level approaches, like CUDA, give you more control on hardware. To program the same algorithm, higher level approaches allow us to write less code than low-level approaches. One important point to mention is that you don’t need to commit to one approach over another one. You can mix different approaches as they can interoperate together.

Slide 8 - Low-Level Approaches: CUDA and OpenCL

NVIDIA CUDA is one of the most popular framework in the low-level category. CUDA is a parallel platform created and owned by NVIDIA. That also means that CUDA is a proprietary framework targeting only NVIDIA computing platforms. Simply, there is no CUDA for GPUs that are not NVIDIA. Instead, OpenCL – be careful here, don’t confuse it with OpenGL that is for graphics – allows us to program all bunch of GPUs and accelerators, like FPGAs and GPUs. OpenCL specifies a programming language, based on C99, for programming. CUDA and OpenCL have many concepts common, but they do use different names just to make it more fun. Here you can see a table with translations from CUDA terms to OPENCL terms.

Slide 9 - Compiler Directives: OpenACC and OpenMP

In the compiler-directive approach, the programmer introduces in the code annotations, called compiler directives or pragmas, to tell the compiler which part of the code should be executed on GPU. You can see an example in the code presented in the right part of the slide. We add a compiler directive using the pound symbol telling the compiler to divide different parts of the loop among threads and run them on GPUs. There are two compiler-directives approaches: OpenACC and OpenMP. To tell you the truth, I am not really a big fan of this. It feels like more GPU programming for dummies. However, I am also guilty of having published few papers on the OpenACC performance. So, I guess that I am not the right person to tell you about this.

Slide 10 - Libraries: Thrust and ArrayFire

There are also libraries that can be used for GPU programming. The two most popular libraries are Thrust and ArrayFore. Thrust is a C++ library, and it is basically C++ Standard templated library (STL) for GPU. Also, ArrayFire is rather popular. Arrayfire provides several functions with interfaces for the most popular programming languages.

Slide 11 - Why we are going to use CUDA?

In this module, we are going to use CUDA. A legitimate question is why is that? Well, there are two main reasons. First, CUDA is an interface providing all the basic concepts we need for basic GPU programming. In fact, we can move later on, to higher-level interfaces or to OpenCL rather easily. The second reason is rather obvious. It is that, CUDA provides the best performance on NVIDIA GPUs. In this course, we use NVIDIA GPUs. OpenCL for NVIDIA GPUs is implemented on the top of CUDA, so, yes, for sure we are not getting a speed-up…

Slide 12 - CUDA Framework

In most of the systems we are using, CUDA is already installed. The CUDA framework has three main components: first, the driver that is low-level software that controls the graphics card; second, the toolkit, that includes the nvcc CUDA compiler among other things; and third, the SDK, that includes codes examples and utilities.

Slide 13 - CUDA APIs

CUDA provides two APIs: the runtime API, that is rather simple, and the driver API, that is low-level CUDA API. One important point is that these APIs are mutually exclusive: you need to decide which one you want to use. We will only use the runtime API in this course. Yes, we are low-level programmers but there is a chance to go even lower by using the CUDA driver API…

Slide 14 - CUDA Virtual Processr as nvcc Flag

When we compiled our first CUDA code, you might have noticed that we asked you to use –arch=… (or --gpu-architecture=) flag after nvcc. But what is that? With this flag, we provide the compiler with the virtual processor architecture of our GPU. A virtual processor is nothing else than a set of CUDA programming features our GPU supports. For instance, in this table on the right you can see features of different virtual processor architectures.

Slide 15 - CUDA C/C++ - CUDA Fortran - CUDA Python/PyCUDA

CUDA was born as an extension to the C/C++ languages. The nvcc compiler is an LLVM-based C/C++ compiler. In addition, there are also Fortran and Python interfaces, if you want to use them. However, especially the Fortran interface, might not support all the CUDA features.

Slide 16 – To Summarize

In summary, in this lecture, we touched upon four main points. First, GPGPU initially required reformulating computational problems in terms of graphics primitives and using graphics APIs, such as OpenGL. Second, GPU computing frameworks, like CUDA, allow to bypass graphics APIs and use more “HPC” interfaces using threads. Third, there are three major approaches for GPU: low-level, compiler directives-based and library approaches. Fourth, we are going to use CUDA that is a framework for parallel computing on NVIDIA GPUs.