Intro to GPUs

Slides

Download 1.1 Intro to GPUs.pdf

Video Transcription

Slide 3 – GPUs

What are GPUs?

GPU stands for Graphical Processing Unit.

A GPU is a specialized microcircuit to accelerate the creation and manipulation of images in video frames for display devices 

A specialized microcircuit is a special hardware specifically designed for solving certain kind of tasks and problems at maximum speed. However, they might not be particularly good or efficient in solving other kind of tasks. GPU is specialized in solving graphics problems while for instance might not be good in applications characterized by little parallelism and irregular data access.

In summary, when it comes to graphics tasks, GPUs are computational monsters that generate real monsters in your display device!

 

Slide 4 –  GPU Design Motivation: Process Pixels in Parallel

What are GPUs good for?

Well, GPUs are designed for processing images at maximum speed. Image processing is a typical example of data-parallel problem where operations on data - in this case pixel of the images- are performed in parallel.

As an example, if we want to manipulate a frame of a video using 1080i or 1080p modes, we need to process 1920 times 1080 pixels that is roughly 2 million pixels. If we want to render in real-time, we need something between 30 and 60 frames per second, and we have approximately 33 ms per frame. A GPU can do this: process 2 million pixels in less than 33 ms. WOW!

Many image processing problems are characterized by pixel calculations that are independent on other pixel calculation. For this reason, these calculations can be easily done in parallel without requiring synchronization and/or sophisticated control.

In summary, GPUs are good for applications that are data-parallel in nature and do not require synchronization and/or sophisticated control.

 

Slide 5 – GPUs are more and more present in HPC!

In the last decade, GPUs became one of the main workforces in HPC supercomputer centers that aim at lower down the consumption of power and energy by supercomputers.

But why is it that?

Well, the reason is that GPUs provide lots of parallelism at low clock speed. Keep in mind, that the power consumption in a processor is a cubic function of the clock speed. So, by increasing the available parallelism and at the same time by decreasing the clock speed, GPUs provide an excellent ratio of Floating-in-point operations per second per Watt and is a very power-efficient architecture.

It is not a surprise that the Green500 list, that is a list ranking the supercomputers according to their power efficiency, features all supercomputers with GPUs in the first six places.

Slide 6 – Where do you find GPUs ?

We can broadly divide GPUs in two categories: integrated GPUs and dedicated GPUs.

As the name says, integrated GPUs are built-in into the main processor. These are the most common GPUs in laptops like the one I am using one. You probably heard about Intel HD or Iris graphics. You might want to check your laptop characteristics and figure out if you have one of these integrated GPU.

I can also check mine … let’s see. So I get out from power-point, about this Mac, and yes I do have an Iris graphics card.

The second category of GPUs is the dedicated GPU category. This is the category we are more interested in. Dedicated GPUs are standalone GPUs with their own processor and memory. They require more power – yes we need to pay energy to move data between GPU and CPU- but these are the GPUs that provide higher performance.

In this course, we are going to use dedicated GPUs.

What is the main difference between integrated and dedicated GPUs? Dedicated GPUs have their own memory while integrated GPUs share memory with the CPU.

Slide 7 – Vendors of dedicated GPUs

There are several dedicated GPU vendors. Among these ones, probably the most successful when it comes to GPU for HPC and deep-learning applications is NVIDIA. 

Slide 8 – GPUs as Accelerators

One of the problems of dedicated GPUs is that they don’t run an operative system.

How do we solve this problem? The solution is to have a hybrid system consisting a CPU and a GPU, connected via PCIe bus.

In this hybrid configuration, the CPU acts as host and it provides GPU with basic management and services since the GPUs doesn’t run an OS.

In this configuration, the GPU acts as accelerator or co-processor and it provides the computational power and increased parallelism.

Slide 9 – The weakness of GPUs

In this course, we are going to use dedicated GPUs connected to CPU via PCIe bus. The data movement between GPU and CPU and vice-versa is relatively slow. When I say relatively slow, I mean when compared to the speed in moving data from GPU to its own memory, called GDRAM, and from CPU to DRAM.

The Figure in this slide shows the data transfer bandwidth between different parts of a system comprising an NVIDIA Tesla K40 connected to CPU via a third generation PCIe. From this figure, you can see that the bandwidth between GPU and CPU memory is roughly ten times smaller than the bandwidth to GDR.

 The good news is that this problem is now solved with the arrival on the market of the new NVIDIA GPUs! These GPUs provide a much faster link between GPU and CPU, called NVLink.

 

Slides 10 – To Summarize

In summary, in this lecture, I have focused on four main points.

First, I have emphasized that GPUs are specialized hardware, initially designed for graphics applications and now widely used in many different areas. Second, GPUs can be either integrated in the processor or have a dedicated chip. In this course, we focus on dedicated GPUs. Third, when using dedicated GPUs, we need a CPU that acts as host and provide OS services to the GPU. Fourth, to move data from GPU memory to CPU memory is relatively slow.

In the next lecture, we are going to look at basic concepts and philosophy driving the design of GPU architecture design.

So, talk you in a bit!