Introduction to OpenACC - Data Management
The lecture slides are available here Download here.
Lecture Transcripts
Slide 2 – Four Key Points
This lecture has four main points. First, we are going to look at OpenACC data regions. To define data regions allows to leave data in GPU device memory and reuse it across different functions. Data regions can be implicit if the compiler automatically takes care of copying in and out memory objects to/from GPU memory. On the other hand, data regions can be explicit when we tell the compiler which memory object to move in and out and leave on GPU using #pragma acc data. Finally, we use a data clause to tell compiler which memory object to move to/from GPU memory.
Slide 3 – OpenACC memory model
In an OpenACC memory model, the host memory and the device memory are treated as separated. It is assumed that the host is not able to access device memory directly and the device is not able to access host memory directly. So OpenACC doesn’t have support for unified memory access between GPUs. Just like in CUDA C, in OpenACC input data needs to be transferred from the host to the device before kernel launches and result data needs to be transferred back from the device to the host. However, unlike in CUDA C/ C11 where programmers need to explicitly code data movement through API calls, in OpenACC they can just annotate which memory objects need to be transferred. The OpenACC compiler will automatically generate code for memory allocation, copying, and de-allocation.
Slide 4 – OpenACC Data Management
OpenACC has two modes of dealing with data: the implicit and explicit data regions. In the implicit data regions, we basically don’t do anything and we let the compiler decide what to do. Of course, this is simple but in some cases we want to have more performance and tell the compiler that we are going to reuse the data on the GPU and to leave it there. We can do it with explicit data regions where we tell the compiler what to do
Slide 5 – Implicit Data Management
This code is an example of the implicit data region. How do we know that it is implicit? Well, we don’t give any information about the data. The compiler can determine the size of a and b, and it will copy them to the device along with the kernel. So, in this case, the data movement of a and b is implicit a.
Slide 6 – Explicit Data Regions
For having explicit data regions, we add the #pragma acc data line with two braces. In this way, we define an OpenACC data region. Why we want to have data regions in some regions? Well, the data was copied on to the device within the data region that will persist with the region. This allows to leave data in GPU device memory and we can reuse it. This allows us to avoid to copy back and forth data from and to the GPU.
Slide 7 – Explicit Regions II
The example of this slide shows the use of the copyin clause to tell which variables to be copied to GPU memory. The compiler will then copy the arrays a and b to the device memory. There is a chance to tell the compiler the size of the array, putting the size in the bracket of the array. However, in this case, we omit the size and the compiler still has to work out sizes. An important point for us is that we can use a or b later inside the data region for another kernel without having to copy them again in again. In this way, we will gain performance.
Slide 8 – Other Data Clauses
Together with copyin clause, there are other data clauses. For instance, we can use to copy out to create an array on the device and copy it to the host at the end of the region. Remember that we still need to initialize the array. We can use the create data clause to create an array on the device. This is particularly good to create a temporary array on the GPU memory. There is the present clause to specify to tell that the data is already on the device. This clause is typically used in function calls and it helps performance. Finally, there is the copy clause to copy data to and from the device. Use only when needed because it might impact the performance.
Slide 9 – To summarize
In summary, this lecture was about OpenACC data regions. Data regions can be implicit if the compiler automatically takes care of copying in and out memory objects to/from GPU memory. Or data regions can be explicit when we tell the compiler which memory object to move in and out and leave on GPU using #pragma acc data. Explicit data region might be helpful to avoid unnecessary data communication and increase the performance.