final project slides

Project Background

The Particle-in-Cell or PIC method is a computational technique for the simulation of electron and proton dynamics in space, astrophysics and fusion devices.

Electron and protons are simulated as charged particles under the effect of electric (E) and magnetic field (B). At each simulation step, each particle position coordinate (x, y and z) and velocity in the three directions (u, v and w) are calculated solving the so-called equation of motion for each particle. For each particle we solve the equation d v_p / dt = q/m (Ep + v_p x Bp) and d x p/ dt = v _pdt where E_p and B_p are the electric and magnetic field at the particle position, dt is the time step and q/m is the charge-mass ratio for a particle.

In the PIC method, the three components of electric (E_x, E_y, and E_z) and magnetic field (B_xn, B_yn and B_zn) are defined on the nodes of a three-dimensional grid. The electric field and magnetic field at the particle position (E_p and B_p) are calculated as the linear interpolation from the value defined on the grid.

The computational phase calculating the new particle position and velocity together with the electric and magnetic field at the particle position is called "particle mover" or "particle pusher".

A second phase of the PIC method is the interpolation from particle to the grid (interpP2G). Each particle deposits to each grid point part of its charge and weight to the nodes of the cell it resides to calculate ten quantities on the grid: rho (charge density), Jx, Jy, Jx (current density), pxx, pxy, pxz, pyy, pyz, pzz (pressure).

We have developed a serial C PIC code, called sputniPIC, starting from the iPIC3D code. The code has a serial implementation of the particle mover, called mover_PC, and interpolation called interpP2G.

The sputniPIC code is available here.

Project Objective

The overall goal of the project is to prepare sputniPIC to run on GPUs, measure and optimize its performance.

In particular,

int mover_PC(struct particles* part, struct EMfield* field, struct grid* grd, struct parameters* param) should run on GPU
void interpP2G(struct particles* part, struct interpDensSpecies* ids, struct grid* grd) should run on GPU

We also expect you to use timers and profilers to evaluate the performance and plan the optimization. as optimization techniques, we suggest you use pinned memory, CUDA streams, and kernel fusion (you can merge mover_PC and interpP2G in one kernel).

Project Deliverables

Design Document (due Dec. 8, max 1 page). You need to prepare a document describing the initial plan for porting sputniPIC to GPU. In particular, which file and functions you are going to work on, which variables you will move to GPU, which tool and platforms you will use in your project and which optimizations you will use.
Final report and code (due Jan. 14, max 8 pages). You need to prepare a final report describing your work in porting sputniPIC to GPU.

References

Markidis, Stefano, Giovanni Lapenta and Rizwan-uddin. "Multi-scale simulations of plasma with iPIC3D." Mathematics and Computers in Simulation 80.7 (2010): 1509-1519.

Hints and suggestions

This section provides a number of hints and suggests that will help you get started with the porting.

Source code and compilation

Study the particle mover and interpolation, which are in Particles.cu. A Makefile is already provided for you. The compiled code will be located in the bin folder. The data structures are declared in header files, that are stored in the folder include. For example: Grid.h, EMfield.h, Particles.h

Inputfiles and execution of simulation

We have provided an input file that is called GEM_2D.inp. The code compiled code can be executed like the following:

./bin/sputniPIC.out inputfiles/GEM_2D.inp

Checking of results and visualization of results

VTK files will be written to the data folder. Ensure that the folder exists before you execute the program! The files are written in text format and can be readable by any text editors. For example, you can use the result of rho-net as a check of your implementation. If you are interested, you can install ParaView to visualize the products from the simulation. You can control the frequency of output using the FieldOutputCycle parameter in the inputfile.

Memory allocation

The code uses many multidimensional arrays. They are often declared as a pointer to pointers to actual memory allocation. This allows the access of memory using subscripts (e.g. x[i][j][k]). However since they are not allocated contiguously (i.e. not with a single malloc), it is difficult to perform a single memory copy of the data to the GPU. For this reason, we implemented a memory allocator that forms a pointer chain, which eventually points to another memory region that is allocated in 1D. We have already implemented this for the most basic arrays that you will need for the porting. They are all named: (name)_flat. For example, XN_flat in Grid.h is where data is actually stored and XN is a pointer chain that points to XN_flat.

Multidimensional array index in 1D

Since you will be copying the flat arrays to the GPU, they have to be accessed using 1D indexing. For this reason we have created a set of helper functions (in Alloc.h) that will compute the equivalent 1D index, when given multidimensional indexes. For example:

long get_idx(long x, long y, long z, long stride_y, long stride_z)

Computes the 1D index of a three-dimensional array. In case of x[i][j][k], the 1D index can be computed with

get_idx(i, j, k, (size of second last dimension), (size of last dimsnstion).

Feel free to copy these helper functions to where you need them.