Grading Criteria
Grading Criteria
To achieve a particular grade, all the criteria for that particular grade and grades below must be met. For example, to get B, all criteria for B, C, D, E must be met, and so on.
Grade |
Code |
Report |
A |
- Completely offload both the particle mover and interpolation to CUDA, with both steps performed in a single kernel.
- Implement a simple particle mover that uses OpenACC.
|
- The report is clear, readable and of high quality.
- Explain how you overcome the challenges that you discussed in grade C for interpolation.
- Illustrate the performance implication using nvprof.
- Discussion the overall performance change, after completing you have completed all the grade levels.
- If the performance is not as expected, discuss and explain, using results from nvprof.
- Compare the performance of the versions using OpenACC and CUDA.
|
B |
- Use pinned memory on the host.
- Use Stream and async copy
|
- Illustrate the performance when using pinned memory and asynchronous data movement, comparing the version in grade C, using nvprof.
- Explain your strategy of the overlapping in computation.
- Illustrate the performance when overlapping computation, using nvprof.
- Discuss general performance changes from the previous version. If the performance is not as expected, explain why.
|
C |
- Implement mini-batches of particles to the particle mover so that the application can process more particles than it fits on the GPU.
|
- Explain the strategy of your mini batching.
- Illustrate the performance implication using nvprof. Repeat the experiments you did for grade E.
- Discuss performance changes from the previous version.
- Explain the challenges in implementing a completely offloaded interpolation.
|
D |
- Port part of the interpolation (interpP2G) to CUDA.
- All the CUDA memory management is correctly taken care of.
|
- Explain how the interpolation is ported.
- Illustrate the performance implication using nvprof. Repeat the experiments you did for grade E.
- Discuss the performance bottleneck in this implementation.
|
E |
- Port the particle mover (mover_PC) of the code to use GPU, without further optimization.
- All the CUDA memory management is correctly taken care of.
- All the input files that are used to perform the experiments are provided.
|
- The final report has all five sections and readable.
- Design specification provided
- Describe clearly your experiments and the testing environment.
- Explain how to reproduce your results so we can run the code.
- Plots are readable, have labels and are clearly explained.
|