Grading Criteria

Grading Criteria

To achieve a particular grade, all the criteria for that particular grade and grades below must be met. For example, to get B, all criteria for B, C, D, E must be met, and so on.

Grade Code Report
A
  • Completely offload both the particle mover and interpolation to CUDA, with both steps performed in a single kernel.
    •  OR
  • Implement a simple particle mover that uses OpenACC.
  • The report is clear, readable and of high quality.
  • Explain how you overcome the challenges that you discussed in grade C for interpolation.
    • AND
  • Illustrate the performance implication using nvprof.
    • AND
  • Discussion the overall performance change, after completing you have completed all the grade levels.
    • AND
  • If the performance is not as expected, discuss and explain, using results from nvprof.
    • OR
  • Compare the performance of the versions using OpenACC and CUDA.
B
  • Use pinned memory on the host.
  • Use Stream and async copy 
  • Illustrate the performance when using pinned memory and asynchronous data movement, comparing the version in grade C, using nvprof.
  • Explain your strategy of the overlapping in computation.
  • Illustrate the performance when overlapping computation, using nvprof.
  • Discuss general performance changes from the previous version. If the performance is not as expected, explain why.
C
  • Implement mini-batches of particles to the particle mover so that the application can process more particles than it fits on the GPU.
  • Explain the strategy of your mini batching.
  • Illustrate the performance implication using nvprof. Repeat the experiments you did for grade E.
  • Discuss performance changes from the previous version.
  • Explain the challenges in implementing a completely offloaded interpolation.
D
  • Port part of the interpolation (interpP2G) to CUDA.
  • All the CUDA memory management is correctly taken care of.
  • Explain how the interpolation is ported.
  • Illustrate the performance implication using nvprof. Repeat the experiments you did for grade E.
  • Discuss the performance bottleneck in this implementation.
E
  • Port the particle mover (mover_PC) of the code to use GPU, without further optimization.
  • All the CUDA memory management is correctly taken care of.
  • All the input files that are used to perform the experiments are provided.
  • The final report has all five sections and readable.
  • Design specification provided
  • Describe clearly your experiments and the testing environment.
  • Explain how to reproduce your results so we can run the code.
  • Plots are readable, have labels and are clearly explained.