OpenACC-based Snow Simulation
MetadataShow full item record
In recent years, the GPU platform has risen in popularity in high performance com-puting due to its cost effectiveness and high computing power offered through its manyparallel cores. The GPUs computing power can be harnessed using the low-level GPGPUprogramming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the pro-grammer fine-grained control of a GPUs resources, they are both generally considereddifficult to use and can potentially lead to complicated software design. To simplifyGPGPU programming and gain more mainstream usage of GPUs, there is an increasedinterest in moving the complexity of GPGPU programming over to the compiler. Thishas lead to the development of the directive-based standard for heterogeneous computingcalled OpenACC, supported by NVIDIA, Cray, PGI, CAPS and others.In this thesis, we explore using OpenACC on a high performance snow simulator code de-veloped by the HPC-Lab at NTNU. The snow simulator consists of two main simulationcomponents; the simulation of wind, and the simulation of snow particle movement.The OpenACC version of the snow simulator is made by first updating the currentCUDA version, porting it to a sequential CPU implementation, and applying OpenACCdirectives to accelerate compute intensive regions in the code. The OpenACC port isalso optimized by reducing datamovement between host and device using OpenACClibrary routines.Due to the heterogeneous nature of OpenACC, we show that the inability to explicitlyuse shared memory as temporary storage and not being able to use texture memory forhardware based interpolation and 3D caching, are the largest performance bottleneckswhen comparing to the CUDA version.This is supported by the benchmarks of the OpenACC implementation which is shown togive only 40.6% performance of the CUDA version with an average speedup of 3.2x whenscaling the amount of snow particles simulated and using a balanced windfield dimension.When scaling the windfield with constant snow particles 58% of the CUDA performanceis reached with an average speedup of 4.84x. The best real-time performance is found atabout 1.5M snow particles when using a balanced windfield with about 524K grid cells.Using OpenACC for accelerating high performance graphical simulations can be a viableoption if the goal is high code portability, however, when the goal is to achieve the best possible performance, our experience show that it is still better to use the more low-level alternatives CUDA or OpenCL.