Performance benchmarking of a discontinuous Galerkin-based compressible flow solver on GPU computing platforms using cnsBench

Karakuş, Ali
Unnikrishnan, Umesh
Rowe, Kris
Patel, Saumil S
Heterogeneous computing architectures have become an integral feature of modern supercomputers. In this work, we present the performance benchmarking results of a discontinuous Galerkin, spectral-element solver for the compressible Navier-Stokes equations on different GPU-based computing platforms. The solver uses OCCA, an open-source library that provides the portability layer to offload targeted kernels across different architectures and vendor platforms, and achieve application portability. Profiling of the solver is conducted, and the most compute-expensive kernels are identified. A mini-app called cnsBench is developed based on the full solver to benchmark and investigate performance characteristics of different core kernels. The kernel performance metrics will be presented and compared across different GPU architectures, such as NVIDIA, Intel, and AMD GPUs, and programming models, such as CUDA, OpenCL and SYCL. The kernel algorithms and memory access patterns are analyzed to provide insights regarding computational bottlenecks and approaches to further optimize performance of these kernels. These efforts will guide future development of compressible flow applications that can leverage the full potential of next generation exascale supercomputers and beyond.
Bulletin of the American Physical Society


Implementation of a risc microcontroller using fpga
Gümüş, Raşit; Güran, Hasan; Department of Electrical and Electronics Engineering (2005)
In this thesis a microcontroller core is developed in an FPGA. Its instruction set is compatible with the microcontroller PIC16XX series by Microchip Technology. The microcontroller employs a RISC architecture with separate busses for instructions and data. Our goal in this research is to implement and evaluate the design in the FPGA. Increasing performance and gate capacity of recent FPGA devices permits complex logic systems to be implemented on a single programmable device. Such a growing complexity dema...
Performance Models for the Spike Banded Linear System Solver
Manguoğlu, Murat; Sameh, Ahmed; Grama, Ananth (Hindawi Limited, 2011)
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers them...
Implementation of an 8-bit microcontroller with system c
Kesen, Lokman; Aşkar, Murat; Department of Electrical and Electronics Engineering (2004)
In this thesis, an 8-bit microcontroller, 8051 core, is implemented using SystemC programming language. SystemC is a new generation co-design language which is capable of both programming software and describing hardware parts of a complete system. The benefit of this design environment appears while developing a System-on-Chip (SoC), that is a system consisting both custom hardware parts and embedded software parts. SystemC is not a completely new language, but based on C++ with some additional class libra...
A reconfigurable computing platform for real time embedded applications
Say, Fatih; Halıcı, Uğur; Department of Electrical and Electronics Engineering (2011)
Today’s reconfigurable devices successfully combine ‘reconfigurable computing machine’ paradigm and ‘high degree of parallelism’ and hence reconfigurable computing emerged as a promising alternative for computing-intensive applications. Despite its superior performance and lower power consumption compared to general purpose computing using microprocessors, reconfigurable computing comes with a cost of design complexity. This thesis aims to reduce this complexity by providing a flexible and user friendly dev...
Performance Evaluation of Different Real-Time Motion Controller Topologies Implemented on a FPGA
MUTLU, B. R.; Yaman, Ulaş; Dölen, Melik; Koku, Ahmet Buğra (2009-11-18)
This paper presents a comprehensive comparison of several real-time motion controller topologies implemented on a field programmable gate array (FPGA). Controller topologies are selected as proportional-integral-derivative controller with command feedforward, sliding mode controller, fuzzy controller, and a hysteresis controller. Controllers and other necessary modules are developed using Verilog HDL and they are implemented on a ML505 development board with a Xilinx Virtex-5 FPGA chip. In order to take ful...
Citation Formats
A. Karakuş, U. Unnikrishnan, K. Rowe, and S. S. Patel, “Performance benchmarking of a discontinuous Galerkin-based compressible flow solver on GPU computing platforms using cnsBench,” presented at the Bulletin of the American Physical Society, Indiana, Amerika Birleşik Devletleri, 2022, Accessed: 00, 2022. [Online]. Available: