Parallel implementation of the finite element method on graphics processors for the solution of incompressible flows

Download
2014
Göçmen, Mahmut Murat
In recent years clock speeds and memory bandwidths of Graphics Processing Units (GPUs) increased dramatically compared to CPUs. Also GPU vendors developed and freely released new programming tools to make scientific computing on GPUs easier. With these recent developments the use of GPUs for general purpose computing becomes a popular research field. Researchers previously demonstrated that use of GPUs may provide tens of times of speeds-ups compared to CPU solvers for CFD methods such as Smoothed Particle Hydrodynamics, Lattice Boltzmann and Discontinuous Galerkin, which are known to offer very high parallelization potential. However, studies for the utilization of GPUs for classical finite volume and especially for finite element based CFD codes are rare in the literature. This study involves the development of a flow solver based on the Finite Element Method (FEM) working parallel on GPUs. CUDA (Compute Unified Device Architecture) programming toolkit developed by NVIDIA is used for GPU programming. Three-dimensional, laminar, incompressible, flows with possible heat transfer effects are considered. Governing equations are discretized using 2 different fractional step algorithms. Accuracy of the developed solver is tested using 5 benchmark problems, including a microchannel flow and flow inside a tube with conjugate heat transfer. Each step of the fractional step algorithm is investigated in detail on the CPU and GPU for run time performance. Speed-up tests are performed on a series of meshes with total number of unknowns between 700,000 and 6.7 million. Parallelization on the CPU is achieved by using Intel’s MKL library and OpenMP and on the GPU mostly CUBLAS, CUSPARSE and CUSP libraries are used with some scratch-built GPU kernels whenever necessary. For the largest mesh tried, GPU usage resulted in 5.79 and 1.86 times speed-ups compared to single-thread and 8-thread CPU solutions, respectively. The use of single precision arithmetic is investigated from accuracy and efficient points of view and it is seen that it does not degrade accuracy, while providing almost 2 times speed-up both on the CPU and the GPU. Compared to the explicit version, implicit fractional step algorithm turned out to be advantageous in terms of run time for steady state problems. On the other hand, explicit method uses less memory as expected.

Suggestions

GPU based real time stereoscopic ray tracing
Es, Alphan; İşler, Veysi (2007-11-09)
Over the last couple of years graphics processing units (GPU) found in graphics cards evolved into general purpose parallel stream processors. This evolution allows for using GPUs not only for traditional rasterization based rendering but also for global illumination techniques including ray tracing. Fast generation of stereo images is very important for virtual reality applications. Rendering stereo image pairs for left and right eye separately doubles the frame time. This might be a problem for interactiv...
Optimization of Advanced Encryption Standard on Graphics Processing Units
Tezcan, Cihangir (2021-01-01)
Graphics processing units (GPUs) are specially designed for parallel applications and perform parallel operations much faster than central processing units (CPUs). In this work, we focus on the performance of the Advanced Encryption Standard (AES) on GPUs. We present optimizations which remove bank conflicts in shared memory accesses and provide 878.6 Gbps throughput for AES-128 encryption on an RTX 2070 Super, which is equivalent to 4.1 Gbps per Watt. Our optimizations provide more than 2.56x speed-up agai...
Data parallelism for ray casting large scenes on a cpu-gpu cluster
Topcu, Tümer; İşler, Veysi; Department of Computer Engineering (2008)
In the last decade, computational power, memory bandwidth and programmability capabilities of graphics processing units (GPU) have rapidly evolved. Therefore, many researches have been performed to use GPUs in advanced graphics rendering. Because of its high degree of parallelism, ray tracing has been one of the rst algorithms studied on GPUs. However, the rendering of large scenes with ray tracing can easily exceed the GPU's memory capacity. The algorithm proposed in this work uses a data parallel approac...
Open problems in CEM: Porting an explicit time-domain volume-integral- equation solver on GPUs with OpenACC
Ergül, Özgür Salih; Al-Jarro, Ahmed; Clo, Alain; Bagci, Hakan (Institute of Electrical and Electronics Engineers (IEEE), 2014-01-01)
Graphics processing units (GPUs) are gradually becoming mainstream in high-performance computing, as their capabilities for enhancing performance of a large spectrum of scientific applications to many fold when compared to multi-core CPUs have been clearly identified and proven. In this paper, implementation and performance-tuning details for porting an explicit marching-on-in-time (MOT)-based time-domain volume-integral-equation (TDVIE) solver onto GPUs are described in detail. To this end, a high-level ap...
Numerical Investigation on Cooling of Small form Factor Computer Cases
ORHAN, OMER EMRE; Tarı, İlker (2008-11-01)
In this study, cooling of small form factor computers is numerically investigated. The problem is a conjugate heat transfer problem in which ambient air is the final heat transfer medium. In modeling the problem, heat transfer using heat pipes running from the CPU to the heat exchanger in the back end of the chassis, forced convection inside the chassis, ventilation of the chassis air, conduction paths inside the chassis, and natural convection from the chassis walls to the ambient air are considered. The n...
Citation Formats
M. M. Göçmen, “Parallel implementation of the finite element method on graphics processors for the solution of incompressible flows,” M.S. - Master of Science, Middle East Technical University, 2014.