Optimization of Advanced Encryption Standard on Graphics Processing Units

Graphics processing units (GPUs) are specially designed for parallel applications and perform parallel operations much faster than central processing units (CPUs). In this work, we focus on the performance of the Advanced Encryption Standard (AES) on GPUs. We present optimizations which remove bank conflicts in shared memory accesses and provide 878.6 Gbps throughput for AES-128 encryption on an RTX 2070 Super, which is equivalent to 4.1 Gbps per Watt. Our optimizations provide more than 2.56x speed-up against the best GPU results in the literature. Our optimized AES implementations on GPUs even outperform any CPU using the hardware level AES New Instructions (AES-NI) and legacy FPGA-based cluster architectures like COPACOBANA and RIVYERA. Even on a low-end GPU like MX 250, we obtained 60.0 Gbps throughput for AES-256 which is generally faster than the read/write speeds of solid disks. Thus, transition from AES-128 to AES-256 when using GPUs would provide military grade security with no visible performance loss. With these breakthrough performances, GPUs can be used as a cryptographic co-processor for file or full disk encryption to remove performance loss coming from CPU encryption. With a single GPU as a co-processor, busy SSL servers can be free from the burden of encryption and use their whole CPU power for other operations. Moreover, these optimizations can help GPUs to practically verify theoretically obtained cryptanalysis results or their reduced versions in reasonable time.


Parallel implementation of the finite element method on graphics processors for the solution of incompressible flows
Göçmen, Mahmut Murat; Sert, Cüneyt; Department of Mechanical Engineering (2014)
In recent years clock speeds and memory bandwidths of Graphics Processing Units (GPUs) increased dramatically compared to CPUs. Also GPU vendors developed and freely released new programming tools to make scientific computing on GPUs easier. With these recent developments the use of GPUs for general purpose computing becomes a popular research field. Researchers previously demonstrated that use of GPUs may provide tens of times of speeds-ups compared to CPU solvers for CFD methods such as Smoothed Particle ...
Accelerated regular grid traversals using extended anisotropic chessboard distance fields on a parallel stream processor
Es, Alphan; İşler, Veysi (Elsevier BV, 2007-11-01)
Modern graphics processing units (GPUs) are an implementation of parallel stream processors. In recent years, there have been a few studies on mapping ray tracing to the GPU. Since graphics processors are not designed to process complex data structures, it is crucial to explore data structures and algorithms for efficient stream processing. In particular ray traversal is one of the major bottlenecks in ray tracing and direct volume rendering methods. In this work we focus on the efficient regular grid based...
Open problems in CEM: Porting an explicit time-domain volume-integral- equation solver on GPUs with OpenACC
Ergül, Özgür Salih; Al-Jarro, Ahmed; Clo, Alain; Bagci, Hakan (Institute of Electrical and Electronics Engineers (IEEE), 2014-01-01)
Graphics processing units (GPUs) are gradually becoming mainstream in high-performance computing, as their capabilities for enhancing performance of a large spectrum of scientific applications to many fold when compared to multi-core CPUs have been clearly identified and proven. In this paper, implementation and performance-tuning details for porting an explicit marching-on-in-time (MOT)-based time-domain volume-integral-equation (TDVIE) solver onto GPUs are described in detail. To this end, a high-level ap...
GPU based real time stereoscopic ray tracing
Es, Alphan; İşler, Veysi (2007-11-09)
Over the last couple of years graphics processing units (GPU) found in graphics cards evolved into general purpose parallel stream processors. This evolution allows for using GPUs not only for traditional rasterization based rendering but also for global illumination techniques including ray tracing. Fast generation of stereo images is very important for virtual reality applications. Rendering stereo image pairs for left and right eye separately doubles the frame time. This might be a problem for interactiv...
Acceleration of direct volume rendering with programmable graphics hardware
Yalim Keles, Hacer; Es, Alphan; İşler, Veysi (Springer Science and Business Media LLC, 2007-01-01)
We propose a method to accelerate direct volume rendering using programmable graphics hardware (GPU). In the method, texture slices are grouped together to form a texture slab. Rendering non-empty slabs from front to back viewing order generates the resultant image. Considering each pixel of the image as a ray, slab silhouette maps (SSMs) are used to skip empty spaces along the ray direction per pixel basis. Additionally, SSMs contain terminated ray information. The method relies on hardware z-occlusion cul...
Citation Formats
C. Tezcan, “Optimization of Advanced Encryption Standard on Graphics Processing Units,” IEEE ACCESS, pp. 67315–67326, 2021, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/90722.