Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit

Dülger, Özcan
Oğuztüzün, Mehmet Halit S.
Owing to many cores in its architecture, graphics processing unit (GPU) offers promise for parallel execution of the particle filter. A stage of the particle filter that is particularly challenging to parallelize is resampling. There are parallel resampling algorithms in the literature such as Metropolis resampling, which does not require a collective operation such as cumulative sum over weights and does not suffer from numerical instability. However, with large number of particles, Metropolis resampling becomes slow. This is because of the non-coalesced access problem on the global memory of the GPU. In this article, we offer solutions for this problem of Metropolis resampling. We introduce two implementation techniques, named Metropolis-C1 and Metropolis-C2, and compare them with the original Metropolis resampling on NVIDIA Tesla K40 board. In the first scenario where these two techniques achieve their fastest execution times, Metropolis-C1 is faster than the others, but yields the worst results in quality. However, Metropolis-C2 is closer to Metropolis resampling in quality. In the second scenario where all three algorithms yield similar quality, although Metropolis-C1 and Metropolis-C2 get slower, they are still faster than the original Metropolis resampling.


Accelerated regular grid traversals using extended anisotropic chessboard distance fields on a parallel stream processor
Es, Alphan; İşler, Veysi (Elsevier BV, 2007-11-01)
Modern graphics processing units (GPUs) are an implementation of parallel stream processors. In recent years, there have been a few studies on mapping ray tracing to the GPU. Since graphics processors are not designed to process complex data structures, it is crucial to explore data structures and algorithms for efficient stream processing. In particular ray traversal is one of the major bottlenecks in ray tracing and direct volume rendering methods. In this work we focus on the efficient regular grid based...
Data parallelism for ray casting large scenes on a cpu-gpu cluster
Topcu, Tümer; İşler, Veysi; Department of Computer Engineering (2008)
In the last decade, computational power, memory bandwidth and programmability capabilities of graphics processing units (GPU) have rapidly evolved. Therefore, many researches have been performed to use GPUs in advanced graphics rendering. Because of its high degree of parallelism, ray tracing has been one of the rst algorithms studied on GPUs. However, the rendering of large scenes with ray tracing can easily exceed the GPU's memory capacity. The algorithm proposed in this work uses a data parallel approac...
Massive crowd simulation with parallel processing
Yılmaz, Erdal; İşler, Veysi; Department of Information Systems (2010)
This thesis analyzes how parallel processing with Graphics Processing Unit (GPU) could be used for massive crowd simulation, not only in terms of rendering but also the computational power that is required for realistic simulation. The extreme population in massive crowd simulation introduces an extra computational load, which is quite difficult to meet by using Central Processing Unit (CPU) resources only. The thesis shows the specific methods and approaches that maximize the throughput of GPU parallel com...
Parallel resampling methods for particle filters on graphics processing unit
Dülger, Özcan; Oğuztüzün, Mehmet Halit S.; Department of Computer Engineering (2017)
This thesis addresses the implementation of the resampling stage of the particle filter on graphics processing unit (GPU). Some of the well-known sequential resampling methods are the Multinomial, Stratified and Systematic resampling. They have dependency in their loop structure which impedes their parallel implementation. Although such impediments were overcome on their GPU implementation, these algorithms suffer from numerical instability due to the accumulation of rounding errors when single precision is...
Acceleration of direct volume rendering with programmable graphics hardware
Yalim Keles, Hacer; Es, Alphan; İşler, Veysi (Springer Science and Business Media LLC, 2007-01-01)
We propose a method to accelerate direct volume rendering using programmable graphics hardware (GPU). In the method, texture slices are grouped together to form a texture slab. Rendering non-empty slabs from front to back viewing order generates the resultant image. Considering each pixel of the image as a ray, slab silhouette maps (SSMs) are used to skip empty spaces along the ray direction per pixel basis. Additionally, SSMs contain terminated ray information. The method relies on hardware z-occlusion cul...
Citation Formats
Ö. Dülger, M. H. S. Oğuztüzün, and M. DEMİREKLER, “Memory Coalescing Implementation of Metropolis Resampling on Graphics Processing Unit,” JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, pp. 433–447, 2018, Accessed: 00, 2020. [Online]. Available: