Performance Models for the Spike Banded Linear System Solver

2011
Manguoğlu, Murat
Sameh, Ahmed
Grama, Ananth
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners, compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.
Scientific Programming

Suggestions

Power-Delay Analysis of an ABACUS Parallel Integer Multiplier VLSI Implementation
Ercan, Furkan; Muhtaroglu, Ali (2015-03-26)
ABACUS parallel architecture was previously proposed as an alternate integer multiplication approach with column compression and parallel carry futures. This paper presents a VLSI implementation for ABACUS and benchmarks it against the conventional Wallace Tree Multiplier (WTM). Simulations are conducted with UMC180nm technology in Cadence environment. Although WTM implementation results in 26.6% fewer devices, ABACUS implementation has 8.6% less power dissipation with matched delay performance, due to 27.8...
Data-parallel programming on Helios, Parallel environment and PVM
Sener, C; Paker, Y; Kiper, A (1996-09-27)
Parallel computing, increasingly used for computationally intensive problems, requires considerable expertise and time, limiting then widespread use. This article presents a data-parallel programming tool to simplify the task of developing parallel programs based on data-parallel type. It has been originally developed for the Hellos operating system running on a network of Transputers, and then ported to the IBM SP/2 system executing two parallel programming environments. With its interface to the C languag...
Sound Perception in Virtual Environments
Doğan, Aslı Zeynep; Sorguç, Arzu; Department of Building Science in Architecture (2022-12-20)
Virtual environments have been developing for a long time and are changing the understanding of a space by means of how we design, perceive and use it. This new understanding of space requires people to adapt by gaining a new type of spatial cognition that can help people to combine the possibilities of a virtual space with the physical space they are used to: by making use of their different sensory skills. The aim of this study is to contribute to the literature on the improvement of the auditory percepti...
WEIGHTED MATRIX ORDERING AND PARALLEL BANDED PRECONDITIONERS FOR ITERATIVE LINEAR SYSTEM SOLVERS
Manguoğlu, Murat; Sameh, Ahmed H.; Grama, Ananth (Society for Industrial & Applied Mathematics (SIAM), 2010-01-01)
The emergence of multicore architectures and highly scalable platforms motivates the development of novel algorithms and techniques that emphasize concurrency and are tolerant of deep memory hierarchies, as opposed to minimizing raw FLOP counts. While direct solvers are reliable, they are often slow and memory-intensive for large problems. Iterative solvers, on the other hand, are more efficient but, in the absence of robust preconditioners, lack reliability. While preconditioners based on incomplete factor...
Boostıng performance of hls optımızatıon for soc based hardware accelerators.
Kocaay, Aziz Berkin; Bazlamaçcı, Cüneyt F..; Department of Electrical and Electronics Engineering (2020)
Modern large-scale computing algorithms require huge amount of computational power. In adapting to increasing computation demands, FPGA-based SoC platforms provide an alternative to traditional CPU or GPU units, which suffer from thermal problems, power issues, etc. However, design flow for FPGA based development may be hard and time-consuming for an average software engineer who has limited knowledge about hardware design. A new approach in FPGA-based system development without the need for a hardware engi...
Citation Formats
M. Manguoğlu, A. Sameh, and A. Grama, “Performance Models for the Spike Banded Linear System Solver,” Scientific Programming, pp. 13–25, 2011, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/50994.