Power-delay optimized VLSI threshold detection circuits and their use in parallel integer multiplication

Ercan, Furkan
Threshold detection is a fundamental logic function that has broad use in arithmetic processors, and other digital applications. Thus, any improvement in threshold detection in terms of power and/or delay contributes significantly to the field of digital circuit design. A recently reported parallel integer multiplier architecture, ABACUS, uses column compression networks to compress partial products through the final addition network. Architecture of column compression network of ABACUS is suitable for threshold logic use. Architecture of ABACUS enables a higher order of optimization with the use of threshold logic than conventional multipliers. In this study, we investigate the opportunity to optimize power-delay performance of ABACUS through threshold detection circuits in order to close a previously identified gap with Wallace Tree Multiplier (WTM). We study digital (CMOS-based) threshold detection solutions that are Compound-CMOS, transmission-gate, gate-level based, as well as analog solutions which are voltage-, current- and charge-based designs. Solutions are compared first for power-delay performance as the number of inputs and threshold values change. Threshold logic that provides the best power-delay product is then used to implement the ABACUS multiplier, which is compared with the conventional Wallace Tree Multiplier in terms of power-delay performance. At the end of this work, (i) accurate trend-lines were extracted to project best threshold logic circuit implementations as the number of inputs grows, (ii) solutions were used in a novel multiplication circuit (ABACUS) in order to measure performance in application, (iii) pre-layout, post-layout simulations and on-chip measurements for ABACUS architecture with threshold logic was presented to demonstrate that the performance gap with basic WTM implementation has indeed been closed through threshold logic. Further optimizations for better yield on power-delay product were also addressed.


Power-Delay Analysis of an ABACUS Parallel Integer Multiplier VLSI Implementation
Ercan, Furkan; Muhtaroglu, Ali (2015-03-26)
ABACUS parallel architecture was previously proposed as an alternate integer multiplication approach with column compression and parallel carry futures. This paper presents a VLSI implementation for ABACUS and benchmarks it against the conventional Wallace Tree Multiplier (WTM). Simulations are conducted with UMC180nm technology in Cadence environment. Although WTM implementation results in 26.6% fewer devices, ABACUS implementation has 8.6% less power dissipation with matched delay performance, due to 27.8...
Investigation of a method to identify energy efficient organization of compression unit in parallel tree multipliers
Rashid, Muhammad Saleh; Muhtaroğlu, Ali; Sustainable Environment and Energy Systems (2018-1)
In this study, high-performance integer multipliers extensively used in digital signal processing are investigated in the context of energy-aware system organization. Among different functional blocks in an integer multiplier, the compression unit is primarily targeted for optimization as the largest section of the multiplier with significant energy consumption. Five different realizations of the most popular Wallace tree are investigated in this work, which consists of Dual Pass Logic (DPL) circuit impleme...
Data-parallel programming on Helios, Parallel environment and PVM
Sener, C; Paker, Y; Kiper, A (1996-09-27)
Parallel computing, increasingly used for computationally intensive problems, requires considerable expertise and time, limiting then widespread use. This article presents a data-parallel programming tool to simplify the task of developing parallel programs based on data-parallel type. It has been originally developed for the Hellos operating system running on a network of Transputers, and then ported to the IBM SP/2 system executing two parallel programming environments. With its interface to the C languag...
Implementation and performance of parallellised turbo decoders
Yılmaz, Ali Özgür; Yılmaz, Ayşen (2011-01-04)
In this study, the authors discuss the implementation of a low latency decoding algorithm for turbo codes and repeat accumulate codes and compare the implementation results in terms of maximum available clock speed, resource consumption, error correction performance and the data (information bit) rate. In order to decrease the latency a parallellised decoder structure is introduced for these mentioned codes and the results are obtained by implementing the decoders on a field programmable gate array. The mem...
Efe, Giray; Cenk, Murat; Department of Cryptography (2022-3-07)
Polynomial multiplication on the quotient ring Z[x]/<x^n+-1> is one of the most fundamental, general-purpose operations frequently used in cryptographic algorithms. Therefore, a possible improvement over a multiplication algorithm directly affects the performance of algorithms used in a cryptographic application. Well-known multiplication algorithms such as Schoolbook, Karatsuba, and Toom-Cook are dominant choices against NTT in small and ordinary input sizes. On the other hand, how these approaches are imp...
Citation Formats
F. Ercan, “Power-delay optimized VLSI threshold detection circuits and their use in parallel integer multiplication,” M.S. - Master of Science, Middle East Technical University, 2015.