Faster Residue Multiplication Modulo 521-bit Mersenne Prime and an Application to ECC

2018-08-01
Ali, Shoukat
Cenk, Murat
We present faster algorithms for the residue multiplication modulo 521-bit Mersenne prime on 32- and 64-bit platforms by using Toeplitz matrix-vector product. The total arithmetic cost of our proposed algorithms is less than that of existing algorithms, with algorithms for 64- and 32-bit residue multiplication giving the best timing results on our test machine. The transition from 64- to 32-bit implementation is full of challenges because the number of limbs doubles and the limbs' bitlengths are cut in half. Without using any intrinsics or SIMD/assembly instructions in our implementation on an Intel(R) Core i5 - 6402P CPU @ 2.80GHz, we find 136 and 550 cycles for our 64- and 32-bit residue multiplications, respectively. In addition, we implement constant-time variable- and fixed-base scalar multiplication for the standard NIST curve P-521 and Edwards curve E-521.
Citation Formats
S. Ali and M. Cenk, “Faster Residue Multiplication Modulo 521-bit Mersenne Prime and an Application to ECC,” pp. 2477–2490, 2018, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/31401.