- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to square integer numbers with more than 10 million decimal digits. What is the fastest method to do it in C++/assembly language using the latest Intel processor instructions (e.g. AVX2, BMI2, MMX, SSE4.2 - Kaby Lake I7 processor)
For example Intel mentioned (by using Carry-Less Multiplication Instruction PCLMULQDQ):
Quick Squaring (256-bit)2 = 512-bit
// c3c2c1c0 = a1a0 * a1a0 { c0 = _mm_clmulepi64_si128(a0, a0, 0x00); c1 = _mm_clmulepi64_si128(a0, a0, 0x11); c2 = _mm_clmulepi64_si128(a1, a1, 0x00); c3 = _mm_clmulepi64_si128(a1, a1, 0x11); }
Any suggestion for fast algorithm ideally with source code in C++/x64 assembly ?
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
richter, dan, I would use the AVX2 instructions since it's more performance efficient.
Thanks.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page