Software Archive
Read-only legacy content
17061 Discussions

uint32_t __m512i union

P__Robert
Beginner
789 Views

Hello,

I am trying to learn the basics of intrinsic optimizations for native Xeon Phi Applications.

The selected target application is libethereum as it implements strict memory hard hashing functions.

https://bitslog.files.wordpress.com/2013/12/memohash-v0-3.pdf

The code-base makes use of nodes composed of 512 bits!

This seems ideal for targeting the native Xeon Phi.

However, I cannot figure out the appropriate syntax for the union typedef.

I have tried numerous combinations, all of which compile and produce a segfault.

The only enabled change is to the structure of the union.

The segfault occurs any time a node is accessed via uint32_t words.

The specifics are outlined in this pull request:

https://github.com/ethereum/libethereum/pull/251/files

Any idea whats causing the segfaults here?

 

typedef union node {
	uint8_t bytes[NODE_WORDS * 4] __attribute__((aligned(64)));
	uint32_t words[NODE_WORDS] __attribute__((aligned(64)));
	uint64_t double_words[NODE_WORDS / 2] __attribute__((aligned(64)));

#if defined(_M_X64) && ENABLE_SSE
	__m128i xmm[NODE_WORDS/4] __attribute__((aligned(64)));
#endif
#if defined(__MIC__)
	__m512i zmm;
#endif

} node;
#define FNV_PRIME 0x01000193

static inline uint32_t fnv_hash(uint32_t const x, uint32_t const y)
{
    return x * FNV_PRIME ^ y;
}



#if defined(_M_X64) && ENABLE_SSE
            {
                __m128i fnv_prime = _mm_set1_epi32(FNV_PRIME);
                __m128i xmm0 = _mm_mullo_epi32(fnv_prime, mix.xmm[0]);
                __m128i xmm1 = _mm_mullo_epi32(fnv_prime, mix.xmm[1]);
                __m128i xmm2 = _mm_mullo_epi32(fnv_prime, mix.xmm[2]);
                __m128i xmm3 = _mm_mullo_epi32(fnv_prime, mix.xmm[3]);
                mix.xmm[0] = _mm_xor_si128(xmm0, dag_node->xmm[0]);
                mix.xmm[1] = _mm_xor_si128(xmm1, dag_node->xmm[1]);
                mix.xmm[2] = _mm_xor_si128(xmm2, dag_node->xmm[2]);
                mix.xmm[3] = _mm_xor_si128(xmm3, dag_node->xmm[3]);
            }
            #elif defined(__MIC__) && 0
            {
                // TODO: __m512i implementation
                //    Each vector register (zmm) can store sixteen 32-bit integer numbers
                // <--- Begin Critical Section --->
                __m512i fnv_prime = _mm512_set1_epi32(FNV_PRIME)
                __m512i zmm0 = _mm512_mullo_epi32(fnv_prime, mix.zmm);
                mix.zmm = _mm512_xor_si512(zmm0, dag_node->zmm);
                // <--- End Critical Section --->
            }
            #else
            {
                for (unsigned w = 0; w != NODE_WORDS; ++w) {
                    mix.words = fnv_hash(mix.words, dag_node->words);
                }
            }
#endif

 

0 Kudos
2 Replies
P__Robert
Beginner
789 Views

debugger screenshot attached.

node_debug.png

0 Kudos
P__Robert
Beginner
789 Views

resolved via _mm_malloc and _mm_free.

 

0 Kudos
Reply