- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I was trying to understand the difference between `_mm256_loadu_epi16` and `_mm256_loadu_si256`.
According to the intrinsics manual, they both return the same type and get an unbounded pointer, but result in different instructions `vmovdqu` VS `vmovdqu16`.
As the `_mm256_loadu_epi16` requires more flags, AVX512BW + AVX512VL has slightly worse latency compared to `_mm256_loadu_si256` that requires only AVX, I could not understand what the benefit of the explicit `epi16` variant.
In addition, there is also `_mm256_lddqu_si256` that should be equivalent to `_mm256_loadu_si256` but "may perform better than _mm256_loadu_si256 when the data crosses a cache line boundary".
Any advice or explanation?
I appreciate any help you can provide.
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Yes, I agree with you, I don't see any benefit from the comparison tables either.

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite