optimization - SSE2 intrinsics: access memory directly -
Many SSE directives allow source operands to have 16-byte coalition memory addresses. For example, various (UN) packed instructions PUNCKLBW are the following signatures:
PUNPCKLBW xmm1, xmm2 / m128
Now with all possible intrinsics May not seem to be. It is essential to use _mm_load * intrinsics to read anything in memory, it is natural for PUNPCKLBW:
__M128i _mm_unpacklo_epi8 (a __m128i, __m128i b);
(As far as I know, __m128i type always refers to an XMM register.)
Now, why is this? It is nostalgic because I see the potential of some optimization by addressing the memory directly ...
The built-in relatively Direct references are referred to for actual instructions, but compilers are not obliged to issue related instructions. Optimizing the load (even if written internally) after an operation in the memory form of operation is a general optimization by all respectable compilers, when it is beneficial to do so.
TLDR: Write loads and internal operations, and let's customize the compiler.
Edit: Trivial Example:
#include & lt; Emmintrin.h & gt; __m128i foo (__m 128i * ADR) {__m128i a = _mm_load_si128 (endor); __m128i b = _mm_load_si128 (edit + 1); Return _mm_unpacklo_epi8 (A, B);
GCC-oas-phomate-frame-pointer
returns:
compilation with _foo: movdqa (% rdi) ,% Xmm0 punpcklbw 16 (% rdi),% xmm0 retq
look? The adapter will sort it.
Comments
Post a Comment