optimization - SSE2 intrinsics: access memory directly -


Many SSE directives allow source operands to have 16-byte coalition memory addresses. For example, various (UN) packed instructions PUNCKLBW are the following signatures:

PUNPCKLBW xmm1, xmm2 / m128

Now with all possible intrinsics May not seem to be. It is essential to use _mm_load * intrinsics to read anything in memory, it is natural for PUNPCKLBW:

__M128i _mm_unpacklo_epi8 (a __m128i, __m128i b);

(As far as I know, __m128i type always refers to an XMM register.)

Now, why is this? It is nostalgic because I see the potential of some optimization by addressing the memory directly ...

The built-in relatively Direct references are referred to for actual instructions, but compilers are not obliged to issue related instructions. Optimizing the load (even if written internally) after an operation in the memory form of operation is a general optimization by all respectable compilers, when it is beneficial to do so.

TLDR: Write loads and internal operations, and let's customize the compiler.

Edit: Trivial Example:

  #include & lt; Emmintrin.h & gt; __m128i foo (__m 128i * ADR) {__m128i a = _mm_load_si128 (endor); __m128i b = _mm_load_si128 (edit + 1); Return _mm_unpacklo_epi8 (A, B);  

GCC-oas-phomate-frame-pointer returns:

  compilation with _foo: movdqa (% rdi) ,% Xmm0 punpcklbw 16 (% rdi),% xmm0 retq  

look? The adapter will sort it.


Comments

Popular posts from this blog

paypal - How to know the URL referrer in PHP? -

oauth - Facebook OAuth2 Logout does not remove fb_ cookie -

wpf - Line breaks and indenting for the XAML of a saved FlowDocument? -