Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.

Are mm_load/store required here?

stauff4
Beginner
563 Views

I'm struggling to understand if I can iterate through a __m128i memory segment (WITHOUT ISSUES) or if load/store intrinsics are required? Emphasis on 'work without issues' because my code operates correctly in most cases but I begin to see odd behavior when my system is starved for resources. I don't see runtime exceptions/errors but I am not checking any intrinsic  return values/status/etc. Please ignore syntax errors as the code here is just for the  sake of discussion/questions.

 

First, a task allocates multiple __m128i memory segments and I save the returned values:

for (int i = 0; i < SOME_N; i++)

{
   __m128i *pFrame = (__m128i *)_mm_malloc(sizeof(__m128i) * SOME_LENGTH, sizeof(__m128i));

   someList[i] = pFrame;

}

 

Some other task will extract pointers from that list and copy (8-bit non-intrinsic-type) data into that memory:

for (int i = 0; i < SOME_LENGTH; i++)

{
 // Will this  work?
   pFrame[i] = _mm_insert_epi8(pFrame[i], pData[i], 0);

// Or do I need to do something like this?
   __m128i p128i =_mm_load_128(pFrame[i]);
  _mm_store_128(p128i, _mm_insert_epi8(p128i, pData[i], 0));

}


Similarly I will need to pull the data out at the end:
for (int i = 0; i <  SOME_LENGTH; i++)
{
// Does this work?

pData[i] = _mm_extract_epi8(pFrame[i], 0);

// Or do I need to do something like this?
pData[i] = _mm_extract_epi8(_mm_load_128(pFrame[i]),0);

}

Labels (1)
0 Kudos
4 Replies
MadhuK_Intel
Moderator
547 Views

Hi,

 

Thank you for posting in intel communities.

 

>>” I begin to see odd behavior when my system is starved for resources

 

Could you please elaborate more on the difficulty you are facing? Could you please provide the complete sample reproducer code and steps to reproduce your issue at our end?

And also, please share the platform details, operating system, Intel compiler, and oneAPI toolkit version you are using.

 

Please refer to the provided URL for more information about Intel intrinsics.

 

URL: intel.com/content/www/us/en/docs/intrinsics-guide/index.html

 

Best regards,

Madhu 

 

stauff4
Beginner
525 Views

Unfortunately, I have a proprietary decoder/system that I cannot share. This question is more about proper use of the API than it is a question of 'what is wrong with my system?'. I've found that my code 'works' under optimal conditions without using mm_load/mm_store but am trying to understand the undefined behavior in non-optimal conditions. I can re-write my code to use load/store but that would take several days of work/testing. My hope was that someone could tell me 'the epi insert/extract methods should be sufficient' (in which case I would look elsewhere for the problem) --OR-- 'yes intended use of the API is for you to utilize the load/store methods to access the __m128i memory blocks' before I spend several days on what might be a dead end

 

Is that enough information to answer yes or no if the mm_load/mm_store operations should be used in the provided example code?

Viet_H_Intel
Moderator
482 Views

To my knowledge, you don't need to have mm_load/mm_store.


Thanks,

Viet


Viet_H_Intel
Moderator
282 Views

Hi,


Let's close this thread. If you have any other questions/concerns, please create a new one.


Thanks,


Reply