Intel Cilk Plus VSM memory limitations

Christoph_L_ · ‎01-10-2014

Hi,

we're currently implementing some tests for the Xeon Phi before porting a new library to using the MIC. During comparison tests for OpenMP and CilkPlus we noted that apparently Cilk Plus does not allow to use the entire memory available on the cards.

In the test machine are two Xeon Phi cards with roughly 8Gb of memory each, of which about 7.5Gb should be available for use (observable using micsmc). With OpenMP offload or offload_transfer pragmas, it is possible to use the entire memory on both cards. With the CilkPlus _Offload_shared_malloc statements, however, it is only possible to use less than 2Gbs of memory.

Independent of whether the memory is acquired with several smaller _Offload_shared_malloc's or a single one: as soon as more than a total of 2 Gbs should be acquired during an application run, the application crashes stating:

HOST--ERROR:myoiExPLExtendVSM: VSM size exceeds the limitation (4294967296) now!

HOST--ERROR:myoiExMalloc:662 Fail to get a new memory chunk!
HOST--ERROR:myoArenaMalloc: Fail to get free memory space!

The mentioned limitation (4294967296) translates exactly to 4Gb, assuming that number is referring to Bytes.

I have two urgent questions: how is the observed effect with the apparent 2Gb maximum explainable with the mentioned maximum of 4Gb? And, is it possible to change this limitation, e.g. by changing a configuration file, etc.? I did not find any information on the topic.

Best,
Christoph

TaylorIoTKidd · ‎03-17-2014

Christoph,

It looks like you are processing large amounts of data. The design of the explicit offloading model, using "pragma offload", is probably more suited to what you want to do.

The intent of the virtual shared memory model is to increase the expressiveness of the data structures/code you can share. Thus it supports C++ classes and objects. Since the implementation of shared memory is via software, there are practical limits (based upon performance) to the amount of data that can be shared.

You are running up against one of these practical limits.

I recommend you look at using the explicit offload model. Though you cannot directly share classes and objects, you can transfer the data to populate those objects on the coprocessor side.

Also, since the virtual shared memory model is a software implementation, doing fine structured synchronization between coprocessor and host is too slow to be practical.

A good reference is "Effective Use of the Intel Compiler's Offload Features" by Kevin Davis.

Regards

--
Taylor

PAUL_R_Intel30 · ‎05-09-2014

Christoph, Taylor,

The MYO team was able to reproduce and fix a bug where there was an artificially low limit on shared virtual memory allocations. The above messages were part of the symptom of the bug. The fix for this bug will be released in version 3.3 of the MPSS. The current state of MYO allows a little less than half of the GDDR Size on the card for application use. For example, on a MIC card with 16 GB of GDDR, we successfully allocate and read and write 6.5 GB.

Paul R.

MYO Team

Christoph_L_ · ‎05-12-2014

Hello Paul,

it is great to hear that the bug was resolved and now more memory can be used with the virtual memory model. However, since currently the main limitation of the cards is memory and not processing power, it would be great to have this limit even higher!

Best
Christoph