I am pretty sure I have a configuration error, but I can't seem to figure it out. Hopefully someone here can help. I am running the latest of the Cluster Studio XE. I have c++/OpenMP code that I would like to trace out to find the problem areas. It is also worth noting that at the moment I can compile successfully and run other Intel tools (or the few I have tried) from any of the nodes as well as the frontend of the cluster.
On the frontend, I can open the gui, load the project, run, and view the results. If I copy the command line output from the gui, I get:
amplxe-cl -v -collect hotspots -- /home/chris/code/intel/run
Running that on the frontend from the cmd line, gives expected output. I can then open that output in the GUI and view the details.
However, the frontend is a weakling in comparison to the compute nodes. So I tried to run that command on the compute nodes:
$ export INTEL_LICENSE_FILE=28518@frontend01
$ source /software/intel/composer_xe_2013/bin/compilervars.sh intel64
$ source /software/intel/vtune_amplifier_xe/amplxe-vars.sh
$ amplxe-cl -v -collect hotspots -- /home/chris/code/intel/run
Using result path `/home/chris/tmp/r007hs'
Executing actions 8 % Clearing the database
The database has been cleared, elapsed time is 0.623 seconds.
Executing actions 16 % Loading raw data to the database
Raw data has been loaded to the database, elapsed time is 0.009 seconds.
Executing actions 50 % Loading raw data to the database
Finalizing the result took 0.653 seconds.
Executing actions 50 % done
Error: Error 0x4000001e (Cannot load raw collector data)
Err.....OK. I went looking for that error and I found a few similar errors, but none that I saw apply to me (that I can tell).
I decided to try again on the login nodes (which have Gnome installed; the nodes are just cmd). It ran just fine in the GUI, but failed on the command line with the same error. I figured that it must be something in the environment variables, but checking `env` on all of the nodes yields similar results. So I decided to install all the same pacakges on the login and compute nodes (cluster is still in early testing so I can do that without angry users:P). Now the GUI fails to run the job on the login node in additon to the cmd version (I haven't figured out how to get better error output yet, but what it does say is just as minimal as the cmd output). On the compute node, I can now open the GUI version, but it too fails like the login node does now. So at the moment, the only way to run the VTune application with any success is on the frontend. This is obviously a problem for me and the cluster...
My end goal here, is to eventually allow the users to run the GUI on the login nodes and submit jobs through the Torque resource manager. However, before I get to that step I need to be able to run the job successfully from the cmd line on the node. I don't know what failed or where as that error is probably the most informative output I have found at the moment.
Can anyone please give me some pointers as to what configuration I may have messed up here?
Is it possible that you have no permission to store result directory? You may try to add such as "-r /tmp/r001hs" to generate result.
If the problem still persists on, go https://premier.intel.com to submit this issue with result directory.
I have run into installation issues with all recent versions of VTune. As you didn't give details of your platform, it may not apply, but in my case it was necessary to remove the Amplifier installation and delete all driver modules before re-installing the current one. It's possible to have the wrong driver in use if any older ones are present on your system. This problem has hit enough people that an update 4 may be issued to ease this installation issue.
@Peter: I did check the permissions and they are fine. The tmp directory is actually in my home directory so I have full access. I also tried running in the same folder where the GUI outputs is successful runs and recieved the same error.
@TimP: I am running this on a 6.3 Scientific Linux cluster. This was a fresh install so no previous version existed.
Is there a better way to get more information then what the '-v' provides? If I could put this into a proper debug mode, maybe then I could figure out what is going on.
Thanks for the replies!