I'm trying to do microarchitectural exploration on an sql server that is being driven by a test bench. The idea is to get the statistics for the server under load. However, after the driver hits certain portions of the test, it crashes. It looks like the server handling thread is crashing as well, but it is caught and masked by the main server process.
Ideally, I would like to be able to not only do this, but attach VTune to the process mid test in order to skip the warmup phases.
As well, I've been able to crash external programs (e.g. Firefox) and my deadlock my entire system doing this, so obviously I would like to avoid that in the future.
My setup is as follows:
CPU: Intel i7 2600K
OS: Ubuntu 15.10 64bit
server: Postgres 10devel (compiled from source at https://github.com/postgres/postgres)
driver: oltpbench tpcc (compiled from source at https://github.com/oltpbenchmark/oltpbench)
VTune info: Amplifier XE 2016 Update 4 (build 470476)
In order to replicate, I run in separate terminals (PG_DATA is a data directory for postgres):
./amplxe-cl -collect general-exploration -strategy=:trace:trace -target-duration-type=veryshort --duration 20 -- postgres -D $PG_DATA ./oltpbenchmark --create true --bench tpcc --execute true --config ./config/tpcc_config_postgres.xml
Setting up oltpbenchmark requires some specific steps, which I can give if you want to replicate this.
The errors from the driver vary depending on which benchmark is used. As it uses a Java interface to postgres, the exceptions are usually RuntimeException or NullPointerException. Often this is caused by the server closing the connection.
The postgres server seems to be masking dead threads, so none of this is seen by VTune and it exits normally (provided the entire system doesn't crash).
From the postgres logs:
2017-02-14 13:25:41.236 PST  LOG: execute <unnamed>: SELECT su_suppkey, su_name, su_address, su_phone, total_revenue FROM supplier, revenue0 WHERE su_suppkey = supplier_no AND total_revenue = (select max(total_revenue) from revenue0) ORDER BY su_suppkey
2017-02-14 13:25:41.610 PST  LOG: server process (PID 27318) was terminated by signal 11: Segmentation fault
2017-02-14 13:25:41.610 PST  DETAIL: Failed process was running: SELECT c_last, c_id, o_id, o_entry_d, o_ol_cnt, sum(ol_amount) AS amount_sum FROM customer, oorder, order_line WHERE c_id = o_c_id AND c_w_id = o_w_id AND c_d_id = o_d_id AND ol_w_id = o_w_id AND ol_d_id = o_d_id AND ol_o_id = o_id GROUP BY o_id, o_w_id, o_d_id, c_id, c_last, o_entry_d, o_ol_cnt HAVING sum(ol_amount) > 200 ORDER BY amount_sum DESC, o_entry_d
2017-02-14 13:25:41.610 PST  LOG: terminating any other active server processes
2017-02-14 13:25:41.610 PST  WARNING: terminating connection because of crash of another server process
2017-02-14 13:25:41.610 PST  DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-02-14 13:25:41.610 PST  HINT: In a moment you should be able to reconnect to the database and repeat your command.
This seems to change as well depending on the benchmark I run. Running these commands without running VTune executes flawlessly.
Any help in figuring this out is greatly appreciated. I'm also somewhat surprised that VTune is capable of crashing your system based on the application.