Here's an example:
std::clock_t start = std::clock(); // do some work std::clock_t finish = std::clock(); std::double_t time_span = finish - start; std::cout << "Execution time: " << time_span << "\n";
At least you may use a partially ordered RDTSCP instruction.
Be aware, that assembly implementation may run in the "shadow" of RDTSCP instruction (i.e. execute faster (in terms of pipeline stage of execution, than RDTSCP instruction)