- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following code optimizes well for `custom_swap` and `restrict_std_swap`, but has an additional `mov` instruction for `std_swap`:
void custom_swap(int * lhs, int * rhs) {
int temp = *lhs;
*lhs = *rhs;
*rhs = temp;
}
void restrict_std_swap(int * __restrict lhs, int * __restrict rhs) {
int temp = *lhs;
*lhs = 0;
*lhs = *rhs;
*rhs = temp;
}
void std_swap(int * lhs, int * rhs) {
int temp = *lhs;
*lhs = 0;
*lhs = *rhs;
*rhs = temp;
}
Compiles into this assembly with -O1, -O2, -Os, or -O3:
custom_swap(int*, int*):
mov eax, DWORD PTR [rsi] #3.13
mov edx, DWORD PTR [rdi] #2.17
mov DWORD PTR [rdi], eax #3.6
mov DWORD PTR [rsi], edx #4.6
ret #5.1
restrict_std_swap(int*, int*):
mov edx, DWORD PTR [rdi] #8.17
mov eax, DWORD PTR [rsi] #10.13
mov DWORD PTR [rdi], eax #10.6
mov DWORD PTR [rsi], edx #11.6
ret #12.1
std_swap(int*, int*):
mov edx, DWORD PTR [rdi] #15.17
mov DWORD PTR [rdi], 0 #16.6
mov eax, DWORD PTR [rsi] #17.13
mov DWORD PTR [rdi], eax #17.6
mov DWORD PTR [rsi], edx #18.6
ret #19.1
As we see from the example that annotates the parameters with __restrict, the problem appears to be that the risk of *lhs aliasing *rhs disables the optimizer's ability to remove the dead store in the second line of std_swap. It is able to see that if they don't alias, the store in line 2 is dead. It is not able to see that if they do alias, the store in line 3 is dead and the store in line 2 is dead.
See it live: https://godbolt.org/z/8nCTnL
The real life problem here is that types that manage a resource but do not implement a custom std::swap, as well as all types that recursively contain a type that manages a resource, suffer from reduced performance for using std::swap. The larger, slightly more meaningful test case showing how I arrived at this reduction and its relationship to std::swap:
struct unique_ptr {
unique_ptr():
ptr(nullptr)
{
}
unique_ptr(unique_ptr && other) noexcept:
ptr(other.ptr)
{
other.ptr = nullptr;
}
unique_ptr & operator=(unique_ptr && other) noexcept {
delete ptr;
ptr = nullptr;
ptr = other.ptr;
other.ptr = nullptr;
return *this;
}
~unique_ptr() noexcept {
delete ptr;
}
int * ptr;
};
void custom_swap(unique_ptr & lhs, unique_ptr & rhs) noexcept {
int * temp = lhs.ptr;
lhs.ptr = rhs.ptr;
rhs.ptr = temp;
}
void inlined_std_swap(unique_ptr & lhs, unique_ptr & rhs) noexcept {
int * temp_ptr = lhs.ptr;
lhs.ptr = nullptr;
delete lhs.ptr;
lhs.ptr = nullptr;
lhs.ptr = rhs.ptr;
rhs.ptr = nullptr;
delete rhs.ptr;
rhs.ptr = nullptr;
rhs.ptr = temp_ptr;
temp_ptr = nullptr;
delete temp_ptr;
}
void std_swap(unique_ptr & lhs, unique_ptr & rhs) noexcept {
auto temp = static_cast(lhs);
lhs = static_cast(rhs);
rhs = static_cast(temp);
}
See it live: https://godbolt.org/z/tWjmzo
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page