Yes, I am going to do padding in a similar way though not exactly the same.
The cache_aligned_allocator has the constant to be 128 for the sake of platforms where cache lines are 128 bytes (e.g. Intel Itanium processor). Again, you are right that ideally it should be determined at runtime for each HW (and I hope it will be reworked this way); but as most of current CPUs (well, at least Intel processors) have either 64 or 128 bytes in a cache line, for simplicity the biggest of the two was chosen.
In the scalable allocator, it wasconsidered less important; but my current vision is that good cache behavior is more important than smaller memory overhead; thus I will rework it soon.The block header will be of 128 bytes; and its fields changed by "foreign" threads will reside in the second half while the fields changed by owning thread only will reside in the first half. I expect it will improve performance on the hot path for systems with 64 bytes per cache line, and on Itanium processorit will at least eliminate the situation where the block header share its cache line with data.