From what I understand, when different Linux processes use the same shared libraries, the text/machine code is stored as copy-on-write, such that the same object code across different processes will consume the same physical memory locations only once, even though their virtual memory addresses differ across processes. Will they end up sharing I-cache resources for the shared library code? How about instruction memory in L2/L3, branch-prediction, and ITLB resources? I doubt ITLB but I just wanted to see if anyone can verify these.
From what I understand, L1-2 cache is virtually indexed, and L3 is physically indexed, but I'm not sure how this plays out in terms of cache line sharing across shared library code on different processes.
I'm trying to understand the performance tradeoffs when moving from a single process/multi-threaded to multi-process architecture for an application that spends much of its time in I-cache miss stalls.