The main problem with this is that you cannot expect that the relationship you are seeing stay the same on another computer because if I remember correctly OS assignment varies depending on the BIOS APIC table and OS scheduler logic.
Perhaps the best way would be to use Initial APIC ID because successive numbers seem to be representing adjacent cores for each physical package. If I understand your numbers correctly you would want to assign threads to the cores with APIC ID 0, 2, 4 and 6.
Number of cores sharing each cache can be found out by enumerating deterministic cache parameter leaf (CPUID instruction with EAX=4). More information about cache sharing among cores and thread you can find on page 33 section 18.104.22.168 of AP-485 Intel Processor Identification and the CPUID Instruction document order #241618.
I did some measurements on this based on the following assumption:
[ (0, 2), (3, 4) ][ (1, 5), (6, 7) ]
Where the brackets  represent physical processors, the parentheses represent the two cache elements within each physical processor, and the numbers represent the logical processors the OS understands, such that 1 << N is the processor affinity mask for a given logical processor index N.
For a highly cache-dependent operation, performed a few thousand times, concurrently on all possiblepairsof logical processors, ie. (0, 2), (0,3), (0,4), ..., (1, 2), (1, 3), (1,4), ..., (2, 3), (2,4), ...., (6,7).
I get a a performance measurement of ~2 ms per for all combinations residing on separate caches. Being on a different physical processor does not provide any additional benefit. I get a measurement of ~4.5ms for each of four pairs sharing a cache element.
Mysteriously (?), I get ~3.5ms for each pair which includes LP 0. The pair which shares cache with core 0 actually returns an elevated # of 4.8. I presume this is because some portion of the OS resides permanently on the first core.
So, on my current processors (E5440) the first two APICs within a physical package use the first cache element in that physical processor. The question, however, still stands as to whether this is behavior I can depend on or not.
For your information, we're working on an update of the white paper on processor topology enumeration and the associated reference code. I expect them to be ready in the June time frame.
The update is expected to include enhancement in several areas:
1. System topology enumeration using x2APIC ID where available. Enumeration using initial APIC ID will also be supported when x2APIC ID is not available.
2. Reference code for cache topology enumeration will also be included along with CPU topology.
The cache topology enumeration is based on those published in the Intel 64 Architecture Software Optimization Manual.
Apparently there is a CacheIndex encoded into the APIC_ID. There is psuedo-code showing how to extract this in:
Refer to section 7.10.3 of theIntel@ 64 and IA-32 Software Developers Manual,
Volume 3A: System Programming Guide.
The logic I ended up using looks like this:int nL2CacheIDMaskWidth = find_maskwidth(nLogicalProcessorsPerL2Cache_supported); char nL2CacheIDMask = (char) (0xFF << nL2CacheIDMaskWidth); int nL2CacheIndex = ((nAPIC_ID & nL2CacheIDMask) >> nL2CacheIDMaskWidth);