Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

## Measuring the Variance of Prefetching in the L1D Cache

Beginner
390 Views
Hi All,

I have an array of 1024 bytes. I access this array 40 times in a function
at random points as shown in the snippet below.

const unsigned char array[1024] = { an array of uniformly distributed random numbers }
unsigned char pts[40];

my_func(unsigned char *pts){
pts[0] = array[pts[0]];
pts[1] = array[pts[1]];
...
...
pts[39] = array[pts[39]];
}

call_my_func(){
for (i=0; i<40; ++i)
pts = random() & 0xff;

for (i=0; i < (1 << 24); ++i){ // function called 2^24 times
t1 = rdtsc() // timestamp
my_func();
time = rdtsc() - t1 // timing for my_func
}
}

I plot a graph with the value of pts[0] (range from 0 to 255) on the x-axis
and the corresponding average time taken on the y-axis. What I get is a
triangular wave of the form /\\\\/\\\\/\\\\/\\\\/\\\\/\\\\ (as rightly pointed out by Jim Dempsey).
I get 8 such /\\ in the graph.

The guess is that this waveform is due the prefetching in the L1 cache.

My questions are as follows.
1. Is prefetching the only reason for getting this particular wave pattern ?
2. I want to analyze the variance in the timing with respect to the size
of the array. Shouldn't the variance reduce with increase in array size ?

Any help in this regard would be greatly appreciated

Regards

Chet

2 Replies
Honored Contributor III
390 Views
Chet,

Could you post a complete working sample? (you can attach a file to your post)

There are some nuances in your code snip that I would rather not interpret for your design.
Examples:

Your array has dimension of 1024 of unsigned char and is indexed by an unsigned char from pts. Meaning only the first 256 elements of this array will be used. Does this array have an alignment restriction in your code? Does pts array have an alignment?

Your function call my_func() in your timed loop has no args yet the function in your snip has one arg (unsigned char* pts). I suspect your real test code has something quite different. Perhapse:

Array is 1024 of unsigned char aligned at 1024 byte boundary

void my_func(unsigned char* pts, unsigned char* arraySlice)

Meaning pts is, or can be for testing,a 40 byte slice of a larger pts array.

and where your test program test the time for a sliding 256 byte window on the array
and also optional test a sliding window on the 40 byte pts.

Jim Dempsey

Honored Contributor III
390 Views

The title of your post "Measuring the Variance of Prefetching in the L1D Cache" would seem to require your code is more like this:

for (i=0; i < (1 << 24); ++i){ // function called 2^24 times
doSomethingHereToAssureArrayAndPtsNotInAnyCache();
t1 = rdtsc() // timestamp
my_func();
time = rdtsc() - t1 // timing for my_func
}

(with the potential alterations to the args to my_func())

Prefetching is enabled/disabled via BIOS setting and/or MSW setting (or may always be on or off per processor model).

Without the doSomethingHereToAssureArrayAndPtsNotInAnyCache();, the L1 cache would fully contain the used portion of array and pts after the 1st iteration. (code would not test prefetching)

Jim Dempsey