Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Atomic bit test with a shared memory

CyrIng
Novice
861 Views

Hello,

For a multi-threading synchonization purpose, I'm using LOCK BTS and LOCK BTR with a shared memory.

However how to test this bit when BT does not work with LOCK ?

LOCK AND could be a solution but the destination operand, the shared memory, is destroyed by the result of this logical operation.

Thanks for any help.

 

0 Kudos
7 Replies
JWong19
Beginner
861 Views

The pseudo instruction 'LOCK BT' should have the same effect as the instruction 'BT', because memory is normally read once with the instruction 'BT'. Therefore the instruction 'BT' should serve your original need.

Anyway, you may disclose more details of what you are doing to confirm the above.

0 Kudos
CyrIng
Novice
861 Views

Thank for your reply.

Although it is specified that the prefix lock is forbidden with (assembler rejects it), BT serves my need and seems to be atomic in a slow speed context (500 ms)

However, I need to write a multi-thread stress test to check if BT is really atomic in different cases :

1- Two logical cores (HTT)

2- Two physical cores

3- Same core ?

0 Kudos
JWong19
Beginner
861 Views

According to Intel's manual, reading a byte, reading a word aligned on 16-bit boundary and reading a doubleword aligned on 32-bit boundary are guaranteed atomic since Intel486 processor.

So make sure that your 32-bit variable is properly aligned.

The next thing you need to pay attention is memory ordering as your purpose is synchronization.

--

'BT' instruction should not be so widely used as 'AND' instruction, hence 'BT' instruction may have poorer performance (longer latency; longer instruction decode time) than 'AND' instruction under some microarchitectures.

0 Kudos
CyrIng
Novice
861 Views
Thank you. In my Linux driver, the L1 cache alignment is relying on a slab memory allocated by the kernel. Both threads, producer (driver) and consummer (user-space), are pinned to same core id ; thus I have removed the LOCK prefix of BTS and BTR, to synchronize them with BT It seems to work. I would also like to bit test with AND but this instruction destroys the destination operation: the slab memory cell.
Jeremy W. wrote:

According to Intel's manual, reading a byte, reading a word aligned on 16-bit boundary and reading a doubleword aligned on 32-bit boundary are guaranteed atomic since Intel486 processor.

So make sure that your 32-bit variable is properly aligned.

The next thing you need to pay attention is memory ordering as your purpose is synchronization.

--

'BT' instruction should not be so widely used as 'AND' instruction, hence 'BT' instruction may have poorer performance (longer latency; longer instruction decode time) than 'AND' instruction under some microarchitectures.

0 Kudos
JWong19
Beginner
861 Views
When your slab memory is the source operand instead of destination operand,
Opcode   Instruction
21 lr         AND m32, r32 ; memory address is the destination operand
23 lr         AND r32, m32 ; memory address is the source operand

You'll get what you want.

--

BTW, beware that interrupts can appear in the middle of instruction (especially in the case of REP prefix instruction).

0 Kudos
CyrIng
Novice
861 Views

Unfortunately the assembler refuse to compile AND with the LOCK prefix when destination is not a memory address.

I'm facing this case because a third thread is aggregating the values of all slab memories: this thread, also as a consumer and not cpu pinned, needs to test the availability of each slab using the bit of synchronisation. So I believe a LOCK is required to garanty the atomicity of the test.

Do you mean that an interrupt can "preempt" the execution between the REP prefix and the rest of op codes ?

Such as LOCK <i> AND dest, src

where is <i> is the position when the interruption happens.

0 Kudos
JWong19
Beginner
861 Views

CyrIng wrote:

Unfortunately the assembler refuse to compile AND with the LOCK prefix when destination is not a memory address.

I'm facing this case because a third thread is aggregating the values of all slab memories: this thread, also as a consumer and not cpu pinned, needs to test the availability of each slab using the bit of synchronisation. So I believe a LOCK is required to garanty the atomicity of the test.

Do you mean that an interrupt can "preempt" the execution between the REP prefix and the rest of op codes ?

Such as LOCK <i> AND dest, src

where is <i> is the position when the interruption happens.

To test the availability of each slab, is atomic read necessary? Does your third thread execute in the same logical processor as the first 2 threads?

LOCK prefix instruction costs ~70 cycles in average, is it still too expensive in your case?

--

According to Intel's manual, interrupts are taken at instruction boundary. For the case of REP prefix instruction, interrupt is taken at the current iteration (e.g. at 50th iteration when there are 100 iterations specified)

0 Kudos
Reply