1. Yes, it applies to both MLC and LLC latency as well. An x87 load that hits the MLC or LLC will have a latency 2 cycles longer than a Integer load that hits the MLC or LLC.
2. Yes, a load with base+offset with offset<2048 will have an MLC or LLC latency 1 cycle shorter than a load that doesn't.
3. Checking...does seem like a typo.
4. Yes, this is a design limitation dealing moving twice as much data between the memory and avx stacks.