The reason for terminating these hardware prefetchers at the end of the page is to avoid page miss side effects. If you wish to initiate a prefetch to Xeon TLB cache, so as to get started early on resolution of a possible DTLB miss, you would require a software prefetch, either explicit in source code, or generated by a compiler option. I'm not reading this as part of the original question.
The question does raise interesting consequences. If an array has been modified by a different core, the prefetcher would accelerate getting it back to a core which needs the update. Only 1 forward going array in the page could be so accelerated by hardware prefetch, and an additional backward going array could be accelerated only into L2.