- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
The HPL benchmark performance obtained on a host + 2 MIC cards is coming only 719GFlops. The Host system has 128 GB memory. The theoretical peak is 1.2TF + 1.2TF + 460GFLOPS = 2.8TF. Efficiency is just 35%(~). May I know how to optimize the hpl performance? I've used the OFFLOAD execution, with the executable xhpl_offload_intel64(manually compiled from source of Intel linpack 11.1.1)
링크가 복사됨
1 응답
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
try use another floating point models and use huge pages , which should increase ur performance as well , thats a factor of 2 or 3 if u use both. It also depends on the problem u just want to compute, do u have many memor acceses ?.
best regards
