- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am using the intrinsic for square root. I know from the Optimization manual I could use reciprocal square root and aproximation algorithm. But I need the accuracy.
The thing is that AVX shows no improvement over SSE. Intrinsics guide gave me some hints. Is it true that the square root operation is not pipeling for both SSE and AVX? At least latency and througput indicte this. I mean AVX has twice data amount per operation but a double of latency and half of througput means all combined same performance? Is it so?
My testsystem is an i5-2410M. In the intrinsics guide (I updated to the newest version) I only find latency and througput for Sandy Bridge. Has performance of this commands improved in Ivy Bridge? Could anyone explain the CPUID(s) a little bit? 06_2A means Sandy Bridge or does it not? Does this account for all Sandy Bridge CPUs (regardless of Desktop or Mobile or i3, i5, i7)?
For CPUID(s) I found: http://software.intel.com/en-us/articles/intel-architecture-and-processor-identification-with-cpuid-model-and-family-numbers
Does the intrinsics guide refer to a combination of family and model number? What about model numbers not mentioned in the intrinsics guide like Ivy Bridge?
Link Copied
- « Previous
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah this sounds quite good.
Let me know when you have the updated project. I will let it run on two Sandy Bridge Systems (one mobile i5 and desktop high class i7).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Think code is quite good now, so lets start the tests and see what we get.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Sergey.
I think the whole combination of our test scenarios gives quite an good overview.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You mean I run the test on another machine and then we post it together?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, thats fine. I think I can run the tests that evening and post them results tomorrow.
And please email me also the exe, so we test the same thing. You work with VS2008 and Intel Compiler?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here you find the test results, based on the project provided above. All additional information can be found in the output itself.
///////////////////////////////////////////////////////////////////////////////
CONSOLE APPLICATION : SqrtTestApp Project Overview
///////////////////////////////////////////////////////////////////////////////
Release Notes:
6. Tests on Sandy Dridge system:
>> 32-bit <<
32-bit Windows platform
Notes:
- Processing is Normalized - Tests calculate 8 sqrt values per iteration
- Number of iterations is 33554432
Tests started
CRT Sqrt - float
Calculating the Square Roots - Done in 47 ticks
625.000^0.5 = 25.000
SSE Sqrt - float
Calculating the Square Roots - Done in 172 ticks
625.000^0.5 = 25.000
AVX Sqrt - float
Calculating the Square Roots - Done in 31 ticks
625.000^0.5 = 25.000
STL vector size: 67108864 ( float elements )
Number of tests: 4
STL vector: STL sqrt - float
Calculating the Square Roots
Test 1: 327 ticks
Test 2: 343 ticks
Test 3: 344 ticks
Test 4: 327 ticks
Average: 335 ticks
STL vector: SSE sqrt - float
Calculating the Square Roots
Test 1: 93 ticks
Test 2: 94 ticks
Test 3: 78 ticks
Test 4: 94 ticks
Average: 89 ticks
STL vector: AVX sqrt - float
Calculating the Square Roots
Test 1: 94 ticks
Test 2: 78 ticks
Test 3: 93 ticks
Test 4: 94 ticks
Average: 89 ticks
Tests completed
Press ESC to Exit...
>> 64-bit <<
64-bit Windows platform
Notes:
- Processing is Normalized - Tests calculate 8 sqrt values per iteration
- Number of iterations is 33554432
Tests started
CRT Sqrt - float
Calculating the Square Roots - Done in 47 ticks
625.000^0.5 = 25.000
SSE Sqrt - float
Calculating the Square Roots - Done in 187 ticks
625.000^0.5 = 25.000
AVX Sqrt - float
Calculating the Square Roots - Done in 16 ticks
625.000^0.5 = 25.000
STL vector size: 67108864 ( float elements )
Number of tests: 4
STL vector: STL sqrt - float
Calculating the Square Roots
Test 1: 328 ticks
Test 2: 343 ticks
Test 3: 343 ticks
Test 4: 343 ticks
Average: 339 ticks
STL vector: SSE sqrt - float
Calculating the Square Roots
Test 1: 78 ticks
Test 2: 78 ticks
Test 3: 93 ticks
Test 4: 78 ticks
Average: 81 ticks
STL vector: AVX sqrt - float
Calculating the Square Roots
Test 1: 78 ticks
Test 2: 94 ticks
Test 3: 93 ticks
Test 4: 94 ticks
Average: 89 ticks
Tests completed
Press ESC to Exit...
5. Sandy Bridge system:
Betriebssystemname Microsoft Windows 7 Home Premium
Version 6.1.7601 Service Pack 1 Build 7601
Zusätzliche Betriebssystembeschreibung Nicht verfügbar
Betriebssystemhersteller Microsoft Corporation
Systemname DANIELA-LAPTOP
Systemhersteller SAMSUNG ELECTRONICS CO., LTD.
Systemmodell RV420/RV520/RV720/E3530/S3530/E3420/E3520
Systemtyp x64-basierter PC
Prozessor Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz, 2301 MHz, 2 Kern(e), 4 logische(r) Prozessor(en)
BIOS-Version/-Datum Phoenix Technologies Ltd. 03PQ, 08.07.2011
SMBIOS-Version 2.6
Windows-Verzeichnis C:\Windows
Systemverzeichnis C:\Windows\system32
Startgerät \Device\HarddiskVolume1
Gebietsschema Österreich
Hardwareabstraktionsebene Version = "6.1.7601.17514"
Benutzername Daniela-Laptop\Daniela
Zeitzone Mitteleuropäische Zeit
Installierter physikalischer Speicher (RAM) 6,00 GB
Gesamter realer Speicher 5,98 GB
Verfügbarer realer Speicher 4,28 GB
Gesamter virtueller Speicher 12,0 GB
Verfügbarer virtueller Speicher 10,3 GB
Größe der Auslagerungsdatei 5,98 GB
Auslagerungsdatei C:\pagefile.sys
4. Tests on Ivy Dridge system:
>> 32-bit <<
..\SqrtTestApp\Release>SqrtTestApp32.exe
32-bit Windows platform
Notes:
- Processing is Normalized - Tests calculate 8 sqrt values per iteration
- Number of iterations is 33554432
Tests started
CRT Sqrt - float
Calculating the Square Roots - Done in 62 ticks
625.000^0.5 = 25.000
SSE Sqrt - float
Calculating the Square Roots - Done in 109 ticks
625.000^0.5 = 25.000
AVX Sqrt - float
Calculating the Square Roots - Done in 31 ticks
625.000^0.5 = 25.000
STL vector size: 67108864 ( float elements )
Number of tests: 4
STL vector: STL sqrt - float
Calculating the Square Roots
Test 1: 343 ticks
Test 2: 359 ticks
Test 3: 343 ticks
Test 4: 359 ticks
Average: 351 ticks
STL vector: SSE sqrt - float
Calculating the Square Roots
Test 1: 62 ticks
Test 2: 47 ticks
Test 3: 47 ticks
Test 4: 62 ticks
Average: 54 ticks
STL vector: AVX sqrt - float
Calculating the Square Roots
Test 1: 47 ticks
Test 2: 62 ticks
Test 3: 47 ticks
Test 4: 47 ticks
Average: 50 ticks
Tests completed
>> 64-bit <<
..\SqrtTestApp\x64\Release>SqrtTestApp64.exe
64-bit Windows platform
Notes:
- Processing is Normalized - Tests calculate 8 sqrt values per iteration
- Number of iterations is 33554432
Tests started
CRT Sqrt - float
Calculating the Square Roots - Done in 47 ticks
625.000^0.5 = 25.000
SSE Sqrt - float
Calculating the Square Roots - Done in 109 ticks
625.000^0.5 = 25.000
AVX Sqrt - float
Calculating the Square Roots - Done in 31 ticks
625.000^0.5 = 25.000
STL vector size: 67108864 ( float elements )
Number of tests: 4
STL vector: STL sqrt - float
Calculating the Square Roots
Test 1: 359 ticks
Test 2: 343 ticks
Test 3: 359 ticks
Test 4: 343 ticks
Average: 351 ticks
STL vector: SSE sqrt - float
Calculating the Square Roots
Test 1: 47 ticks
Test 2: 62 ticks
Test 3: 47 ticks
Test 4: 47 ticks
Average: 50 ticks
STL vector: AVX sqrt - float
Calculating the Square Roots
Test 1: 47 ticks
Test 2: 47 ticks
Test 3: 47 ticks
Test 4: 47 ticks
Average: 47 ticks
Tests completed
3. Ivy Bridge system:
OS Name Microsoft Windows 7 Professional
Version 6.1.7601 Service Pack 1 Build 7601
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name DELLPM
System Manufacturer Dell Inc.
System Model Precision M4700
System Type x64-based PC
Processor Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz, 2801 Mhz, 4 Core(s), 8 Logical Processor(s)
BIOS Version/Date Dell Inc. A05, 08/10/2012
SMBIOS Version 2.7
Windows Directory C:\Windows
System Directory C:\Windows\System32
Boot Device \Device\HarddiskVolume2
Locale Canada
Hardware Abstraction Layer Version = "6.1.7601.17514"
User Name DellPM\Admin
Time Zone Mountain Standard Time
Installed Physical Memory (RAM) 16.0 GB
Total Physical Memory 15.9 GB
Available Physical Memory 14.3 GB
Total Virtual Memory 47.9 GB
Available Virtual Memory 46.3 GB
Page File Space 32.0 GB
Page File C:\pagefile.sys
2. When int iNumberOfIterations = 268435456 ( 2^28 ) there is Microsoft C++
exception: std::length_error ( Vector is too long ) and application crashes
Fixed. A different way of processing is used now.
1. Renamed aligned_alloc.h to AlignedAlloc.h
///////////////////////////////////////////////////////////////////////////////
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey, thanks for the describing all the important details!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »