Community
cancel
Showing results for 
Search instead for 
Did you mean: 
marcel-
Beginner
369 Views

Atom optimization: -xSSE3_ATOM vs. -xSSSE3 and -minstruction=movbe

Jump to solution
Sorry if this is in the wrong forum or already asked (many times) in other places, but I am completely new to these forums and a search did not answers my questions.

In the article about optimization for Intel Atom processors, Robert Mller Albrecht recommends ICC flags used when compiling for Intel Atom. He writes:

The compiler optimizations specifically targeting the Intel Atom Processor can be grouped into those related to the in-order instruction scheduler and thus minimizing dependency stalls caused by instruction latencies, those taking advantage of new or preferable instructions added to the instruction set and lastly those who take advantage of some of the advanced features like SSE3 instructions and bi-endianness support the Intel Atom Processor shares with some other Intel processors. Taking advantage of these features is triggered by using the

xL (Linux*) or /QxL (Windows*)

andwith the Intel C++ Compiler 11.x also the

xSSE3_ATOM (Linux*) or /QxSSE3_ATOM (Windows*)

optimization switch.

According to my /proc/cpuinfo, my (32-bit) Intel Atom N270 CPU also supports SSSE3 instructions:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 xtpr pdcm movbe lahf_lm

Therefore I could also use flag

-xT (deprecated) = -xSSSE3

1) Mller Albrecht recommends using both -xL and -xSSE3_ATOM, but it turns out that the last -x flag overrides any previous ones. So, should I use -xSSSE3 or is it better (i.e., does ICC generate better optimized code) to use -xSSE3_ATOM?

2) Should I also use

-minstruction=movbe

when using -xSSE3_ATOM? The manpage is not totally clear on this:

To use this option, you must also specify -xSSE3_ATOM (Linux and Mac OS X)
[...]
The options are ON by default when -xSSE3_ATOM or /QxSSE3_ATOM is specified.

The second implies that it's automatically selected when using -xSSE3_ATOM, but the first implies that it is useless to turn MOVBE instructions on with this flag when using something different than -xSSE3_ATOM (so, I can't combine this flag with -xSSSE3, if I am right).
0 Kudos
1 Solution
Hubert_H_Intel
Employee
369 Views
Marcel,
-xSSSE3 -minstruction=movbe will create code for Atom including movbe and it might take advantage of the extended SSSE3 instruction set (depending on the application), but it will not not use the Atom specific heuristicsfor code optimization including in-order scheduling. So-xSSE3_ATOM is the compiler switch of choice for Atom.
Hubert.

View solution in original post

9 Replies
aazue
New Contributor I
369 Views
Hi,Marcel
I think that if you having duplicated flag
(option added unnecessary already in other or previous)
is not problem for the compiler, he take only unique.
better you having exceed , that you having in less for result fault.
Also made test with your proper source for evaluate really result effects ;
largely better that as you can read.
I have made several test with n270,n280,230,330,450.
large part of improve depend how source is wrote

Personal ,I have conviction that this type processor is actual the better existing in the world
if you see size ,consumption, performance and the small price.

Best regards
Nb
Please , Have some informations or link about flag required from ATOM with using GNU compiler ?

Hubert_H_Intel
Employee
369 Views
Marcel,


All you need to specify for Intel Atom processor specific optimizations is to add the switch

-xSSE3_ATOM (Linux*) or /QxSSE3_ATOM (Windows*)

which also implicitely enables generation of movbe instructions.

-xL and /QxL was the previus Atom processor switch for 10.x Intel Compilers that is deprecated now and willnot beavailable in future compiler versions. So please use the new switch only.

Regards,

Hubert.

marcel-
Beginner
369 Views
Thanks Hubert, but is it better to use

-xSSE3_ATOM

than

-xSSSE3

(as my Intel Atom also seems to support SSSE3 instructions)?
marcel-
Beginner
369 Views
Hi Bustaf,

Although I couldn't make much sense of your post (but I suppose you're, like me, not a native English speaker and I don't mean to offend you), I know that writing good source in the first place will give me a much better improvement than just tweaking compiler optimization flags. Having said that, my question regards already existing pieces of software (in this case, Firefox).

For GCC, you should use -march=atom -mssse3 -mfpmath=sse and compile with GCC >= 4.5.
TimP
Black Belt
369 Views
-xSSSE3 optimizes for Woodcrest/Core 2 CPUs. If you're looking for portability, it won't work on older CPUs, or CPUs of other brands, and won't give best results on newer ones. The only way I could see it being "better" is where you want best performance on early Core 2, but don't care about performance on Atom or Core i3,5,7, and want to exclude more CPU types. As Atom is more dependent on efficient instruction scheduling by the compiler, you might expect the Atom specific option to show more gain on Atom than loss on other CPU types.
Hubert_H_Intel
Employee
370 Views
Marcel,
-xSSSE3 -minstruction=movbe will create code for Atom including movbe and it might take advantage of the extended SSSE3 instruction set (depending on the application), but it will not not use the Atom specific heuristicsfor code optimization including in-order scheduling. So-xSSE3_ATOM is the compiler switch of choice for Atom.
Hubert.

View solution in original post

aazue
New Contributor I
369 Views
Hi
Although I couldn't make much sense of your post (but I suppose you're, like me, not a native English speaker and I don't mean to offend you)
For would you can understand better or you learn,
I have already compile 1 years ago, Firefox,Seamonkey, epiphany and other Gecko typed with all type
processors Intel Atom included , I have not share the thread about as to build Firefox with
all flag or options possible used ,Icc result for (Browser Gecko typed) slower that GNU compiler..
I have directly analyze time reference with that receive internal APACHE 2 (trace in source pool APACHE 2) (request internal and external for exlude Mtu Mru potential problem)
As you can learn better with proccessor ATOM type, the only problem as that if you
using as too fork or thread can be result catastrophic for all tasks O/S
(same difficult to move mouse time process busy) this problem not exist or are less with other type Intel processor.
Thank for your link about GNU.
best regards.

marcel-
Beginner
369 Views
Hi bustaf,

Icc result for (Browser Gecko typed) slower that GNU compiler

I already feared that. I managed to compile it without any optimizations (no Atom flags, no IPO) and SunSpider resulted in a significantly slower total time (I know it only tests the speed of the JavaScript engine) than when building FF using GCC 4.5 with Atom optimizations and PGO (in case someone is interested, you can read about my experiences about building FF using ICC in the Arch Linux Forums). I'm doing a last try building FF with ICC using Atom optimization and IPO (following these instructions, it's really something you don't want to do on an Atom anyway), but you may be right, I don't think it's worth it.

Thank for your link about GNU.

You're welcome!

Regards, Marcel
aazue
New Contributor I
369 Views
Hi
I never use SunSpider as reference for an evaluate an browser gecko typed
If you want evaluate really performance, made an new programmed module in apache2 for used sub stream socket specialy reserved
with that you can directly trace exactly estimation time communication between Server and browser.
Only as same way, you can evaluate value of the flag used for improve result.
can be very various, depend if you use standard page or hard task same database
in backend that must also answer shared dynamically.
Java engine is only an part that can be not significant in this situation.
Also unfortunate you can not modify
source FF specialized for your programming.
I have start investigation learning for change an new deflate module Apache and use as typed
asynchronous programming. ( for side apache only)
I don't know if is an judicious way for improve exchange but (Hope Is Vital).
I'll remake new tests (grouped) Firefox, Apache, Database and CGI, with Icc but i not wait the miracle.
Best regards.
Reply