Vertex shader / interpolator slow with OpenGL (SWVP or HWVP enabled?)

survivorx · ‎01-20-2012

The Developer Guide says, one should use shaders over fixed function pipeline. I'm yet trying to get close to the performance of the fixed function pipeline in OpenGL.

My GPU is a "Mobile Intel 4 Series Express Chipset Family" (DELL Latitude E6500) with latest drivers from Intel. I have done some experiments with my lighting shader to improve performance on mobile Intel GPUs. Normally, per vertex lighting should be much faster than per fragment lighting because the fragment color is just interpolated between the vertex colors. Therefore, you do as much work as possible in the vertex shader, because it is executed much less often than the fragment shader.

On my Intel GPU, it's different! It seems the vertex shader is much slower than the fragment shader. Especially passing varyings from vertex shader to fragment shader is a huge performance impact. My lighting shader works best if I do only the absolute necessary stuff in vertex shader, pass as less information as possible to the fragment shader and do everything else there.

As mentioned before, even per vertex lighting, where just the vertex color is passed to the fragment shader which only does "gl_FragColor = gl_Color;" is much slower than passing vertex and normal to the fragment shader and doing everything there.

I suppose the vertex shader is eigther a software solution or is by default in software mode. Your Developer Guide says in section 3.1:

Support for both Hardware Vertex Processing (HWVP) and Software Geometry Processing (SWGP) is included. SWGP is a superset of software-based processing that includes software vertex processing (SWVP). HWVP peak vertex throughput has been significantly improved in Intel GMA Series 4 twice as fast as the previous generation, and by default, HWVP is enabled.

But I think that is just for DirectX. Is there a way to improve vertex shader performance with OpenGL? Or am I doing something wrong?

Edit: My test scene is a sphere. There seems to be a hard limit where vertex lighting becomes slow. If I set the subdivision / segment count of the sphere > 32, vertex lighting fps is halved (I assume it's when the vertex data becomes > 32 kb). The resolution is 640x480 without multisampling. There are three directional lights in the scene which are rendered in a single pass by the shader.

I've tested three different systems:
- DELL Latitude E6500 Notebook (Core2Duo 2.8 GHz, GMA X4500MHD)
- 7 year old Nexoc Osiris E604 Notebook (Pentium M 2.0 GHz, ATi Mobility Radeon 9700)
- Gaming PC (AMD Phenom II X4 955 3.4 GHz, nVidia GTX 275)

Legend:

The three columns below a GPU mean: fixed-function pipeline / per vertex lighting / per fragment lighting
Values in brackets mean a multi-pass shader, rest single pass
Green indicates a shader that does as little as possible in vertex shader, red makes normal use of the vertex shader

Here are the results:

Sphere seg.	Triangles	Vertices	GMA X4500MHD			Radeon 9700M			GTX 275
Sphere seg.	Triangles	Vertices	fixed func	per vertex	per fragment	fixed func	per vertex	per fragment	fixed func	per vertex	per fragment
30	2142	1794	797	600 (323)	297 (231)	1009	977 (450)	382 (146)	2600	4900 (4900)	4900 (4600)
31	2260	1854	796	580 (311)	296 (230)	1005	969 (444)	380 (145)	2600	4900 (4900)	4900 (4600)
32	2382	1916	802	573 (297)	294 (227)	1007	967 (439)	379 (144)	2600	4900 (4900)	4900 (4600)
33	2508	1980	804	243 (140)	294 (111)	1006	960 (435)	379 (145)	2600	4900 (4900)	4900 (4600)
34	2638	2046	803	232 (131)	291 (107)	998	945 (428)	375 (143)	2600	4900 (4880)	4900 (4600)
37	3052	2256	800	200 (112)	287 (90)	992	920 (413)	372 (141)	2600	4900 (4880)	4900 (4600)
64	8396	4952	586	78(38)	211 (31)	892	613 (270)	340 (122)	2150	4900 (4850)	4900 (3330)
128	32720	17184	226	20 (10)	62 (8)	523	239 (83)	279 (87)	1300	4900 (2200)	4700 (1439)
256	66208	130512	63	5(2)	16 (2)	151	69 (21)	194 (36)	543	2410 (680)	2400 (419)

Edit2: Ok, I found this document. Is there a way to force the Intel driver to do HWVP? The option "Vertex Processing" in the driver's task bar app has no effect.

Regards

Andrew_McDonald · ‎02-03-2012

Wow, good timing - I think I've just hit the same thing (see my other post.) My app is running at half the frame rate after conversion from fixed-function to shaders. If what you're saying is true, the vertex shaders could be running on the CPU.

I'm using a desktop Clarkdale part rather than mobile, with the latest drivers. The driver control panel offers three options for vertex processing: "application settings", which presumably applies only to D3D as GL doesn't have an API for this; "default settings"; and "enable software processing" which presumably in fact means "force software processing".

However, all three have the same effect on my app. The doc you found implies the driver will only force SWVP if it's on a specific list - perhaps there's a bug here? It's also four years old, is any more recent info available?

Did you contact/get a reply from Intel about this? A shame not to after you've put so much effort in!

survivorx · ‎02-03-2012

No, I didn't get any response. The only advice that I can give to you is to make a single pass shader and do as little as possible in the vertex unit. Pass everything to the fragment unit (preferably as uniform, since varyings are slow) and calculate lighting there.

Andrew_McDonald · ‎02-06-2012

OK, thanks. I wanted to share as much GL code as possible between my iOS and Windows versions; but if I'm going to have to branch the code to work around this problem I might as well stick with the fixed-function pipeline on Windows, since I don't actually need any custom shaders. In fact I suppose I could still target OpenGL ES 1 for the iOS versions, assuming Apple won't drop support for it in the future.

Regardless, I'll try escalating this issue through Intel and see what they say...