Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games
509 Discussions

Vertex shader / interpolator slow with OpenGL (SWVP or HWVP enabled?)

survivorx
Beginner
1,542 Views
The Developer Guide says, one should use shaders over fixed function pipeline. I'm yet trying to get close to the performance of the fixed function pipeline in OpenGL.

My GPU is a "Mobile Intel 4 Series Express Chipset Family" (DELL Latitude E6500) with latest drivers from Intel. I have done some experiments with my lighting shader to improve performance on mobile Intel GPUs. Normally, per vertex lighting should be much faster than per fragment lighting because the fragment color is just interpolated between the vertex colors. Therefore, you do as much work as possible in the vertex shader, because it is executed much less often than the fragment shader.

On my Intel GPU, it's different! It seems the vertex shader is much slower than the fragment shader. Especially passing varyings from vertex shader to fragment shader is a huge performance impact. My lighting shader works best if I do only the absolute necessary stuff in vertex shader, pass as less information as possible to the fragment shader and do everything else there.

As mentioned before, even per vertex lighting, where just the vertex color is passed to the fragment shader which only does "gl_FragColor = gl_Color;" is much slower than passing vertex and normal to the fragment shader and doing everything there.

I suppose the vertex shader is eigther a software solution or is by default in software mode. Your Developer Guide says in section 3.1:

Support for both Hardware Vertex Processing (HWVP) and Software Geometry Processing (SWGP) is included. SWGP is a superset of software-based processing that includes software vertex processing (SWVP). HWVP peak vertex throughput has been significantly improved in Intel GMA Series 4 twice as fast as the previous generation, and by default, HWVP is enabled.

But I think that is just for DirectX. Is there a way to improve vertex shader performance with OpenGL? Or am I doing something wrong?

Edit: My test scene is a sphere. There seems to be a hard limit where vertex lighting becomes slow. If I set the subdivision / segment count of the sphere > 32, vertex lighting fps is halved (I assume it's when the vertex data becomes > 32 kb). The resolution is 640x480 without multisampling. There are three directional lights in the scene which are rendered in a single pass by the shader.




I've tested three different systems:
- DELL Latitude E6500 Notebook (Core2Duo 2.8 GHz, GMA X4500MHD)
- 7 year old Nexoc Osiris E604 Notebook (Pentium M 2.0 GHz, ATi Mobility Radeon 9700)
- Gaming PC (AMD Phenom II X4 955 3.4 GHz, nVidia GTX 275)

Legend:
  • The three columns below a GPU mean: fixed-function pipeline / per vertex lighting / per fragment lighting
  • Values in brackets mean a multi-pass shader, rest single pass
  • Green indicates a shader that does as little as possible in vertex shader, red makes normal use of the vertex shader

Here are the results:

Sphere
seg.
Triangles Vertices GMA X4500MHD Radeon 9700M
GTX 275
fixed func
per vertex
per fragment
fixed func
per vertex
per fragment
fixed func
per vertex
per fragment
30
2142
1794
797 600 (323)
297 (231)
1009 977 (450)
382 (146)
2600
4900 (4900)
4900 (4600)
31
2260
1854
796 580 (311)
296 (230)
1005 969 (444)
380 (145)
2600
4900 (4900)
4900 (4600)
32
2382
1916
802 573 (297)
294 (227)
1007 967 (439)
379 (144)
2600
4900 (4900)
4900 (4600)
33
2508
1980
804 243 (140)
294 (111)
1006 960 (435)
379 (145)
2600
4900 (4900)
4900 (4600)
34
2638
2046
803 232 (131)
291 (107)
998 945 (428)
375 (143)
2600
4900 (4880)
4900 (4600)
37
3052
2256
800 200 (112)
287 (90)
992 920 (413)
372 (141)
2600
4900 (4880)
4900 (4600)
64
8396
4952
586 78(38) 211 (31)
892 613 (270)
340 (122)
2150
4900 (4850)
4900 (3330)
128
32720
17184
226 20 (10) 62 (8)
523 239 (83)
279 (87)
1300
4900 (2200)
4700 (1439)
256
66208
130512
63 5(2) 16 (2)
151 69 (21)
194 (36)
543
2410 (680)
2400 (419)


Edit2: Ok, I found this document. Is there a way to force the Intel driver to do HWVP? The option "Vertex Processing" in the driver's task bar app has no effect.


Regards

0 Kudos
3 Replies
Andrew_McDonald
Beginner
1,542 Views
Wow, good timing - I think I've just hit the same thing (see my other post.) My app is running at half the frame rate after conversion from fixed-function to shaders. If what you're saying is true, the vertex shaders could be running on the CPU.

I'm using a desktop Clarkdale part rather than mobile, with the latest drivers. The driver control panel offers three options for vertex processing: "application settings", which presumably applies only to D3D as GL doesn't have an API for this; "default settings"; and "enable software processing" which presumably in fact means "force software processing".

However, all three have the same effect on my app. The doc you found implies the driver will only force SWVP if it's on a specific list - perhaps there's a bug here? It's also four years old, is any more recent info available?

Did you contact/get a reply from Intel about this? A shame not to after you've put so much effort in!
0 Kudos
survivorx
Beginner
1,542 Views
No, I didn't get any response. The only advice that I can give to you is to make a single pass shader and do as little as possible in the vertex unit. Pass everything to the fragment unit (preferably as uniform, since varyings are slow) and calculate lighting there.
0 Kudos
Andrew_McDonald
Beginner
1,542 Views
OK, thanks. I wanted to share as much GL code as possible between my iOS and Windows versions; but if I'm going to have to branch the code to work around this problem I might as well stick with the fixed-function pipeline on Windows, since I don't actually need any custom shaders. In fact I suppose I could still target OpenGL ES 1 for the iOS versions, assuming Apple won't drop support for it in the future.

Regardless, I'll try escalating this issue through Intel and see what they say...
0 Kudos
Reply