- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Developer Guide says, one should use shaders over fixed function pipeline. I'm yet trying to get close to the performance of the fixed function pipeline in OpenGL.
My GPU is a "Mobile Intel 4 Series Express Chipset Family" (DELL Latitude E6500) with latest drivers from Intel. I have done some experiments with my lighting shader to improve performance on mobile Intel GPUs. Normally, per vertex lighting should be much faster than per fragment lighting because the fragment color is just interpolated between the vertex colors. Therefore, you do as much work as possible in the vertex shader, because it is executed much less often than the fragment shader.
On my Intel GPU, it's different! It seems the vertex shader is much slower than the fragment shader. Especially passing varyings from vertex shader to fragment shader is a huge performance impact. My lighting shader works best if I do only the absolute necessary stuff in vertex shader, pass as less information as possible to the fragment shader and do everything else there.
As mentioned before, even per vertex lighting, where just the vertex color is passed to the fragment shader which only does "gl_FragColor = gl_Color;" is much slower than passing vertex and normal to the fragment shader and doing everything there.
I suppose the vertex shader is eigther a software solution or is by default in software mode. Your Developer Guide says in section 3.1:
But I think that is just for DirectX. Is there a way to improve vertex shader performance with OpenGL? Or am I doing something wrong?
Edit: My test scene is a sphere. There seems to be a hard limit where vertex lighting becomes slow. If I set the subdivision / segment count of the sphere > 32, vertex lighting fps is halved (I assume it's when the vertex data becomes > 32 kb). The resolution is 640x480 without multisampling. There are three directional lights in the scene which are rendered in a single pass by the shader.
I've tested three different systems:
- DELL Latitude E6500 Notebook (Core2Duo 2.8 GHz, GMA X4500MHD)
- 7 year old Nexoc Osiris E604 Notebook (Pentium M 2.0 GHz, ATi Mobility Radeon 9700)
- Gaming PC (AMD Phenom II X4 955 3.4 GHz, nVidia GTX 275)
Legend:
Here are the results:
Edit2: Ok, I found this document. Is there a way to force the Intel driver to do HWVP? The option "Vertex Processing" in the driver's task bar app has no effect.
Regards
My GPU is a "Mobile Intel 4 Series Express Chipset Family" (DELL Latitude E6500) with latest drivers from Intel. I have done some experiments with my lighting shader to improve performance on mobile Intel GPUs. Normally, per vertex lighting should be much faster than per fragment lighting because the fragment color is just interpolated between the vertex colors. Therefore, you do as much work as possible in the vertex shader, because it is executed much less often than the fragment shader.
On my Intel GPU, it's different! It seems the vertex shader is much slower than the fragment shader. Especially passing varyings from vertex shader to fragment shader is a huge performance impact. My lighting shader works best if I do only the absolute necessary stuff in vertex shader, pass as less information as possible to the fragment shader and do everything else there.
As mentioned before, even per vertex lighting, where just the vertex color is passed to the fragment shader which only does "gl_FragColor = gl_Color;" is much slower than passing vertex and normal to the fragment shader and doing everything there.
I suppose the vertex shader is eigther a software solution or is by default in software mode. Your Developer Guide says in section 3.1:
Support for both Hardware Vertex Processing (HWVP) and Software Geometry Processing (SWGP) is included. SWGP is a superset of software-based processing that includes software vertex processing (SWVP). HWVP peak vertex throughput has been significantly improved in Intel GMA Series 4 twice as fast as the previous generation, and by default, HWVP is enabled.
But I think that is just for DirectX. Is there a way to improve vertex shader performance with OpenGL? Or am I doing something wrong?
Edit: My test scene is a sphere. There seems to be a hard limit where vertex lighting becomes slow. If I set the subdivision / segment count of the sphere > 32, vertex lighting fps is halved (I assume it's when the vertex data becomes > 32 kb). The resolution is 640x480 without multisampling. There are three directional lights in the scene which are rendered in a single pass by the shader.
I've tested three different systems:
- DELL Latitude E6500 Notebook (Core2Duo 2.8 GHz, GMA X4500MHD)
- 7 year old Nexoc Osiris E604 Notebook (Pentium M 2.0 GHz, ATi Mobility Radeon 9700)
- Gaming PC (AMD Phenom II X4 955 3.4 GHz, nVidia GTX 275)
Legend:
- The three columns below a GPU mean: fixed-function pipeline / per vertex lighting / per fragment lighting
- Values in brackets mean a multi-pass shader, rest single pass
- Green indicates a shader that does as little as possible in vertex shader, red makes normal use of the vertex shader
Here are the results:
Sphere
seg. |
Triangles | Vertices | GMA X4500MHD | Radeon
9700M |
GTX
275 |
||||||
fixed func |
per vertex |
per fragment |
fixed func |
per vertex |
per fragment |
fixed func |
per vertex |
per fragment |
|||
30 |
2142 |
1794 |
797 | 600 (323) |
297 (231) |
1009 | 977 (450) |
382 (146) |
2600 |
4900 (4900) |
4900 (4600) |
31 |
2260 |
1854 |
796 | 580 (311) |
296 (230) |
1005 | 969 (444) |
380 (145) |
2600 |
4900 (4900) |
4900 (4600) |
32 |
2382 |
1916 |
802 | 573
(297) |
294
(227) |
1007 | 967 (439) |
379 (144) |
2600 |
4900 (4900) |
4900 (4600) |
33 |
2508 |
1980 |
804 | 243
(140) |
294
(111) |
1006 | 960 (435) |
379 (145) |
2600 |
4900 (4900) |
4900 (4600) |
34 |
2638 |
2046 |
803 | 232 (131) |
291 (107) |
998 | 945 (428) |
375 (143) |
2600 |
4900 (4880) |
4900 (4600) |
37 |
3052 |
2256 |
800 | 200 (112) |
287 (90) |
992 | 920 (413) |
372 (141) |
2600 |
4900 (4880) |
4900 (4600) |
64 |
8396 |
4952 |
586 | 78(38) | 211 (31) |
892 | 613 (270) |
340 (122) |
2150 |
4900 (4850) |
4900 (3330) |
128 |
32720 |
17184 |
226 | 20 (10) | 62 (8) |
523 | 239 (83) |
279 (87) |
1300 |
4900 (2200) |
4700 (1439) |
256 |
66208 |
130512 |
63 | 5(2) | 16 (2) |
151 | 69 (21) |
194 (36) |
543 |
2410 (680) |
2400 (419) |
Edit2: Ok, I found this document. Is there a way to force the Intel driver to do HWVP? The option "Vertex Processing" in the driver's task bar app has no effect.
Regards
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wow, good timing - I think I've just hit the same thing (see my other post.) My app is running at half the frame rate after conversion from fixed-function to shaders. If what you're saying is true, the vertex shaders could be running on the CPU.
I'm using a desktop Clarkdale part rather than mobile, with the latest drivers. The driver control panel offers three options for vertex processing: "application settings", which presumably applies only to D3D as GL doesn't have an API for this; "default settings"; and "enable software processing" which presumably in fact means "force software processing".
However, all three have the same effect on my app. The doc you found implies the driver will only force SWVP if it's on a specific list - perhaps there's a bug here? It's also four years old, is any more recent info available?
Did you contact/get a reply from Intel about this? A shame not to after you've put so much effort in!
I'm using a desktop Clarkdale part rather than mobile, with the latest drivers. The driver control panel offers three options for vertex processing: "application settings", which presumably applies only to D3D as GL doesn't have an API for this; "default settings"; and "enable software processing" which presumably in fact means "force software processing".
However, all three have the same effect on my app. The doc you found implies the driver will only force SWVP if it's on a specific list - perhaps there's a bug here? It's also four years old, is any more recent info available?
Did you contact/get a reply from Intel about this? A shame not to after you've put so much effort in!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, I didn't get any response. The only advice that I can give to you is to make a single pass shader and do as little as possible in the vertex unit. Pass everything to the fragment unit (preferably as uniform, since varyings are slow) and calculate lighting there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, thanks. I wanted to share as much GL code as possible between my iOS and Windows versions; but if I'm going to have to branch the code to work around this problem I might as well stick with the fixed-function pipeline on Windows, since I don't actually need any custom shaders. In fact I suppose I could still target OpenGL ES 1 for the iOS versions, assuming Apple won't drop support for it in the future.
Regardless, I'll try escalating this issue through Intel and see what they say...
Regardless, I'll try escalating this issue through Intel and see what they say...
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page