Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

NVD3DUM Eating CPU Cycles

LeeBamber
Beginner
2,656 Views

I've been doing a little profiling work on my games engine and have noticed that an inordinate amount of time can be spent inside the NVD3DUM.DLL module, and VTune reports up to 50% of my CPU workload happens inside.  For now I have resigned to accept that this is probably the driver handling the preparation and submission of my graphics data to the GPU but that's just a guess, and trying to find information about this module is pretty tough with almost no information on it.  There also does not appear to be any VS symbol (PDB) files available for it, so I don't know exactly what it's doing when I drill down into it so that's another shroud of mystery to tackle.

Does anyone have any links or information on this module, what it is doing, how it can be optimized and perhaps how to get more symbol style information on it, at least enough to give me a clue what general area I should be optimizing.  If anyone posts something useful, I promise to add it to my daily blog and lavish you with praise for digging out information on what seems to be a black box library that is stealing more than half of my processing cycles!

All I can guess is that NVD3DUM stands for NVIDIA DirectX 3D Unmanaged Driver, but beyond that, it's a mystery module!

0 Kudos
41 Replies
Bernard
Valued Contributor I
1,134 Views

@Lee

My last answer  #20 was related more to your previous posts where you were talking about the slow performance.

0 Kudos
Bernard
Valued Contributor I
1,134 Views

Do you have any updates?

0 Kudos
Vitaly_S_Intel
Employee
1,134 Views

Hi Lee!

Sorry for late response - do you know that GPU profiling is available right in VTune? Assuming you use recent versions there is an additional option in "Hotspot" and "Advanced Hotspot" analysis - "Analyze GPU Usage". It collects similar data that GPUView collects and will present both CPU and GPU profiling data in the same view so you will be able to recognize if your application CPU or GPU bound.

If you use GPA Frame Analyzer, try GPA Platform Analyzer (versions released in 2014) - it'll show you DMA packets queue and also provide DirectX calls so you would be able to find the reasons of GPU activity.

0 Kudos
Bernard
Valued Contributor I
1,134 Views

 >>>Assuming you use recent versions there is an additional option in "Hotspot" and "Advanced Hotspot" analysis - "Analyze GPU Usage>>>

I was under assumption that "Analyze GPU Usage" option is only available for Intel GPUs.

0 Kudos
Vitaly_S_Intel
Employee
1,134 Views

It works for any GPU. For Intel Graphics we'll be able to provide meaningful names for GPU engines while on others there will be just numbers like in GPUView.

0 Kudos
Bernard
Valued Contributor I
1,134 Views

Thank you for your answer.

0 Kudos
LeeBamber
Beginner
1,134 Views

My current problem is the game I need to test consumes about 1GB of system memory. Runs fine as is, but run through the GPA Monitor and it crashes. I suspect it is eating extra memory to capture the data, and this causes the game to initialize completely and simply runs out of memory (32 bit application).  Do you know any GPA tricks to ensure it does NOT consume any extra memory in the process of monitoring the application being targeted?  I will continue my experiments, maybe with a smaller game, but ideally I want to performance tune the big stuff!!

 

 

0 Kudos
Vitaly_S_Intel
Employee
1,134 Views

Hi Lee!

When saying "run through the GPA Monitor" - which tool do you use - Frame Analyzer, Platform Analyzer or System Analyzer? Also, which version of GPA do you use?

0 Kudos
LeeBamber
Beginner
1,134 Views

Before you can use GPA Platform Analyser, you need a .gpa_trace file as the source of the analysis.  To create this file I needed to run the Intel(r) GPA Monitor and select Analyze Application...  When I do on my 1GB application, the application crashes due to lack of memory.  If I run the application normally, it works fine.  I am suspecting it's the Intel(r) GPA Monitor is consuming some memory here.

0 Kudos
Bernard
Valued Contributor I
1,134 Views

 >>>I am suspecting it's the Intel(r) GPA Monitor is consuming some memory here>>>

Can you check GPA Monitor or rather VTune  virtual memory usage with VMmap tool?

0 Kudos
LeeBamber
Beginner
1,134 Views

I tried VMmap and launched my game application through it, a pretty good tool and shows me every memory allocation by the software!  When I launch through the tool the tool crashed, but when I run the game and then attach the tool once loaded, it worked fine. The problem is that if the VMmap tool crashes as it loads, and the GPA Monitor crashes during the load, there is no way I can finish the game loading step and THEN attach the VMmap tool.  I have had some success with smaller game levels so will be using that to get to a place where there is no crash, and in the meantime the information about memory allocation will come in pretty handy for spotting large commits of memory that are un-necessary. In fact I just spotted a 125MB reservation made by a DLL I no longer use, so time to investigate!  In anyone from VMmap is reading, here is the crash log:

Problem signature:

  Problem Event Name:    APPCRASH
  Application Name:    vmmap.exe
  Application Version:    3.12.0.0
  Application Timestamp:    532a6358
  Fault Module Name:    vmmap.exe
  Fault Module Version:    3.12.0.0
  Fault Module Timestamp:    532a6358
  Exception Code:    40000015
  Exception Offset:    0003f36e
  OS Version:    6.1.7601.2.1.0.256.1
  Locale ID:    2057
  Additional Information 1:    6bc8
  Additional Information 2:    6bc8b499bf770c4064979b0a658871fa
  Additional Information 3:    7409
  Additional Information 4:    74094e6bac091eff0133b989e0f30bd2

 

0 Kudos
Bernard
Valued Contributor I
1,134 Views

>>> Exception Code:    40000015>>>

Exception code 0x40000015 stands for "Status Fatal App Exit" and is usually generated during the application shutdown.

 Exception Offset:    0003f36e points to the exact location of the instruction which caused the crash.

0 Kudos
Bernard
Valued Contributor I
1,134 Views

 >>>In anyone from VMmap is reading, here is the crash log:>>>

Can you locate VMmap  minidump crash file and upload as a private message?

0 Kudos
Bernard
Valued Contributor I
1,134 Views

>>>In anyone from VMmap is reading, here is the crash log:>>>

AFAIK  Mark Russinovich is a VMmap developer and probably support is given by the Sysinternals forum.

0 Kudos
Vitaly_S_Intel
Employee
1,134 Views

Lee, which version of GPA do you use?

Can you please update to the most recent version available here: https://software.intel.com/en-us/gpa ?

0 Kudos
LeeBamber
Beginner
1,134 Views

2014 R2

0 Kudos
Vitaly_S_Intel
Employee
1,134 Views

OK, thanks! Can you please try to decrease Trace Duration value?

Right click on icon -> Profiles -> Tracing -> Trace Duration (sec). Set 5, for example.

0 Kudos
LeeBamber
Beginner
1,134 Views

Some good news. Reducing the trace from 5 seconds down to 1 second might have helped, as I was able to run a smaller level and then use 'Capture Trace' option, and now I have some data in the Platform Analyser which is a step forward. The 'Frame Capture' still showed an 'out of memory' error when I pressed CTRL+SHIFT+C but I dare say I can coax that to work if the level was even smaller, e.t.c.  Already found some LOCK commands I could do without to increase performance so onwards and upwards!

profilefromplatformanalyser.jpg

0 Kudos
Bernard
Valued Contributor I
1,134 Views

I must admitt that GPA output looks more friendly when compared to GPUView. So the reason for poor performance were VertexBuffer lock/unlock commands?

0 Kudos
Vitaly_S_Intel
Employee
1,038 Views

Hi Lee!

Glad that you were able to get useful data with Platform Analyzer. We'll be grateful if you share numbers of performance improvement you could achieve using Platform Analyzer. Any feedback and suggestions for the tool is highly appreciated. Thanks in advance!

As for memory issues - does your application link with /LARGEADDRESSAWARE? This option allows 32bit applications to use up to 4Gb of virtual memory on 64bit systems. Both Platform Analyzer and Frame Analyzer need additional virtual memory in application process.

0 Kudos
LeeBamber
Beginner
1,038 Views

Awesome advice!  I used the flag and I am now able to exceed the 1.8GB system memory cap that caused crashes in my engine.  I have not tried the Analysers yet but I have a feel good factor about it now and I wanted to report right away that the LARGEADDRESSAWARE worked a treat ;)

 

0 Kudos
Reply