I'm getting a very large CPI, when profiling an application; around 120. Is clocks ticks refering to total clock ticks(1,200,000,000 * sec) or only the clock ticks while my app is in context. Are cache misses included in this meaning a stall of 40 ticks will be added in. How does it account of device driver code that is run in an arbitrary context(my app).
For purposes of calculating CPI, it is only referring to clockticks while you app is executing. Anything that affects execution of instructions will impact CPI so, yes, cache misses will increase the number of clock ticks it takes to execute the code.
Regarding driver code, you should ignore the Process and Thread views and look at the Module view for all processes and all threads since, as you point out, they are not relevant to drivers. In Options, you can specify that Modules should be the initial view of sampling data (and this is the default).