When considering application tuning i see that most of the tuning tries to increase the cache hit ratio. But why dont they try to the samething with the main memory hit ration?? (when for eg a big file is loaded from the disk and processed)
Optimizing the memory performance is a good thing to do but you need to have another tool other than Vtune, vtune can tell you stuff about cache misses and stuff like that internal to the processor and some other statistics from your call graph where you can know that File I/O takes most of your time you can perhaps do something about that but managing memory is an OS kind of thing. So, if you try to prefetch things to memory anticipating that you will request them is I think either done or partially done in some form but today smart Disks, they have caches and do some prefetching at the desk level and there is some research papers on other prefetching techniques into DRAMs from the Disk.
I agree. In general, the OS is going to handle buffering file I/O and there isn't much you can do about it, except maybe try to access the file sequentially, which is what the OS and disk subsystem are expecting and optimized for.
I'm new to VTune and still haven't got it working so I'm not sure how to use to help with IO tuning. Aside from that, when I'm developing programs that need high IO bandwidth I usually bypass the OS buffering and use DIRECT IO (O_DIRECT) or Asynchronous IO (aio libraries).
It's now running. I'm using Windows XP eval version with RDC on Itanium Linux. I had trouble installing RDC. One server was missing a library and on other I had problems building a compatible sampling driver. I down graded the kernel on one of the servers and now it's up and crawling! I managed to get sampling on a short run of my process but it was awfully slow. I don't mean my process is slow, VTune is slow. Running the project Wizard takes +30sec per screen and then when you run an activity nothing happens for minutes. I see no cpu or IO activity on my workstation or the server and just the occasionally flicker of network traffic. After several minutes my process starts on the server, runs for 20sec and stops. There's another delay of +5min before I see any results. This morning I tried running my process on extra data. It takes 3-4 minutes. But VTune and vtserver hang near the end of the first sampling run. The calibration run seemed to run OK. At time of hang, VTune client is 'Not responding' and vtserver can't be sigint'd or sigkill'd. I can kill it from root. There's oodles of disk space, RAM and a 100M LAN. Regards, Colin
When calibration is on, your experiment actually runs twice, once to "calibrate," and once to collect data. Just to experiment, consider:
1) Turning calibration off then launching activity
2) deliberately restricting the run of the profiling activity, to say, 20 seconds. Then run it. How long then does it physically take for data to pop up?
3) Unusually high network traffic can slow things down, there's a LOT of data transferring over the wire after vtserver has finished its work. Repeat experiments at different times of day, morning, noon, evening, see if results are consistent?
Thanks for your help, but sorry to say it isn't anything you suggested.
I use the Wizard to create a new sampling project, select the remote server, enter the application name and click next. It takes about 30sec before the next screen appears. I get similar delays as I move to each screen in the Wizard. When I run the activity from the client there is +30 second delay before vtserver reports it is starting the process and when the process (its a console app and writes start/finish times to stderr) ends there is a similar delay before the client recognises it and starts to transfer the sample data. The sample transfers run quite fast and show high network transfer rates. It looks to me like a connection timeout between the client and vtserver but I can't understand why this would happen. In client I configure server by IP address not name.
I'm actually getting results but it would be nice to solve this problem