
I use and try out a lot of software, just by doing a lot of different kinds of work. I edit audio, I draw vector graphics, I edit photos and textures, I edit 3D models, I work on presentations, spreadsheets and written documents. Then, I do work as a software programmer, with an IDE, text editor, debugger, and the actual program I'm debugging. Somtimes, there are two or three copies of the IDE and debugging programs, for various reasons. I even surf the web and read e-mail on occasion, not to mention manage tasks and my calendar. The list goes on, and in general, I'll have 10-15 separate open application buttons on my Windows task list. And there's AOL instant messenger, and safe hardware removal, and virus scans, and MagicISO mounter and a few other assorted tray icons.
Every so often, a download of some sort needs to restart the computer. Typically, this is when it wants to update some DLL that is already being used by an open application. Less often, it's when I update a driver of some sort (graphics drivers are a common culprit here). This means I have to quit all the applications, and restart the computer, and then wait for the computer to come to its senses, after which I have to re-start all the applications again. This rant will be on a number of things I think the OS and applications do wrong, that make this be (literally) a 20-minute ordeal. Think about that: 20 minutes is over 4% of a working day! Just to restart the computer!
I often meet application programmers who believe that just memory mapping a file, and accessing it "directly" through a pointer, will automatically make their application "high performance." Nothing could be further from the truth. In fact, for most applications, memory mapping is the wrong thing to do.
The reason memory mapping exists is that the operating system kernel can merge the file buffer and virtual memory page cache. Computer RAM is organized into pages, typically 4 kB in size each, and any operation on physical memory will be done in chunks of 4 kB or more. The theory is that, if you use memory mapped files, there will be more memory available than if you allocate a chunk of memory and then read file data into that memory. The reason is that when you read the file into memory, one copy may be made for the file buffer, and a second copy is then made in your process space for you to do your work on.
However, in real life, reads of a file of any size should not actually be cached. This may sound contrary to naive programmers -- "but if I do I/O, surely using the memory cache is faster than using the disk?" That isn't true unless you, say, want to read 200 bytes, and then want to read another 50 bytes, and then 100 bytes after that. If you read in chunks of 4 kB or more at a time, the file cache isn't very useful to you. Windows has a flag you can set to make file reads not be cached, and most applications should set this flag and then read all of the file in one single gulp, after which they can parse the file as-is in RAM. This will not only avoid the unnecessary creation of file cache pages, but also give the best possible throughput, because the entire file is read in one gulp from the hard disk. (In the BeOS, we made it so that any aligned read of 64 kB or more was automatically un-cached, and this made for a nice system speed-up)
Hard disks hate small transfers. Each access has to wait for the disk head to move to the right track, and then the platter to spin so the right sector is under the head. (More likely, these days, the drive will read the entire track as it comes in into on-drive cache, and deliver the requested sectors as they become available). Thus, reading small files in chunks of 4 kB, or even slightly larger files in chunks of 64 kB, isn't very efficient, because you keep getting a lot of seek activity, especially if other I/O interleaves with your file I/O and moves the head elsewhere.
So, if you set the right flag on your file open call, and read the entire file in one gulp (a good idea for most files under a hundred MB, say), then you will get ideal performance, and won't pollute the VM/file page cache.
Meanwhile, if you memory map the file, and start accessing it using a pointer, the OS will only page a little bit of the file in at a time, as you move the pointer through the file. Each of those accesses means another head-seek and platter-spin-wait. While each of those operations is only a few milliseconds (say, on average, 5-10 milliseconds), the time quickly piles up. The effective throughput of your hard disk will be terrible, because it will spend all its time seeking to itty-bitty chunks of data in your file, instead of streaming it all in one big gulp. It may be that the OS is smart enough to page in 64 kB at a time instead of 4 kB at a time when you access a new page with your pointer, but that's still a far cry away from the ideal throughput of a single read operation.
Also, a pointer access to a non-resident page will completely stall the application thread until the page becomes available. In fact, the application is suspended waiting to resolve a page fault. That can lead to significantly bad user responsiveness. This explicit synchronous behavior is intrinsic to the use of memory mapped files, and is also bad for the user experience.
Meanwhile, if you read all your data in a single gulp, you can do it in the background, either using a different thread, or using an asynchronous I/O request. When the read is complete, you can notify the main thread that it has data to use. Meanwhile, the main thread can go on answering to mouse moves, keypresses, Windows messages and whatever else the user sees fit to throw at the application.
So, correct usage of the hard disk and file system, and the avoidance of memory mapped files in all but the most specialized cases (no, your case is likely not one of those specialised cases) should lead to much better application and overall system performance.
Memory leaks are bad, because they are often hard to detect, and your marketing department may let you ship with a bug that only hits your most dedicated users -- those who expect your application to keep running 24/7. To combat memory leaks, programmers have come up with a number of solutions. One of those solutions is the RAII pattern (Resource Aquisition Is Initialization), which means that memory is allocated in a class constructor, and deallocated in a class destructor. As long as you arrange for your object to have symmetric allocation/deallocation, you won't have a leak!
Programmers will typically make a global Application object in their program, and have that manage Document and Window objects, etc. Each of those objects is managed through RAII or a related technique known as smart pointers. Thus, when the application process goes away, the destructor for the global Application object will be called, it will in turn cause the Documents and Windows to be deallocated, etc. All the memory from the process will be returned to the memory manager, before the process goes away entirely. It's neat, tidy, and symmetrical.
It's also totally useless.
If the process is going away, then the kernel will clean up all the memory pages of the process. It will close any open file handles (and flush any native file handles, but not any higher-level cached constructs such as FILE* streams). That's the whole job of the kernel -- it manages resources, and reclaims them when the application doesn't need them. For the application to "give back" resources when the process is about to go away is just busy-work with no real benefit to the system. It does, however, slow down application termination somewhat.
Now, enter the problem of virtual memory, and the file buffer pages created by sloppy usage of file APIs. It's quite possible that the system has decided that those programs that I wasn't currently using before shutting down were lower priority, and paged the memory of those programs out to disk, to make space for more file buffers (or other programs I actually am using). If the program would just call ExitProcess() without calling any global/static destructors in itself or any loaded DLLs, then terminating a paged-out application is very quick. However, this is not what programs do. While the main Windows message loop may still be in resident memory, all the window, document and other data that hadn't been touched for a while might now be paged out. Because the application will walk all its data structures to "clean up" and "return" all that memory as it is being terminated, it has to page all that memory back in. And, what's worse: virtual memory works just like memory mapped files! Small pieces are paged in at a time, and it's done in a synchronous fashion that mean you can't actually interact with the thread/program doing the paging.
Thus, you have the frustrating, pointless problem of having to wait longer for a program to shut down, than to start up. All the while, the hard disk is grinding. In the ideal world, the application would have a simple table of the windows, documents and dirty states in a single, linear array. Upon quit, this array (which fits in a single page) could be paged in (if it's not already in memory), and if nothing needs saving, the application can just call ExitProcess(). As long as no global/static objects are used, then that's a very quick process, and because the kernel cleans up just as well from one ExitProcess() as from another, it will be a lot less work for the system.
But how to get this information out to all those application programmers out there? I don't know. These things are certainly not taught in most trade schools or 4-year computer science or software engineering programs.
Let me know if you have any bright ideas!
Comments
Sure, but...
Programmer productivity and program stability is typically far more important than how fast your program exits. It is not a problem that will be solved by people who write the programs you're using.
But there is a way to solve many of these problems - by providing the right abstractions. If the language or library used in production makes it easy to "do the fast thing", the fast thing will be done, for productivity reasons. If it doesn't make the program unstable/unsafe/incorrect, it will continue to be used.
One parallel that can be drawn is to SQL injections. These are probably present in the majority of small sites that use a SQL database. The reason for this, I think, is that the libraries encourage using dynamically generated SQL. So people learn to do it this way, and they tell everybody else that this is the way to do it in tutorials, etc.
As a result, you can't make dynamically generated SQL "harder" to do, since most newcomers will only be familiar with this way to do it, and be alienated by an API that doesn't provide it (or makes it harder).
A good API is easy to use correctly and hard to use incorrectly.
The fact that libraries make
The fact that libraries make it easier or harder to write well-performing applications does not excuse a programmer from the requirement that his program not unnecessarily waste the resources of the machine or, worse, the user.
If I spend 15 minutes a day waiting for my laptop to resume from hibernation in the morning, mostly because programs use virtual memory unwisely (I have enough RAM for all of them), that actually does affect me, the user, quite poorly. 15 minutes is more than 3% of a working day. And imagine the horror if I have to reboot once in a day -- that adds another 15-20 minutes of my time, wasted! Now I'm up to 7% overhead. Where does it stop? When should lazy programmers (who are few) stop wasting the time of the users (who are many)?
It's all because of the race to the bottom. All you have to do is ship something that SEEMS to work, as soon as possible, and as cheaply as possible, and you will get paid. If someone tries to take the high road, and actually test and tune their system, the cost structure of the business will quickly drive them out of business. Mostly because bean counters see the higher cost as unnecessary, because worker productivity is not visible in a balance sheet column, whereas direct expenditures are. Same reason why creative people are forced to work in cubes, even though offices would pay their cost back several times over according to well-published research.
Now you, as a programmer, can do your bit: do not allow a piece of code to be shipped if you know that it contains performance sins.
I'm going to blame the tool anyway
You can blame the programmer, or the management, of consumer software companies, but it won't change anything. In order to compete you have to prioritize, and functionality often comes before speed.
But the point I really want to make is that you can try to solve the problem by changing the way every company and programmer in the world works, or by how every API works. There are far fewer APIs to fix than people, and APIs don't have habits, superstitions and objections.