PC Timers -- why it's so hard to keep time

jwatte's picture

On PC hardware, there are basically three timers that are generally available. Neither of them is very reliable

  1. the "tick" counter. This is a millisecond-type timer, and is read by calls such as timeGetTime() or GetTickCount(). It is driven by the age-old ISA interrupt controller (which by now is probably virtualized into some do-everything-legacy chip three bridges away from the CPU on the motherboard). This is sort-of millisecond precise. Reading it requires a bus transaction all the way out to the LPC ISA interface, so it's fairly slow as such things go (say, 1-2 microseconds). (boo!) If there's heavy bus traffic, such as running a game which makes good use of AGP and DMA for graphics and audio, this timer will glitch and lose a tick or two every so often; slowly falling behind. (boo!)
  2. the "performance" counter. This is a microsecond-type timer, and is read by the QueryPerformanceCounter() call. It typically advances by little over 1 million ticks per second, or little over 3 million ticks per second, depending on bus. It is derived from, I think, some PCI controller counter. Reading it requires, at least, a CPU->northbridge bus transaction, and as such is not super snappy. Call it 1 microsecond. (boo!) If there's heavy bus traffic, on many, common chip sets, even very new ones, this timer may suddenly jump between one and four seconds forward in time (boo!). I believe there is some internal race WRT this counter being 64 bits, but kept in two separate 32-bit registers, but don't hold me to that.
  3. the "time stamp" counter. This is a register internal to the CPU, which counts, using a 64 bit timer, the number of CPU clocks executed since system power-on (more or less). On a four GHz machine, it will take over a hundred and thirty years for this counter to wrap (yay!). Reading this counter stays within the CPU, and as such is really fast (call it 40 cycles) (yay!). Because it really measures CPU clock ticks on all but the most exotic architectures (read: Transmeta :-), this counter is extremely useful for determining exactly how long a piece of code takes to execute, on a micro-benchmarking scale. You still have to compensate for the fact that you may take an interrupt in the middle of measurement, so run the same piece of code 100 times and pick the BEST value -- that's how well your code can perform, which is what you want to measure unless you're trying to model starting with cold caches. (yay!)

Unfortunately, this third model, if you want to turn it into real time, needs the relation between CPU ticks and seconds. This can be had by profiling the CPU on start-up, using some of the other timers available. Such profiling always has a little bit of imprecision (boo!). What's worse, on laptops, the rate of advance on this register will fluctuate wildly, because, you guessed it, the CPU speed fluctuates wildly. (boo!)

Thus, there is no fool-proof way of measuring elapsed time on a PC in a really accurate manner. If you're trying to keep a simulation in sync over a network, say, this may be of high importance to you so you'll probably, in the end, come up with some system which uses a combination of all three. I use the cycle counter for most measurements, because it's fast and cheap, and then I continually re-calibrate by using the other two counters and having them vote. If they vote differently, I discard that sample and wait until later. This seems to work reasonably well.

On the CPU-speed-switching machines, you are mostly toast. The best you can do is selectively instantiate another class for your clock, and hope that the effect of jumping timer inaccuracy won't totally kill your application.


I wrote the following answer to a similar question on the GDAlgorithms mailing list:

We use QueryPerformanceCounter()/QueryPerformanceFrequency() to get the initial two measurements, putting a Sleep(50) in the middle, and looping back if QPC says we slept for more than 200 milliseconds. There's a watch-dog, so after 5 tries, it gives up, and accepts whatever numbers we got as true.

Also, the QPC call itself isn't allowed to take more than some number of RDTSC cycles (I think we settled on 100,000) to avoid pre-emption or interrupts during the timing from throwing us off.

As I've said before on this list, we use rdtsc during intra-frame measurements, but once a second or so, we re-calibrate the timers by calling timeGetTime() and QueryPerformanceCounter() to vote about who slipped the least, and update the baseline. We also use these measurements to update the estimate for CPU frequency. We use the highest CPU frequency ever detected, because of SpeedStep; this means that during CPU idle, time using RDTSC will advance more slowly, and then skip to catch up when we next re-calibrate the clock. For us, this is better than sometimes returning time values that are ahead, because the CPU slowed down and we don't know it yet.

No, there are no good real-time timers on current laptops. The next generation of chip sets will support the "advanced timer" feature, which will hopefully make this better -- but that was the idea with the PCI timers (QPC), too, and bugs in implementation made them much less useful than they should have been.

Thus, the code looks something like:

 void read_counters( __int64 * otsc, __int64 * opc ) {
   __int64 rdtsc_a, rdtsc_b;
   __int64 pca;
   int cnt = 0;
   rdtsc_a = rdtsc();
   QueryPerformanceCounter( &pca );
   rdtsc_b = rdtsc();
   if( rdtsc_b - rdtsc_a > READING_THRESHOLD && cnt++ < 5 ) goto again;
   *otsc = (rdtsc_a + rdtsc_b)/2;
   *opc = pca;

 __int64 calibrate_rdtsc()
   __int64 pca, pcb, pcf, tsca, tscb;
   double interval;

   QueryPerformanceFrequency( &pcf );

   for( int ix = 0; ix < 5; ++ix ) {
     read_counters( &tsca, &pca );
     Sleep( 50 );
     read_counters( &tscb, &pcb );
     interval = double(pcb-pca)/double(pcf);
     if( interval < 0.2 ) {
   return (tscb-tsca) / interval;