iMac energy efficiency

I am very pleased with Apple's obvioius attention to making these machines utilize their power effectively.
hellman wrote on :

I recently bought a 20" 2 GHz G5 iMac and compared its power consumption (using a Brand power meter) to my previous 17" 800 MHz G4 iMac. For those interested, the results are below, but bottom line I am very pleased with Apple's obvioius attention to making these machines utilize their power effectively.

The 17" iMac 800 used 1.5 watts OFF, 3.5 watts SLEEP, 32 watts screen dimmed or black, 47-75 watts WORKING, and about 50 wts average working.

The 20" iMac used 0.5 wts OFF, 2.8 wts SLEEP, 90-120 watts booting up, 95 watts normal processing (e.g., MS Word) with screen at max brightness (82 watts at min brightness), 50 watts screen sleeping/CPU awake, and 133 watts when taxing the CPU.

It's interesting that I had to compute 2^1000000 (two to the millionth power) in bc in the Terminal application to tax the CPU enough to get a useful reading. I started at 2^10000 and moved to 2^100000, but they were computed so fast that the power meter didn't register adequately. Two to the millionth power took 17 seconds. Not bad.

Niels Jørgen Kruse replied on :

hellman@redacted.invalid wrote:

I recently bought a 20" 2 GHz G5 iMac and compared its power consumption (using a Brand power meter) to my previous 17" 800 MHz G4 iMac. For those interested, the results are below, but bottom line I am very pleased with Apple's obvioius attention to making these machines utilize their power effectively.

The 17" iMac 800 used 1.5 watts OFF, 3.5 watts SLEEP, 32 watts screen dimmed or black, 47-75 watts WORKING, and about 50 wts average working.

The 20" iMac used 0.5 wts OFF, 2.8 wts SLEEP, 90-120 watts booting up, 95 watts normal processing (e.g., MS Word) with screen at max brightness (82 watts at min brightness), 50 watts screen sleeping/CPU awake, and 133 watts when taxing the CPU.

It's interesting that I had to compute 2^1000000 (two to the millionth power) in bc in the Terminal application to tax the CPU enough to get a useful reading. I started at 2^10000 and moved to 2^100000, but they were computed so fast that the power meter didn't register adequately. Two to the millionth power took 17 seconds. Not bad.

Thanks for these numbers.

I tried 2^1000000 in bc on my 10.3.9 iMac G5 1.8GHz and it took about 3:20, so Tiger must come with a better bc. The power consumption of the CPU peaked at 11.58V * 2.98A = 34.5W, as measured on the 12V side of the CPU voltage regulator.

To really spike power consumption on a G5, nothing beats the dnetc client running RC5-72, provided that CPU performance is set at maximum rather than automatic, as dnetc runs at a very low priority. With dnetc running, I measured power consumption as high as 11.54V * 3.41A = 39.4W at a CPU temperature of 76.9C. (Power consumption increase with temperature.)

To read the sensors of the iMac, Hardware Monitor http://www.bresink.com/osx/HardwareMonitor.html can be used.

Other fun things to play with:

Running a simple program like

#include <stdio.h> #include <sys/time.h>

static struct timeval t0,t; static struct timezone tz; static double range[1024];

int main() { double last = 0, now; int i;

    gettimeofday(&t0,&tz);
    for (i = 0; i< 1000000; i++) {
            gettimeofday(&t,&tz);
            now = (t.tv_sec - t0.tv_sec) 
                 + 1e-06 * (t.tv_usec - t0.tv_usec);
            printf ("now = %.6g, delta = %.3g.\n",now,now-last);
            last = now;
    }
    return 0;
}

it is possible to see when a switch from half CPU clock to full CPU clock speed is made (excerpt):

now = 0.721408, delta = 5e-06. now = 0.721413, delta = 5e-06. now = 0.721418, delta = 5e-06. now = 0.721423, delta = 5e-06. now = 0.721428, delta = 5e-06. now = 0.721713, delta = 0.000285. now = 0.721721, delta = 8e-06. now = 0.721723, delta = 2e-06. now = 0.721726, delta = 3e-06. now = 0.721728, delta = 2e-06. now = 0.721731, delta = 3e-06.

hellman replied on :

Niels,

Thanks for the extra info and thoughts. I redid the two to the millionth power and it took 17 seconds again. To play safe here are three lines of the Terminal session showing the request and the first two lines of output. If Panther really takes over 10 times as long then there must be a difference in the bc programs that come with it and Tiger, but that was a surprise. One other possibility is memory. I have 1.5 GB of RAM. If you have even 1 GB that could account for the difference since the computation outputs over 300,000 digits.

2^1000000 99006562292958982506979236163019032507336242417875673328663961145317
09483309486103054614551234648391482431507034583723883510658989416314\

Martin

Niels Jørgen Kruse replied on :

hellman@redacted.invalid wrote:

Niels,

Thanks for the extra info and thoughts. I redid the two to the millionth power and it took 17 seconds again. To play safe here are three lines of the Terminal session showing the request and the first two lines of output. If Panther really takes over 10 times as long then there must be a difference in the bc programs that come with it and Tiger, but that was a surprise. One other possibility is memory. I have 1.5 GB of RAM. If you have even 1 GB that could account for the difference since the computation outputs over 300,000 digits.

2^1000000 99006562292958982506979236163019032507336242417875673328663961145317
09483309486103054614551234648391482431507034583723883510658989416314\

Martin

I have 2 GB RAM here. The version of bc is 1.05. Monitoring memory use, bc never used more than 1 MB of RAM.

For comparison:

Mnementh:~ njk$ time bc -q <<%

2^1000000 quit % 99006562292958982506979236163019032507336242417875673328663961145317
09483309486103054614551234648391482431507034583723883510658989416314

57100428209192420228872667749052301871236104888403162747109376

real 3m9.117s user 2m46.320s sys 0m0.850s

Paul Russell replied on :

Niels J¯rgen Kruse wrote:

To really spike power consumption on a G5, nothing beats the dnetc client running RC5-72, provided that CPU performance is set at maximum rather than automatic, as dnetc runs at a very low priority. With dnetc running, I measured power consumption as high as 11.54V * 3.41A = 39.4W at a CPU temperature of 76.9C. (Power consumption increase with temperature.)

If you really want to crank up the power consumption then you need to write a chunk of code which exercises both the AltiVec unit and the two FPU's simulatenously. (Most code only ever uses one or the other at any given time.) Unfortunately as the CPU core temperature rises instruction throttling kicks in but you should be able to max out the power consumption for a while.

Paul

Niels Jørgen Kruse replied on :

Paul Russell prussell@redacted.invalid wrote:

Niels J¯rgen Kruse wrote:

To really spike power consumption on a G5, nothing beats the dnetc client running RC5-72, provided that CPU performance is set at maximum rather than automatic, as dnetc runs at a very low priority. With dnetc running, I measured power consumption as high as 11.54V * 3.41A = 39.4W at a CPU temperature of 76.9C. (Power consumption increase with temperature.)

If you really want to crank up the power consumption then you need to write a chunk of code which exercises both the AltiVec unit and the two FPU's simulatenously. (Most code only ever uses one or the other at any given time.) Unfortunately as the CPU core temperature rises instruction throttling kicks in but you should be able to max out the power consumption for a while.

As it happens, I helped tweak a loop that did just that. It sustains 2 DP FMADDs per clock along with 1 SP SIMD FMADD and 71/72 SIMD permutes (one permute had to be sacrificed for the branch back to loop entry, otherwise there would be a dispatch stall). There is no instruction throttling with this code, that is a figment of your imagination.

Running this loop, I measured a maximum of 11.54V * 38.8A = 38.8W at a CPU temperature of 71.2C.

Paul Russell replied on :

Niels J¯rgen Kruse wrote:

As it happens, I helped tweak a loop that did just that. It sustains 2 DP FMADDs per clock along with 1 SP SIMD FMADD and 71/72 SIMD permutes (one permute had to be sacrificed for the branch back to loop entry, otherwise there would be a dispatch stall). There is no instruction throttling with this code, that is a figment of your imagination.

I remember that there was instruction throttling on the G4 when the core temperature got too high - looking at the 970 manual though it seems that the G5 does not have this ?

Running this loop, I measured a maximum of 11.54V * 38.8A = 38.8W at a CPU temperature of 71.2C.

Is this just running with code/data in L1 ? If so then I expect you could get the power consumption up further by making sure you hit L2 and DRAM too (without slowing down the processing though).

Paul

Niels Jørgen Kruse replied on :

Paul Russell prussell@redacted.invalid wrote:

Niels J¯rgen Kruse wrote:

As it happens, I helped tweak a loop that did just that. It sustains 2 DP FMADDs per clock along with 1 SP SIMD FMADD and 71/72 SIMD permutes (one permute had to be sacrificed for the branch back to loop entry, otherwise there would be a dispatch stall). There is no instruction throttling with this code, that is a figment of your imagination.

I remember that there was instruction throttling on the G4 when the core temperature got too high - looking at the 970 manual though it seems that the G5 does not have this ?

I don't see it either. Dropping to half clock frequency serves the same purpose though. When I wrote my response, I didn't distinguish between flavours of thermal throttling, it was more of a kneejerk reaction. Thermal throttling is such an easy cop-out when a loop doesn't run as fast as it should.

Running this loop, I measured a maximum of 11.54V * 38.8A = 38.8W at a ^^^^^3.36A CPU temperature of 71.2C.

Is this just running with code/data in L1 ? If so then I expect you could get the power consumption up further by making sure you hit L2 and DRAM too (without slowing down the processing though).

The loop doesn't touch memory. It was written by a programmer who believed that FP arithmetic was the way to hot code. I only tweaked the instruction order within each dispatch group. He refused to believe that anything could be wrong with his loop, so thermal throttling was the only possible explanation when his loop only got 80% of the iterations per second that was expected. Getting the loop to 100% was the only way to make him surrender.

I agree that exercising the memory subsystem (like dnetc does) is necessary to get the highest power consumption.