USS Clueless - An unbelievable kludge

Stardate 20020814.1823

(On Screen): My favorite vocal Mac advocate, Brian Tiemann, talks about the new systems announced by Apple. The long awaited speed bump has finally arrived, but as Brian says, the new machines are a decidedly mixed bag, and he's confused by what's happened. I think I can help him out. On his list of bad things is this one:

167 MHz system bus (what is up with that? Dell's top-end systems are up to 533)

It explains everything, actually. Let's do some math, shall we?

1 GHz / 133 MHz FSB == 7.5:1 clock multiplier
1.25 GHz / 167 MHz FSB == 7.5:1 clock multiplier

The "new" 1.25 GHz G4's aren't new. They're 1 GHz G4's which are being overclocked 25%. Apple is selecting G4's which can run that speed, and they've designed their new top-end system around them.

But though they determined that enough 7.5x G4's could run at 167 MHz, SDRAM cannot. Its base speed is 133 MHz but it's possible to buy selected SDRAM which will run at 150 MHz. But there isn't enough of it that will run at 167 MHz, so for these new machines Apple had to switch to a faster RAM technology, DDR-SDRAM.

That, in turn, meant that they had to design a new mobo controller, which had a DDR interface on its backside, because otherwise the new 1.25 GHz machines would be even more starved for RAM bandwidth than the existing ones are. But even if the RAM is capable of doing 266 MHz, the real bandwidth into the CPUs is bottlenecked on the 167 MHz FSB. (I have a sneaking suspicion that they're underclocking the RAM to synchronize it with the FSB.)

A new mobo controller chip isn't something that you conjure up in two weeks, and the fact that they actually designed an entirely new piece of silicon to support this kludge means that Apple has been making its engineering plans based on the assumption of another Moto speed stall.

In other words, Moto has stopped developing the G4, and Apple has known it for a long time, long enough to develop this new bus controller chip. If Moto were continuing to work on faster G4's and expected to release new ones with higher multipliers, Apple wouldn't have bothered doing something like this.

This was a desperation move, a way of wringing one final speed bump out of a terminated processor design.

And it's probably the last speed bump, too. It's hard to believe that they could wring even more speed out of a trick like this, so until such time as they come up with something else entirely, this is the end. Either Moto against expectation actually delivers the G5 (and rumor is that they cancelled it a year ago when they started making major cuts in their semiconductor group) or IBM comes through with its desktop Power4 and Apple releases an entirely new class of machines based on that, which is what I now expect.

But there's absolutely no way to know when IBM will be ready with the new Power4 chip in quantity; it could happen in October or it could be a year from now. Until it happens, Apple is stuck with what they've just released, to compete against PCs which are expected to use processors from AMD and Intel which will continue to increase in speed and drop in price. But we can make a pretty shrewd guess that Apple expects the Power4 to be later rather than sooner. If they expected Power4's in two months they would not have designed these systems. They wouldn't have used such a large part of their engineering on a stopgap if something much better was coming shortly thereafter.

Brian also asks why these machines are so expensive. It's because the number of 7.5x G4's which can actually run this fast is limited, and they need to price them high so that they don't sell very many and outstrip their supply of parts. The 1.0 GHz G4's have to remain attractive and sell in quantity, because they have to move a lot of slower 7.5x G4's for every fast one they ship.

To an even greater extent than the latest iMac, these systems indicate the truly dire engineering dilemma facing Apple because of Moto's failure to stay competitive with AMD and Intel. The LCD iMac showed that the problem was serious, because Apple was trying to produce a new sexy machine without a dramatic improvement in the computing hardware inside it, and the only way to do that was with extravagant packaging.

These new overclocked machines show that the problem is terminal. Apple would not have done anything like this unless they had no other choice. (And the characterization of this as "rock solid engineering" is more than a bit laughable.)

Update: And you know how I said that Apple keeps announcing solutions to problems that it doesn't have until they're solved? They've done it again. The new machine designs "eliminate data bottlenecks", but the old machines didn't have any data bottleneck, or at least they never admitted that they did. In fact, there's good reason to believe that these new machines are still going to be bottlenecked, because the processors share a single bus to the controller. PC duallies have separate buses for each CPU. Irrespective of how much L3 cache is connected to the controller, or how fast the RAM behind it runs, the data is choked on that FSB which runs 133 MHz for the 1.0GHz duallies, and 167 MHz for the 1.25 GHz duallies. The new "faster" bus merely keeps pace with the degree of choking; it doesn't relieve it. To relieve it, the G4 bus interface would have had to be redesigned, but that would have required Moto to roll the chip design, and it's clear they are not going to be doing that. If Apple was expecting a new G4 with a new FSB architecture, they would never have created these monsters.

Update 20020816: I made a mistake on this, though it's not a serious one. The new dual 1.25 GHz system is indeed a 7.5:1 G4 running 25% overclocked. The new dual 1.0 GHz system is not the same chip running slower, rather they're using the 6:1 G4 (nominally "800 MHz") and also overclocking it 25%, so that it uses the same mobo with the same 167 MHz FSB. There are probably two reasons for that. The only difference between these two new systems is which version of the CPU Apple plugs in, which will raise volume on the rest of the system and help their manufacturing a bit. And it means they can try to claim that the new dual 1GHz is faster than the previous dual 1 GHz because of a 25% increase in FSB bandwidth to somewhat relieve the memory bottleneck. First evidence is that the new systems are still badly bottlenecked and that for a moderately broad set of benchmarks run by a Mac fan (not by Apple itself) that the performance increase with these new machines is negligible when compared to the previous 1 GHz system using SDRAM and a 133 MHz FSB. In fact, in some cases the new systems are actually slower.

The likely reason is that though DDR-SDRAM, which is used by the new systems, has substantially greater throughput than SDRAM as used in the older ones, the new systems are largely wasting it because of the FSB bottleneck. On the other hand, DDR-SDRAM has more latency than SDRAM, which they're eating in full. When the Athlon went to DDR they ran it at full speed so they more than made up in bandwidth what they lost in latency. But Apple is incapable of taking advantage of most of the DDR bandwidth, but is fully affected negatively by the latency. It appears to be nearly a wash, and the systems are still badly bottlenecked on the pipe between the CPUs and mobo control chip, a problem which L3 cache ultimately can't solve. The bottleneck is the FSB itself, in essence the bus pins on the G4's, and L3 cache is on the wrong side of it and just as constrained as anything else is. So while memory accesses to the L3 cache are faster than to main memory, they can't be any faster than the FSB. A complete redesign of the FSB is what's needed, but that would require Moto to respin the chip, which they have not done, and which I now think they will never do. And since Apple can't unilaterally alter the FSB interface on the CPU's, there is ultimately nothing they can do to really relieve this problem.

Except to claim that they have solved the problem, even though they don't have the ability to do so, and to continue to lie about how fast the new systems are by using rigged benchmarks.

Update 20020819: I have been informed that the L3 cache in Apple duallies is per-CPU, and access to the L3 cache doesn't rely on the shared FSB to the mobo controller. That is definitely a good thing because it will somewhat alleviate the bottleneck of the mobo bus. But this also explains some of