USS Clueless - Shared bus

Stardate 20020821.0911

(Captain's log): It's been amazing the last week to see all the links to me from various Mac-oriented sites because of my post last week about the newest PowerMacs. While it's little surprise that some of those have called me an idiot, a surprising number of the people on various discussion systems have been taking my idea seriously.

There was a burst of activity on that post that died down in a couple of days, which is the usual pattern when any of my posts gets a lot of link attention. Then Monday Macintouch linked to it, and my traffic went through the roof, as well as a whole new set of other sites which then linked to me because they'd seen it on Macintouch. It generated as much refer traffic as a link from Reynolds does. Today, Macintouch put my link on the front page again, along with a response from some anonymous person who wrote:

Mr. Den Beste is just that, clueless. Apple is using no "overclocked" chips. I can state unequivocally that the 1.2ghz chips are a new rev and are not overclocked! [...] When the new systems (1.2Ghz) are available his claims will easily be disproven.

It's interesting that he provides no evidence at all. Either he is pretending to be an insider, or else he has manifest faith in the infallibility of Apple and Motorola. Good luck to him.

Starting yesterday I began to receive outraged letters from Mac fans who claimed that I owed them and every other Mac user an apology and a retraction, apparently for no reason other than because I actually had the gall to say something negative. (Not that this is the first time a Mac user has demanded an apology from me.) Sorry, guys; I wrote what I thought, and I haven't changed my mind. If you want to quibble about what the word "overclock" means, be my guest, but that doesn't change the substance of my analysis, which is that these CPUs are using the same clock multipliers as the old ones, and that the purported increase in speed is entirely due to increasing the base clock rate..

In the mean time, testing and analysis is beginning to reveal the fact that these new Macs are horribly bottlenecked on the front side bus (FSB) which connects the two CPUs to the mobo bus controller. The older Macs used an FSB speed of 133 MHz, while the new ones increase that to 167 MHz. That increases the theoretical maximum bandwidth of the FSB from 1 gigabyte per second to 1.33 gigabytes per second. The comparable speed for the latest P4 systems is four times that, at 5.3 gigabytes per second.

Preliminary tests have shown that the new machines are essentially the same speed as older ones which used the 133 MHz bus, and at this point it seems that the most likely explanation is that what Apple gained by increasing the FSB bandwidth, they lost by cutting the L3 cache from 2MB to 1MB (mostly, I suspect, to get the cost down because cache RAM is spendy). The result is more or less a wash, overall. (The new machines are going to be somewhat faster at some things and slower at others).

The FSB architecture can't be changed without a chip redesign, and Apple has probably now done all it can in terms of manipulating the raw clock speed to eke out a bit more bandwidth from it. But the evidence now is that the effect of this new design is that the main thing the new faster CPUs do is to spend more time waiting for the bus. They seem to do little more computing.

When I first considered the new systems, the one thing that leaped out at me was the fact that the two CPUs share a single FSB to the mobo controller. If they're so damned choked on FSB bandwidth, why not double it? This morning I realized the answer: they can't.

There are two major engineering issues involved in making symmetrical multiprocessing work. The first is software, designing the OS scheduler to distribute the jobs between the two CPUs without designating one the system master that tells the other what to do. (That's what the "symmetric" part means.) The other problem is hardware, and it's a bitch.

When either processor writes to memory, that potentially makes the other processor's cache obsolete if it happens to be holding that memory location. So every time each processor writes to any memory location, the other processor has to know so that it can update its on-chip cache if necessary. (I suspect what they do is just cease to mark that address as being cached, so that the next access to it goes to main memory instead to retrieve the new value. Actually updating the cache value would be much too difficult.)

On an SMP system with a shared FSB, each processor watches the bus while the other is using it, and grabs the address from every memory write so that it can prevent anachronisms in its own cache. If Apple had designed its mobo controller to give each processor its own FSB, the two CPUs would no longer have been able to keep their caches synchronized, and the system would fail. (Cache anachronisms would be fatal at the software level; I doubt that the OS would even boot.)

The FSB would have to actually be designed in such a way that the mobo controller could feed each processor's memory writes out the other bus to the other CPU so it could see it happen. That must be what the Athlon duallies do, because they do indeed each have a separate bus to the mobo controller. But that would be a different kind o