home | about | pictures | reference | trade | links |
AMD SHOWS OFF HOME-GROWN PENTIUM KILLER - WITH RISC CORE (October 21st 1994) Advanced Micro Devices unveiled the most impressive threat yet to Intel's Pentium this week, in the shape of its K5 processor. The company claims that the new chip will be around 30% faster than Intel's chip at the same clock speed; however, confirmation will be some time coming; Originally the company had hoped to say that the chip had reached 'tape-out stage' in September. However at the Microprocessor Forum, AMD Director Mike Johnson amended this to "the next couple of days".
Of all the x86 processors announced, AMD's is the most architecturally interesting, diverging radically from the approach taken by Pentium's designers. At its heart, the K5 isn't an Intel-compatible chip at all - it's a RISC processor somewhat akin to AMD's superscalar AM29000 chips. However the RISC core and its instruction set (called ROPs - pronounced ar-ops)is hidden from the end user. Instead, x86 instructions are converted into these RISC operations which are then handled by six parallel execution units: one floating point unit, two integer units, two load/store units and a branch unit. It is an approach similar to NexGen's Nx586 (PowerPC News XXX story 2XXX), however AMD has included some extra technology at the beginning of the translation process, which the company claims will allow up to four x86 instructions to be despatched concurrently. This is pushing it a bit, since only the very simplest x86 instructions will map directly onto an ROP. It's still impressive if it works, though. "Simplest" in this case, equates to instructions such as register-to-register adds. Most operations take two or three.
The x86 instruction set, like the whole CISC tribe, presents a couple of problems for the designer bent on producing an x86-RISC hybrid. The first big one is that x86 instructions are of a variable length meaning that the processor has to search the instruction byte stream as it comes in from memory or cache, looking for the start of each instruction. AMD uses an innovative approach to overcome this - the x86 instructions are partially decoded as they are pulled into cache. It might be thought that this process would slug the speed at which the processor pulls instructions from memory - but AMD points out that memory accesses are comparatively sluggish anyway, and says that it can hide the time needed to pre-decode within this bigger lag. As instructions are pulled from the cache the processor translates them into the appropriate ROPs which are placed four per cycle-time into a byte-queue ready for dispatch. The queue will always attempt to dispatch four ROPs irrespective of the instruction boundaries of the original x86 instruction. This leads to the curious situation where the processor can actually be executing, say 1-and-a-bit Intel instructions.
The RISC core offers full out-of-order issue and completion, and so it appears that AMD has more or less overcome many of the nasty issues which have previously been thought to hamper parallel execution of variable-length Intel instructions. It looks so simple that the average cynical journalist might start looking for the smoke and mirror. Still, no-one stood on their chairs and cried "Foul" at, or after the Forum presentation - and they are the experts.
What does remain to be seen is how the processor behaves with real-life code; if it turns out that users' applications are rich in x86 instructions that map onto 3 ROPs - well, AMD will kiss its claimed 30% advantage over Pentium good-bye. Not surprisingly AMD has been doing its own simulation, and reports that typical 16-bit x86 applications have an instruction mix that works out at 1.9 ROPs per instruction. But the even better news, as far as AMD is concerned, is its belief that 32bit code brings this down to 1.3 ROPs per instruction. In other words, as 32bit applications and operating systems become more popular, the K5 should begin to do better in Intel benchmarks.
On paper it looks good, although two key things have yet to be seen. Number 1 is the Silicon and number 2 is the price. Could it be that Intel will actually lose the performance lead in the Intel compatible market? That certainly wouldn't be a pleasant prospect for the world's number one on the desktop. However the company is promising systems based on the P6 by the end of next year - and that is rumoured to have a RISC core. What's the betting that internally, Intel's very own Pentium killer turns out looking rather like AMD's and Nexgen's?