Part 4 of the Story of a 6-CPU Server from 1997. In this final section we will first explore (briefly) the theory of running a 6-CPU SMP system (with processors designed for 2 or 4 way) and then move to benchmark the system and overclock it.
For the background of the ALR 6×6 and Pentium Pro processors that form the basis of this project please see:
Previous Parts of the Series
Part 1: Mini-Mainframe at Home – Introduction
Part 2: Mini-Mainframe at Home: Installing a Modern OS
Part 3: Mini-Mainframe at Home: The ALR 6×6 Hardware and BIOS
Features of the architecture and operation of the six CPU
So, as the server was originally shipped with six Pentium Pro “Black” processors, I decided to add six Pentium Pro “Gold” processors with a frequency of 200 MHz and a 256 KB L2 cache for contrast. Such a volume is just four times smaller, and at the same time it will be interesting to check the effect of the cache in such a volume: six megabytes versus one and a half. But before starting the tests, I will focus on the principle of interaction of six processors in this system. To overcome the limitations of Intel on building a system with more than four processors, ALR engineers with the support of Unisys suggested using an inter-processor interaction scheme using arbitration:
The theory behind this architecture is as simple as it is powerful. Inside new six-way systems are two Tri-6 CPU cards, A and B (Figure 1). Each of these cards is an independent, three processor ready SMP bus, complete with all logic Active CPR processor protection, and auto-recovery technology built on each CPU card. These two Tri-6 CPU cards are then plugged into a 64-bit parity SMP bus. This design keeps the processors closely coupled, just like a parallel bus architecture, without the related heat and design problems. A separate four-way interleaved memory card is attached to the bus, supporting a sustained data bandwidth of 533-MB per second. This bandwidth is ample to support two full PCI buses as well as an EISA bus bridge.
To overcome the logical limitations of the Pentium Pro chip, six-way servers use a unique expanded bus arbitration configuration referred to as Dynamic Orchestration. The best way to understand how this system works is to compare it to a typical four-way SMP architecture. On a four-way system, bus arbitration is implemented in a “round robin” fashion. That is, each processor has equal rights to the bus, and access is handled in an orderly fashion. For example, if all processors needed access to the bus, CPU 0 would gain access first, followed by CPU 1, CPU 2, CPU 3, and then back to CPU 0. If CPU 2 was executing a cycle, and both CPU 3 and CPU 1 requested use of the bus, control would first pass to CPU 3, before cycling back to CPU 1.
For purposes of this four-way arbitration, processors are identified using the two-bit ID code. The six-way solution borrows this convention, with some important modifications. Within each Tri6 CPU card, individual processors are identified using the two-bit ID code. This yields four possible combinations, although only ID codes 0 through 2 are needed. A chip on each Tri6 card handles the arbitration, following the “round robin” scheme found in a four-way system. In this case, however, the fourth processor has been replaced by a sort of “phantom” processor that actually represents the other Tri6 card:
The figure above shows the six-processor scheme of the server board ALR Revolution 6×6 and its clones. Thanks to this approach, the appearance of 8, 10 and more processor systems has become possible.
Building a chessboard from various models of Pentium Pro, I thought that I could not find a larger processor. Even the 32-core AMD Threadripper 2990WX next to the Intel Pentium Pro does not seem so big.
However, The CPU Shack sent me this photo. On the left is the engineering version of the Xeon Gold 6142 on the LGA3647 socket, on the right another engineering version, but already the Intel Xeon’a Phi in the same LGA3647 version. As you can see, the story is back to square one and perhaps all subsequent processors will not be placed on the open palm of the hand. Although the processors in the performance of LGA2066 is still far from Intel Pentium Pro.
Overclocking 6 cores together and separately
Read More »