Archive for the 'CPU of the Day' Category

August 28th, 2019 ~ by admin

Sushi Tacos and Lasers: Marking Intel Processors

Intel ink stamp used for marking chips in the 1970’s

In 1987 Intel became the first semiconductor manufacturer to use lasers to mark all component parts, including ceramic packages (they still used ink for some but had the capability and eventually rolled out laser marking to most all of their assembly/test locations).  Conventional ink marking for ceramic packages required a post-mark ink cure time and production yields ranged from 96%-98% before rework.  That percentage may be good on a school exam, but in the production environment, having to rework 2-4% of everything off the line is unacceptable.  It costs resources, money and time that do not go to making profit.

Intel A80387-20B SX024 remarked with a laser

With lasers, however, the cure operation was not needed and yields increased to better then 99.95%.  Lasers were so consistent that marking became a zero rework process and overall productivity increased by 25%.  Throughput also increased significantly (less rework and lasers are faster) and inspection requirements dropped by 95%.  These lasers were originally developed for ceramic packages but found to work well on plastic packages as well.  They also made remarking significantly easier, old markings could be crossed out with the laser and new marking made.  No stencils, pads or masks were needed, the lasers were programmable and very fast.

Intel continues to use laser marking today (as do most manufacturers).  Intel uses laser marking systems from Rofin-Sinar (now owned by Coherent).  These lasers are typically from the PowerLine E line, which are a diode end-pumped Nd: YVO4 (Neodymium doped yttrium vanadate) diode laser.  These are basically a high ends high power version of the diode lasers used in laser pointers.  Intel went with diode lasers as they were faster, and cleaner then CO2

Intel Package marked SUSHI TACO SALAD. Perhaps the technician was getting hungry while trying to dial in the laser settings.

lasers (at the same power levels).  These lasers typically run in the 10-40Watt range.  Most commonly they are a 532nm laser (green light).  In order to achieve the speeds needed, these marking systems are ran in a pulsed mode, 1-200KHz depending on the speed and material being marked.  This allows the laser to run at very high power, for very short pulses.

This of course requires some tuning, essentially simple trial and error to find the right setting for a given material.  Today’s packages are very thin, and marking on the organic substrate (or the silicon die itself) must be done in a way that leaves the markings visible, but does not damage the underlying structure. These markings are often only a few microns deep on silicon and 25 microns on a package, as deeper then th

Motorola PP603 Engineering Sample with ROFIN BAASEL test marking on the die

at is the chips circuitry.

Rofin offers testing and calibration for some of their bigger customers (such as Intel) where they help develop the settings needed.  This results in a lot of ‘oddly’ marked chips.  Companies will ship packages, dies and whatever else needs to be marked to Rofin along with

specifications of the markings (how wide, tall, deep etc) and the systems/settings are worked out to make it workable on the production line.  Anyone that has used a CO2 desktop laser knows they are not the fastest thing around.  An engraving project completion time is measured in minutes.  When marking chips, speed and accuracy are of paramount importance.  Rofin advertises their lasers as such “Our semiconductor marking solutions achieve marking speeds up to 1600 characters/second. Even at a character height of 0.2 mm and line widths of less than 30 µm they still ensure best readability.”

Package with laser settings engraved

Here we have a test chip package from Intel, marked up by Rofin, there is tests of the 3d-Bar code, Lots numbers s-specs and others.  There is also some calibration markings, its useful to engrave the settings used as for the test, as the test.  In this case we see 25k, 650mms and 23.8A.  These are 3 of the fundamental settings for the laser system.  25k is the pulse rate (25KHz) of the laser, 650mms is the speed, or feed rate, 650mm per sec (about 2ft/sec),  thats a relatively slow speed, but probably was one step in the calibration process.  The 23.8A is the current for the laser, in amps.  Its a rather high current compared to say a continuous wave CO2 laser which runs currents in the milliamps, but these are pulsed lasers, so that current is only needed for a fraction of a second.

Marking can also be done on the die itself.  Here we see a sample

Flip chip marking marketing sample by ROFIN SINAR in Tempe, AZ

(probably an actually marketing sample given away to customers) of a flip chip die, with ROFIN SINAR markings on it, and erven their phone number for their location in Tempe, AZ (only a few miles from several fabs in Chandler, AZ (including Intel and Motorola (now NXP)).

As chips become smaller, marking technology continues to evolve with it.  Markings today have become much less about what the consumer sees, and much more about traceability and trackability.  Being able to follow a device through the supply chain, or trace a defective device back to when/where it was produced.  Marking enhancements also play a great role in combating counterfeiting, helping them out of the supply chain.

There is a lot that goes into designing, making, assembling and even marking a computer chip, and often times things that seem the simplest, such as placing marking on a chip, are anything but simple, and just as important as the fabrication of the die itself.

Posted in:
CPU of the Day

June 1st, 2019 ~ by admin

All Boxed up: Retail Boxed CPU’s

NIB MOS 6502 CPU

New In Box MOS MCS6502 CPU from 1975 (Michael Steil – pagetable.com)

Today most all processors are permanently installed in their device (soldered in) or were taken from a bulk tray and installed by the OEM such as Dell or HP.  AMD has, at least with their higher end CPU’s gotten quite creative with the marking on the chip itself, and both AMD and Intel still offer some pretty amazing retail packaging for their enthusiast processors (the i9 in a dodecahedron package is pretty cool).  There was a time when almost all processors were available in retail packaging.  This was the time of physical computer shops, largely bypassed now by the Internet, where the packaging of a processor helped sell it.

I collect such New In Box (NIB) processors as they are pretty need to see the branding/marketing that went with the CPU’s of years past, and was reminded of this when I saw perhaps one of the oldest NIB CPU’s I have ever seen on Michael Steil’s pagetable.com blog.  An original MOS 6502 processor from 1975 in its original shipping box, as close to NIB as one can get.  MOS’s packaging would make Apple proud with its simplicity and design keeping everything tidy and the MCS6502 visible as soon as the box is opened (I am happy they didn’t use miserable black foam either, so the CPU is pristine after 45 years).  Even the original invoice is included.  $25 for the CPU ($118 in 2019 dollars) and $10 (nearly half the cost of the CPU ($47 in 2019)) for documentation)

Cyrix 83D87 386 FPU

Cyrix 83D87 386 FPU Bundled with Borland Quattro PRO Spreadsheet software (a big thing back in 1992)

Intel started offering retail boxed CPUs with the 8087 coprocessor.  This was really the first chip designed as a user upgrade to their PC (a new thing back then).  Before this Intel’s closest thing to a NOB was University Kits or Dev Kits for various chips/processors.  With the introduction of the PC, and the many thousands of beige box clones that followed, people themselves began buying processors and building computers for themselves at a much greater pace then before.  There was many companies making compatible processors at the time so packaging helped set them apart.  This began with upgrade products, math coprocessors for the 808x, 286 and 386 were the most common (by Intel, AMD, IIT, ULSI. Cyrix and more), but eventually processors themselves started getting the NIB treatment, Intel made OverDrive processors (still technically an upgrade product) for the 486. followed by actual Pentium CPUs in the retail box. By the late 1990’s everything from Celerons to Xeon server processors could be had in Retail box.  Buying a retail boxed Xeon for your rackmount server seems like an odd thing to do, but apparently Intel figured it would need to be done.

Quad AMD Opteron 6128s in Retail Box

Quad AMD Opteron 6128s in Retail Box

Other companies such as AMD, Cyrix and VIA made NIB processors but they are much less common, and in a lot of ways more interesting.  AMD made retail Durons, Athlons, and Opterons, and in one of the most unusual things I have seen for a NIB, an actual 4-pack of Opteron 6128s (pictured). The Opteron 6128 is a 8 core Magny-Cours server processor introduced in 2009 and cost $266 each at that time.  This NIB set is dated late 2011, so would probably be a bit cheaper, but still $800 or so, and the large SWATX motherboards needed to run 4 socket G34 processors require somewhat special cases and PSU’s, but at least you can have  a half terabyte of RAM.  Inside the retail box is 4 smaller boxes, each containing an Opteron 6128 CPU, installation instructions, warranty info, and a case badge (you get 4 total case badges).  It seems this packaging was designed to support different configurations (probable a single Opteron 6128, and duals).

Tags:
, ,

Posted in:
CPU of the Day

April 18th, 2019 ~ by admin

Tiered up for 3D-FPGAs: The Story of the Tier Logic FPGA-ASIC

100K LUT Tier Logic FPGA TL1F100 on the left and TL1A100 ASIC on the right

This is the CPU Shack Museum, but occasionally I find a chip thats not really a CPU but is of such interest that I keep it, especially if its novel and relatively unknown.  So today we have a bit of the story of Tier Logic.  Tier Logic set out to make FPGA (Field Programmable Gate Arrays) better, and to make the transition (or choice) between them and ASICs (Application Specific Integrated Circuit) easier.

FPGA’s are great for smaller product runs, they are configurable, and relatively easy to reprogram, designs can easily be updated/tested with no additional cost.  FPGA’s however are large in terms of die area, power budgets, and cost per chip.  ASIC’s on the other hand, take longer to develop (re-spinning silicon every time an error is found) and have a much larger upfront cost, as well as an entirely different tool chain to design with. They are however smaller, use less power, and once the design is finalized, the per unit cost is very low.  This presents a dilemma in design, which should one choose for a project?  What if you didn’t have to choose? What if you could have the flexibility of an FPGA, and the benefits of an ASIC all at once?

It is exactly this that Tier Logic set out to do.  Tier Logic was founded by FPGA process-technology pioneer Raminda Madurawe (from Altera) in 2003 and was led by Doug Laird, a founder of Transmeta (famous for the Crusoe VLIW processors).  For 7 years they worked to design a solution, working in what is known as ‘stealth mode.’  Stealth mode is a way for companies to work quietly, with little to know PR, until they have a product ready to release.  Often the company exists but is completely unknown to outsiders.  This has some definite benefits, there is no constant barrage of having to answer/report to the media and others, and their is less risk of someone seeing what you are doing and trying to beat you to market to it.  Seven years, however, is a very long time to be in stealth mode, and the reason for this is Tier Logic not only was inventing a new style of FPGA/ASIC, they had to develop a new silicon process to make it work.

Read More »

Tags:

Posted in:
CPU of the Day

March 31st, 2019 ~ by admin

CPU of the Day: CS603RMP-200 PowerPC 603r Goes Golden

Chip Supply Inc. CS603RMP-200 – 2005 Production Miltemp PowerPC 603r

The original PowerPC 603 was released way back in 1994, made on a 0.5u process and running at 75MHz.  A year later, the greatly improved PowerPC 603e was released, made on the same process, but supporting speeds of up to 200MHz.  It doubled the L1 caches to 16K each (for Instruction and Data) and introduced some Power Down modes useful for mobile and other low power applications.  A die shrink to 0.5u allowed speeds of up to 300MHz.

The 603e was available in both BGA  and cerquad packages, which worked for most applications.  But what if you wanted something a bit different?  What if your application needed something a bit more robust.  This is where packaging and die specialist companies come into play.  Motorola/IBM had no desire to make short runs of oddball packages and/or dies screened for higher end use.  Other companies however, did…

Motorola MPC603ERX100LN – 2000 vintage PowerPC 603e

Chip Supply Inc. was founded back in 1978 in Orlando, FL  just for this purpose.  Chip Supply provided die testing and packaging services for many different companies.  They also provided a service known as ‘die banking’ and just as the name implies, this involves collection and storing wafers and/or dies for future use.  This helped with end-of-life products especially.  As manufacturers slowed, changed, or stopped production of a device, dies for it could be made available through firms like Chip Supply.

In 1997 Chip Supply Inc. signed an agreement with Motorola giving them access to bare dies and known good dies for the PowerPC 603e, MPC106/7 PCI Bridge, and the MC68000 line.  This allowed Chip Supply to source dies from Motorola, screen them for higher spec (Military and Industrial temp typically).  Motorola had a similar agreement with Thomson-CSF (later this line was acquired by Atmel) who did the same thing, but also made radiation tested parts for space use (notably used on the original Iridium satellite constellation).

16×16 PGA in a 50mm package. Pins are 6mm long (twice as long as a Socket 7 Pentium)

The CS603RMP-200 is a 200MHz PowerPC 603r processor.  The 603r is nearly identical to the 603e, but allows for lower voltages (2.5V) and is made on a 0.29u process.  Chip Supply packaged this in a 16×16 CPGA package that is 50mmx50mm (nearly 2 inches square). It includes a large, gold plated heatspreader thats about the same size as a typical BGA PowerPC 603e.  These use original Motorola dies, upcreened to Military temperature (-55-125C) and tested to run at 200MHz.  The large heatspreader and ceramic package allow for better thermal management, and better mechanical support.  Thermal cycling and vibrations often result in BGA connection failures (a familiar problem on some game consoles in the early 2000’s), something a properly mounted PGA chip is much more tolerant of.

Chip Supply Inc. was acquired by Micross Components in 2010, a company that formed in 1998, and provided the same services with the addition of radiation testing. It appears that this was the end of the line for the entire PowerPC line by Chip Supply, though its likely that custom orders could be fulfilled for sometime after the acquisition.   Someday perhaps we’ll find out what applications the PGA PowerPC 603s were used in.

Posted in:
CPU of the Day

March 1st, 2019 ~ by admin

CPU of the Day: UTMC UT69R000: The RISC with a Trick

UTMC UT69R000-12WCC 12MHz 16-bit RISC -1992

We have previously covered several MIL-STD-1750A compatible processors as well as the history and design of them.  As a reminder the 1750A standard is an Instruction Set Architecture, specifying exactly what instructions the processor must support, and how it should process interrupts etc.  It is agnostic, meaning it doesn’t care. how that ISA is implemented, a designers can implement the design in CMOS, NMOS, Bipolar, or anything else needed to meet the physical needs, as long as it can process 1750A instructions.

Today we are going to look at the result of that by looking at a processor that ISN’T a 1750A design.  That processor is a 16-bit RISC processor originally made by UTMC (United Technologies Microelectronics Center).  UTMC was based in Colorado Springs, CO, and originally was formed to bring a semiconductor arm to United Technology, including their acquisition of Mostek, which later was sold to Thomson of France. After selling Mostek, UTMC focussed on the military/high reliability marked, making many ASICs and radhard parts including MIL-STD-1553 bus products and 1750A processors.  The UT69R000 was designed in the late 1980’s for use in military and space applications and is a fairly classic RISC design with 20 16-bit registers, a 32-bit Accumulator, a 64K data space and a 1M address space.  Internally it is built around a 32-bit ALU and can process instructions in 2 clock cycles, resulting in 8MIPS at 16MHz.  The 69R000 is built on a 1.5u twin-well CMOS process that is designed to be radiation hardened (this isn’t your normal PC processor afterall).  In 1998 UTMC sold its microelectronics division to Aeroflex, and today, it is part of the English company Cobham.

UTMC UT1750AR – 1990 RISC based 1750A Emulation

UTMC also made a 1750A processor, known as the UT1750AR, and if you might wonder why the ‘R’ is added at the end.  The ‘R’ denotes that this 1750A has a RISC mode available.  If the M1750 pin is tied high, the processor works as a 1750A processor, tied low, it runs in 16-bit RISC mode.  How is this possible? Because the UT1750AR is a UT69R000 processor internally.  Its the same die inside the package, and the pinout is almost the same (internally it may be but that’s hard to tell).  In order for the UT1750AR to work as a 1750A it needs an 8Kx16 external ROM.  This ROM (supplied by UTMC) includes translations from 1750A instructions to RISC macro-ops, not unlike how modern day processors handle x86.  The processor receives a 1750A instruction, passes it to the ROM for translation, and then processes the result in its native RISC instructions.   There is of course a performance penalty, processing code this way results in 1750A code execution rates of 0.8MIPS at 16MHz, a 90% performance hit over the native RISC.  For comparison sake, the Fairchild F9450 processor, also a 1750A compatible CPU, executes around 1.5MIPS at 20MHz (clock for clock, about 30% faster), and thats in a power hungry Bipolar process, so the RISC translation isn’t terrible for most uses.

NASA Aeronomy of Ice in the Mesosphere – Camera powered by RISC

By today’s standards, even of space based processors, the UT69R000 is a bit underpowered, but it still has found wide use in space applications.  Not as a main processor, but as a support processor, usually supporting equipment that needs to be always on, and always ready.  One of the more famous mission the UT69R000 served on was powering the twin uplink computers for the DAWN asteroid mission (which only this year ended).  It was also used on various instrumentation on the now retired Space Shuttles. The CPU also powered the camera system on the (also retired) Earth Observing-1 Satellite, taking stellar pictures of our planet for 16 years from 2000-2017.  Another user is the NASA AIM satellite that explores clouds at the edge of space, originally designed to last a couple years, its mission which started in 2007 is still going.  The

JAXA/ESA Hinode SOLAR-B Observatory

cameras providing the pretty pictures are powered by the UT69R000.  A JAXA/ESA mission known as SOLAR-B/Hinode is also still flying and running a Sun observing telescope powered by the little RISC processor.

There are many many more missions and uses of the UT69R000, finding them all is a bit tricky, as rarely does a processor like this get any of the press, its almost always the Command/Data Processor, these days things like the BAE RAD750, and LEON SPARC processors, but for many things in space, and on Earth, 16-bits its all the RISC you need.

January 24th, 2019 ~ by admin

Intel Everest Goes to Auction

Last summer we wrote about the Intel Everest series of high end CPU’s.  These are processors which Intel makes for very specific customers (in this case High Frequency stock trading).  They often have very little official information about them, and are sold at prices around $20,000 each. The latest in the series is the Intel Core i9-9990XE, with a max Turbo Frequency of 5.1GHz.  According to Anandtech, these will be auctioned off to the highest bidder.  These chips are a 14-core processor dissipating 255W, so will require rather good cooling, motherboard and Power Supply Support.  The chips will be auctioned to ‘select OEM’s’ once per quarter throughout 2019.  Intel isn’t likely deliberately making these chips scarce to increase the price, they are rather very rare speed bins for chips to attain.  Out of thousands of chip’s tested, only a few will pass screening at this level of performance.  These typically come from the center of a wafer (defects typically increase towards the edge of a wafer).  It will be interesting to see what prices these attain, but then again, we may never know.

Posted in:
CPU of the Day

December 29th, 2018 ~ by admin

The End is Near (of the year) – A Look Back at Y2K

AMD Y2Kids Career Day – K6-2 Custom Painted 

Think back 19 years, the year is 1999 and in just a few days the world is apparently coming to an end due to programmers of the 60’s and 70’s deciding to save precious memory and use 2-digits for the year instead of 4.  Or perhaps they just assumed that in 30-40 years we really wouldn’t be using the same systems. Either way the world (and by world we mean mainly the media) was prepared to go dark as everything technology driven ground to a halt as the clocks struck midnight.  Kids pondered if this would mean an extended holiday break, while parents wondered if they would still have a job, or money in their computer controlled checking account.

Thankfully (though perhaps looking back that is becoming murky to some) it was a complete non-even, life, and technology continued at a record pace. And who would want to miss it? The GHz war between AMD and Intel was neck and neck at the turn of the millennium, with AMD set to win it by a few days.  This was the age of the Pentium 3, the Athlon and the K6-2.  Technology was glamorous and some of its downsides seen today were relegated to sci fi movies.  AMD and other companies held job fairs to acquire new talent, and also hosted Career Days for younger kids to see what went on in the exciting tech industry.  This specially painted AMD K6-2 CPU was likely handed out during such an event, probably either in Austin, TX (where AMD had a large fab) or Santa Clara, CA.  Its a NTK made package with a AMD package # 26351, the standard from 1998-2000 and used for most all late K6-2 CPUs. The child who likely would have received this, probably a middle schooler at the time would now be around 30, who knows how such an event affected them but it would be neat if they ended up working at AMD (or Globalfoundries) or at the very least sing an AMD powered computer.

September 30th, 2018 ~ by admin

Peavey and the Motorola DSP56000

Motorola XSP56001ZL20 – 20.5MHz 1990

In 1985 Motorola was looking to create a DSP (Digital Signal Processor) line of processors to go with their very popular 68000 series of general purpose processors.  DSP’s are similar to a normal processor but, as their name implies, are designed to work on signals, versus data stored in memory.  Typical signal data is audio, video, RF (such as RADAR information) and anything else that comes in via an ADC.  These signals are processed via algorithm such as FFTs (Fast Fourier Transforms) to manipulate, change or analyse them.  In audio, this can be used for cleaning up an audio stream, adding effects to it, or even generating audio.

In the 1980’s the main single chip DSP competitors was the still in use TI TMS320 series. the ATT/WE DSP16 series, and some DSP’s from OKI/NEC.  When Motorola began work on what would become the DSP56000 they asked one of their long time customers, Peavey, what they would like to see in a DSP. Peavey is an audio equipment manufacturer, making such things as guitar amps and keyboards, so would have a good idea of what would be useful in a DSP designed for audio signals.

These were packaged in a ‘SLAM’ package. The contacts/traces were easily damaged by leaking batteries.

The DSP5600 is a 24-bit processor made on a 1.5u HCMOS process with around 150,000 transistors.  24-bits were selected as that was ideal for audio sampling at the time (and most ADS/DACs at the time max’d out at 20-bits of resolution anyways.  These DSP’s had a 3-stage pipeline and ran at 20.5MHz, 27MHz and 33MHz.  This provided around 10.25 MIPS of performance (at 20.5MHz).  They were a fixed point (no floating point support in hardware) design, which was adequate at the time.  A total of 62-instructions were provided.

The DSP56001 is identical to the DSP56000 except that it has 512×24-bits of on-chip program
RAM instead of 3.75K of program ROM and a 32×24-bit bootstrap ROM for loading the program RAM.  This is the version that became most popular.  Peavey used the 560001 (3 of them actually) to power the DPM3 SE keyboard back in 1990.  Recently J. Acorn, from Crasno Electronics in Canada sent The CPU Shack Museum an e-mail inquiring if I had a few of these now obsolete 56001 DSPs spare, to rebuild some dead Peavey keyboards.   As a Museum, I not only like to collect and present vintage IC’s but also regularly help people with project such as this, and have thousands of CPU’s sitting around that have been acquired through the years (really its a bit crazy how much I have collected lol).  Mr. Acorn needed 2 of these DSPs to replace ones destroyed by a leaking battery in a keyboard, and two is exactly what I had spare.  I dug them out, packaged them, and off to Canada they went.  The result?  A restored and working Peavey keyboard.  You can read about the restoration process on Crasno’s site.

The 56000 series continued to be made by Motorola (and then Freescale) up until 2012 when it was announced it would be discontinued as a standalone product.  The 56000 series cores though live on, inside of other Freescale (now NXP) products.

 

Posted in:
CPU of the Day

August 25th, 2018 ~ by admin

CPU of the Day: FOCUS on 32-bits

1983 HP FOCUS Board set – Pre FPU. Top left: Memory. Top Right: I/O and CPU bottom center

The year is 1981, Intel is making the 8/16-bit 8086/8088, and Motorola has released the 16/32-bit 68000 processor to much fanfare.  Motorola marketed this as the first 32-bit processor, but while it supports 32-bit instructions/data it does so with a 16-bit ALU.  HP, always used the MC68000 in their 9000 Series 200 line of computers, providing rather good performance for 1981. But this was the 1980’s and HP wasn’t satisfied with good, they wanted more, they wanted to implement a full 32-bit computer on something less then the 5,000 IC’s typically used to implement one at that time.  This meant making a processor like nothing else before, something with more then the 68,000 transistors of the MC68000 or even the 134,000 transistors of the new i286 Intel had announced.  What HP made is simply remarkable, in 1981 they announced the HP 9000 Series 500 computers, powered by an all new fully 32-bit processor called the FOCUS.  FOCUS was made on HP’s high density NMOS-III process, a 1.5u process, and used 450,000 transistors.  Thats 450,000 transistors on a single 40.8mm2 piece of 1.5u silicon in 1981, a smaller die than the Intel 286.

Read More »

Tags:
, ,

Posted in:
CPU of the Day

August 15th, 2018 ~ by admin

CPU of the Day: The 61 Knights of the Intel Xeon Phi

Xeon Phi – Knights Corner – Engineering Sample

In June of 2013, 20 years after the release of the Intel Pentium Processor, Intel released a new processor, technically a co-processor that Intel referred to as a MIC (Many Integrated Core).  It was branded as a Xeon, specifically the Xeon Phi 7000 series but at its core, it was nothing like a Xeon of 2013.  Code named Knights Corner, it built on the Knights Ferry.  Knights Ferry used many Larrabee GPGPU cores and was not designed as a commercial product.  Knights Corner , however, was, and to do so, Intel stuck with an architecture that customers were very familiar with, x86.  The Knights Corner integrated 61 Pentium P54CS cores onto a single chip.  The original Pentium P54CS was made on a 0.35u process and topped out at 200MHz.  They included 16K of L1 cache on die, and typically 256-512K of L2 Cache off chip.  The implementation of the Pentium on the Phi gets a bit of an upgrade.  The cores are made on a 22nm process (16 times smaller) and clocked at up to 1.2GHz.  L1 cache has been increased to 64K per core (32K Instruction  32K Data).  L2 cache remains at 512K

Knights Corner Die. – 62 Cores – 8 GDDR5 Memory Controllers

per core, but at 22nm, integrating all 30.5MB of cache on the same die becomes relatively easy.  The biggest change to the cores is adding support for 64 bit instructions, as well as adding a new execution unit called the VPU. This VPU (Vector Processing Unit) has its own 512-bit wide SIMD instruction set, integer support, Fused Multiply/Add, and other advanced features that are more commonly found in GPU’s. The VPU is the result of Intel’s work with Larrabee, the precursor to Knights Corner.  Interestingly MMX/SSE are not supported by the cores natively, this is handled in software (using virtualization) and leveraging the VPU included with the 61x Pentium Cores.  With the VPU, each core has 4 execution units (VPU, FXU, and 2 x Integer units). This allows the cores to support 4-way multi-threading; in practice, 2 threads are most common as 2 execution units are usually tied up calculating memory addresses.

Knights Corner Sample – This is a 1.09GHz part while production versions were bumped to 1.1GHz – Elpida 2Gbit GDDR5 RAM chips surround the core.

For some reason Intel was very vague about information on die sizes/transistor count on the Phi.  Many sources claim 350mm2 die with 5 Billion transistors.  Taking apart a Phi shows that the die is actually much larger.  In fact the Xeon Phi die is 705mm2 and has 5.1 Billion transistors.  A 22nm Haswell Xeon with 18 cores has a die area of 622mm2 containing 5.6 Billion transistors. This means the Xeon Phi die wasn’t the most efficient is its use of space, likely due to the amount of room needed for the very large rings used to connect all the cores.  Looking at the die you can also see a lot of unused space.   There are actually 62 cores per die (with only 61 used max.)  This means 31MB of L2 cache which at 6 transistors per cell (bit) accounts for 1.5 Billion of the transistors.  L1 Cache is 64K per core so another 190 Million transistors there.  That leaves the bulk of the die for the cores, memory controllers, and the 3 interprocessor communication rings that handle communication between cores, MC’s (8 GDDR5 Memory Controllers per die), and the outside world.

Each Xeon Phi board includes the processor, as well as 6-16GB of GDDR5 Memory (8GB on the Engineering Sample here).  Memory is handled by 32 Elpida EDW2032BBBG-6 2Gbit GDDR5 6 Gbps chips. This gives the card is 352 Gbps memory bandwidth and 1 TFLOPS of computing performance.  All in a PCI-E car that dissipates around 300W.   Card/System management is provided by a NXP LPC2365FBD100 72MHz ARM7TDMI processor.

Knights Corner Xeon Phi with cooler removed. 16x 2Gbit GDDR5 (+16 on the back)

In January of 2013 the Texas Advanced Computing Center in Austin, TX announced the Stampede Supercomputer, the first large scale deployment of Xeon Phi Processors.  It used 6880 of them in its 6400 compute nodes and could hit nearly 10PFLOPS of performance. In June of 2013 the Chinese supercomputer Tianhe-2 became the fastest supercomputer in the world, a title it held until the end of 2015.  It was powered by 32,000 Intel Xeon E5-2692 2.2GHz 12C Ivy Bridge processors and a massive 48,000 Xeon Phi co-processors resulting in over 33PFLOPs.

Tianhe 2 Super Computer with 48,000 Knights Corner Processors.

Intel made a successor to Knights Corner, known as Knights Landing, that was based on the Atom core, but then began to wind down the project.   Avinash Sodani, chief architect of the Knights Landing chip took a job at Cavium Networks (who make multicore MIPS networking processors), and Intel then hired Raja Koduri, the chief architect of AMD’s GPU processors.  Intel’s future seems to be one based on Xeon, and GPU’s.

Like the Knights of old, the the Xeon Phi has been passed up by other technologies, certainly still useful, but destined to the halls of museums and history books.  It came, and it conquered the Top500 Supercomputer list, and then quietly fades away.  On July 27th Intel quietly announced the discontinuation of the Xeon Phi line, with last orders accepted the end of this August (2018).