December 3rd, 2012 ~ by admin

GPU of the Day: SGI GE7 Geometry Engine

SGI Extreme 4GE7MCM 256 MFLOPS 320,000 Gates

At 2.6oz (75 grams) and 2.25 inches square (6cm) the SGI 4GE7MCM is a beast of a graphics chip.  More properly called a Geometry Engine was, the GE7 was responsible for all the graphics processing in SGI Indigo2 workstations.  The Indigo2 Extreme graphics option consisted of a pair of these MCMs (Multi-Chip-Module).  Each one contains 4 GE7 Geometry Engines providing 32MFLOPS of performance each.  Each GE7 consists of a custom 80,000 gate array from LSI (for a total of 320,000 gates and 128MFLOPS per MCM).  This performance level was, ironically, better then the main system CPU (35MFLOPs for the 200MHz R4400 option).

Each of the black ‘caps’ on the chip covers a single GE7 Engine.  A similar design was used for the XZ Graphics system that had only 4 total GE7 cores.  This was either implemented with 2 of the large MCMs that had only 2 GE7s in them (same package however) that were marked 2GE7MCM, or, later, a single surface mount MCM containing 4 GE7 engines.  All were manufactured by LSI.  In total the Extreme Graphics subsystem had no less then 31 custom gate arrays from LSI for a total of over 1.2 million gates.   At an average of 2 transistors per gate that works out to around 2.5 million transistors, a considerable amount for a graphics system in 1993.  Today’s graphics chips pack in transistors numbering in the billions, the Geforce GTX 680 has a total of 3.54 billion transistors, and performance measured in TFLOPS, again for the GTX 680, 3.09TFLOPS.  Today’s graphics chips cannot, however, compete with the magnificent looks of the GE7’s giant MCM package.

Sources:
Indigo2 Product Guide (PDF)
Indigo2 Technical Report 

 

Tags:
, ,

Posted in:
GPU

November 23rd, 2012 ~ by admin

The WITCH Rises – Dekatron Computing

Harwell Dekatron

Recently another antique computer was restored to working condition.  Originally called the Harwell Dekatron, the WITCH (Wolverhampton Instrument for Teaching Computing from Harwell) was built in 1949-1951.  Back in the early days of computing, it often took years to build a computer, rather then the minutes it takes to make a iPhone in today’s factories.

The WITCH was a decimal computer, storing data not in 0’s and 1’s of the transistor age, but actual decimal digits.  It originally could store 20 8 digit numbers (0-99,999,999 or 23 binary bits) but later was upgraded to support up to 40, which was considered more then enough (such short sighted statements did not end in the 50’s, if you’ll remember Bill Gates comment about 640k of RAM is all anyone would need.)  Data on the WITCH was stored on Dekatron tubes, a Cold-cathode device filled with Neon (or Argon) that could represent 10 digits.  Sending a pulse to the Dekatron would cause the glowing Neon dot (and its associated high voltage) to move from cathode to cathode, thus allowing data to be stored.  One side effect of having decimal data, and glowing orange dots for volatile storage is you can literally SEE what is in memory.

Dekatron in operation (courtesy of tube-tester.com)

The WITCH was mainly used to perform mathematical computations.  It was not a fast computer, it took it a good 10-15 seconds to perform a multiplication.  Many humans with adding machines could actually work the problems faster, however the WITCH never complained of carpal tunnel nor did it need breaks.  The Harwell Dekatron was slow, but is was steady and quite reliable.  It could go for days (providing it had enough problems fed to it on paper tape) without error or breakdown and that is what made it so useful and worth restoring.

Check out the BBC article and video of its operation and listen to the relays click, and see the glowing Neon of computational history.

Posted in:
Museum News

November 18th, 2012 ~ by admin

48 Cores and Beyond – Why more cores?

Intel 48 core Single Chip Cloud Processor

Recently two companies announced 48 core processors.  Intel announced they are working on a 48 core processor idea for smart phones and tablets. They believe it will be in use within 10 years, which is an eternity in computer terms.  Meanwhile Cavium, makers of MIPS based networking processors announced a new 64bit MIPS based 48-core networking processor.  The Octeon III, running at 2.5GHz is expected to beginning shipping soon.  Cavium already makes and ships a 32 core MIPS processor.  So clearly multi-core processors are not something we need to wait 10 years for.

Tilera, another processor company, is ramping up production of the TILE-Gx family.  This processor running at 1-1.2GHz supports from 9 to 100 cores (currently they are shipping 36 core versions).  NetLogic (now owned by Broadcom) made a 32 core MIPS64 processor and Azul Systems has been shipping a 54 core processor for several years now.  Adapteva is now shipping a custom 64 core processor (the Epiphany-IV).  This design is expected to scale to many thousands of cores.

Why is this all important?

Tilera multi-core wafer

While your personal computer, which typically is running a dual, or quad core, or perhaps even a new Intel 10 core Xeon is practical for most general computing, these processors are not adequate for many jobs.  Ramping up clock speed, the GHz wars, was long thought to be the solution to increasing performance in computing.  Just making the pipe faster and faster, and reducing the bottlenecks that fed it new instructions (memory, disk, cache, etc) was the proposed solution to needing more performance.  To a point it worked, until a wall was hit, that wall was power and thermal requirements.  With increasing clock speed processors ran hotter and began drawing immense amounts of current (some processors were pulling well over 100 amps, albeit at low voltage).  This was somewhat alleviated by process shrinks, but still, the performance, per watt, was decreasing.

Many computing tasks are repetitive, do the exact same thing to each of a set of data, and the results are not interdependent  meaning A does not have to happen before you can do B.  You can perform an operation on A, B and C, all at once and then spit out the results.  This is typically true of processing network data, processing video, audio, and many other tasks.  Coding and compiling methods had to be updated, allowing programs to run in many ‘threads’ which could be split amongst many cores (either real or virtual) on a processor, but once done, the performance gains were tremendous.

Clearspeed CSX700 192 cores @ 250MHz

This allows a processor to have increased performance, at a relatively low clock speed.  Work loads can also be balanced, a task that does not lend itself to parallelism, can be assigns to a single core, while the other cores can be busy doing other work.

There are several main benefits to multi-cores:

Increased performance for parallel tasks:  This was the original goal, split a single problem into many smaller ones, and process them all at once.  That is why massively multi-core processors began in the embedded world, dealing with digital signal processing and networking.

Dynamic Performance:  Dynamic clocking of multi-core processors has led to tremendous power savings.  Some tasks don’t need all the performance on all the cores, so a modern multi-core processor can dynamically scale the clock speed, and voltage, of each core, as needed.  If not all cores are needed, they can be put to sleep, saving power. If a non-parallel task is encountered, a single core can be dedicated to it, at an increased clock speed.

Upgradeability:  If a system is designed correctly, and the code is written/compiled well, the system does not know, or care how many cores the processor has.  This means that performance can, in general, be upgraded just by swapping out the physical silicon with one with more cores.  This is common in larger super computers, and other systems.  HP even made a custom Itanium, called the Hondo MX2 that integrated 2 Madison cores on a single Itanium module.  This allowed their Superdome servers to be upgraded with nothing more then a processor swap, much cheaper then replacing the entire server.

Not all tasks are easily handled in a parallel fashion, and for this reason clock speed is still important in some applications where B cannot happen until A is complete (data dependencies).  There are, and will continue to be systems where this is the focus, such as the IBM system Zec12 which runs at a stunning 5.5GHz.  However, as power becomes a more and more important aspect of computing, we will continue to see an ever increasing number of cores per chip in many applications. Is there a limit?  Many think not, and Intel has made a case for the use of 1000+ core processors.

Posted in:
Processor News

October 31st, 2012 ~ by admin

Cyrix Joshua Processor – From Peppers to the Bible

Cyrix Joshua Sample

Perhaps one of the most confusing, and misreported processor stories is that of the Cyrix Joshua processor.  More correctly known as the VIA Cyrix III Joshua.  Cyrix began sampling this successor to the MII in 1999, a tumultuous time in Cyrix’s history, as they were in the midst of being sold to VIA by National Semiconductor.  The Joshua never made it into full production, being quickly killed off by the Centaur designed Samuel core. Centaur was the processor division of IDT which produced the Winchip series, bought by VIA only a month after their purchase of Cyrix.

Adding to the confusion was Cyrix bountiful use of code names for its upcoming products, with many seeming to overlap, change or be redundant.  Understanding the methodology of their naming will greatly increase ones understanding of the products.  Cyrix used a code name for the core of a processor, as well as a separate name for what application that core was going to be used in.  Just like Intel used the P6 core for the PII, Celeron, and Xeon, Cyrix intended its cores to be able to be used in several products.

In the late 1990’s Cyrix had two new cores under development.  The first was the Cayenne, an evolution of the 6x86MX/MII processor.  The Cayenne was essentially an MII, with a dual (rather then single) issue FPU, support for 3DNow! instructions, and perhaps most importantly, a 256K 8-way associative on-die L2 cache.  It retained the 7 stage pipeline of the MII, the 256 byte scratch pad L0 cache, an almost identical X-Y integer unit and the same 64K L1 cache.  Cyrix had had industry leading integer performance, but always lagged in the area of FPU performance.  The dual issue FPU was their attempt to help remedy this.  However, FPU intensive benchmarks, such as Quake 3, showed the Cayenne core to be about half as fast as a Celeron of equal rating (500MHz vs PR500 Cyrix).  Business apps, heavy in integer and light on floating point, showed the integer strength of the Cyrix, with a 400MHz Cyrix matching a 500MHz Celeron.

The Cayenne core was slated to be used in at least 3 different products.  The first was the MXi, this was the successor to the MediaGX and thus would be highly integrated, including a PCI Bus controller, SDRAM controller, MPEG/DVD acceleration, 2D/3D Graphics as well as audio capabilities. The Jedi was to be a socket 7 (Super 7 really) compatible processor based on the Cayenne core.  This was canceled in 1999 (nothing to do with potential lawsuits from Lucas Films as often was rumored).  The third use of the Cayenne core was the Gobi, this was to be a Socket 370 compatible processor and it is this version that was widely sampled, and benchmarked, by many hardware review sites, magazines, etc.  When VIA purchased Cyrix on June 30, 1999 the Gobi project was allowed to continue, MXi, and other projects were quickly shut down.  The Gobi codename did not fit with VIAs core naming scheme however, thus is was renamed.

Read More »

Tags:
, , ,

Posted in:
CPU of the Day

October 26th, 2012 ~ by admin

Paul Allen’s Living Computer Museum Opens To Public In Seattle

Paul Allen, co-founder of Microsoft, has just opened the Living Computer Museum in Seattle.  Living, of course, due to the fact that many of the vintage computers on display are working units.  Some very rare systems including the only working PDP-7 in the entire world (UNIX was created to run on the PDP-7, so its a rather famous machine) and other DEC’s are on display.  There are original IBMs, TRS-80s, Novas, and yes even some Apples.  No Apple 1 as of yet.  Perhaps Paul could pick up this latest one on auction? Should go cheap as it seems to be lacking an original MOS 6502 CPU.

Posted in:
Museum News

October 16th, 2012 ~ by admin

Renesas: The Auto Bailout of the Semiconductor Industry

In 2003 Renesas Technology was formed as a joint company between Hitachi and Mitsubishi, combining their semiconductor operations.  In 2010 Renesas Electronics was created by the merger of NEC Electronics, and Renesas Technology.  This created the largest supplier of microcontrollers in the world, combining the product portfolios of NEC, Mitsubishi and Hitachi.  This allowed them to stop competing amongst themselves, and compete with Samsung, Infineon and other suppliers.

Renesas ended up with the following microcontroller families:

  • Hitachi: H8, H8S, H8SX, SuperH
  • Mitsubishi: M16, M32, R32, 720, 740
  • NEC: V850, 78K

In addition Renesas has developed it’s own designs including:

  • RX Series – a replacement for the Hitachi H8SX and Mitsubishi R32C designs.
  • RL78 Series – a replacement that combines the NEC 78k and Mitsubishi R8C devices
  • RH850 Series – successor to the NEC V850 for automotive use
  • R8C Series – Value derivative of the Mitsubishi M16C

Hitachi SH-3

One of the largest markets for these microcontrollers (and associated other parts) is the automotive industry, with today’s vehicles containing, on average, $350 in just IC’s per car.  $350 may not sound like much when a car costs $20,000, but the Average Sale Price (ASP) per component, is 33 cents, meaning there are, on average, over 1000 IC’s in a modern car, of which 50-100 are microcontrollers.  They do everything from run the stereo, to monitor and adjust engine parameters.  As more features (entertainment, navigation, stability control, etc) are added, the count goes up.

The market downturn in 2008-2009 hit the automotive industry, and is suppliers very hard.  With very small profit margins this hit Renesas very hard as well.  Combined with increasing competition from Samsung  Renesas has been driven into high levels of debt, and a distinct lack of profitability.

Read More »

October 8th, 2012 ~ by admin

Apple A6 vs Rockchip RK3066: 4 Years vs. 6 weeks of design

The introduction of the iPhone 5 was also the introduction of Apple’s first truly original Application Processor design.  The iPhone 2, 3G and 3GS all featured designs by Samsung.  The iPhone 4 introduced the A4, which was closely based on the Hummingbird Cortex-A8 core developed with Samsung and Intrinsity, again, not a truly Apple design.  The iPhone 4S introduced the A5 (and the A5X used in the iPad 2).  The A5 is based on the ARM Cortex-A9 MPCore, a standard ARM design, albeit with many added features, but architecturally, the processor is not original, just customized.

ARM provides cores designs for use by developers, such as the Cortex-A9, A8, etc.  These are complete designs of processors that you can drop into your system design as a block, add your own functions, such as a graphics system, audio processing, image handling, radio control, etc and you have your processor.  This is the way many processor vendors go about things.  They do not have to spend the time and effort to design a processor core, just pick one that meets their needs (power budget, speed, die area) and add any peripherals   Many of these peripherals are also licensed as Intellectual Property (IP) blocks making building a processor in some ways similar to construction with Legos.  This is not to say that this is easy, or the wrong way to go about things, it is in fact the only way to get a design to market in a matter of weeks, rather then years.  It allows for a wide product portfolio that can meet many customers needs.  The blocks are often offered for a specific process, so not only can you purchase a license to a Cortex-A9 MPCore, you can purchase one that is hardware ready for a TSMC 32nm High-k Metal Gate process, or a 28nm Global Foundries process.  This greatly reduces the amount of work needed to make a design work with a chosen process. This is what ARM calls the Processor Foundry Program.

Read More »

Tags:
, ,

Posted in:
Research

October 5th, 2012 ~ by admin

CPU of the Day: Fairchild F9450 – Commercial Military

Fairchild F9450 – 1985 – 10MHz

In 1980 the United States Air Force published a standard for a 16-bit Instruction Set Architecture (ISA) to meet their needs for computers on fighters etc.  This standard is known as MIL-STD-1750A and laid out what the processor needed to be able to do, but not how, or what would be used to accomplish it.  This allowed manufacturers to implement the standard in anyway they wanted.  It could be done in CMOS, Bipolar, SoS, GaAs or even ECL.  It was designed (like the Signetics 8X300 and the Ferranti F100) with real time processing in mind, similar to what we would call a DSP today.

Many companies made 1750A compatible processors including Honeywell, Performance Semiconductor (now Pyramid), Bendix (Allied), Fairchild, McDonnell Douglas, and others.  The processors ended up finding uses in many things outside of the USAF, including many satellites and spacecraft including the Mars Global Surveyor.  The standard was not restricted to military use, in fact commercializing it was encouraged, as this would increase production, which would help decrease costs for the military.

Fairchild designed the F9450 to meet both the commercial, and military markets.  Initial availability was in 1985 and the F9450 provides an on-board floating point unit, an optional, second chip, on other implementations.  Fairchild also made a F9451 MMU (Memory Management Unit), and a F9452 BPU (Block Protection Unit).  The 9450 was manufactured in a bipolar process (Fairchild called it I3L for Isoplanar Integrated Injection Logic).  This helped boost speed, as well as greatly increased reliability, as bipolar is much less susceptible to higher radiation levels then CMOS is.  Bipolar processes also generate heat, lots of it and to help counter this Fairchild used a somewhat unusual (for a processor) ceramic package made of Beryllium Oxide (BeO).  BeO has a higher thermal conductivity than any other non-metal except diamond, and actually exceeds that of some metals. Normally the ceramic on a CPU package is some form of Alumina (Al2O3).  Beryllium itself is a carcinogen so grinding, or acid application on BeO is not recommended.  The bottom of the the 9450 was made with a different ceramic, as the goal was to get the heat away from the chip, and not back into the PCB.  9450s were available in speeds of 10, 15 and 20MHz and in Commercial, or Military temperature rating.  MIL-STD-883 screening was of course available.

By 1996 the 1750A architecture was declared inactive and not recommended for new designs.  However, due to its extensive software support, reliability, and familiarity, it enjoys continued use, and is still being manufactured by several companies.

September 27th, 2012 ~ by admin

EPROM of the Day: AMD AM27C2048 – Shrinking Dies

AMD AM27C2048-150DC – 3 Dies (Click to view larger)

In the semiconductor industry process shrinks are highly sought after.  They result in smaller die sizes for the same part, which results in more chips per wafer, thus increasing revenue.  There are other benefits (typically speed increases and power decreases (aside from leakage)) but from a purely economical stand point, the smaller dies result in more profits.

Rarely do you get to SEE the result of these process changes.  UV-EPROMs fortunately have a window, for erasing them with UV light, that also lets the die be seen.  Here are three AMD AM27C2048 EPROMs.  These are CMOS 2-Mbit EPROM, pretty common in the 1990s.  As you can see that while they are all the same part, the dies are significantly different. While its hard to say for sure without a die analysis, we can make some good estimations based on what foundries AMD had at the time these devices were made.  The first EPROM is date late 1993 which will likely be a 1 micron process.  The second EPROM, dated mid 1997 is a bit smaller, around 20% smaller, which fits with AMD’s 0.8 micron fabs.  The last, and latest, EPROM was made in 1998, likely at the joint AMD-Fujitsu (FASL) plant in Japan.  This would mean it is a 0.5 micron device. The plant was transitioning to 0.35 micron at the time, but that was most likely used for the higher profit Flash memory devices.  By 1998 EPROM’s were on the decline.

Also of note is the different copyrights.  The first two are copyright 1989 while the third is 1997.  Its hard to know for sure (I do not have the microscopes/tools needed to do die analysis) but it is likely the 1 micron to 0.8 micron was an optical shrink. Literally this means that the die (and masks) are scaled down to a new smaller process with no architectural changes.  This is simple and inexpensive.  Sometimes changes will have to be made to support a new process, or make full use of its benefits, so a new layout/masks are made.  This is likely the case with the 1997 copyrighted EPROM.  The design was altered to work with the new, smaller process, and it was significant enough to warrant a new copyright.

Tags:
,

Posted in:
EPROM of the Day

September 21st, 2012 ~ by admin

CPU of the Day: MicroModule Systems Pentium Gemini

MicroModule Systems GV1-D0-3S-60-120A 120MHz (top side)

MicroModule Systems (MMS) began operations in 1992, following the completion of an agreement to acquire the assets and license rights to the technology of Digital Equipment Corporation’s MCM (Multi-chip Module) engineering and manufacturing business in Cupertino, California. The MicroModule Systems vision was to lead the next wave of electronic integration technology. Previous waves have been: discrete components (1950s), integrated circuits (1960s), large-scale integration (1980s), and system on a chip (mid 1990s).

The MMS Gemini was a module, that includes the National Semiconductor chipset die (x2) , a P54CSLM Pentium die, tag RAM, and cache RAM (128Kx2) as well as an LM75A temperature sensor for thermal management.   MMS used Intel D0 revision P54 processors (with the exception of some early C0 die), a stepping Intel never packaged themselves (it was solely used for the ‘known good die’ program).  When Intel discontinued selling fully tested dies, MMS had no way to build the Gemini and later MMX modules, so in 1998 went out of business. The Gemini was used in many mobile, and rugged PC applications such as the Motorola MW520 Computer used in many police cars.

MMS also produced MCM modules for ROSS, used to make the HyperSPARC processor as well as the Intel Pentium Pro 1MB MCM.   For a company that was only in existence for 6 years, their impact was tremendous. MMS was not alone in their production of Intel Pentium Processor modules…

Fujitsu also made modules using Intel dies.  These were again used in rugged PC applications, laptops, and industrial computers.

Fujitsu MRN-3546 120MHz

Fujitsu made 100, 120, 133, and MMX processors on a MCM type package where the individual components are bonded/soldered to a ceramic substrate (rather then the PC Board)

Tags:
,

Posted in:
CPU of the Day