Stratus: Servers that won’t quit – The 24 year running computer.
Making the rounds this week is the Computer World story of a Stratus Tech. computer at a parts manufacturer in Michigan. This computer has not had an unscheduled outage in 24-years, which seems rather impressive. Originally installed in 1993 it has served well. In 2010 it was awarded for being the longest serving Stratus computer, then being 17 years. Phil Hogan, who originally installed the computer in 1993, and continues to maintain it to this day said in 2010 “Around Y2K, we thought it might be time to update the hardware, but we just didn’t get around to it” In other words, if it’s not broke, don’t fix it.
Stratus computers are designed very similar to those used in space. The two main difference are: 1) No need for radiation tolerant designs, let’s face it, if radiation tolerance becomes an issue in Michigan, there are things of greater importance than the server crashing and 2) hot swappable components. Nearly everything on a Stratus is hot-swappable. Straus servers of this type are based on an architecture they refer to as pair and spare. Each logical processor is actually made from 4 physical CPU’s. They are arranged in 2 sets of pairs.
Each pair executes the exact same code in lock-step. CPU check logic checks the results from each, and if there is a discrepancy, if one CPU comes up with a different result than the other, the system immediately disables that pair and uses the remaining pair. Since both pairs are working at the same time there is no fail-over time delay, it’s seamless and instant. The technician can then pull the mis-behaving processor rack out and replace it, while the system is running. Memory, power supplies, etc all work in similar fashion.
These systems typically are used in areas where downtime is absolutely unacceptable, banking, credit card processing, and other operations are typical. The exact server in this case is a Stratus XA/R 10. This was Stratus’s gap filler. Since their creation in the early 1980’s their servers had been based on Motorola 68k processors, but in the late 1980’s they decided to move to a RISC architecture and chose HP’s PA-RISC. There was a small problem with this, it wasn’t ready, so Stratus developed the XA line to fill in the several years gap it would take. The first XA/R systems became available in early 1991 and cost from $145,000 to over $1 million.
The XA is based on another RISC processor, the Intel i860XR/XP. Initial systems were based on 32MHz i860XR processors. The 860XR has 4K of I-cache and 8K of D-cache and typically ran at 33MHz. Stratus speed rating may be based on the effective speed after the CPU check logic is applied or they have downclocked it slightly for reliability. XA/R systems were based on the second generation i860XP. The 860XP ran at 48MHz and had increased cache size (16K/16K) and had some other enhancements as well. These servers continued to be made until the Continuum Product Line (Using Hewlett Packard “PA-RISC” architecture) was released in March of 1995.
This type of redundancy is largely a thing of the past, at least for commercial systems. The use of the cloud for server farms made of hundreds, thousands, and often more computers that are transparent to the user has achieved much the same goal, providing one’s connection to the cloud is also redundant. Mainframes and supercomputers are designed for fault tolerance, but most of it is now handled in software, rather than pure hardware.
January 29th, 2017 at 2:04 pm
[…] Read Full Story […]
January 29th, 2017 at 2:42 pm
Stratus the company still exists but in a significantly changed form. I’m sitting across the street from their new main office at the moment. In Maynard MA.
January 29th, 2017 at 5:34 pm
“CPU check logic checks the results from each, and if their is a discrepancy”
Oh come on!
January 29th, 2017 at 5:46 pm
Thanks! I don’t proof read them as good as I should
January 29th, 2017 at 6:46 pm
“lets face it, if radiation tolerance becomes an issue in Michigan, there are things of greater importance then the server crashing”
Said like someone who has zero clue about radiation or it’s effects on electronics.
January 29th, 2017 at 9:05 pm
Actually no, the statement was made to add a bit of humour to the post, it wasn’t the best place to discuss the differences in radiation hardening and radiation tolerance and the pro’s/con’s of each, however that would be a good post.
January 29th, 2017 at 9:06 pm
Looks like there is an error on this page: http://www.cpushack.com/chippics/
(Awesome site, by the way!)
January 29th, 2017 at 9:14 pm
Oh interesting, that’s from the old gallery, I am guessing that its failed since the PHO version got changed recently.
Have to see if I can fix that
January 29th, 2017 at 9:19 pm
Yup, that was the problem, fixed.
Thanks!
January 30th, 2017 at 12:28 am
Hi Jeff,
“… if radiation tolerance becomes an issue in Michigan, there are things of greater importance then the server crashing and…”
Not to climb on the bandwagon, but it should be “…THAN the server crashing…”
Hope this helps!
Cheers!
January 30th, 2017 at 12:33 am
Not a problem, readers help make the articles better, and I am usually terrible with my then/than (I feel bad for people who have to learn English as a second language lol)
January 30th, 2017 at 3:52 am
“Nearly everything on a Straus is hot-swappable. Straus servers”
Straus?
January 30th, 2017 at 7:56 am
s/Straus/Stratus/
January 30th, 2017 at 8:28 am
[…] Stratus: Servers that won’t quit – The 24 year running computer. from Tumblr http://chrisshort.tumblr.com/post/156581610650 via IFTTT […]
January 30th, 2017 at 10:49 am
It could be like Theseus Ship 😉 Does any know how much of this server has been hotswapped/replaced during scheduled downtime?
January 30th, 2017 at 5:49 pm
Little known fact… the very first generation of Bloomberg trading systems, build in collaboration with Merrill Lynch, relied on a Stratus-based communications front end… one of the very first Stratus applications rolled out into production.
January 30th, 2017 at 5:54 pm
That would make sense, certainly an application where downtime would not be acceptable.
February 13th, 2017 at 8:08 am
Great article and great site! Congratulations!
February 27th, 2017 at 4:27 am
OMG, your page is going to ruin my work day, so much stuff to read!
May 22nd, 2021 at 6:38 am
There not their.