DIVIDED IT FAILS - PENTIUM ARITHMETIC BUG ANGERS USERS
(December 2nd 1994) It is not often that the president of an $8bn (and
rising) company spends the weekend drafting a message to be posted to a
newsgroup. But that is what Andy Groves, head of Intel Corp was doing
last weekend. Over the past couple of weeks the Usenet newsgroup
comp.sys.intel has been dominated by an angry debate over a bug in
Pentium's floating point unit which causes errors in the occasional
division sum. If you are using a Pentium machine today then it will have
the bug - Intel is now saying that it is sampling fixed chips with its
manufacturer-customers, but that machines with corrected chips are not
likely to appear in the shops until early next year.

What so enraged the Internet-based users, was not so much the bug
itself; bugs *do* appear in processors and all processors go through a
constant process of improvement. Rather, it was Intel's apparent
attitude to the problem. The company acknowledged that it knew about the
problem since the summer, however the perception was that it didn't
actually let on until Dr. Thomas R. Nicely of Lynchburg College let the
cat out of the bag. Dr Nicely had been doing some heavy duty number
crunching when he realised that the answer to one sum 1/824633702441 was
only accurate to the eight significant figures, rather than fifteen
decimal places. He had noted the problem in June and, having excluded
all other sources of error,  reported it to Intel on October 16th. The
matter became public on October 30, when a memo to his colleagues was
re-posted on Compuserve. Other researchers quickly chipped in and it was
discovered that the problem extended across a range of numbers. The
clearest analysis of the problem so far is contained within a Frequently
Asked Question (FAQ) document put together by Mike Carlton of the
University of Southern California Information Sciences Institute.
Currently no-one outside Intel is sure exactly how many division-pairs
will cause errors, however it is known that at least 1,738 unique cases
result in accuracy less than single precision and of these 87 cases
produce answers accurate to only around four decimal places.

Intel's initial public response stoked the flames, rather than calm
them: the company set up a fax-back system to brief worried users. The
message described the bug as a "subtle flaw" and estimated that the
average "spreadsheet user" would encounter the problem only once in
every 27,000 years. The idea that Intel wanted to get across was that
the rest of the PC was bound to fall apart before your Pentium processor
produced an incorrect answer. However the users immediately interpreted
this as meaning that around 3 spreadsheet users a day worldwide would be
getting erroneous results from their spreadsheets, with even more
frequent errors for people doing serious scientific work. Most
importantly, anyone doing iterative functions, where a variable is
repeatedly calculated, could see the inaccuracies snowball through their
calculations.

But above all, the question raised by the newsgroup was "Why didn't you
tell us as soon as you knew that there was a problem, rather than
keeping us in the dark?" The second question is invariably "Will you
replace my chip" to which the answer seems to be "probably not". Unless
you can show Intel that you are doing high powered mathematics that
needs full double precision figures Intel is unlikely to oblige. To-date
we only have two reported examples to draw on: one Pentium user; an
undergraduate mathematics student says that he had his request for a
replacement chip turned down, despite the fact that he could be doing
these complex calculations on his PC. The other user, using his computer
for medical analysis ("if you were going under the knife, would you want
to know that the analysis may be wrong?") says that he was put on the
list for a replacement after 10 minutes of discussion with an Intel rep.

Intel now admits that it should have been more open about the bug from
the start. It was, if you'll excuse the gallows humour, a miscalculation
on its part. But, it says, its initial engineering analysis convinced it
that the bug was very unlikely to ever affect users. So, the problem was
noted and forwarded through the usual channels to be fixed in the chip's
mask. To give a feel for how often this happens; the 486 mask has been
through around 30 revisions. The changes to the Pentium weren't rushed
through, the idea was to trickle them into the channel. It is incorrect
to say that Intel did nothing until Dr Nicely dropped his small
bombshell-ette - corrective action was already underway, it says. As a
matter of interest, Nicely is now consulting for Intel, and has signed a
non-disclosure agreement.

The message from Groves apologised for the situation, and revealed just
how problematical it was for the company: "We would like to find all
users of the Pentium processor who are engaged in work involving heavy
duty scientific/floating point calculations and resolve their problem in
the most appropriate fashion including, if necessary, by replacing their
chips with new ones. We don't know how to set precise rules on this, so
we decided to do it thru individual discussions between each of you and
a technically trained Intel person... I would like to ask for your
patience here." By Wednesday the company had received at least 5,000
calls worldwide. The problem is compounded, of course, by the fact that
Intel had been partially targeting Pentium machines as low-end
workstation replacements.

While Intel and users debate how often the error is likely to occur, the
question of how this will effect Intel's business in the short, medium
and long term also remains to be resolved. That depends on how long the
issue remains "news" and so remains in the public's mind. At the
beginning of the week, most financial analysts were saying that the
story was interesting, but suggested no one would remember it in a
week's time. Indeed an initial 2% slump in Intel's share price last
Friday, was followed by a swift recovery on Monday. Then in the middle
of the week analysts at Prudential Securities said they believed that
the technical difficulties with Pentium's FDIV instruction were more
deep-seated than previously thought, and a rumour spread on Wall St that
all the faulty Pentiums would be recalled. Intel denied both suggestions
and its share price stabilised again. However one of the most
interesting aspects of the story is the Internet's role in all this -
the story first fermented in the Internet newsgroups for some time
before bubbling over into the mainstream media. EE Times gets the credit
for first picking up the story on November 7th, though it buried it
somewhat. Since then however, CNN and the Washington Post/Wall St
Journal double-act have done their pieces, and the problem has appeared
in The Economist, which pointed out that some banks track interest rates
with a degree of precision that takes them into the danger zone. Even
Channel 4 News in the UK took a bite at the cherry; not its usual fodder
at all. Meanwhile IBM has announced that it will be replacing faulty
processors for its customers.

Intel's latest admission, that machines with the fixed chips will not
appear until next year is also guarantied to keep the story bubbling,
and no-doubt the trade mags will keep an eye on the situation, looking
for the first bug-free machine to ship. And of course, things will carry
on bubbling on the Internet, already users are talking about pursuing
Intel or its suppliers through the courts on the grounds of selling
faulty goods; there's nothing like a bit of litigation to keep people
interested. 

There is even the possibility that one of the leaner, hungrier x86
processor-clone makers could be tempted into running an advertising
campaign along the "99% Pentium-compatible, trust us, you don't want the
other 5%" lines. Doing so would be risky, positioning the advertiser in
a hostage-to-fortune position; still the US advertising market is a
rough and tumble place and no-doubt someone will take a dig at the Intel
Inside campaign, or 'Insel Intide' as the Economist dubbed it.

But perhaps the worst news for Intel is that the jokes have already
started. Every human or marketing disaster is swiftly followed by black
jokes; for a long time in the UK the car maker Skoda became the butt of
jokes about its build quality - "Q. How do you double the value of
Skoda? A. fill its tank with gasoline." It took a long time for the
company to shift that image, despite the fact that Volkswagen took over
the company and improved quality beyond recognition. Even today, Skoda
drivers in the UK walk around with a sheepish air. 

The fact that it took less than a week for the jokes such as: 

Q. How many Pentium engineers does it take to change a lightbulb? 
A. Errr, we're not quite sure, but don't worry, bulbs don't blow very
often.

to begin flying across the Internet suggests that Intel's damage control
has completely failed. The problem is that people no longer really care
that the bug is almost certain not to affect them; Pentium's inability
to count has already become an urban myth and the jokes will continue to
fly, irrespective of calming messages from Andy Groves on the Internet. 
(C) PowerPC News -  Free by mailing: add@power.globalnews.com