Newsgroups: comp.parallel,comp.sys.super
From: eugene@sally.nas.nasa.gov (Eugene N. Miya)
Reply-To: eugene@george.arc.nasa.gov (Eugene N. Miya)
Subject: [l/m 1/2/98] IBM and Amdahl  -- comp.parallel (20/28) FAQ
Organization: NASA Ames Research Center, Moffett Field, CA
Date: 20 Feb 1998 13:03:20 GMT
Message-ID: <6cjuuo$1n6$1@cnn.nas.nasa.gov>

Archive-Name: superpar-faq
Last-modified: 2 Jan 1998

20	IBM and Amdahl (the man, the article)
22	Grand challenges and HPCC
24	Suggested (required) readings
26	Dead computer architecture society
28	Dedications
2	Introduction and Table of Contents and justification
4	Comp.parallel news group history
6	parlib
8	comp.parallel group dynamics
10	Related news groups, archives and references
12
14
16	
18	Supercomputing and Crayisms


Keywords and phrases:
	Watson memo, marketing, COBOL, mythology, aspiring blue boxes, DEC,
	laws: Amdahl, other,
	challenge, debates, prizes: Karp, Bell, Hillis


Is a supercomputer, a mainframe?
================================

Short answer: yes.

Are all mainframes supercomputers?
================================

Short answer: No.

Think subsets.


Is a cluster a mainframe?
-------------------------

Maybe not.  (No one ever promised you yes or no answers.)


Why do people dump on IBM in this news group?
=============================================

We'll get to the good positive stuff in a moment, but first the chronology:

IBM is a late entry into the supercomputing market.
It's lateness, it's aloofness, and a certain degree of sales arrogance
has turned the readership of this news group off.  You merely need to
assert IBM's superiority, and you will find out.  Actually, you might
get dead silence, because the long timers in this group don't care much
anymore.  It's historic.  BASICALLY: don't worry about it.

A few people will say that IBM claims credit for inventing everything.
These people had not heard of the IBM VAMP.
Not to completely dump on IBM: IBM does make important peripheral technology.

"Some vice presidents of IBM assert that the speed of light goes just
a little bit faster in Armonk."  --An IBM Vice President [yes, it's humor]

Anonymous contribution:
I was in White Plains, and I heard their biggy---the chemist---say
"Supercomputing is just a marketing word."

This is, in fact, also the title of a paper.  It's these kinds of comments
which will continue to plague IBM.  The phrase does have a little truth to it.
It's also deriviative of Watson's comments about computers in general
from the late 1940s.  This group keeps a copy of the texts of Watson's memo
about the performance of the CDC 6600 [then the most powerful supercomputer].

%A Ad Emmen
%A Jaap Hollenberg
%T Supercomputer is just an advertisement word, an interview with
Enrico Clementi
%J Supercomputer
%C Amsterdam
%D July-September, 1986
%P 24-33
%X Never very technical, but interesting reading.

Don't forget that "B" stands for "Business" machines and many people
regard this "marketplace" as outside business machines.  The reality is
that the SP2 and many other IBM architectures aren't IBM-370 clones.

The problem created by a comment like
	"Supercomputing is just a marketing word"
is that potential customers have a hard time justifying
supercomputer purchases internally.

If you think we are rubbing IBM's face in dirt, we aren't.
We are listing a history based on net discussion.  Does IBM have
a supercomputer (currently)?  Depends on your perspective.
See: definition of a supercomputer on the earliest panels.
On the other hand, it could be argued that IBM has snubbed potential customers.
Touche.

The following stuff is included because 1) it's frequently mentioned,
2) it's frequently quoted out of context, incomplete, etc.  These are
from the published documents:


MEMORANDUM						August 28, 1963
 
Memorandum To: Messrs.  A. L. Williams
                        T. V. Learson
                        H. W. Miller, Jr.
                        E. R. Piore
                        O. M. Scott
                        M. B. Smith
                        A. K. Watson

Last week CDC had a press conference during which they officially
announced their 6600 system. I understand that in the laboratory
developing this system there are only 34 people, "including the
janitor." Of these, 14 are engineers and 4 are programmers, and
only one person has a PhD., a relatively junior programmer. To
the outsider, the laboratory appeared to be cost conscious, hard
working, and highly motivated.
 
Contrasting this modest effort with our own vast development
activities, I fail to understand why we have lost our industry
leadership by letting someone else offer the world's most
powerful computer. At Jenny Lake, I think top priority should
be given to a discussion as to what we are doing wrong and how we
should go about changing it immediately.
 
                                        T. J. Watson, Jr.
TJW,Jr:jmc
cc: Mr. W. B. McWhirter

Reproduced in A Few Good Men from Univac.
 

On hearing about this memo:
"It seems Mr. Watson has answered his own question."
--Seymour Cray 
# http://cip2.e-technik.uni-erlangen.de:8080/hyplan/dl4rcg/texte/quotes.col 
#	This link now appears to be dead.
#	I'll remove this link if a suitable replacement is not found in
#	a few months


What is the significance of this memo?
--------------------------------------

1) As pointed out by the book "A Few Good Men from Univac:"
The bureaucratic overhead is less in a small intimate organization.
[i.e., communication requiring complete connectivity is roughly an O(n^2)
problem.]
2) It pulls a little bit of a slap at education (PhD).  [The presence of
Woz, Jobs, and Gates is some ways makes this less impressive (all rich
men lacking more than two years of college or after making their fortune,
none building supers of course).]


	COBOL?
	------
People have on occasion asked for a COBOL compiler in this group.
Comp.lang.cobol is a better place to ask.

COBOL-X: mentioned in the STAR-OS manual.


It is difficult to assess the role of the IBM clones: Amdahl and in particular
the major Japanese vendors: Fujitsu, NEC, and Hitachi.  Many exist.
IBM will likely continue to assert itself as a "player" in this market.


	What does IBM imply?
	--------------------
	32-bit, byte-oriented (big-endian vs. little-endian debate),
	CISC architecture, EBCDIC characters, radix-50 floating point.
	IBM Channel I/O rates (3.3-4.3 MB/S).

or
	PCs.
	Intel or RS6K processor.
	I/O: SCSI-Bus, Micro-Channel.
	It is easy to challenge this, but it is even easier to confirm these.


See: definition of a supercomputer on the earliest panels.


Are these "super" features?
---------------------------
Generally no.

It should be noted that some communities can do 32-bit work
(fixed-point, image processing is typical).  The question then to ask is:
Does 32-bit mode run twice as fast as 64-bit?  What happens with
DOUBLE PRECISION declarations?  What happens with integers?
	Good questions.  8^)


IBM Positives
-------------

The SP/2:
	http://ibm.tc.cornell.edu/ibm/pps/doc/
	http://lscftp.kgn.ibm.com/pps/vibm/index.html
	http://www.tc.cornell.edu/~jeg/config.html


%A J. M. Kuzela
%T IBM POWERparallel System - SP2 Performance Measurements
%R manual
%I Power Parallel Systems, IBM
%D October 1994


Interesting research (not business) machines:
	GF-11 (11 GLFOPS)
	TF-1 (proposed, 1 TFLOPS sustained [3-5 peak])
	RP-3 (Research Parallel Processor Project)
	Vulcan
	lCAP
	....
	VAMP

Production machines
	3090-vector facility


What's the Mythical-MIPS/MegaFLOPS?
-----------------------------------

It's a corollary to Fred Brooks (one of the architects of the IBM OS/360
operating system) law/book The Mythical Man-Month.
	If a woman can have a baby in 9 months, then
	clearly 9 women can have a baby in one month.

Generalize this to parallel compution.
See also Amdahl's Law (see Amdahl also worked on the 360).
	 ------------

The second edition of The Mythical Man-Month was just released.


Why should the FLOPS section be mentioned under IBM?
----------------------------------------------------

The metric is generally associated first with IBM machines.
Also see the above note.


What's a TFLOP?  TeraFLOPS?
---------------------------
What comes after Tera?
======================

prefix  symbol  multiplier
--------------------------
tera      T      10^12
peta      P      10^15
exa       E      10^18
zetta     Z      10^21
yotta     Y      10^24


What's a FLOP?
--------------

FLOating Point operation.  FLOPS (FLOP per Second).

"The greatest mistake made by computer science was to invent floating point..."
	--Robert Holzman

"When does a FLOP become a success?"

Burton's paper.

Several definitions of floating point equivalence exist.
Perhaps the most notable is Don Knuth's
	Empirical Study of Fortran Programs in the first volume of
	Software: Practice and Experience.
This was not a benchmarking study per se.

Another equivalence is presented by Dave Bailey and John Barton
in the original NAS Kernel benchmark technical report.  They were
not aware of Knuth at the time.

Without doubt there are other equivalences.


%A Thomas Sterling
%A Paul Messina
%A Paul H. Smith
%T Enabling Technologies for Petaflops Computing
%S Science and Engineering Computation Series
%I MIT Press
%C Cambridge, MA
%D 1995
%i Caltech Concurrent Supercomputing Facilities, Pasadena, CA
%r CCSF-45
%d July 1994
%K book, text, supercomputers, parallel processing,
%O ISBN 0262691760
%X Workshop on Enabling Technologies for Peta(FL)OPS Computing (Feb. 1994).
Dreaming?
%X Perhaps the most interesting assertion is that PetaByte memories
aren't needed.  Only time will tell on this one.  Not optically optimistic.


PETAFLOPS, petaflops reference

http://cesdis.gsfc.nasa.gov/petaflops/archive/other/collection.html


We are diverging.


What's this I hear about "Amdahl's law/rule?"
=============================================

See the end.  The text of the entire article is appended as a reference.


When are we going to have automatic parallelization?
====================================================

Right after we have automatic programming as conceived in the 1950s/1960s.


What's the Karp Prize?  [Karp Challenge]
======================================

Alan Karp, then at the IBM Palo Alto Scientific Center (different from
the Research Centers), now at H-P Labs, wrote a CACM Letter to the Editor
challenging the MIMD community to show a 200 times speed up on
a real-world application (test programs and "embarassingly parallel" programs
did not count).

%A Alan Karp
%T Letter to the Editor
%J Communications of the ACM, Forum
%V 29
%N 2
%D February 1986
%P 87
 
%A Alan Karp
%T Letter to the Editor, Follow-up, CACM Forum
%J Communications of the ACM
%V 30
%N 1
%D January 1987
%P 7


Improved text from Alan (Net):

    I have just returned from the Second SIAM Conference on
    Parallel Processing for Scientific Computing in Norfolk,
    Virginia. There I heard about 1,000 processor systems, 4,000
    processor systems, and even a proposed 1,000,000 processor
    system. Since I wonder if such systems are the best way to
    do general purpose, scientific computing, I am making the
    following offer.
     
        I will pay $100 to the first person to demonstrate a
        speed-up of at least 200 on a general purpose, MIMD
        computer used for scientific computing. This offer
        will be withdrawn at 11:59 PM on 31 December 1995.
     
    Some definitions are in order.
     
    Speed-up: The time taken to run an application on one
        processor using the best sequential algorithm divided
        by the time to run the same application on N
        processors using the best parallel algorithm.
     
    General purpose: The parallel system should be able to run
        a wide variety of applications. For the purposes of
        this test, the machine must run 3 different programs
        that demonstrate its ability to solve different
        problems. I suggest a large, dynamic structures
        calculation, a trans-sonic fluid flow past a complex
        barrier, and a large problem in econometric modelling.
        These are only suggestions; I will be quite flexible
        in the choice of the problems. Several people have
        volunteered to adjudicate disputes in the selection of
        the applications.
     
    Application: The problems run for this test must be
        complete applications, no computational kernels
        allowed. They must contain all input, data transfers
        from host to parallel processors, and all output.
        The problems chosen should be the kind of job that a
        working scientist or engineer would submit as a batch
        job to a large supercomputer. In addition, I am
        arbitrarily disqualifying all problems that Cleve
        Moler calls "embarrassingly parallel". These include
        signal processing with multiple independent data
        streams, the computation of the Mandelbrot set, etc.
     
    There are some rather obvious ground rules.
     
        Simulations of the hardware are not permitted.  I am
        looking for a demonstration on a running piece of
        hardware.
    
        The same problem should be run on the sequential and
        parallel processors.
     
        It is not fair to cripple the sequential processor.
        For example, if your operating system uses 99% of one
        processor, the single processor run will spend all its
        time in the operating system. Super-linear speed-up as
        the number of processors is increased is evidence of
        this problem.
     
        It is not fair to memory starve the single processor
        run. If you have 100K words of memory on each of 1,000
        processors, and you run a 10 MW problem, it is not
        fair for the sequential run to be made on a 100 KW
        processor. After all, we are not interested in seeing
        the impact of additional memory; we want to see how
        much the extra CPUs help.
     
    It may not be possible to follow all these additional rules.
    For example, you can't be expected to build a single processor
    with lots of extra memory or write a new operating system just
    for this test. However, you should do your best to make the
    demonstration fair. A third party has agreed to adjudicate any
    scaling disputes.
     
    Anyone claiming this prize will be expected to present his or
    her results at a suitable conference. At the end of the
    presentation I will have the audience vote on whether the
    demonstration was successful. Remember, the purpose of the
    bet is to force developers to demonstrate the utility of
    massively parallel systems, not to show they can find holes
    in the rules.
     
    Alan Karp
    na.karp@su-score.arpa


The Prize was won by the team of Gustafson and Montry (See refs below).

	It wasn't a lot of money.
	-------------------------
	It was never intended to be.

	As noted: this news group and CACM have very academic biases.
	Several "prizes" have been proposed in and out of computing which were
	more for the "challenge:"
	Human powered flight (various degrees),
	cracking public key encryption [D-H was claimed, but RSA still holds],
	etc.  Karp's prize was just another exercise of this type.
	You didn't have to apply.

	Who is Alan Karp?
	-----------------

	Alan was trained as an astronomer and basically works as a
	computational numerical physicist.

References (the few main ones) to the winners:

%A John L. Gustafson
%A Gary R. Montry
%A Robert E. Benner
%Z Sandia National Labs.
%T Development of Parallel Methods for a 1024-Processor Hypercube
%J SIAM Journal on Scientific and Statistical Computing
%V 9
%N 4
%D July 1988
%K fluid dynamics, hypercubes, MIMD machines, multiprocessor performance,
parallel computing, structural analysis, supercomputing, wave mechanics,
%K grequired91,
%K jlh, dp, hds, dar,
%X Introduces concept of operation efficiency, scaled speed-up.
Also covers communication cost, beam strain analysis, and a bit on
benchmarking.  Winner of 1988 Bell and Karp Prizes.
%X (JLH & DP) This paper report interesting results in using a
large scale NCUBE.  The authors won the Gordon Bell prize with their work.
They also suggest the idea of problem scaling to overcome the limitations of
sequential portions of an application.
%X (DAR) some application flavor mixed with performance analysis.

%A J. Dongarra
%A A. Karp
%A K. Kennedy
%T First Gordon Bell Awards Winners Achieve Speedup of 400
%J IEEE Software
%V 5
%N 3
%D May 1988
%P 108-112
%X Award for achieving >= 200x computing speedup, Sandia [Montry
Gustafson].  An announcement and news article rather than
a technical paper.

%A J. Voelcker
%T Sandia Trio Wins Awards at Compcon
%J The Institute
%I IEEE
%V 12
%N 5
%D May 1988
%P 8
%K Karp/Bell parallel award for achieving >= 200x computing speedup
%X Winners got near linear speedup: 1020 speedup on 1024 processors in the
best case.  They did it on Ncube 10-D hypercube with 1024 processors.  They
developed mathematical methods and algorithms to minimize communications
between processors, and to overlap communication and computation.  Also
used a logarithmic system-level algorithm to load a program into the
hypercube in 11 steps, rather than inputing the same program serially into
each of the 2^10 processors.  The problems were wave motion through
deflectors, vortex motion in a fluid, and strain analysis of a cantilevered
beam.
%X Above refer entry does not even mention names of the people who
did the work.  Search for Gary Montry or John Gustafson
The above is just an article. Not a paper.  See Compcon proc.

Alt:

%A R. E. Benner
%A J. L. Gustafson
%A R. E. Montry
%T Development and analysis of scientific application programs
on a 1024-processor hypercube
%R SAND 88-0317
%I Sandia National Laboratories
%C Albuquerque, N.M.
%D February 1988

%A John L. Gustafson
%A Gary R. Montry
%T Programming and Performance on a Cube-Connected Architecture
%J Compcon '88
%I IEEE
%C San Francisco, CA.
%D February -March, 1988
%P 97-100
%K NCUBE hypercubes, Ensemble Paradigm, Language, Debugging,
Communications, Load Balance, Alan Karp Prize, Gordon Bell Prize,


What's the Bell Prize?
======================

Following Karp's lead.
The list of Judges are
	Ken Muira (Fujitsu),
	Horst Simon (Now NERSC at LBL was SGI was NASA Ames, NAS was ...), and
	Jack Dongarra (ORNL UTK was ANL).

Gordon Bell Awards:

Taken from IEEE Computer, July 1995, p79:

"The year's (1995) competition continues the pattern of little or no
gain in performance and price/performance following a year with a
large increase," said Alan Karp of Hewlett-Packard, who serves as
Gorden Bell Prize administrator and as one of the three judges.

"While this year's best performance of 0.1 Tflops (trillion
floating-point operations per second) is impressive," Karp continuted,
"it doesn't equal that from last year.  The price/performance only
increased 40 percent over last year's winner to 3.6 Gflops (billion
floating-point operations per second) per $1 million.  In contrast to
other years with modest performance gains, none of the entrants who
solved significantly more difficult problems reported impressive
performance." 

Michael Heath of the Univ of Illinois and Don Heller of Ames
Laboratory at Iowa State University are also serving as competition judges. 

Gorden Bell, a former National Science Foundation division director
and now an independent consultant in Los Altos, CA, [currently working
for Microsoft Corp.] is sponsoring the prizes for 10 years to
promote practical parallel-processing research.
This is the eighth year of the prize.

Entries were solicited in three categories: PERFORMANCE, in which the
submitted program must be shown to run faster than any other
comparable engineering or scientific application; PRICE/PERFORMANCE,
in which the entrant must show that the performance of the application
divided by the list price of the smallest system needed to achieve
the reported performance is better than that of any other entry; and
COMPILER PARALLELIZATION, in which judges look for the combination of
compiler and application that generates the greatest speedup.

Computer magazine coordinates the entries.

	It isn't a lot of money.
	-------------------------
	It was never intended to be.

	Who is Gordon Bell?
	-------------------
	Currently, a consultant (Microsoft),
	formerly a CMU EE/CS professor responsible for the development
	of many machines most notability the DEC VAX and PDP-11 architectures,
	and multiprocessors including the C.mmp and C.vmp.
	Gordon is noted for "salty" language and strong opinions.
	He has also worked with Encore and Ardent.  He recently left SUN
	for Microsoft.

	References
	----------

%A Jack Dongarra
%A Alan H. Karp
%A Ken Kennedy
%T First Gordon Bell Awards Winners Achieve Speedup of 400
%J IEEE Software
%V 5
%N 3
%D May 1988
%P 108-112

%A Jim Browne
%A Jack Dongarra
%A Alan H. Karp
%A Ken Kennedy
%A Dave Kuck
%T Gordon Bell Prize 1988
%J IEEE Software
%V 6
%N 3
%D May 1989
%P 78-85

%A Jack Dongarra
%A Alan H. Karp
%A Ken Kennedy
%A David Kuck
%T Gordon Bell Prize 1989
%J IEEE Software
%V 7
%N 3
%D May 1990
%P 100-110

%A Jack Dongarra
%A Alan H. Karp
%A Ken Miura
%A Horst Simon
%T Gordon Bell Prize 1990
%J IEEE Software
%V 8
%N 3
%D May 1991
%P 92-102

%A Alan H. Karp
%A Ken Miura
%A Horst Simon
%T Gordon Bell Prize 1992
%J IEEE Computer
%V 26
%N 1
%D January 1993
%P 77-82

%A Alan H. Karp
%A Don Heller
%A Horst Simon
%T 1993 Gordon Bell Prize Winners
%J IEEE Computer
%V 27
%N 1
%D January 1994
%P 69-75


I don't have the reference for 1995 which appeared in the January Issues of
IEEE Computer. Also there was a skip by six months by moving the prize from the
spring (COMPCON) to the fall (Supercomputing). This resulted in a shift of the
publication, i.e. the 1992 awards were published in January 1993, etc.


What's the Bell-Hillis debate? (or isn't this supposed to be an IBM panel?)
==============================

Start:

%A George Gilder
%T Hillis vs. Bell: the law of the microcosm
%J Upside
%V ?
%N ?
%D January 1992
%P 24-42
%X Non-technical but interesting characterization (you don't have to agree
with some of the simplifications) of the Hillis-Bell bet on performance
and market share of distributed memory massive-parallel machines versus
shared-memory machines by the end of 1995.
Sides with Bell.
%X Update: January 1996 issues: Hillis pays off Bell (Bell wins).

End:

%A Willie Schatz
%T Laying a Bet To Rest
%J Upside
%V
%N
%D January 1996
%P 39-45
%K massive parallelism,
%O http://www.upside.com/resource/print/9601/teraflop.html
%X See 1992 Upside article on the Gordon Bell-Danny Hillis bet.
%X Gordon Bell claims to be the winner of his bet with Danny Hillis.
The author believes that the real issue isn't who has the fast
machine, but whether anybody will still make Big Iron machines in the
future.  The author believes specifically that there will be a few
niche markets for this kind of hardware, but that general purpose
machines with lots of processors are dead.  Indeed, the author
believes that the market never really happened and that it was really
fed by Cold War paranoia.  The facing page of the cover is a drawing
showing Danny and someone else in a graveyard, with two gravestones
reading RIP Thinking Machines 1983-94 and Kendall Square Research
1986-95, with Danny rolling a pair of dice.  Sprinkled through the
piece is a collection of gravestones with all of the big supercomputer
vendors.  The author doesn't particularly side with either Hillis or Bell.
There's a sidebar about Tera Computer and its recent IPO entitled "$40
Million Vaporware, Anyone? - How's this for a hot prospect?"


Stock value of companies that make HPC? (Okay, only some of them).
 
The buzzword value of the Hot New Concepts? Most buzzwords burst onto
the scene, dominate public discussion for a year or two, then fade
into nothingless; leaving only a slight imprint on history.
 
-- 
Stanley Chow;  schow@bnr.ca, stanley.chow-ott@nt.com; (613) 763-2831


Bell's Laws
===========

1. You can never have too much address space.
 
[Based on experience with the PDP-11 - like many other
computers, it lived so long in the marketplace that it
ran out of bits with which to address memory.]
 
2. People are smarter than machines.
 
[In other words, regardless of how crazy a computer's
architecture is somebody will be able to find an application
and code it up in a way that makes that computer run fast.]
 
3. ...
 
[I forget what I had decided was Bell's third law - but
it was something he said pretty recently.  If I remember
I'll pass it on.]


Back to IBM....

Other IBM Misc. attempts (non-products, special purpose processors)
===================================================================

QCD engines: GF-11, TF-1: Ever reach 11 GFLOPS?  "Not quite" and "Never built."


Chess Machines
--------------

Current architectures are slightly more general purpose than older machines,
but notable these days are Deep Blue and Deeper Blue.  They appear to be
variations of SP2 architectures.  By themselves, it is doubtful that commercial
versions of these architectures will appear.  Keywords: brute force.
Are these supecomputers?  Yes, maybe not general purpose, but qualified as
SPECIAL PURPOSE, the use of the term can't be considered objectionable.
Chess machines aren't among the Grand Challenge problems.

On the other han, this community does not push image generation machines
used in flight simulators as major supercomputers.


What's Amdahl's Law?
====================

What's this I hear about "Amdahl's other law/rule?"
===================================================


The following is probably one of the single most important papers
(several people have recommended this article be read weekly) challenging
parallelism.  Well, we'll make it monthly here.  It's kinda like UFOs.


The following document is Copyright AFIPS 1967.
Copyright pending (checked with the Copyright Clearance Center).
This document's use shall be for non-profit educational use only.

Citation:

%A Dr. Gene M. Amdahl
%Z International Business Machines Corporation, Sunnyvale, California
%T Validity of the single processor approach to achieving
large scale computing capabilities
%J AFIPS Proc. of the SJCC
%V 30
%D 1967
%P 483-485


Validity of the single processor
approach to achieving large scale
computing capabilities

DR. GENE M. AMDAHL
International Business Machines Corporation
Sunnyvale, California

	The indented table structure is editorial to improve readability.
	The original paper did not have it.

INTRODUCTION

For a decade prophets have voiced the contention that the organization
of a single computer has reached its limits and that truly significant
advances can be made only by interconnection of a multiplicity of computers
in such a manner as to permit cooperative solution.  Variously the
proper direction has been pointed out as general purpose computers
with a generalized interconnection of memories, or as specialized computers
with geometrically related memory interconnections and controlled by one or
more instruction streams.

Demonstration is made of the continued validity of the single processor
approach and of the weaknesses of the multiple processor approach in
terms of application to real problems and their attendant irregularities.

The arguments presented are based on statistical characteristics of
computation on computers over the last decade and upon the operational
requirements within problems of physical interest.  An additional reference
will be one of the most thorough analyses of relative computer capabilities
currently published --
"Changes in Computer Performance," *Datamation*, September 1966,
Professor Kenneth E. Knight, Stanford School of Business Administration.

The first characteristic of interest is the fraction of the computational
load which is associted with data management housekeeping.  This fraction
is very nearly constant for about ten years, and accounts for 40% of the
executed instructions in production runs.  In an entirely dedicated
special purpose environment this might be reduced by a factor of two,
but it is highly improbably [sic] that it could be reduced by a factor of
three.  The nature overhead appears to be sequential so that it is likely
to be amenable to parallel processing techniques.  Overhead alone would
then place an upper limit on throughput of five to seven times the sequential
processing rate, even if housekeeping were done in a separate processor.
The non-housekeeping part of the problem could exploit at most a processor
of performance three to four time the performance of the
housekeeping processor.  A fairly obvious conclusion which can be drawn at
this point is that the effort expended on achieving high parallel processing
rates is wasted unless it is accompanied by achievement in sequential
processing rates of very nearly the same magnitude.

Data management housekeeping is not the only problem to plague oversimplified
approaches to high speed computation.  The physical problems which are of
practical interest tend to have rather significant complications.
Examples of these complications are as follows:
	boundaries are likely to be irregular;
	interiors are likely to be inhomogeneous;
	computations required may be dependent on the states of variables
		at each point;
	propagation rates of different physical effects may be quite different;		the rate of convergence, or convergence at all, may be strongly
		dependent on sweeping through the array along different
		axes on succeeding passes;
	etc.
The effect of each of these complications is very severe on any
computer organization based on geometrically related processors in
a paralleled processing system.  Even the existence of rectangular boundaries
has the interesting property that for spatial dimension N there are $3 sup N$
different point geometries to be dealt with in a nearest neighbor computation.
If the second nearest neighbor were also involved, there would be $5 sup N$
different point geometries to contend with.  An irregular boundary
compounds this problem as does an inhomogeneous interior.  Computations
which are dependent on the states of variables would require the processing
at each point to consume approximately the same computational time as the
some of computations of all physical effects within a large region.
Differences or changes in propagation rate may affect the mesh point
relationships.

Ideally the computation of the action of the neighboring points upon the
point under consideration involves their values at a previous time
proportional to the mesh spacing and inversely proportional to the
propagation rate.  Since the time step is normally kept constant,
a faster propagation rate for some effects would imply interactions
with more distant points.  Finally the fairly common practice of sweeping
through the mesh along different axes on succeeding passes poses problems of
data management which affects all processors, however it affects
geometrically related processors more severely by requiring transposing
all points in storage in addition to the revised input-output scheduling.
A realistic assessment of the effect of these irregularities on the actual
performance of a parallel processing device, compared to its performance on
a simplified and regularized abstraction of the problem yields a
degradation in the vicinity of one-half to one order of magnitude.

To sum up the effects of data management housekeeping and of problem
irregularities, the author has compared three different machine organizations
involving equal amounts of hardware.
Machine A has thirty two arithmetic execution units controlled by a single
instruction stream [SIMD].
Machine B has pipelined arithmetic execution units with up to three overlapped
operations on vectors of eight elements.
Machine C has the same pipelined execution units, but initiation of individual
operations at the same rate as Machine B permitted vector operations.
The performance of these three machines are plotted in Figure 1 as a function
of the fraction of the number of instructions which permit parallelism.
The probable region of operation is centered around a point corresponding to
25% data management overhead and 10% of the problem operations forced to be
sequential.

The historic performance versus cost of computers has been explored
very thoroughly by Professor Knight.  The carefully analyzed data he
presents reflects not just the execution times for arithmetic operations
and cost of minimum of recommended configurations.  He includes memory capacity
effects, input-output overlap experienced, and special functional capabilities.
The best statistical fit obtained corresponds to a performance proportional
to the square of the cost at any technological level.  This result very
effectively supports the often invoked "Grosch's law."  Utilizing this
analysis, one can argue that if twice the amount of hardware were exploited
in a single systems, one could expect to obtain four times the performance
The only difficulty is involved in knowing how to exploit this additional
hardware.  At any point in time it is difficult to foresee how the previous
bottlenecks in a sequential computer will be effectively overcome.
If it were easy they would not have been left as bottlenecks.
It is true by historical example that the successive obstacles have been
hurdled, so it is appropriate to quote the Rev. Adam Clayton Powell --
"Keep the faith, baby!"  If alternative one decided to improve the
performance by putting two processors side by side with shared memory,
one would find approximately 2.2 times as much hardware.  The additional
two tenth in hardware accomplishes the crossbar switching for the sharing.
The resulting performance achieved would be about 1.8.
The latter figure is derived from the assumption that each processor
utilized half of the memories about half of the time.
The resulting memory conflicts in the shared system would extend the
execution of one to two operations by one quarter of the time.
The net result is a price performance degradation to 0.8 rather than an
improvement to 2.0 for the single larger processor.

Comparative analysis with associative processors is far less easy and obvious.
Under certain conditions of regular formats there is a fairly direct approach.
Consider an associative processor designed for pattern recognition, in which
decisions within individual elements are forwarded to some set of other
elements.  In the associative processor design the receiving elements
would have a set of source addresses which recognize by associative techniques
whether or not it is to receive the decision of the currently declaring
declaring element.  To make a corresponding special purpose non-associative
processor one would consider a receiving element and its source addresses as
an instruction, with binary decisions maintained in registers.  Considering
the use of thin film memory, an associative cycle would be longer than a
non-destructive read cycle.  In such a technology the special purpose
non-associative processor can be expected to take about one-fourth as
many memory cycles as the associative version and only about one-sixth
the time.  These figures were computed on the full recognition task,
with somewhat differing ratios in each phase.  No blanket claim is
intended here, but rather that each requirement should be investigated
from both approaches.

END of paper.

See also
	CRI UNICOS has a useful command amlaw(1) which does simple
	number crunching on Amdahl's Law.
			    ------------
On a CRI system type: `man amlaw`.

 
                        1         1
      S =  lim    ------------ = ---
           P->oo        1-s       s
                   s +  ---
                         P
 
  S = speedup which can be achieved with P processors
  s (small sigma) = proportion of a calculation which is serial
  1-s = parallelizable portion

Speedup_overall 
 
= 1 / ((1 - Fraction_enhanced) + (Fraction_enhanced/Speedup_enhanced))


Articles to parallel@ctc.com (Administrative: bigrigg@ctc.com)
Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel