Newsgroups: comp.parallel,comp.sys.super From: eugene@sally.nas.nasa.gov (Eugene N. Miya) Reply-To: eugene@george.arc.nasa.gov (Eugene N. Miya) Subject: [l/m 1/2/98] IBM and Amdahl -- comp.parallel (20/28) FAQ Organization: NASA Ames Research Center, Moffett Field, CA Date: 20 Feb 1998 13:03:20 GMT Message-ID: <6cjuuo$1n6$1@cnn.nas.nasa.gov> Archive-Name: superpar-faq Last-modified: 2 Jan 1998 20 IBM and Amdahl (the man, the article) 22 Grand challenges and HPCC 24 Suggested (required) readings 26 Dead computer architecture society 28 Dedications 2 Introduction and Table of Contents and justification 4 Comp.parallel news group history 6 parlib 8 comp.parallel group dynamics 10 Related news groups, archives and references 12 14 16 18 Supercomputing and Crayisms Keywords and phrases: Watson memo, marketing, COBOL, mythology, aspiring blue boxes, DEC, laws: Amdahl, other, challenge, debates, prizes: Karp, Bell, Hillis Is a supercomputer, a mainframe? ================================ Short answer: yes. Are all mainframes supercomputers? ================================ Short answer: No. Think subsets. Is a cluster a mainframe? ------------------------- Maybe not. (No one ever promised you yes or no answers.) Why do people dump on IBM in this news group? ============================================= We'll get to the good positive stuff in a moment, but first the chronology: IBM is a late entry into the supercomputing market. It's lateness, it's aloofness, and a certain degree of sales arrogance has turned the readership of this news group off. You merely need to assert IBM's superiority, and you will find out. Actually, you might get dead silence, because the long timers in this group don't care much anymore. It's historic. BASICALLY: don't worry about it. A few people will say that IBM claims credit for inventing everything. These people had not heard of the IBM VAMP. Not to completely dump on IBM: IBM does make important peripheral technology. "Some vice presidents of IBM assert that the speed of light goes just a little bit faster in Armonk." --An IBM Vice President [yes, it's humor] Anonymous contribution: I was in White Plains, and I heard their biggy---the chemist---say "Supercomputing is just a marketing word." This is, in fact, also the title of a paper. It's these kinds of comments which will continue to plague IBM. The phrase does have a little truth to it. It's also deriviative of Watson's comments about computers in general from the late 1940s. This group keeps a copy of the texts of Watson's memo about the performance of the CDC 6600 [then the most powerful supercomputer]. %A Ad Emmen %A Jaap Hollenberg %T Supercomputer is just an advertisement word, an interview with Enrico Clementi %J Supercomputer %C Amsterdam %D July-September, 1986 %P 24-33 %X Never very technical, but interesting reading. Don't forget that "B" stands for "Business" machines and many people regard this "marketplace" as outside business machines. The reality is that the SP2 and many other IBM architectures aren't IBM-370 clones. The problem created by a comment like "Supercomputing is just a marketing word" is that potential customers have a hard time justifying supercomputer purchases internally. If you think we are rubbing IBM's face in dirt, we aren't. We are listing a history based on net discussion. Does IBM have a supercomputer (currently)? Depends on your perspective. See: definition of a supercomputer on the earliest panels. On the other hand, it could be argued that IBM has snubbed potential customers. Touche. The following stuff is included because 1) it's frequently mentioned, 2) it's frequently quoted out of context, incomplete, etc. These are from the published documents: MEMORANDUM August 28, 1963 Memorandum To: Messrs. A. L. Williams T. V. Learson H. W. Miller, Jr. E. R. Piore O. M. Scott M. B. Smith A. K. Watson Last week CDC had a press conference during which they officially announced their 6600 system. I understand that in the laboratory developing this system there are only 34 people, "including the janitor." Of these, 14 are engineers and 4 are programmers, and only one person has a PhD., a relatively junior programmer. To the outsider, the laboratory appeared to be cost conscious, hard working, and highly motivated. Contrasting this modest effort with our own vast development activities, I fail to understand why we have lost our industry leadership by letting someone else offer the world's most powerful computer. At Jenny Lake, I think top priority should be given to a discussion as to what we are doing wrong and how we should go about changing it immediately. T. J. Watson, Jr. TJW,Jr:jmc cc: Mr. W. B. McWhirter Reproduced in A Few Good Men from Univac. On hearing about this memo: "It seems Mr. Watson has answered his own question." --Seymour Cray # http://cip2.e-technik.uni-erlangen.de:8080/hyplan/dl4rcg/texte/quotes.col # This link now appears to be dead. # I'll remove this link if a suitable replacement is not found in # a few months What is the significance of this memo? -------------------------------------- 1) As pointed out by the book "A Few Good Men from Univac:" The bureaucratic overhead is less in a small intimate organization. [i.e., communication requiring complete connectivity is roughly an O(n^2) problem.] 2) It pulls a little bit of a slap at education (PhD). [The presence of Woz, Jobs, and Gates is some ways makes this less impressive (all rich men lacking more than two years of college or after making their fortune, none building supers of course).] COBOL? ------ People have on occasion asked for a COBOL compiler in this group. Comp.lang.cobol is a better place to ask. COBOL-X: mentioned in the STAR-OS manual. It is difficult to assess the role of the IBM clones: Amdahl and in particular the major Japanese vendors: Fujitsu, NEC, and Hitachi. Many exist. IBM will likely continue to assert itself as a "player" in this market. What does IBM imply? -------------------- 32-bit, byte-oriented (big-endian vs. little-endian debate), CISC architecture, EBCDIC characters, radix-50 floating point. IBM Channel I/O rates (3.3-4.3 MB/S). or PCs. Intel or RS6K processor. I/O: SCSI-Bus, Micro-Channel. It is easy to challenge this, but it is even easier to confirm these. See: definition of a supercomputer on the earliest panels. Are these "super" features? --------------------------- Generally no. It should be noted that some communities can do 32-bit work (fixed-point, image processing is typical). The question then to ask is: Does 32-bit mode run twice as fast as 64-bit? What happens with DOUBLE PRECISION declarations? What happens with integers? Good questions. 8^) IBM Positives ------------- The SP/2: http://ibm.tc.cornell.edu/ibm/pps/doc/ http://lscftp.kgn.ibm.com/pps/vibm/index.html http://www.tc.cornell.edu/~jeg/config.html %A J. M. Kuzela %T IBM POWERparallel System - SP2 Performance Measurements %R manual %I Power Parallel Systems, IBM %D October 1994 Interesting research (not business) machines: GF-11 (11 GLFOPS) TF-1 (proposed, 1 TFLOPS sustained [3-5 peak]) RP-3 (Research Parallel Processor Project) Vulcan lCAP .... VAMP Production machines 3090-vector facility What's the Mythical-MIPS/MegaFLOPS? ----------------------------------- It's a corollary to Fred Brooks (one of the architects of the IBM OS/360 operating system) law/book The Mythical Man-Month. If a woman can have a baby in 9 months, then clearly 9 women can have a baby in one month. Generalize this to parallel compution. See also Amdahl's Law (see Amdahl also worked on the 360). ------------ The second edition of The Mythical Man-Month was just released. Why should the FLOPS section be mentioned under IBM? ---------------------------------------------------- The metric is generally associated first with IBM machines. Also see the above note. What's a TFLOP? TeraFLOPS? --------------------------- What comes after Tera? ====================== prefix symbol multiplier -------------------------- tera T 10^12 peta P 10^15 exa E 10^18 zetta Z 10^21 yotta Y 10^24 What's a FLOP? -------------- FLOating Point operation. FLOPS (FLOP per Second). "The greatest mistake made by computer science was to invent floating point..." --Robert Holzman "When does a FLOP become a success?" Burton's paper. Several definitions of floating point equivalence exist. Perhaps the most notable is Don Knuth's Empirical Study of Fortran Programs in the first volume of Software: Practice and Experience. This was not a benchmarking study per se. Another equivalence is presented by Dave Bailey and John Barton in the original NAS Kernel benchmark technical report. They were not aware of Knuth at the time. Without doubt there are other equivalences. %A Thomas Sterling %A Paul Messina %A Paul H. Smith %T Enabling Technologies for Petaflops Computing %S Science and Engineering Computation Series %I MIT Press %C Cambridge, MA %D 1995 %i Caltech Concurrent Supercomputing Facilities, Pasadena, CA %r CCSF-45 %d July 1994 %K book, text, supercomputers, parallel processing, %O ISBN 0262691760 %X Workshop on Enabling Technologies for Peta(FL)OPS Computing (Feb. 1994). Dreaming? %X Perhaps the most interesting assertion is that PetaByte memories aren't needed. Only time will tell on this one. Not optically optimistic. PETAFLOPS, petaflops reference http://cesdis.gsfc.nasa.gov/petaflops/archive/other/collection.html We are diverging. What's this I hear about "Amdahl's law/rule?" ============================================= See the end. The text of the entire article is appended as a reference. When are we going to have automatic parallelization? ==================================================== Right after we have automatic programming as conceived in the 1950s/1960s. What's the Karp Prize? [Karp Challenge] ====================================== Alan Karp, then at the IBM Palo Alto Scientific Center (different from the Research Centers), now at H-P Labs, wrote a CACM Letter to the Editor challenging the MIMD community to show a 200 times speed up on a real-world application (test programs and "embarassingly parallel" programs did not count). %A Alan Karp %T Letter to the Editor %J Communications of the ACM, Forum %V 29 %N 2 %D February 1986 %P 87 %A Alan Karp %T Letter to the Editor, Follow-up, CACM Forum %J Communications of the ACM %V 30 %N 1 %D January 1987 %P 7 Improved text from Alan (Net): I have just returned from the Second SIAM Conference on Parallel Processing for Scientific Computing in Norfolk, Virginia. There I heard about 1,000 processor systems, 4,000 processor systems, and even a proposed 1,000,000 processor system. Since I wonder if such systems are the best way to do general purpose, scientific computing, I am making the following offer. I will pay $100 to the first person to demonstrate a speed-up of at least 200 on a general purpose, MIMD computer used for scientific computing. This offer will be withdrawn at 11:59 PM on 31 December 1995. Some definitions are in order. Speed-up: The time taken to run an application on one processor using the best sequential algorithm divided by the time to run the same application on N processors using the best parallel algorithm. General purpose: The parallel system should be able to run a wide variety of applications. For the purposes of this test, the machine must run 3 different programs that demonstrate its ability to solve different problems. I suggest a large, dynamic structures calculation, a trans-sonic fluid flow past a complex barrier, and a large problem in econometric modelling. These are only suggestions; I will be quite flexible in the choice of the problems. Several people have volunteered to adjudicate disputes in the selection of the applications. Application: The problems run for this test must be complete applications, no computational kernels allowed. They must contain all input, data transfers from host to parallel processors, and all output. The problems chosen should be the kind of job that a working scientist or engineer would submit as a batch job to a large supercomputer. In addition, I am arbitrarily disqualifying all problems that Cleve Moler calls "embarrassingly parallel". These include signal processing with multiple independent data streams, the computation of the Mandelbrot set, etc. There are some rather obvious ground rules. Simulations of the hardware are not permitted. I am looking for a demonstration on a running piece of hardware. The same problem should be run on the sequential and parallel processors. It is not fair to cripple the sequential processor. For example, if your operating system uses 99% of one processor, the single processor run will spend all its time in the operating system. Super-linear speed-up as the number of processors is increased is evidence of this problem. It is not fair to memory starve the single processor run. If you have 100K words of memory on each of 1,000 processors, and you run a 10 MW problem, it is not fair for the sequential run to be made on a 100 KW processor. After all, we are not interested in seeing the impact of additional memory; we want to see how much the extra CPUs help. It may not be possible to follow all these additional rules. For example, you can't be expected to build a single processor with lots of extra memory or write a new operating system just for this test. However, you should do your best to make the demonstration fair. A third party has agreed to adjudicate any scaling disputes. Anyone claiming this prize will be expected to present his or her results at a suitable conference. At the end of the presentation I will have the audience vote on whether the demonstration was successful. Remember, the purpose of the bet is to force developers to demonstrate the utility of massively parallel systems, not to show they can find holes in the rules. Alan Karp na.karp@su-score.arpa The Prize was won by the team of Gustafson and Montry (See refs below). It wasn't a lot of money. ------------------------- It was never intended to be. As noted: this news group and CACM have very academic biases. Several "prizes" have been proposed in and out of computing which were more for the "challenge:" Human powered flight (various degrees), cracking public key encryption [D-H was claimed, but RSA still holds], etc. Karp's prize was just another exercise of this type. You didn't have to apply. Who is Alan Karp? ----------------- Alan was trained as an astronomer and basically works as a computational numerical physicist. References (the few main ones) to the winners: %A John L. Gustafson %A Gary R. Montry %A Robert E. Benner %Z Sandia National Labs. %T Development of Parallel Methods for a 1024-Processor Hypercube %J SIAM Journal on Scientific and Statistical Computing %V 9 %N 4 %D July 1988 %K fluid dynamics, hypercubes, MIMD machines, multiprocessor performance, parallel computing, structural analysis, supercomputing, wave mechanics, %K grequired91, %K jlh, dp, hds, dar, %X Introduces concept of operation efficiency, scaled speed-up. Also covers communication cost, beam strain analysis, and a bit on benchmarking. Winner of 1988 Bell and Karp Prizes. %X (JLH & DP) This paper report interesting results in using a large scale NCUBE. The authors won the Gordon Bell prize with their work. They also suggest the idea of problem scaling to overcome the limitations of sequential portions of an application. %X (DAR) some application flavor mixed with performance analysis. %A J. Dongarra %A A. Karp %A K. Kennedy %T First Gordon Bell Awards Winners Achieve Speedup of 400 %J IEEE Software %V 5 %N 3 %D May 1988 %P 108-112 %X Award for achieving >= 200x computing speedup, Sandia [Montry Gustafson]. An announcement and news article rather than a technical paper. %A J. Voelcker %T Sandia Trio Wins Awards at Compcon %J The Institute %I IEEE %V 12 %N 5 %D May 1988 %P 8 %K Karp/Bell parallel award for achieving >= 200x computing speedup %X Winners got near linear speedup: 1020 speedup on 1024 processors in the best case. They did it on Ncube 10-D hypercube with 1024 processors. They developed mathematical methods and algorithms to minimize communications between processors, and to overlap communication and computation. Also used a logarithmic system-level algorithm to load a program into the hypercube in 11 steps, rather than inputing the same program serially into each of the 2^10 processors. The problems were wave motion through deflectors, vortex motion in a fluid, and strain analysis of a cantilevered beam. %X Above refer entry does not even mention names of the people who did the work. Search for Gary Montry or John Gustafson The above is just an article. Not a paper. See Compcon proc. Alt: %A R. E. Benner %A J. L. Gustafson %A R. E. Montry %T Development and analysis of scientific application programs on a 1024-processor hypercube %R SAND 88-0317 %I Sandia National Laboratories %C Albuquerque, N.M. %D February 1988 %A John L. Gustafson %A Gary R. Montry %T Programming and Performance on a Cube-Connected Architecture %J Compcon '88 %I IEEE %C San Francisco, CA. %D February -March, 1988 %P 97-100 %K NCUBE hypercubes, Ensemble Paradigm, Language, Debugging, Communications, Load Balance, Alan Karp Prize, Gordon Bell Prize, What's the Bell Prize? ====================== Following Karp's lead. The list of Judges are Ken Muira (Fujitsu), Horst Simon (Now NERSC at LBL was SGI was NASA Ames, NAS was ...), and Jack Dongarra (ORNL UTK was ANL). Gordon Bell Awards: Taken from IEEE Computer, July 1995, p79: "The year's (1995) competition continues the pattern of little or no gain in performance and price/performance following a year with a large increase," said Alan Karp of Hewlett-Packard, who serves as Gorden Bell Prize administrator and as one of the three judges. "While this year's best performance of 0.1 Tflops (trillion floating-point operations per second) is impressive," Karp continuted, "it doesn't equal that from last year. The price/performance only increased 40 percent over last year's winner to 3.6 Gflops (billion floating-point operations per second) per $1 million. In contrast to other years with modest performance gains, none of the entrants who solved significantly more difficult problems reported impressive performance." Michael Heath of the Univ of Illinois and Don Heller of Ames Laboratory at Iowa State University are also serving as competition judges. Gorden Bell, a former National Science Foundation division director and now an independent consultant in Los Altos, CA, [currently working for Microsoft Corp.] is sponsoring the prizes for 10 years to promote practical parallel-processing research. This is the eighth year of the prize. Entries were solicited in three categories: PERFORMANCE, in which the submitted program must be shown to run faster than any other comparable engineering or scientific application; PRICE/PERFORMANCE, in which the entrant must show that the performance of the application divided by the list price of the smallest system needed to achieve the reported performance is better than that of any other entry; and COMPILER PARALLELIZATION, in which judges look for the combination of compiler and application that generates the greatest speedup. Computer magazine coordinates the entries. It isn't a lot of money. ------------------------- It was never intended to be. Who is Gordon Bell? ------------------- Currently, a consultant (Microsoft), formerly a CMU EE/CS professor responsible for the development of many machines most notability the DEC VAX and PDP-11 architectures, and multiprocessors including the C.mmp and C.vmp. Gordon is noted for "salty" language and strong opinions. He has also worked with Encore and Ardent. He recently left SUN for Microsoft. References ---------- %A Jack Dongarra %A Alan H. Karp %A Ken Kennedy %T First Gordon Bell Awards Winners Achieve Speedup of 400 %J IEEE Software %V 5 %N 3 %D May 1988 %P 108-112 %A Jim Browne %A Jack Dongarra %A Alan H. Karp %A Ken Kennedy %A Dave Kuck %T Gordon Bell Prize 1988 %J IEEE Software %V 6 %N 3 %D May 1989 %P 78-85 %A Jack Dongarra %A Alan H. Karp %A Ken Kennedy %A David Kuck %T Gordon Bell Prize 1989 %J IEEE Software %V 7 %N 3 %D May 1990 %P 100-110 %A Jack Dongarra %A Alan H. Karp %A Ken Miura %A Horst Simon %T Gordon Bell Prize 1990 %J IEEE Software %V 8 %N 3 %D May 1991 %P 92-102 %A Alan H. Karp %A Ken Miura %A Horst Simon %T Gordon Bell Prize 1992 %J IEEE Computer %V 26 %N 1 %D January 1993 %P 77-82 %A Alan H. Karp %A Don Heller %A Horst Simon %T 1993 Gordon Bell Prize Winners %J IEEE Computer %V 27 %N 1 %D January 1994 %P 69-75 I don't have the reference for 1995 which appeared in the January Issues of IEEE Computer. Also there was a skip by six months by moving the prize from the spring (COMPCON) to the fall (Supercomputing). This resulted in a shift of the publication, i.e. the 1992 awards were published in January 1993, etc. What's the Bell-Hillis debate? (or isn't this supposed to be an IBM panel?) ============================== Start: %A George Gilder %T Hillis vs. Bell: the law of the microcosm %J Upside %V ? %N ? %D January 1992 %P 24-42 %X Non-technical but interesting characterization (you don't have to agree with some of the simplifications) of the Hillis-Bell bet on performance and market share of distributed memory massive-parallel machines versus shared-memory machines by the end of 1995. Sides with Bell. %X Update: January 1996 issues: Hillis pays off Bell (Bell wins). End: %A Willie Schatz %T Laying a Bet To Rest %J Upside %V %N %D January 1996 %P 39-45 %K massive parallelism, %O http://www.upside.com/resource/print/9601/teraflop.html %X See 1992 Upside article on the Gordon Bell-Danny Hillis bet. %X Gordon Bell claims to be the winner of his bet with Danny Hillis. The author believes that the real issue isn't who has the fast machine, but whether anybody will still make Big Iron machines in the future. The author believes specifically that there will be a few niche markets for this kind of hardware, but that general purpose machines with lots of processors are dead. Indeed, the author believes that the market never really happened and that it was really fed by Cold War paranoia. The facing page of the cover is a drawing showing Danny and someone else in a graveyard, with two gravestones reading RIP Thinking Machines 1983-94 and Kendall Square Research 1986-95, with Danny rolling a pair of dice. Sprinkled through the piece is a collection of gravestones with all of the big supercomputer vendors. The author doesn't particularly side with either Hillis or Bell. There's a sidebar about Tera Computer and its recent IPO entitled "$40 Million Vaporware, Anyone? - How's this for a hot prospect?" Stock value of companies that make HPC? (Okay, only some of them). The buzzword value of the Hot New Concepts? Most buzzwords burst onto the scene, dominate public discussion for a year or two, then fade into nothingless; leaving only a slight imprint on history. -- Stanley Chow; schow@bnr.ca, stanley.chow-ott@nt.com; (613) 763-2831 Bell's Laws =========== 1. You can never have too much address space. [Based on experience with the PDP-11 - like many other computers, it lived so long in the marketplace that it ran out of bits with which to address memory.] 2. People are smarter than machines. [In other words, regardless of how crazy a computer's architecture is somebody will be able to find an application and code it up in a way that makes that computer run fast.] 3. ... [I forget what I had decided was Bell's third law - but it was something he said pretty recently. If I remember I'll pass it on.] Back to IBM.... Other IBM Misc. attempts (non-products, special purpose processors) =================================================================== QCD engines: GF-11, TF-1: Ever reach 11 GFLOPS? "Not quite" and "Never built." Chess Machines -------------- Current architectures are slightly more general purpose than older machines, but notable these days are Deep Blue and Deeper Blue. They appear to be variations of SP2 architectures. By themselves, it is doubtful that commercial versions of these architectures will appear. Keywords: brute force. Are these supecomputers? Yes, maybe not general purpose, but qualified as SPECIAL PURPOSE, the use of the term can't be considered objectionable. Chess machines aren't among the Grand Challenge problems. On the other han, this community does not push image generation machines used in flight simulators as major supercomputers. What's Amdahl's Law? ==================== What's this I hear about "Amdahl's other law/rule?" =================================================== The following is probably one of the single most important papers (several people have recommended this article be read weekly) challenging parallelism. Well, we'll make it monthly here. It's kinda like UFOs. The following document is Copyright AFIPS 1967. Copyright pending (checked with the Copyright Clearance Center). This document's use shall be for non-profit educational use only. Citation: %A Dr. Gene M. Amdahl %Z International Business Machines Corporation, Sunnyvale, California %T Validity of the single processor approach to achieving large scale computing capabilities %J AFIPS Proc. of the SJCC %V 30 %D 1967 %P 483-485 Validity of the single processor approach to achieving large scale computing capabilities DR. GENE M. AMDAHL International Business Machines Corporation Sunnyvale, California The indented table structure is editorial to improve readability. The original paper did not have it. INTRODUCTION For a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution. Variously the proper direction has been pointed out as general purpose computers with a generalized interconnection of memories, or as specialized computers with geometrically related memory interconnections and controlled by one or more instruction streams. Demonstration is made of the continued validity of the single processor approach and of the weaknesses of the multiple processor approach in terms of application to real problems and their attendant irregularities. The arguments presented are based on statistical characteristics of computation on computers over the last decade and upon the operational requirements within problems of physical interest. An additional reference will be one of the most thorough analyses of relative computer capabilities currently published -- "Changes in Computer Performance," *Datamation*, September 1966, Professor Kenneth E. Knight, Stanford School of Business Administration. The first characteristic of interest is the fraction of the computational load which is associted with data management housekeeping. This fraction is very nearly constant for about ten years, and accounts for 40% of the executed instructions in production runs. In an entirely dedicated special purpose environment this might be reduced by a factor of two, but it is highly improbably [sic] that it could be reduced by a factor of three. The nature overhead appears to be sequential so that it is likely to be amenable to parallel processing techniques. Overhead alone would then place an upper limit on throughput of five to seven times the sequential processing rate, even if housekeeping were done in a separate processor. The non-housekeeping part of the problem could exploit at most a processor of performance three to four time the performance of the housekeeping processor. A fairly obvious conclusion which can be drawn at this point is that the effort expended on achieving high parallel processing rates is wasted unless it is accompanied by achievement in sequential processing rates of very nearly the same magnitude. Data management housekeeping is not the only problem to plague oversimplified approaches to high speed computation. The physical problems which are of practical interest tend to have rather significant complications. Examples of these complications are as follows: boundaries are likely to be irregular; interiors are likely to be inhomogeneous; computations required may be dependent on the states of variables at each point; propagation rates of different physical effects may be quite different; the rate of convergence, or convergence at all, may be strongly dependent on sweeping through the array along different axes on succeeding passes; etc. The effect of each of these complications is very severe on any computer organization based on geometrically related processors in a paralleled processing system. Even the existence of rectangular boundaries has the interesting property that for spatial dimension N there are $3 sup N$ different point geometries to be dealt with in a nearest neighbor computation. If the second nearest neighbor were also involved, there would be $5 sup N$ different point geometries to contend with. An irregular boundary compounds this problem as does an inhomogeneous interior. Computations which are dependent on the states of variables would require the processing at each point to consume approximately the same computational time as the some of computations of all physical effects within a large region. Differences or changes in propagation rate may affect the mesh point relationships. Ideally the computation of the action of the neighboring points upon the point under consideration involves their values at a previous time proportional to the mesh spacing and inversely proportional to the propagation rate. Since the time step is normally kept constant, a faster propagation rate for some effects would imply interactions with more distant points. Finally the fairly common practice of sweeping through the mesh along different axes on succeeding passes poses problems of data management which affects all processors, however it affects geometrically related processors more severely by requiring transposing all points in storage in addition to the revised input-output scheduling. A realistic assessment of the effect of these irregularities on the actual performance of a parallel processing device, compared to its performance on a simplified and regularized abstraction of the problem yields a degradation in the vicinity of one-half to one order of magnitude. To sum up the effects of data management housekeeping and of problem irregularities, the author has compared three different machine organizations involving equal amounts of hardware. Machine A has thirty two arithmetic execution units controlled by a single instruction stream [SIMD]. Machine B has pipelined arithmetic execution units with up to three overlapped operations on vectors of eight elements. Machine C has the same pipelined execution units, but initiation of individual operations at the same rate as Machine B permitted vector operations. The performance of these three machines are plotted in Figure 1 as a function of the fraction of the number of instructions which permit parallelism. The probable region of operation is centered around a point corresponding to 25% data management overhead and 10% of the problem operations forced to be sequential. The historic performance versus cost of computers has been explored very thoroughly by Professor Knight. The carefully analyzed data he presents reflects not just the execution times for arithmetic operations and cost of minimum of recommended configurations. He includes memory capacity effects, input-output overlap experienced, and special functional capabilities. The best statistical fit obtained corresponds to a performance proportional to the square of the cost at any technological level. This result very effectively supports the often invoked "Grosch's law." Utilizing this analysis, one can argue that if twice the amount of hardware were exploited in a single systems, one could expect to obtain four times the performance The only difficulty is involved in knowing how to exploit this additional hardware. At any point in time it is difficult to foresee how the previous bottlenecks in a sequential computer will be effectively overcome. If it were easy they would not have been left as bottlenecks. It is true by historical example that the successive obstacles have been hurdled, so it is appropriate to quote the Rev. Adam Clayton Powell -- "Keep the faith, baby!" If alternative one decided to improve the performance by putting two processors side by side with shared memory, one would find approximately 2.2 times as much hardware. The additional two tenth in hardware accomplishes the crossbar switching for the sharing. The resulting performance achieved would be about 1.8. The latter figure is derived from the assumption that each processor utilized half of the memories about half of the time. The resulting memory conflicts in the shared system would extend the execution of one to two operations by one quarter of the time. The net result is a price performance degradation to 0.8 rather than an improvement to 2.0 for the single larger processor. Comparative analysis with associative processors is far less easy and obvious. Under certain conditions of regular formats there is a fairly direct approach. Consider an associative processor designed for pattern recognition, in which decisions within individual elements are forwarded to some set of other elements. In the associative processor design the receiving elements would have a set of source addresses which recognize by associative techniques whether or not it is to receive the decision of the currently declaring declaring element. To make a corresponding special purpose non-associative processor one would consider a receiving element and its source addresses as an instruction, with binary decisions maintained in registers. Considering the use of thin film memory, an associative cycle would be longer than a non-destructive read cycle. In such a technology the special purpose non-associative processor can be expected to take about one-fourth as many memory cycles as the associative version and only about one-sixth the time. These figures were computed on the full recognition task, with somewhat differing ratios in each phase. No blanket claim is intended here, but rather that each requirement should be investigated from both approaches. END of paper. See also CRI UNICOS has a useful command amlaw(1) which does simple number crunching on Amdahl's Law. ------------ On a CRI system type: `man amlaw`. 1 1 S = lim ------------ = --- P->oo 1-s s s + --- P S = speedup which can be achieved with P processors s (small sigma) = proportion of a calculation which is serial 1-s = parallelizable portion Speedup_overall = 1 / ((1 - Fraction_enhanced) + (Fraction_enhanced/Speedup_enhanced)) Articles to parallel@ctc.com (Administrative: bigrigg@ctc.com) Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel