Newsgroups: comp.parallel,comp.sys.super From: eugene@sally.nas.nasa.gov (Eugene N. Miya) Reply-To: eugene@george.arc.nasa.gov (Eugene N. Miya) Subject: [l/m 2/27/98] network resources -- comp.parallel (10/28) FAQ Organization: NASA Ames Research Center, Moffett Field, CA Date: 10 Mar 1998 13:03:19 GMT Message-ID: <6e3dmn$836$1@cnn.nas.nasa.gov> Archive-Name: superpar-faq Last-modified: 20 Jan 1998 10 Related news groups, archives, test codes, and other references 12 User/developer communities 14 References, biblios 16 18 Supercomputing and Crayisms 20 IBM and Amdahl 22 Grand challenges and HPCC 24 Suggested (required) readings 26 Dead computer architecture society 28 Dedications 2 Introduction and Table of Contents and justification 4 Comp.parallel news group history 6 parlib 8 comp.parallel group dynamics Related News Groups ------------------- Child groups: comp.parallel.pvm (unmoderated) http://www.epm.ornl.gov/pvm/pvm_home.html http://www.netlib.org/pvm3/index.html http://www.netlib.org/pvm3/faq_html/faq.html http://www.netlib.org/pvm3/book/node1.html ftp://netlib2.cs.utk.edu/pvm3 http://www.nas.nasa.gov/NAS/Tools/Outside/ comp.parallel.mpi (unmoderated) http://www.mcs.anl.gov/mpi comp.arch (unmoderated) # our parent news group. comp.arch.arithmetic (unmoderated) # step kids comp.arch.storage: many news groups discuss parallelism/HPC comp.os.research (moderated, D. Long, UCSC) comp.sys.convex (unmoderated) # I wonder if this will become an h-p group. comp.sys.alliant (unmoderated) comp.sys.isis. Isis is a commercial message passing package for C and Fortran (at least); features fault-tolerance. # defunct: # comp.sys.large (unmoderated) # The term "Big iron" is used. # # more mainframes and distributed networks comp.sys.super (unmoderated) comp.sys.transputer (unmoderated) (consider also OCCAM here) comp.unix.cray (unmoderated) comp.research.japan (moderated, R.S.,UA)/soc.culture.japan (unmoderated) sci.math.* in many different forms, you will even find it in places like bionet.computational, but it is not the intent of this list to anywhere near complete. Locate application areas of interest. comp.benchmark aus.comp.parallel fj.comp.parallel (can require 16-bit character support) alt.folklore.computers: computing history, fat chewing others Note: all these news groups are given as options (and other news groups). Nothing will stop you from posting in this news group on most any topic. Where are the parallel applications? ------------------------------------ Where are the parallel codes? ----------------------------- Where can I find parallel benchmarks? ===================================== High performance computing has important historical roots with some "sensititivity:" 1) Remember the first computers were used to calculate the trajectory of artillery shells, crack enemy codes, and figure out how an atomic bomb would work. You are fooling yourself if you think those applications have disappeared. 2) The newer users, the simulators and analysts, tend to work for industrial and economic concerns which are highly competitive with one another. You are fooling yourself if you think someone is going to just place their industrial strength code here. Or give it to you. So where might I find academic benchmarks? parlib@hubcap.clemson.edu send index netlib@ornl.gov send index from benchmark nistlib@cmr.ncsl.nist.gov send index See also: Why is this news group so quiet? Other news groups: sci.military.moderated We also tend to have many "chicken-and-egg" problems. "We need a big computer." "We can design one for you. Can you give us a sample code?" "No." ... New benchmarks might be best requested in various application fields. sci.aeronuatics (moderated) sci.geo.petroleum sci.electronics sci.physics.* sci.bio.* etc. Be extremely mindful of the sensitive nature of collecting benchmarks. Obit quips: MIP: Meaningless Indicators of Performance Parallel MFLOPS: The "Guaranteed not to exceed speed." Where can I find machine time/access? ===================================== Ask the owners of said machines. What's a parallel computer? =========================== A bunch of expensive components. Parallelism is not obvious. If you think it is, I can sell you a bridge. The terminology is abysmal. Talk to me about Miya's exercise. the problem is mostly (but not all) in the semantics. Is parallel computing easier or harder than "normal, serial" programming? ========================================================================= Ha. Take your pick. Jones says no harder. Grit and many others say yes harder. It's subjective. Jones equated programming to also mean "systems programming." In 1994, Don Knuth in a "Fire Side Chat" session at a Conference when asked, (not me): "Will you write an "Art of Parallel Programming?" replied: "No." Knuth did not. One group of comp.parallel people hold that parallel algorithm is an oxymoron: that an algorithm is inherently serial by definition. How can you scope out a supercomputing/parallel processing firm? ================================================================ Lack of software. What's your ratio of hardware to software people? Lack of technical rather than marketing documentation. When will you have architecture and programming manuals? Excessive claims about automatic parallelization. What languages are you targeting? See Also: What's holding back parallel computer development? ================================================== "I do not know what the language of the year 2000 will look like but it will be called FORTRAN." --Attributed to many people including Dan McCracken, Seymour Cray, John Backus... All the Perlis Epigrams on this language: 42. You can measure a programmer's perspective by noting his attitude on the continuing vitality of FORTRAN. --Alan Perlis (Epigrams) 70. Over the centuries the Indians developed sign language for communicating phenomena of interest. Programmers from different tribes (FORTRAN, LISP, ALGOL, SNOBOL, etc.) could use one that doesn't require them to carry a blackboard on their ponies. --Alan Perlis (Epigrams) 85. Though the Chinese should adore APL, it's FORTRAN they put their money on. --Alan Perlis (Epigrams) See also #68 and #9. FORTRAN | C | C++ | etc. ------------------------ Why don't you guys grow up and use real languages? ================================================== The best way to answer this question first is to determine what languages the questioner is asking (sometimes called 'language bigots'). What's a 'real' langauge? This is a topic guaranteed to get yawns from the experienced folk, you will only argue with newbies. In two words, many of the existing application programs are: "Dusty decks." You remember what a 'card deck' was right? These programs are non-trivial: thousands and sometimes millions of lines of code whose authors have sometimes retired and not kept on retainer. A missing key concept is "conversion." Users don't want to convert their programs (rewrite, etc.) to use other languages. Incentives. See also: Statement: Supercomputers are too important to run interactive operating systems, text editors, etc. Don't language Converters like f2c help? ---------------------------------------- No. Problems fall into several categories: 1) Implementation specific features: you have a software architecture to take advantage certain hardware specific features (doesn't have to be vectors, it could be I/O for instance). A delicate tradeoff exists between using said features vs. not using them for reasons of things like portability and long-term program life. E.g., Control Data Q8xxxxxx based subprogram calls while having proper legal FORTRAN syntax, involved calls to hardware and software which didn't exist on other systems. Some of these calls could be replaced with non-vector code, but why? You impulse purchased the machine for its speed to solve immediate problems. 2) Some language features don't have precisely matching/ corresponding semantics. E.g., dynamic vs. static memory use. 3). Etc. These little "gotchas" are very annoying and frequently compound to serious labor. What's wrong with FORTRAN? What are it's problems for parallel computing? -------------------------------------------------------------------------- The best non-language specific explanation of the parallel computing problem was written in 1980 by Anita Jones on the Cm* Project Paraphasing: 1) Lack of facilities to protect and insure the consistency of results. [Determinism and consistency.] 2) Lack of adequate communication facilities. [What's wrong with READ and WRITE?] 3) Lack of synchronization (explicit or implicit) facilities. [Locks, barriers, and all those things.] 4) Exception handling (miscellaneous things). Her citation of problems were: consistency, deadlock, and starvation. FORTRAN's (from 1966 to current) problems: Side effects (mixed blessing: re: random numbers) GOTOs (the classic software engineering reason) Relatively rigid poor data structures Relatively static run time environment semantics 68. If we believe in data structures, we must believe in independent (hence simultaneous) processing. For why else would we collect items within a structure? Why do we tolerate languages that give us the one without the other? --Alan Perlis (Epigrams) 9. It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. --Alan Perlis (Epigrams) A few people (Don Knuth included) would argue that the definition of an algorithm contradicts certain aspects regarding parallelism. Fine. We can speak parallel (replicated) data structures, but the problem of programming languages and architectures covers more than education and math. Programming language types (people) tend to either develop specialized languages for parallelism or their tend to add operating system features. The issue is assuming determinism and consistency during a computation. If you don't mind the odd inconsistent error, then you are lucky. Such a person must clearly write perfect code every time. The rest of us must debug. "Drop in" parallel speed-up is the Holy Grail of high performance computing. The Holy Grail of programming and software engineering has been "automatic programming." If you believe we have either, then I have a big bridge to sell you. Attempts to write parallel languages fall into two categories: completely new languages: with new semantics in some case e.g., APL, VAL, ID, SISAL, etc. add ons to old languages: with new semantics and hacked on syntax. The latter fall into two types: OS like constructs like semaphores, monitors, etc. which tend not to scale. ("Oh, yeah you want concurrency, well, let me help you with these....") Starting with Concurrent Pascal, Modula, etc. Constructs for message passing or barriers thought up by numerical analysts (actually these are two vastly different subtypes (oversimplified)). Starting with "meta-adjective" FORTRAN. Compilers and architectures ARE an issue (can be different): One issue is programmability or ease of programming: Two camps: parallel programming is no different than any other programming. [Jones is an early ref.] and Bull shit! It's at least comparable in difficulty to "systems" programming. [Grit and McGraw is an early ref.] Take a look at the use of the full-empty bit on Denelcor HEP memory (and soon Tera). This stuff is weird if you have never encountered it. I'm going to use this as one example feature, but keep in mind that other features exist. You can find "trenches" war stories (mine fields for Tera to avoid [they know it]). Why? Because the programmers are very confident they (we) know what they (we) are doing. BUZZT! We (I mean Murphy) screw up. The difficulty comes (side effects) when you deal with global storage (to varying degrees if you have ever seen TASK COMMON). You have difficulty tracing the scope. Architecture issues. I like to see serial codes which have dead-lock and other problems. I think we should collect examples (including error messages) put them on display as warnings (tell that to the govt. ha!). The use of atomic full-empty bits might be the parity bits of the future (noting that the early supercomputers didn't have parity). How consistent do you like your data? Debug any lately? Don't get fooled that message passing is any safer. See the Latest Word on Message Passing. You can get just as confused. Ideally, the programmer would LOVE to have all this stuff hidden. I wonder when that will happen? What makes us think that as we scale up processors, that we won't make changes in our memory systems? Probably because von Neumann memories are so easily made. Communication: moving data around consistently is tougher than most people give credit, and it's not parallelism. Floating point gets too much attention. Solutions (heuristic): education: I think we need to make emulations for older designed machines like the HEP available (public domain for schools). The problem is that I don't trust some of those emulators, because I think we really need to run them on parallel machines, and many are proprietary and written for sequential machines. The schools potentiall have a complex porting job. I fear that old emulations have timing gotchas which never got updated as the designs moved into hardware. Just as PC software got hacked, some of these emulators could use some hacking. Another thing I thought was cool: free compilers. Tim Bubb made his APL compiler available. I'm not necessarily a fan of APL, but I moved it over to Convex for people to play with during a try back East. I doubt people (Steve: [don't fall off those holds with Dan belaying]) had time to bring APL up on the Convex and get vector code generation working. The information learned from that kind of experience needs to get fed back to compiler writers. That's not happening now. Patrick and I spent an amusing Dec. evening standing outside the Convex HQ pondering what it might be like raising a generation of APL (or parallel language) hacker kids: either they will be very good, or they will be confused as hell. For a while the ASPLOS conferences were pretty great conferences. (Arch. support for PLs and OSes.). Have not been to one lately. Theory alone won't cut it. You want a Wozniak. This is debateable, of course. He needs the right tools. They don't exist now. Maybe. Math won't be enough. To quote Knuth: mathematicians don't understand the cost of operations. (AMM) Perlis had that too (for the LISP guys, not directly for parallelism, in the FAQ). Beware the Turing tarpit. Compilers reflect the architecture, and there is some influence on architecture by compilers, but that vicious circle doesn't have enough components to be worth considering. Blind men attempting to describe an elephant. > The computer languages field seems to have no Dawkins, no Gould, no > popularizers and not even any very good text books. Every one of the > books I have tried has come across as boring, poorly structured, making > no obvious attempt to justify things and completely unwilling to stand by > itself. That's largely because Dawkins and Gould are making observations. They are not attempting to construct things which have never existed before. I say that after reading Gould's text book (O&P, very good) not his popular books (like Mismeasure of Man) which are enjoyable. Wirth and Ishbiah (sp) are pretty bright guys, but they are not above making mistakes (Niklaus himself wrote a letter/article when he forgot an important piece of syntax in Pascal (catch-all exception to the multiway branch: aka as "OTHERWISE" in some compilers, "ELSE" in other compilers, and a slew of other keywords having identical semnatics). It was almost impossible to get this simple (ha!) fix added to the language. Ritchie is pretty bright, too. I recommend The History of Programming Languages II (HOPL-II) published by ACM Press/Addison-Wesley. I can tell you there are no Silicon Valley copies at Comp Lit Bookshop as I cleaned them all out for friends (you might find a copy at Stacey's in Palo Alto, the Sunnyvale library has a copy (I was impressed). Backus is also bright. Bill Wulf in conversation to me suggested that the Griswolds are also bright. Oh, a LISP cross post: I occasionally see John McCarthy at Stanford and Printers Inc. John is also quite bright. I signed his petition against Computing the Future. All bright guys, and they all learned (made mistakes along the way). The brightest most inspired language designers I can think of might be Alan Kay and Adele Goldberg and their world on Smalltalk-80. If you are using a windowing system, you are most likely using a system inspired by them. A very impressive chapter in HOPL-II about them (see the paragraphs refering to "management"). You merely need a decent library or a 6 year IEEE member to get the gist. Two articles stand out (one comes from the MIT AI Lab [and Stanford]). The two articles stand as an interesting contrast: one is a perfect example of the problems cited by the other: The order which you read these articles might highly influence your perception, so I will cite them in page order. Fair enough? [The annotations are NOT all mine (collected over time). In particular see the last sentence of the first annotation to the first article.] %A Cherri M. Pancake %A Donna Bergmark %T Do Parallel Languages Respond to the Needs of Scientific Programmers? %J Computer %I IEEE %V 23 %N 12 %D December 1990 %P 13-23 %K fortran, shared memory, concurrency, %X This article is a must read about the problems of designing, programming, and "marketing" parallel programming languages. It does not present definitive solutions but is a descriptive "state-of-the-art" survey of the semantic problem. The paper reads like the "war of the sexes." Computer scientist versus computational scientist, some subtle topics (like shared memory models) are mentioned. An excellent table summarizes the article, but I think there is one format error. [e.g. of barriers versus subroutines.] It is ironically followed for an article by computer scientists typifying the author's thesis. %X Points out the hierarchical model of "model-making (4-level) very similar to Rodrigue's (LLNL) parallelism model (real world -> math theory -> numerical algorithms -> code). %X Table 1: Category For scientific researcher For computer scientist * Convenience Fortran 77 syntax Structured syntax and abstract data types Minimal number of new Extensible constructs constructs to learn Structures that provide Less need for fine-grain low-overhead parallelism parallelism Reliability Minimal number of changes to Changes that provide familiar constructs clarification No conflict with Fortran models Support for nested scoping of data storage and use and packages Provision of deterministic Provision of non-deterministic high-level constructs high-level constructs (like critical sections, (like parallel sections, barriers) subroutine invocations) Syntax that clearly Syntax distinctions less distinguishes parallel from critical serial constructs Expressiveness Conceptual models that support Conceptual models adaptable to common scientific programming wide range of programming strategies strategies High-level features for High-level features for distributing data across distributing work across processors processors Parallel operators for array/ Parallel operators for abstract vector operands data types Operators for regular patterns Operators for irregular of process interaction patterns of process interaction Compatibility Portability across range of Vendor specificity or vendors, product lines portability to related machine models Conversion/upgrading of Conversion less important existing Fortran code (formal maintenance procedures available) Reasonable efficiency on most Tailorability to a variety of machine models machine models Interfacing with visualization Minimal visualization support routines support Compatibility with parallel Little need for "canned" subroutine libraries routines %A Andrew Berlin %A Daniel Weise %T Compiling Scientific Code Using Partial Evaluation %J Computer %I IEEE %V 23 %N 12 %D December 1990 %P 25-37 %r AIM 1145 %i MIT %d July 1989 %O pages 21 $3.25 %Z Computer Systems Lab, Stanford, University, Stanford, CA %d March 1990 %O 31 pages......$5.20 %K partial evaluation, scientific computation, parallel architectures, parallelizing compilers, %K scheme, LISP, %X Scientists are faced with a dilemma: Either they can write abstract programs that express their understanding of a problem, but which do not execute efficiently; or they can write programs that computers can execute efficiently, but which are difficult to write and difficult to understand. We have developed a compiler that uses partial evaluation and scheduling techniques to provide a solution to this dilemma. %X Partial evaluation converts a high-level program into a low-level program that is specialized for a particular application. We describe a compiler that uses partial evaluation to dramatically speed up programs. We have measured speedups over conventionally compiled code that range from seven times faster to ninety one times faster. Further experiments have also shown that by eliminating inherently sequential data structure references and their associated conditional branches, partial evaluation exposes the low-level parallelism inherent in a computation. By coupling partial evaluation with parallel scheduling techniques, this parallelism can be exploited for use on heavily pipelined or parallel architectures. We have demonstrated this approach by applying a parallel scheduler to a partially evaluated program that simulates the motion of a nine body solar system. The Latest Word on Message Passing ---------------------------------- OpenMP --- http://www.openmp.org Why does the computer science community insist upon writing these ================================================================= esoteric papers on theory and which no one uses anyway? ======================================================= Why don't the computer engineers just throw the chips together? =============================================================== It's the communications, stupid! CREW, EREW, etc.etc. Over the years, many pieces of email were exchanged in private complaining about the parallel processing community from applications people. Specifically topics which appear to irk applications people, discussion: Operating systems. New programming languages. Multistage interconnection networks. Load balancing. This is a short list, I know that I will remember other topics and other people will remind me (anonymously). Of course the applications would like "drop-in" automatic parallelization (it will come after we have drop in "automatic programming." A.k.a.: anything to get my program to run faster crowd. Short of added cost. One noted early paper paraphrased: If a cook can parallel process, why can't computer people? Boy, you guys sure argue a lot. =============================== It's academic. The bark so far is worse than the bite. The name calling can be found in various parts of the literature (e.g., "Polo players of science..."). Many misunderstandings have evolved: ``An exciting thing was happening at Livermore. They were building a supercomputer [the LLNL/USN S-1], and I will certainly confess to being a cycle junkie. Computers are never big enough or fast enough. I have no patience at all with these damned PC's. What I didn't realize when I went over to Livermore was that as long as physicists are running the show you're never going to get any software. And if you don't get any software, you're never going to get anywhere. Physicists have the most abysmal taste in programming environments. It's the software equivalent of a junk-strewn lab with plug boards, bare wires and alligator clips. They also seem to think that computers (and programmers for that matter) are the sorts of things to which you submit punched card decks like you did in the mid-sixties.'' --- Bill Gosper, in ``More Mathematical People: Contemporary Conversations'' (Donald J. Albers et. al., Eds.; Harcourt Brace Jovanovich) [Gosper is a well-known LISPer.] Computing the future: a broader agenda for computer science and engineering / Juris Hartmanis and Herbert Lin, editors ; Committee to Assess the Scope and Direction of Computer Science and Technology, Computer Science and Telecommunications Board, Commission on Physical Sciences, Mathematics, and Applications, National Research Council. Washington, D.C. : National Academy Press, 1992 Petition by John McCarthy, John Backus, Don Knuth, Marvin Minsky, Bob Boyer, Barbara Grosz, Jack Minker, and Nils Nilsson rebutting "Computing the future," 1992. http://www-formal.stanford.edu/jmc/petition.html The FAQ maintainer is one of the signatories of this petition and one of the few people apparently to read the Report. The Report has a good set of references further citing problems. Some of these problems go away when checkbooks are brought out. Don't let anyone tell you there isn't a gulf between CS and other activities, is some cases it better than this, and other cases it's worse. Perhaps here's another FAQ? ;) Q: How do experts debug parallel code? ====================================== A1: Debug? Just don't make any programming mistakes. A2: Use print statements. Debugging Yep, we discuss it. No good ways. People who say they have good ways have heavy context. What does the FAQ recommend? Long timers point to A2. I know people who pride themselves on A1, they really believe they do it. One reader so far recommends Fox's textbooks (1994 in particular). This section needs more thrashing on what little time which I don't have. You can gain a person's perspective on how advanced they believe debugging technology is. What's holding back parallel computer development? ================================================== Software, in one word. Don't hold your breath for "automatic parallelization." [Yes, we have a tendency to butcher language in this field. I have a Grail to sell you.] See my buzzword: "computational break-even" (like break-even in controlled fusion research). We have yet to reach computational break-even. I think I first saw, but did not realize, the nature of the problem when I saw the ILLIAC IV hardware being removed when I arrived here. If you ever see a photo of this old machine, you should realize that the ribbon cable (the first ever developed) connecting the control unit (typically on the left side of the picture) to the most physically distant processors had a length IDENTICAL to the nearest processors (big circular loops of wire). This hardware makes a difference in software which only the naive (like me at the time) don't appreciate (those concentrating on the hardware have less appreciation for software synchronization). I do have photos which I took for Alan Huang of similar Cyber 205 coils of wire. This problem exists today on even the simplest SIMD or MIMD systems. Parallelism does not imply (you cannot assume) synchronous systems. Another very bad implicit assumption exists that software tools developed on sequential machines are useful stepping stones for parallel machines This is not the case. Debuggers are good examples. Debuggers are bad (they well-intended, but they tell you what you ask of them, not what you want to know, a reasonable set of literature exists on debugger problems), and debuggers also tend to be non-standard, dependent on system specific features (you will hear the phrase "PRINT statements" a lot, but that assumes I/O synchronization). The typical comparison that programming parallel machines is like taking a step back to the assembly language era. This is simultaneously (this is the parallelism group after all) true and false (get used to paradox). Current problems are compounded by coding in lower-level, non-standard languages (e.g., assembly) different OSes, libraries, calls, tools, etc. I will assert that one of the reason why microcomputers (serial machines) and interaction got ahead of batch oriented systems was the result of interactive tools like interpreters. I think they were easily hacked with improvements (gross generalization) and have not had the applications pressure of commercial and some research systems. When will parallel computing get easy? ====================================== God knows. A real story: In 1986 or so, I was introduced to a 16 year old son of two computer science professors. The kid was clearly bright. He planned to construct a "two-processor parallel processor machine" using MC68000s which his parents would purchase. Those were his words, not "dual-processor." This is a great idea, and listening to his plans for shared registers, locks, etc. I can't remember if the machine was going to be a master-slave or a symmetric multiprocessor, but the kid had thought about this. On the issue of programming and software , the kid said, "I will leave it to the programmer to keep things straight." Typical hardware designer. My immediate concluding comment was: "Have your mother tell you about operating systems." I suspect the future might lie with the students of Seymour Pappert, the Lego(tm) Chair of Computing at MIT. Except that the kids of that group put together Lego Blocks(tm) like scientists need to assemble their future Such Lego computing must be: 1) easily assembled (like plugging in wall sockets), 2) have software which will "work" in a plug and play fashion. 3) have ideas I have not even begun to cover. But before we have Lego computing, I suspect we will have high school students who do parallel computing projects, and I've not seen many of those yet. It's just too hard, too resource intensive. In my generation it was to build a two-inch long ruby laser; a later generation made warm superconductors, but we have yet to see parallel machines. See also SuperQuest. What are the especially notable schools to study parallelism and ================================================================ supercomputing? =============== Concerning the parlib files, I think a lot of this is way out of date. Especially the list of schools supporting parallel processing work. I think a better way to present this information would be in a web page (perhaps extracted from my page) which gives a pointer to parallel groups at the various schools. Those schools associated with supercomputing centers U. Ill./NCSA CMU/PSC UMn/MSC UCSD/SDSU/SDSC U. Tsukba Others Cornell SUNY Syracuse OGI Rice, Stanford, MIT, Berkeley, UWashington, UMaryland, and Harvard My, my, this group seems to be dominated by FORTRAN PDE solvers. ================================================================ What about LISP (and Prolog, etc.) machines? ============================================ LISP people post in this group on occasion to ask about LISP supercomputers. >From the late 60s, the LISP people have been derided. It's just a sort of fact of life in this field. We did have a few discussions about the porting of Common LISP on the Cray when it happened (world's fastest LISP) and the initial use of *LISP on the CM. But that's about it. Nothing will stop you from posting on the topic except yourself (ignore the LISP hecklers). See also: comp.lang.lisp.franz comp.lang.lisp.mcl comp.lang.lisp.x comp.org.lisp-users comp.std.lisp de.comp.lang.lisp comp.ai.alife comp.ai.edu comp.ai.fuzzy comp.ai.genetic comp.ai.jair.announce comp.ai.nat-lang comp.ai.neural-nets comp.ai.nlang-know-rep comp.ai.philosophy comp.ai.shells comp.ai.vision comp.ai.games comp.ai.jair.papers Search for discussions on "AI winter." NOTE: THERE IS NOTHING WRONG WITH CROSS-POSTING. (This differs from a multi-post to separate news groups.) JUST DO IT INTELLIGENTLY. REMEMBER TO SET Followup-To: WITH A NEWS GROUP OR A SMALLER NUMBER OF RELEVANT CROSS-POSTED NEWS GROUPS. Or SET TO "poster". ^^^^^^^^ What about fault-tolerant parallel systems? =========================================== What is the MTBF of 2000 PCs? Several well known parallel computing books have pointed out the need for fault-tolerant design from the start. Parallel machines are not inherently more reliable than sequential machines. They require that reliability be designed into systems from conception. If you want to start a joke, use the words: "Graceful degradation." Yes, we know and the group has discussed Seymour Cray's "Parity is for farmers" quote. And the group notes that the Cray-1 (and all subsequent Cray architectures) have parity. This is old hat. "Even farmers buy computers" This may be apocryphal. [**What's the joke? It's kind of a midwest 70's type of thing. In the 1970's, the farm support price support programs were known as parity (to give the farmer parity between the selling price and the cost of producing the food? -- in any case, it was the government paying farmers). **] Additional context: It has been pointed out that, when the 6600 was designed, parity error detection might well have caused more problems than it solved (due to the relative reliability of 1963 vintage logic circuits and core memory). But what about things like the Tandem? Like LISP, the question gets asked here, but it lacks critical mass discussion. Yes, Tandem is on the net [I had a climbing partner who used to work there], there's a mail-list and a news group for them. Unfortuately, our news group biases stay with us. You are welcome to discuss/raise the topic here, but no guarantees exist that a discussion will spark (like wet wood). BASIC IDEA: Computers are used as a means to solution of problems for one primary and one secondary reason (and numerous teritary reasons): 1) Performance: computers can do something faster than some other tools (analog, general purpose vs. special purposes, etc.). If something can be done faster w/o a computer, that thing will likely dominate. This is the analytic justification which covers areas such as the non-simulation, grand challenge problems, cryptography, etc. 2) With the emergence of computers, the flexibility of software has proven computers quite useful independent of performance. This is the "What if?" justification which is the reason for simulation, VR, etc. Traffic influences what gets posted in the group. Interest influences traffic. See also: comp.os.research comp.risks Naw, don't let me bias you, post away in this group. What about Japanese supers? =========================== What about the Japanese? Yes, they are here. They read this group, and they redistribute postings internally as they see fit. Oh yes, there is also fj.comp.parallel in the Other news.groups category, but you need to know how to display and read 16-bit Kanji. One special reference to this group is Dr. David Kahaner. You will hear about the Kahaner Reports. David is a numerical analyst by training. Formerly with LANL, NBS/NIST and under funding of ONR, he did a stint reporting electronically on technology events happening in Japan from Tokyo for many years and then became popular on the conference key note speaking circuit. These became particularly popular in the late 80s by "people in the know." This group found DKK reports propagating to a now defunct magazine because of a typo in his name (this is called traffic analysis). David actually asked for help in the news group: soc.culture.japan, proof that the soc.* groups are useful. [The net was smaller at that time.] David does not appreciate being quoted out of context (i.e. as evidence that American academia and industry are keeping up with developments in other parts of the world as evidenced once in comp.os.research; he does believe the US does not keep track of world developments). To this end: David has now left the wing of the ONR and has set up the ATIP: Asian Technology Information Program. This highly useful and deserving program can also use other sources of funding. See also: comp.research.japan. If you quote David's work, please cite him. David and the ATIP appreciate large monetary contributions. And yes, people have posted racist statements in comp.sys.super. The three letter shortening of "Japan" is appreciated by no one significant in this group. Save yourself the trouble: don't bother pissing me off. Don't use it. Don't encourage others to use it. Ref: %A Raul Mendez, ed. %T High Performance Computing: Research and Practice in Japan %I Wiley %C West Sussex, England %D 1992 %K book, text, %O ISBN 0-471-92867-4 And many others Where are the Killer Micros? ---------------------------- Where are the Networks of Workstations (NOW)? ============================================= See the separate PVM and MPI news groups. I need to flesh this out with Hugh LaMaster. What are the Killer Micros? =========================== Commodity micro processors. Usually cited in content of loosely coupled networked workstations or assembled into massively parallel The term was quoted and first appeared at Supercomputing'89 in Reno, NX by Eugene Brooks, III (Caltech and LLNL). Hey, I got my killer micros tee-shirt. Supercomputing people can identify the real orientation of a computer person by asking: How many bits are in "single-precision?" And they should know why. Answers tend to fall in three facial expressions. Exponent bits are also important. Now, what a machine does during arithmetic is a different issue. "Our new computer is *going* to be as fast as 'insert fastest existing CRI CPU...'" -- Kiss of death (See the Xerox ad on the same theme.) What's this I hear about 'super-linear speed up?' ================================================= Q: What is superlinear speedup? This is when the time to run a code on a "P" processor machines takes less than 1/P of the time to run on a single processor. This is a fallacy. Typically, this phenomenon is attributed to caching. ("P" caches are available in parallel executation, whereas only a single cache is utilized for the uniprocessor execution.) I think the issue of cache usage enhancement, which is the probably the most |frequent reason for seeing "superlinear speedup" (and why this isn't really |superlinear) is worth mentioning. It brings up two of _my_ favorite points in |high performance computing: |1) memory bandwidth is and always has been a critical factor in supercomputing |2) Dave Bailey's "How to fool the masses..." should be read and understood by | everyone in the business. A division seems to be forming around those doing AI/genetic search algorithms and those solving say "dense systems of linear equations." It is essential to know what camp you are reading/listening. The parallel processing community most concern and impressed with speed up as a metric is the parallel AI comunity. They tend to be very impressed with parallel search running real fast. They are the most academic. The most jaded members tend to be the numeric folk with big problems. IDENTIFY YOUR COMMUNITY CLEARLY (who you are talking with). Q: What is speedup? Speedup is a function of "P", the number of processors, and is the time to execute on a single processor divided by the time to execute the same code on P processors. Q: What is efficiency? This is the parallel algorithm's efficiency, defined as the best known time to solve the problem sequentially, divided by P times the parallel execution time on P processors. --------------------------------------------------------- Everyone should read: %A David H. Bailey %T Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers %J Supercomputing Review %V 4 %N 8 %D August 1991 %r RNR Technical Report RNR-91-020 %d November 1991 %P 54-55 %K Performance Evaluation and Modelling, %O http://www.nas.nasa.gov/NAS/TechReports/RNRreports/dbailey/RNR-91-020/RNR-91-020.html %X A brief jeremiad on the abuse of performance statistics. The author describes a dozen ways to mislead an audience about the performance of one's machine, and illustrates these techniques with examples drawn from the literature. Afraid of losing that all-important sale? Quote your own single-precision, assembly-level megafloppage; it's sure to look good against your competitor's double-precision, compiled-FORTRAN figures. %X Abstract: Many of us in the field of highly parallel scientific computing recognize that it is often quite difficult to match the run time performance of the best conventional supercomputers. This humorous article outlines twelve ways commonly used in scientific papers and presentations to artificially boost performance rates and to present these results in the ``best possible light'' compared to other systems. %X Summary: 1. Quote 32-bit performance results, not 64-bit results, or compare your 32-bit results with others' 64-bit results. ** 2. Present inner kernel performance figures as the performance of the entire application. 3. Quietly employ assembly code and other low-level language constructs, or compare your assembly-coded results with others' Fortran or C implementations. 4. Scale up the problem size with the number of processors, but don't clearly disclose this fact. 5. Quote performance results linearly projected to a full system. ** 6. Compare your results against scalar, unoptimized code on Crays. 7. Compare with an old code on an obsolete system. 8. Base MFLOPS operation counts on the parallel implementation instead of on the best sequential implementation. 9. Quote performance in terms of processor utilization, parallel speedups or MFLOPS per dollar (peak MFLOPS, not sustained). ** 10. Mutilate the algorithm used in the parallel implementation to match the architecture. In other words, employ algorithms that are numerically inefficient, compared to the best known serial or vector algorithms for this application, in order to exhibit artificially high MFLOPS rates. 11. Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment. 12. If all else fails, show pretty pictures and animated videos, and don't talk about performance. See also a similarly humorous paper by Globus on scientific visualization inspired by the 12). Statement: Supercomputers are too important to run ================================================== interactive operating systems, ============================== text editors, etc. ============= Troll. You've lost. This is a long time debate which dated from the 60s and ended in the late 80s (and will undoubtly come back in the future) with ETA/CDC. The usual answers are: the human or scientist/analyst/programmer time is more valuable than the machine time. You haven't seen some of the files I have had to text edit on the Cray. Interactive text editor on huge files on supercomputers isn't just fun, it's productive. You are worried about text searching? Then grab the spies and tell them to stop buying supercomputers. Where's the list of vendors? ============================ The list of vendors is intentionally spread across (distributed, decomposed, partitioned) across the FAQ. We will try to have email, name, and smail address contacts. The PARALLEL Processing Connection - What Is It? The PARALLEL Processing Connection is an entrepreneurial association; parallel@netcom.com (B. Mitchell Loebel). Dead Computer Society ..................... Burroughs ILLIAC IV 1 64-processor quadrant of a planned 4-quad, 256-PE machine. Numerous reference papers and a retrospective book were produced. The ILLIAC IV was placed at NASA Ames (civilian aeronautics agency) during the height of the Vietnam War to avoid possible damage during on-college campus student riots (computer bowl trivia). BSP One made, never delivered Texas Instruments (TI) Advanced Scientific Computer (ASC) Big long vector CPUs. 64-bit (or near) machine? (7 systems built - used internally= systems used for Seismic Data Processing by GSI (Geophysical Services Inc), a division of T.I ASC 1 prototype cpus=1 ASC 1A used internally cpus=1 ASC 2 used internally cpus=1 ASC 3 Army Ballistic Missile Research Centre, Huntsville Alabama (cpus=2 ?) ASC 4 GFDL/NOAA Princeton cpus=4 ASC 5 used internally cpus=1 ASC 6 used internally cpus=1 ASC 7 Naval Research Laboratory (NRL) Washington DC (cpus=2?) (1976) ASC 3, later moved to:? CEWES (Corps. of Engineere Waterways Experiment Station, Vicksburg MS USA) isn't among them. In the new computer center building, there's a series of photos which portray the History of the Center. This includes a photo whose caption says "TI ASC, January 79 to November 80". It doesn't give a serial number, but the dates are about right. MVS like OS? Evans and Sutherland ES-1. Evans & Sutherland Computer Division (ESCD) shipped two beta parallel supers in 1989 before taking the bullet from the mother company. Nine made, THREE BETA? Why was E&S significant? Evans & Sutherland is one of the foremost non-mainstream computer companies. Most number crunchers have never heard of it, because they are best known for state-of-the-art computer graphics systems like those used in real time flight simulators (training and research). The feeling was that if any company had the experience to challenge Seymour Cray and give him a run for his money it was going to be ESCD. It has been said of E&S, the parent company that more real in-hardware matrix multiplies have been done on installed E&S machines than on most other high-performance computers. Based in Salt Lake City, Utah. ESCD was in Mountain View, CA. E&S has been somewhat slow to get into the workstation market. Ivan Sutherland (no longer with the company now at SUN) is cited as the father of modern computer graphics and was also at ARPA helping to start the ARPAnet, perhaps one of the ten or so unknown unsung scientists in computing. The most recent dead body is ACRI, the heavily subsidised French supercomputer startup that was shut down this Spring after spending $100M+ over five years. A partial prototype was running at the end. Section on CDC/ETA moved to panel 26. Ametek Hypercube && Grids (meshes)? Culler Scientific [Miya is a child of the Culler-Fried System.] Pre-ELI/VLIW Multiflow 7/128, 8/256 Superscalar, ELI, VLIW, enormously long instruction (word) very long instruction word Myrias Canadian multiprocessor firm Started with 68K based scalar processors and proprietary back end processors. Portions of the software part of the company will still exist. SEL (Systems Engineering Laboratories) (purchased by Gould Electronics; became Gould Computer Systems Division). Gould Computer Systems Division (NP-1 was its first minisupercomputer). Much of Gould was purchased by a Japanese mining concern. The Computer Systems Division was sold to Encore. [I know about this because a friend of mine used to be Manager of Latin American Sales at Gould CSD's Corporate HQ in Fort Lauderdale, FL. One NP-1 was sold in 1987 to the Mexican Institute of Petroleum. The CSD sale to Encore happened not long after that.] BBN Computer GP1000 (Butterfly), TC2000 Voice funnel, signal processing, American Supercomputer Mike Flynn (SU) Prisma GaAs, Colorado Springs, CO Vitesse GaAs, a Division of Rockwell Supercomputer Systems, Inc. (Eau Claire, Wisc.) SSI (1) Steve Chen SS-1: Photos of a 2 CPU unit with Chen in a Faraday cage in Fortune. See Chen Systems and Sequent (S. Chen, CTO). Supercomputer Systems, Inc. (San Diego) Very little is known about this firm. I know one person who worked for them. Goodyear/Loral Aerospace See also the STARAN associative SIMD processor in the literature A version of the STARAN is used in the AWACS. MPP (always rumors of a second to be delivered to the NSA) decommissioned. The first MPP machine. No floating point. VAX/VMS based front-end. Cost $10M. Bit-serial, 2 Kb memory per processing element? 8 bit wide paths. Status: the one system was donated by the NASA GSFC to the Smithsonian. It is not on display but in storage. The MPP was replaced with Maspar systems. Inmos Transputers Popular processor (1982) in some circles. Well thought out communication scheme, but problems with scalability. Crippled because it's lacking US popularity. |However, you must mention Transputers (something developed in EUROPE, |outside of the U.S.A., name comes from TRANSistor and comPUTER) and the |related companies: |* INMOS (from GB), now bought by SGS Thompson (French), who was the |inventor and sole manufacturer of transputers |* Parsytec (still alive, but does not use Transputers any more, Germany) |* Meiko (GB) produced the "computing surface" |* IBM had an internal project (codenamed VICTOR) |and there are many more. Transputers had a built in communication agent and |it was very easy to connect them together to form large message passing |machines in regular, fixed topologies. INMOS' idea was that they should be |the "transistors" (building blocks) of the new parallel computers. Cray Computer Corp. (CCC) One of the nifty Cray 3's was the Cray3/SSS. It used some of the stuff done along the "beltway" if you know what I mean. I have the docs on the 3 and the 4. The following comes from some docs I got from the CCC team. Cray-3 ------ Logic Circuits GaAs SDFL 500 Gate-Equivalent 3.935 x 3.835 mm Memory Cricuits Silicon CMOS SRAM, 4Meg x 1, 25ns Cycle Time 8 MWords per Module Modules 4.1 x 4.1 x 0.25 Inches, 4 Modules per CPU 69 Electrical Layers, 22000 Z-Axis Connections Twisted Pair Interconnect Cooling Chilled Water/Flourinert Cabinets System Tank and C-Prod Typical Footprint 252 Square Feet Cray-4 ------ Logic Circuits GaAs DCFL, 5000 Gate-Equivalent, 5.4 x 5.4 mm Memory Circuits 21ns Cycle time, 16MWords per Module (SAA) Modules 5.2 x 5.2 x 0.33 inces, 1 Module per CPU 90 Electrical Layers, 36000 Z-Axis Connections Micro-Coaxial Interconnect Cooling SAA (Also Air Cooled) Cabinets One Cabinet Typical Footprint 215 Square Feet The Cray-3/SSS had PIM's. Nifty. The thing is made of 5.2"x5.2"x.33" modules. One for CPU and one per 16 MW of memory. (This is 21 ns cycle time Toshiba 4Mx1 SRAM, the same as in the Cray-3; they promise to go to 4Mx4 SRAM this year, doubling memory size and bandwidth.) The ratio is 4 memory modules per CPU. 72-bit words with SECDED. The CPU is solid GaAs. The chips are shaved thin, placed in 4x4 arrays on some sort of MCM-like board, and a 3x3 array of boards makes on logic layer. (The memory is different, as the memory chips aren't square.) The module has 4 logic layers (stacked on the outside) and apparently 90 interconnect layers with 36,000 Z-axis vias. The CPU runs at 1 GHz. The GaAs chips are billed as "5000 gate-equivalent", whatever that means. In "DFCL" logic, a term I know not. The modules are stacked against each other and fluorinert is chilled and pumped up to drain through them by gravity. The CPU and memory modules are bolted at one side to bus bars delivering 3 supply voltages + gnd. On the other is a familiar-looking mat of grey wires. The wires are apparently actually miniature coax cables! The connectors are single pins on a fine (0.7 mm or so) pitch, with the male ends being hollow gold-plated cylinders and the females being pins recessed in plastic. (I suppose this circularly symmetric arrangement is good for controlled impedance or something.) The end of the CPU module contains 4 rows of 4 sockets, each socket containing 2 rows of 39 pins. 2/3 of the pins were signal; the others were grouns in a S-G-S S-G-S configuration. This provides for a total of 832 signals from the CPU. The memory modules leave one row of sockets empty, leaving only 624 signals. The sockets attached to plugs that tended to have 3 or 6 pins and 2 or 4 wires coming out the back ends. It was all some sort of white plastic. The plugs were pretty easy to insert and remove when I tried them. Some cables also had plugs in the middle. The basic unit of processing is a 4-processor "quartet" with 256 MW of memory. Each processor has a full-duplex 32-bit HiPPI channel. The processor is billed as using the IEEE floating-point format but not, I note, as doing IEEE math. Now, on to the architecture. All registers are 64 bits wide. There are three main sets of three buses, one result bus and two operand buses. These are: Vi, Vj, Vk: The vector result and operand buses (respectively: Vi is result) Si, Sj, Sk: The scalar result and operand buses Ai, Aj, Ak: The address result and operand buses The memory part of the diagram confuses me. There are clearly 2 bidirectional memory ports. Each has a "Retry Buffer". Each is connected to 8 rows of 4 "Memory Ranks for Each Port", the D unit (they are labelled A, B, C and D) of each row is shown connected to one of the 8 "Octant"s. Each "Octant" contains 18 (eighteen) memory banks. Instruction fetch (8 instruction buffers with 32 entries each) is connected to port 0. (The line is marked "Fetch Control".) The HiPPI and console interfaces are connected to port 1. They are also shown connected to memory (directly, not via one of the ports, which confuses me), the vector registers, the scalar registers, the address registers, the instruction buffers, the program counter, and some utility registers: - Exchange Address - Limit Address - Base Address - Error register - Status Register Arithmetic bus Ai, Scalar bus Si and vector bus Vi all connect to a line between the two memory ports that seems to mean "either". There are three integer vector functional units: integer, logical, and shift. These get inputs from Vj, Vj, Sj, Sk, Ak or the vector mask register and deliver results to Vi or the vector mask register. There are two floating point functional units, shared between vevtor and scalar. These take resuts from Vjk or Sjk, and deliver results to Vi, Si or (apparently) the vector mask register. There are three integer scalar functional units: integer, logical and shift. These take input from Sj, Sk or Ak and deliver results to Si. A 64-bit real time clock is readable on Si and writeable from Sk. The vector length register is readable on Ai and writeable on Ak. There is an address adder and a 35-bit address multiplier. Inputs on Aj and Ak and output on Ai. The functional units are marked as supporting "chaining" and "tailgating", which I don't understand and neglected to ask about the meaning of. The program register is readable on Ai and writeable from Aj, as well as being fed from the instruction buffers, the console/HiPPI interface and a built-in incrementor. (+1/3/5) There are 8 vector registers of length 64, triple-ported on Vi (write), Vj and Vk (read), 8 scalar aregisters (Si, Sj and Sk) and 8 address registers (Ai, Aj and Ak), as well as 64 "temporary" registers (apparently an addition since the Cray-3) that are read/write accessible on Vi, Si and Ai. "Local memory has been eliminated and replaced by a new set of registers-- the temporary (T) registers" There are also "up to" 64 semaphore flags. The memory transfer rate is billed as 2 GW/sec/processor. Up to 32 processors may go into a single "node" in one cabinet. A "cluster" bus of 2 GB/sec (full-duplex, per node, <= 4 nodes) is promised "mid-1995" for systems up to 128 processors. There's some mention on a features list of 64-bit HiPP. ("Support for 200 Megabyte per second, 64-bit HiPPI channels is planned for mid-1995." Apparently a >4 processor system can be subdivided and operated as a series of smaller systems. Only even power-of-2 divisions are shown. The console (one per node) is a MIPS processor running some flavour of Unix. It's claimed to be lots better than the Cray-3 console. The machine itself runs "Extended Unix" (CSOS) and they talk about F77 (with F90 extesions) and ANSI C compilers. They say they don't plan parallel C, although if someone makes a standard they'll implement it. (Various noises about X, TCP/IP, Motif, and open systems whatnot deleted.) The Cray-3 "Foreground processor" was eliminated. The system fits all into one cabinet. 53"wx34"dx48"h for one quartet and 77"wx43"dx48"h for up to 4 quartets. It uses chilled water, although there's something about "Also Air-Cooled Versions". That's essentially all that was in the glossy. There's a scary-looking photo of a GaAs die-testing head. The glossy was printed in November '94. Apparently to test the system without boiling off oddles of fluorinert they run it dry, pulsing it on for a few hundred ns and off for some milliseconds (apparently the pulses are distinguishable to the ear, implying about 10/sec) to run diagnostics. Cray-3, Cray-3/SSS [on the NSA Web page, BTW, I recommend a personal visit to the National Cryptologic Museum (in person) to actually play with a captured Nazi Engima machine: one of the main reasons why the world has computers today], Cray-4 (and some Cray-2s). The Cray-4 was a marked departure from the 1,2,3 line. Dropped from 2,3 was the use of local memory, back to a large number of local temporary registers (T). IEEE floating point was adopted. Chaining, which existed on the Cray-1 but not on the 2/3, was present, along with "tailgating" (a feature of late-model Cray-2 and Cray-3 machines). You really *ought* to mention GaAs and the assembly/cooling technology somewhere. You should probably also note that the Cray-4 was the first (and thus far the only) 1GHz machine - let us all hope it wasn't the last. Those of us who have worked for both companies (some on more than one occaision) cannot help but note that Cray Computer has actually outlasted Cray Research as an independent (in spite of bankruptcy and never actually *selling* a machine) company - thus completing the transition from a manufacturer of "Big Iron" to one of "Big Irony". [I want credit for that one ;-)] --Stephen O Gombosi By request now here: It's time to move CRI to "the dead computer society" - it's no more alive than Supertek or FPS (except as a trademark): Silicon Graphics acquired. Cray Research, Inc. (CRI) 1 800 BUG CRAY comp.unix.cray Cray-1 Cray-2 Cray X-MP Cray Y-MP Cray MP (C-90) (etc. J-90) Cray T3D, T3E, T90, SN, etc. CS6400 -- The CS6400 line was acquired by SUN Micro Systems. Cray Users Group (CUG) and unicos-l mailing list http://www.cug.org http://persephone.inel.gov/CUG/GSIC.html http://wwwCUG.uni-stuttgart.de/CUG/cug.html If a mailinglist for example is called "unicos-l@cug.org", then the -request address can be inferred from this to be: "unicos-l-request@cug.org". To subscribe to a mailinglist, simply send a message with the word "subscribe" in the Subject: field to the -request address of that list. To unsubscribe from a mailinglist, simply send a message with the word "unsubscribe." Kendall Square Research (KSR) Ardent The Western trade Interesting multiprocessor MIPS R2000 and R3000 with vector units (Weitek) based Unix workstation boxes. Almost serious 7600 in a box Very notable and powerful Fortran compilers. Graphics engines. Dore' graphics system. An early leader in "scientific visualization." Stellar The Eastern trade Interesting multiprocessor with vector units (Weitek) based Unix workstation boxes. Almost serious 7600 in a box Very notable and powerful Fortran compilers (purchased from Convex). X-based graphics engines. Stardent The "shot gun" wedding of Ardent and Stellar. ^^^^^^^^^^ not my term, but I wish it was. See also KPC. Became..... Kubota Pacific Computer (KPC) Now, ..... Picker Medical Imaging Systems Live Computer Society ..................... CPP - {Cambridge Parallel Processing}/ICL/Active Memory Technology (AMT)/ DAP (Distributed Array Processor) -- Several generations. SIMD, medium grain, special-purpose (like MPP), transputers or SPARC processors Centennial Court, Easthampstead Road, Bracknell, Berks. RG12 1JA. Irvine, CA H-P was Convex Computer comp.sys.convex C-1, C-2, C-3, C-4, Exemplar Series It has been said that once a machine gets popular enough for a news group that it becomes old hat. We hope. data from the mini-supermarket place. Convex (company founded 1982, first shipment in 1985) C1 1982-1985 3 years C2 1985-1988 3 years (actually first shipment was end of 1987) C3 1988-1991 3 years C4 1991-1994 3 years The C1 was successful, the C2 was wildy successful, the C3 was late and too expensive, the C4 was late due to techology availability delays. Each product was 2-3 times faster per processor than its predecessor. No technology was projected to be available by 1997 that would provide a similar speedup to justify a C5. Basically high-density high-performance gate arrays were the driving technology for vector mini-supers, and there was not enough volume in the business to fund the basic technology R&D to keep pace with the rate of improvement in the microprocessor business. >From 1987 through 1994, the average annual revenues were in the range of $100-200M per year. R&D was 17% of revenues, and the company was publicly traded that whole time. So proof by example, that part of the model can work. What did not work was that microprocessors were improving by a factor of 2 every two years, while mini-supers were improving by a factor of 2 every three years. That growth rate differential will beat anyone eventually. Which is why Convex started working on parallel microprocessor based systems in 1993. Useful popular reference: pre-Convex: %A Tracy Kidder %T The Soul of a New Machine %X Data General development of the MV/8000 at the same time as the DEC VAX-11/780. The Convex dead competitor grave yard. Digitize for a web page some where. DEC (Digital Equipment Corp). ftp://ftp.dbit.com/pub/pdp10/info/soul.txt DEC Alpha chip used in Cray T3[DE] DEC resells for MasPar and CRI (EL/J series). Company alive. Digital owns about 20% of the mid-range high-performance computing market, second after SGI/Cray! (according to IDC). Also in here should be the DECmpp, the SIMD machine manufactured by Maspar and sold by Digital. The reason it should be here is that it started as a research project in Digital; after a working 16k processor prototype was built (it became operational in 1986), the rights to parts of the design were transferred to Maspar, and was the basis of the MP-1. The research prototype consisted of 16k 4-bit processors, a multi-stage bi-directional interconnection network, and a controller. Research software consisted of a Pascal compiler, a debugger, and various demos. History: The VAX 9000 FireFly bus-oriented multiprocessor (non-product) The original Firefly was based on MicroVAX II chips. A later version, the Firefox, used MicroVAX III chips, and was commercialized as the 3520 (2 processors) and 3540 (4 processors) workstation. I had a demo of my Linda implementation (the one you mention as from LRW Systems; thanks for noticing me - I'm still there, though just barely) at DECWORLD that used a 3540, among other systems. It ran fine. The 35x0's were intended as a partial answer to Steller/Ardent's model of a super graphics system. They came with all kinds of graphics software. Unfortunately, even 4 times a MicroVAX III chip wasn't super fast by the time the machine came out (about 13 MIPS if you managed to get linear speedup). If you wanted to play with a real parallel implementation, it was nice, though. The original paper on the Firefly (by, I think, Chuck Thacker and a few others) had two great quotes, both in response to criticisms that the design was two conservative, especially in using the by-then very slow MicroVAX II chipset: "Sometimes it's better to have 10 millions cycles by Friday than 10 million cycles per second"; and, describing the machine, "It may not be that fast, but it's got a lot of torque". Andromedia/M31 The *real* DEC experimental multiprocessor was the Andromeda, a shared memory machine with up to 64 MicroVAX II processors. I got to run some experiments on one with about 28 processors - it was a dead machine by then, and the available processor cards were split between two machines (plus VMS support was limited to 32 processors anyway). There was a Sisal implementation project for the thing, too. Worked fine, for what it was. When the VMS implementors learned about it, they liked to use it to really pound on their SMP code. Fujitsu IBM-cloned architectures as well as independent research efforts. VP-200 [later VP-50/100/400] VP-2000 series (up to VP-2600) AP-1000 VPP-500 VPP-300 VPP-700 (like the CMOS-based 300 except w/ external X-bar) Fujitsu VPP-500: Up to 222 processors (140 the largest actually delivered to the customer), each having 8 add/mul pipes, 10ns clock, shared memory crossbar network with 400MBx2 point-to-point bandwidth 1.6GF/PE, 236 GF peak for 140 PE machine with 9.5ns clock (delivered 1993). Hitachi IBM-cloned architectures as well as independent research efforts Vector Supercomputers: S-810 S-820 S-3600 and S-3800 Hitachi S-3800: 4 vector processors, each having 8 add/mul pipes, 2ns clock, shared memory -> 8GF/PE, 32 Gflops peak (delivered in Jan. 1994 to University of Tokyo). SR-2201 (PAX follow on), QCD machine http://www.cs.cmu.edu/Web/Groups/scandal/www/vendors.html RISC Cluster Server: 3500 Cluster MPPs: SR2001 SR2201 IBM Starting with the 3090-600 VF to SP-1 and SP-2 http://ibm.tc.cornell.edu/ibm/pps/doc/ http://lscftp.kgn.ibm.com/pps/vibm/index.html http://www.tc.cornell.edu/~jeg/config.html GF-11 (Monty's machines) TF-1 RP3 IBM SJ RC, open seminar presentation: IBMer in the audience: Is it IBM370 instruction set compatible? [After the presenter noted the CPU was based on the 801.] Vulcan RS/6000 based Superscalar...... Numerous IBM user groups exist from SHARE to much smaller groups. mailing lists: vfort s-comp (Bitnet) Intel Scientific Computer now called Intel Scalable Systems iPSC/1 iPSC/2 Touchstone Alpha, Beta, Gamma, Delta, Sigma Paragon Systolic product KAI: Kuck and Associates, Inc. (software only) Cedar Project/Alliant The KAP/Pro Toolset ( http://www.kai.com ) is a new direction that takes KAP to the next level of difficulty. We have reintroduced the PCF/X3H5 programming model directives into the Fortran language, so that a single parallel program can be easily recompiled on various machines. However, since this is an explicit programming model, rather than the implicit/automatic model that KAP previously used, we have introduced a new notion of automatic debugging. The radically different part of the KAP/Pro Toolset is Assure. It dramatically alters how you look at the creation and debugging of parallel programs. It tells you why (and where) your parallel program will produce different results from running the same program serially. MasPar Bit serial SIMD in Sunnyvale. Transitioning to a software company? MP-1, MP-2 Meiko Computing Surface The Computing Surface was Meiko's first attempt. They now have a sparc-based machine called the CS-2. (CS=computing surface). Machines located at LLNL (largest) and UCSB, others Precisely the same mistakes as ICL, Acorn, and so many others. Technically good - pity about the contact with reality. This section pending changes from "anonymous." nCUBE Impressive because small portable computers are made and demonstrated on road shows. NEC SX-2, SX-1, SX-3, SX-4 Cenju -series Long vector shared memory machines. Cenju series currently made with MIPS R4400 CPUs. NEC SX-4: Up to 512 processors (shipment started December 1995), 2 Gflops each. Up to 32 processor share memory. 16 nodes connected by crossbar. The NEC SX-4 does not have virtual memory. The SX-4 uses paged map addressing, but it does not demand page. All the instructions and data must be resident in main memory for execution. Reference to an unloaded page == abort. The advantage of paged map addressing is that there is no need to garbage collect (the infamous storage move operation) and it greatly enhances the ease and efficiency of partial swapping. ParaSoft (Express) Arthur Hicken ParaSoft Corporation Technical Support voice: (818) 792-9941 fax : (818) 792-0819 email: ahicken@parasoft.com Parsytec Carsten Rietbrock Parsytec Computer GmbH Marketing Roermonderstarsse 197 52072 Aachen Germany Tel: +49(0)241-8889-0 Fax: +49(0)241-8889-50 Email: carsten@parsytec.de Cserve: 100303,3362 http://www.parsytec.de Parsytec is manufacturing and marketing parallel systems for industry and R&D since 1985. More than 1600 systems have been installed world-wide to date, comprising single processor boardlevel systems and up to 1024 processor MPP systems. Parsytec first utilized the INMOS Transputer series and in 1993 switched to the emerging PowerPC family. Current systems (PowerXplorer, GC/PowerPlus and the new CC-Series) are being used in industrial applications like postal automation, scheck processing and related computational intensive pattern recognition tasks, as well as in scientific applications like CFD, FEM and related fields in simulation and processing (currently up to 15GFlops). Parsytec has subsidiaries in the US (Chicago), Israel (Telaviv), Germany (Aachen is the head-office, Chemnitz) ; The Netherlands (Oss, you can see the wizard there) and sales partners world-wide such as Matsushita (Japan), Sundance (UK), PSS (Sweden), Paratec (Korea) etc. Parsytec Inc. 245 W. Roosevelt Rd. Bldg.9, Unit 60 West Chicago, IL 60185 708-293-9500 cindy@parsytec.de Hitex (.de) Cluster workstation parallelism URL: http://www.hitex.com/parastation Portland Group http://www.pgroup.com HPF Compilers SMP Compilers Thinking Machines Corp. (TMC, frequently mistaken for TMI) [half-alive] SIMD: CM-1, CM-2, CM-200 MIMD: CM-5 (SPARC based) http://north.pacificnet.net/bklaw/chapter11doc.html TERA http://www.tera.com Follow on to Denelcor Learning to live with latency. Process handling in hardware Full-empty bit memory Unisys (See also Burroughs) OPUS, the Unisys entry into the MPP (or SPP "scalable PP" as the marketing department prefers to call it. It is base on the Paragon mesh, except with the nodes replaced with our own design using Pentium processors and ethernet and SCSI on every node to adapt it to database use. The OS is based off the Chorus microkernel with a SysVr4.2 shell with a single system image so all messaging is with standard Unix IPC (don't get me going on that, I'll just say I prefered the ipsc860 and Paragon API). Still has a light show, just our low buck commercial version whick lacks the Darth Vader look. The OS group is now in the building with John Van Zandt. (line to be removed) The marketing blurbs are off the Unisys home page and the Intel SSD pages. My current project is getting Windows NT running on it using Intel's reference work as a base for the next generation system (p6 based using same mesh as their "terraflop" machine). I don't think Sandia will be asking for any SW from me on that one. I will try to round up some of the brochures floating around the office for you (handy fire starters). Are/were they supers? ..................... Arete Avalon A12. http://www.teraflop.com/acs/acs.html Encore Gordon Bell helped to found bus oriented multiprocessor purchased Gould CSD in Florida. FLEX -- Flexible Computer Bus based multiprocessor similar to the Sequent. A follow on to the NASA LaRC Finite Element Machine (FEM[1|2]). Based in Texas. Used 68020? I used the LaRC machine once, and we had one presentation. Reliability was a concern, but they did not understand the OS market. It's not clear why they folded other than market pressure (make out later). HaL SPARC based server Tandem Himalayan http://www.tandem.com Chen Systems (follow on) Eau Claire, Wisc. http://www.chensys.com CHEN-1000 servers, keywords: Pentium, Autopilot, Sequent A promising, bus-oriented multiprocessor which has decided to go the business software route. Forked off a division of Intel back in the days of the 432 processor. http://www.sequent.com/public/fast.html http://www.sequent.com/public/big.html http://www.sequent.com/public/smart.html SUN (Microsystems) A relatively late player in the multiprocessor game. They have concentrated more on loosely coupled workstations and moderate speed networks. Applied Parallel Research (APR) 1723 Professional Drive Sacramento CA 95825 :Voice: (916)481-9891 E-mail: support@apri.com received:FAX: (916)481-7924 APR Web Page: http://www.infomall.org/apri CHARM L.V. Kale kale@cs.uiuc.edu Department of Computer Science, (217) 244-0094 University of Illinois at Urbana Champaign 1304 W. Springfield Ave., Urbana, IL-61801 Linda One Century Tower 265 Church Street Scientific Computing Assoc New Haven, CT 06510 (203) 777-7442 (203) 776-4074 (fax) email: software@sca.com LRW Systems (VMS based) 24 Old Orchard Lane Stamford, CT 06903 (203) 329 0921 Email: lrw@lrw.com Public Domain Linda: -------------------- ariadne.csi.forth.gr:posybl/POSYBL-1.102.TAR.Z . by Ioannis Schoinas (sxoinas@csd.uch.gr) on a network of Sun/3 or Sun/4 or MIPS. Non-Public Domain Linda: ------------------------ AugmenTek has two that are not free but are inexpensive. One is a C implementation and runs on Amiga computers, another is a REXX implementation that runs on either Amiga or OS/2. See AugmenTek's web site: (general) http://www.halcyon.com/sbr/ (REXX version) http://www.halcyon.com/sbr/rexinda.html (C version) http://www.halcyon.com/sbr/torqueware.html Where's the list of research projects? ====================================== To repeat: many projects are sensitive by their very nature, most of this list is likely to be academic. The open list of research projects. Q: What are some portable C-like parallel languages? Split-C from Berkeley http://http.cs.berkeley.edu/projects/parallel/castle/split-c/ Split-C is a parallel extension of the C programming language that supports efficient access to a global address space on current distributed memory multiprocessors. It retains the "small language" character of C and supports careful engineering and optimization of programs by providing a simple, predictable cost model. This is in stark contrast to languages that rely on extensive program transformation at compile time to obtain performance on parallel machines. Split-C programs do what the programmer specifies; the compiler takes care of addressing and communication, as well as code generation. Thus, the ability to exploit parallelism or locality is not limited by the compiler's recognition capability, nor is there need to second guess the compiler transformations while optimizing the program. The language provides a small set of global access primitives and simple parallel storage layout declarations. These seem to capture most of the useful elements of shared memory, message passing, and data parallel programming in a common, familiar context. Split-C is currently implemented on the Thinking Machines Corp. CM-5, the Intel Paragon, the IBM SP-2, the Meiko CS-2, the Cray T3D, SGI Challenge, and over any platform supported MPI or PVM. All versions are built using the Free Software Foundation's GCC and the message passing systems available on each machine. Faster implementations are underway for networks of workstations using Active Messages. It has been used extensively as a teaching tool in parallel computing courses and hosts a wide variety of applications. Split-C may also be viewed as a compilation target for higher level parallel languages. Why is this FAQ structured this way? ==================================== Because I first coined the term and started Johnny Appleseeding them with different ideas to work on problems like people asking: "Where's the FAQ?" This is called a "chained FAQ" or a "chain." It's a method of distributing or partitioning. These files tend to grow and they have to break up in space and time anyway. Another reason is that because it goes out regularly, it's like a lighthouse beacon to let you know that connectivity is (or isn't) taking place. The panels make a good divider (it's a bit tough on those people reading this group via email, but they can adjust. You are encouraged to adopt an intelligent news reader which use Killfiles (and you can version control kill these panels and read them only when a change has taken place). Some smart mailers have kill file capability. Supercedes: and Expires: don't always work. Hey, Burton and Dorothy Smith approved. What is Miya's Exercise? ........................ This was first posted to net.arch in the 1984ish time range. The inspiration is derived from attempting to understand quantum mechanics and the nature of light. Is light a particle or a wave? They said, "Okay, on Mondays, Wednesdays, and Fridays light is a particle. On Tuesdays, Thursdays, and Saturdayes, light is a wave. On Sunday we rest." Take what ever your terminology is: multiprocessing, parallel processing, etc, and throw it out for a week. Use another set of words to mean the same thing. Throw those words out. Select another: e.g. competitive and cooperative processing (these are particularly good ones which came out over time). The discipine is harder than you might imagine. Continue removing words and phrases. Don't laugh, it has proven useful. ----- Q: What are some realistic (non-PRAM) models for analyzing the complexities of parallel algorithms? LogP: Towards a Realistic Model of Parallel Computation ftp://ftp.cs.berkeley.edu/ucb/TAM/ppopp93.ps Block Distributed Memory Model ftp://ftp.cs.umd.edu/pub/papers/papers/3207/3207.ps.Z Bulk Synchronous Parallel Model http://www.scs.carleton.ca/~palepu/BSP.html What is Salishan? ----------------- The Salishan Lodge is a Five-star, seaside, golf resort hotel and conference center (it has its own small runway) on the coast of Oregon near Glen Eden Beach (about a three-hour drive from Portland). The Conference on High Speed Computing sponsored by the DoE's LANL and LLNLs and was the brain child of B. Buzbee, G. Michael, R. Ewald, Occasionally other agencies (DOD, NASA, etc.) also contribute. The Conference is invitation only for about 100 people to keep discussion manageable. A waiting list exists. [Personally I stopped going after chairing a session, too many managers wanting to meet too many famous people; I found another better conference, not super admittedly.] The Conference (which usually does not have proceedings) attempts to bring together people who have big computing problems with knowledgeable people with technical ideas and with people who have money, etc. Similar meetings exist elsewhere (like Asilomar). The basic idea is: 1) get rid of marketing, 2) keep it technical, 3) keep it informal [this is evident with one specific room, a lot of wine, and a chalk board (it's a somewhat academic in nature Conference)]. The setting: The Lodge is part of a very small chain which includes the Salish Lodge near Seattle. The TV-series Twin Peaks was filmed near and occasionally inside the Salish Lodge (called "The Great Northern", this gives a very small sense of what the hotel is like). Trivia. What makes it Five-Star? ------------------------ It's very expensive. It has complete, full-time 24-hour service. Some people don't like it. Aren't we disgressing from supercomputing? ------------------------------------------- You asked. This type of conference is popular. >From the Salishan Meetings sprang the Supercomputing'88 and subsequent Supercomputing'xx Conferences. This is the Conference's logo (simplified): ^ A s / \ r m / \ c h / \ h t / \ i i / \ t r / \ e o / \ c g / \ t l / \ u A / \ r <_____________________> e Language Flynn's terminology ------------------- SISD: [Flynn's terminology] Single-Instruction stream, Single-Data stream SIMD: [Flynn's terminology] Single-Instruction stream, Multiple-Data stream MISD: [Flynn's terminology] Multiple-Instruction stream, Single-Data stream MIMD: [Flynn's terminology] Multiple-Instruction stream, Multiple-Data stream %A M. J. Flynn %T Very High-Speed Computing Systems %J Proceedings of the IEEE %V 54 %N 12 %D December 1966 %P 1901-1909 %K btartar %K maeder biblio: parallel architectures, %K grecommended91, %K JLb, %K process migration, %X The original paper which classified computer system into instruction and data stream dimensions (SISD, SIMD, etc.). %X Classification of parallel architectures, based on the division of control and data. SISD, SIMD, MIMD, MISD. Are/were there MISD machines? ----------------------------- It depends who you talked to. Generally, yes, a few people stated, no. From: David Bader Subject: comp.parallel I read the FAQ proposal you posted and liked it a lot. Covers most of the dirt. Here are some ideas that I had, in no particular order (just to make your life a little more difficult ;) [sigh!] I consider myself a long-time supercomputing/parallel junkie, since I remember the formation of comp.hypercube ;) [you qualify] --------------------------------------------------------- Pointers to Benchmarks: micro benchmarks: PARKBENCH: Parallel Kernels and Benchmarks Report http://www.epm.ornl.gov/~walker/parkbench/ Genesis Benchmark Interface Service http://hpcc.soton.ac.uk/RandD/gbis/papiani-new-gbis-top.html macro benchmarks: The NAS Parallel Benchmarks http://www.nas.nasa.gov/NAS/NPB/ macro benchmarks for shared memory machines: SPLASH from the Stanford Flash Project http://www-flash.stanford.edu --------------------------------------------------------- What is the HPF (high performance fortran) standard? See High Performance Fortran Forum http://www.erc.msstate.edu/hpff/home.html The High Performance Fortran Forum (HPFF), led by Ken Kennedy of the Center for Research in Parallel Computing, is a coalition of industry, academic and laboratory representatives working to define extensions to Fortran 90 for the purpose of providing access to high-performance architecture features while maintaining portability across platforms. DEC: http://www.digital.com/info/hpc/f90/ http://www.fortran.com/fortran/ hpff Main HPFF list, including meeting minutes hpff-core "Core" list, for attendees at the meetings hpff-interpret Questions and answers about interpretation of the HPF specification hpff-distribute Discussions of advanced data mapping (e.g. irregular distributions) for HPF 2.0 hpff-task Discussions of task parallelism and paralle input/output for HPF 2.0 hpff-external Discussions of external interfaces (e.g. linking with C++) for HPF 2.0 >From now on, the correct procedure for adding yourself to the hpff-xxx list is to send mail to hpff-xxx-request@cs.rice.edu. In the body of the message (*not the Subject:, as before*), put the line subscribe is optional; it defaults to the message sender. You should probably only include it if people often have difficulty replying to your e-mail directly. Similarly, to remove yourself from the hpff-xxx list, you should send mail to hpff-xxx-request@cs.rice.edu with the line unsubscribe --------------------------------------------------------- What is the MPI (message passing interface) standard? See Message Passing Interface http://www.mcs.anl.gov/mpi/index.html http://WWW.ERC.MsState.Edu:80/mpi/ http://www.epm.ornl.gov/~walker/mpi/ MPI stands for Message Passing Interface. The goal of MPI, simply stated, is to develop a widely used standard for writing message-passing programs. As such the interface should establish a practical, portable, efficient, and flexible standard for message passing. Message passing is a paradigm used widely on certain classes of parallel machines, especially those with distributed memory. Although there are many variations, the basic concept of processes communicating through messages is well understood. Over the last ten years, substantial progress has been made in casting significant applications in this paradigm. Each vendor has implemented its own variant. More recently, several systems have demonstrated that a message passing system can be efficiently and portably implemented. It is thus an appropriate time to try to define both the syntax and semantics of a core of library routines that will be useful to a wide range of users and efficiently implementable on a wide range of computers. IEEE URL: http://computer.org/parascope/ URL: http://www.tgc.com/HPCwire.html Here's the text from that page: HPCwire The Text-on-Demand E-zine for High Performance Computing Delivered weekly, to over 19,000 readers. If you're interested in news and information about high-performance computing and you have access to Internet e-mail, send an e-mail message to: e-zine@hpcwire.tgc.com to receive a free 6-week (this is now 4-week) trial subscription to HPCwire. Every issue includes news, information, and analysis related to both engineering/scientific and commercial applications. HPCwire also includes employment opportunity and conference listings. Topics regularly covered include: * Supercomputing * Mass Storage * Technology Transfer * On-line Transaction Processing * Parallel Processing * Industial Computing * Client/Server Applications * Fluid Dynamics HPCwire Sponsors: * Ampex * Avalon Computer * Cray Research, Inc. * Digital Equipment Corp. * Fujitsu America * Genias Software * HNSX Supercomputers * IBM * Intel * MasPar Computer * Maximum Strategy * nCUBE * The Portland Group * Silicon Graphics, Inc. * Sony Corp. HPCwire is published by Tabor Griffin Communications, the leader in Internet-delivered news and information Tabor Griffin Communications 8445 Camino Santa Fe, Suite 202 San Diego, CA 92121 619-625-0070 human@tgc.com START section on academic projects Mentat and Legion programming environments. http://legion.virginia.edu/ Mentat uses dataflow on the object level for this. The programmer has to decide what the unit of work is and encapsulate it into an object; the run-time system then wrings out whatever parallelism can be found between these objets. It's definitely an interesting approach. NIST Parallel Processing http://www.itl.nist.gov/div895/sasg/parallel/ ============================= The "Ordinance Museum" ============================= Histories of various Supercomputer centers (yes, _some_ will not be able to participate). It would answer questions like which machine replaced which machine. Also, which machines were long-lived, and which weren't. Call it the "Ordinance Museum" (if old programmers tell war stories... ), or some such. (And there really is an Ordinance Museum at BRL.) List of supercomputer facilities (yes, we know) Some of this could probably be found on various Web sites. Check. LLNL LANL BRL NCAR The spooks. CEWES (Corps. of Engineere Waterways Experiment Station, Vicksburg MS USA) CEWES (from the Photos on the Wall): IBM 650 8/57 - 11/62 GE-225 8/62 - 10/68 GE-437 8/68 - 6/72 GE-635 3/72 - 7/81 TI ASC 1/79 - 11/80 Honeywell DPS 1 7/81 - 12/84 Honeywell DPS 8/70 12/84 - 10/91 Cray YMP 11/89 - 12/96 CDC 962 4/90 - Cray C90 7/93 - SGI PCA 7/96 - Cray T3E 2/97 - SGI O2K 3/97 - IBM SP 8/97 - Articles to parallel@ctc.com (Administrative: bigrigg@ctc.com) Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel