Newsgroups: comp.parallel,comp.sys.super
From: eugene@sally.nas.nasa.gov (Eugene N. Miya)
Reply-To: eugene@george.arc.nasa.gov (Eugene N. Miya)
Subject: [l/m 2/27/98] network resources -- comp.parallel (10/28) FAQ
Organization: NASA Ames Research Center, Moffett Field, CA
Date: 10 Mar 1998 13:03:19 GMT
Message-ID: <6e3dmn$836$1@cnn.nas.nasa.gov>

Archive-Name: superpar-faq
Last-modified: 20 Jan 1998

10	Related news groups, archives, test codes, and other references
12	User/developer communities
14	References, biblios
16	
18	Supercomputing and Crayisms
20	IBM and Amdahl
22	Grand challenges and HPCC
24	Suggested (required) readings
26	Dead computer architecture society
28	Dedications
2	Introduction and Table of Contents and justification
4	Comp.parallel news group history
6	parlib
8	comp.parallel group dynamics


Related News Groups
-------------------
        Child groups:
                comp.parallel.pvm (unmoderated)
                        http://www.epm.ornl.gov/pvm/pvm_home.html
                        http://www.netlib.org/pvm3/index.html
                        http://www.netlib.org/pvm3/faq_html/faq.html
                        http://www.netlib.org/pvm3/book/node1.html
                        ftp://netlib2.cs.utk.edu/pvm3
                        http://www.nas.nasa.gov/NAS/Tools/Outside/
 
 
                comp.parallel.mpi (unmoderated)
                        http://www.mcs.anl.gov/mpi
 
        comp.arch (unmoderated) # our parent news group.
                comp.arch.arithmetic (unmoderated) # step kids
                comp.arch.storage: many news groups discuss parallelism/HPC
        comp.os.research (moderated, D. Long, UCSC)
        comp.sys.convex (unmoderated)
                # I wonder if this will become an h-p group.
        comp.sys.alliant (unmoderated)
        comp.sys.isis.  Isis is a commercial message passing package for
                C and Fortran (at least); features fault-tolerance.
# defunct:
#        comp.sys.large (unmoderated) # The term "Big iron" is used.
#                                # more mainframes and distributed networks
        comp.sys.super (unmoderated)
        comp.sys.transputer (unmoderated)  (consider also OCCAM here)
        comp.unix.cray (unmoderated)
        comp.research.japan (moderated, R.S.,UA)/soc.culture.japan (unmoderated)
        sci.math.*
        in many different forms, you will even find it in places
        like bionet.computational, but it is not the intent of this list to
        anywhere near complete.  Locate application areas of interest.
        comp.benchmark
        aus.comp.parallel
        fj.comp.parallel (can require 16-bit character support)
        alt.folklore.computers: computing history, fat chewing
others
 
Note: all these news groups are given as options (and other news groups).
Nothing will stop you from posting in this news group on most any topic.


Where are the parallel applications?
------------------------------------
Where are the parallel codes?
-----------------------------
Where can I find parallel benchmarks?
=====================================

High performance computing has important historical roots with some
"sensititivity:"
1) Remember the first computers were used to calculate the trajectory of
artillery shells, crack enemy codes, and figure out how an atomic bomb
would work.  You are fooling yourself if you think those applications
have disappeared.
2) The newer users, the simulators and analysts, tend to work for industrial
and economic concerns which are highly competitive with one another.
You are fooling yourself if you think someone is going to just place their
industrial strength code here.  Or give it to you.

	So where might I find academic benchmarks?
		parlib@hubcap.clemson.edu
			send index
		netlib@ornl.gov
			send index from benchmark
		nistlib@cmr.ncsl.nist.gov
			send index

	See also:
		Why is this news group so quiet?

		Other news groups:
			sci.military.moderated

We also tend to have many "chicken-and-egg" problems.
	"We need a big computer."
	"We can design one for you.  Can you give us a sample code?"
	"No."
	...

New benchmarks might be best requested in various application fields.

	sci.aeronuatics (moderated)
	sci.geo.petroleum
	sci.electronics
	sci.physics.*
	sci.bio.*
	etc.

Be extremely mindful of the sensitive nature of collecting benchmarks.


Obit quips:
	MIP: Meaningless Indicators of Performance

	Parallel MFLOPS: The "Guaranteed not to exceed speed."


Where can I find machine time/access?
=====================================

Ask the owners of said machines.


What's a parallel computer?
===========================

	A bunch of expensive components.
Parallelism is not obvious.  If you think it is, I can sell you a bridge.
The terminology is abysmal.  Talk to me about Miya's exercise.
the problem is mostly (but not all) in the semantics.


Is parallel computing easier or harder than "normal, serial" programming?
=========================================================================

Ha.  Take your pick.  Jones says no harder.  Grit and many others say yes
harder.  It's subjective.  Jones equated programming to also mean
"systems programming."

In 1994, Don Knuth in a "Fire Side Chat" session at a Conference when asked,
(not me):
	"Will you write an "Art of Parallel Programming?"
replied:
	"No."
Knuth did not.

One group of comp.parallel people hold that parallel algorithm is an oxymoron:
that an algorithm is inherently serial by definition.


How can you scope out a supercomputing/parallel processing firm?
================================================================

Lack of software.
	What's your ratio of hardware to software people?
Lack of technical rather than marketing documentation.
	When will you have architecture and programming manuals?
Excessive claims about automatic parallelization.
	What languages are you targeting?


See Also: What's holding back parallel computer development?
	  ==================================================


	"I do not know what the language of the year 2000 will look like
	but it will be called FORTRAN."
				--Attributed to many people including
				Dan McCracken, Seymour Cray, John Backus...

All the Perlis Epigrams on this language:

42. You can measure a programmer's perspective by noting his
attitude on the continuing vitality of FORTRAN.
			--Alan Perlis (Epigrams)

70. Over the centuries the Indians developed sign language
for communicating phenomena of interest.  Programmers from
different tribes (FORTRAN, LISP, ALGOL, SNOBOL, etc.) could
use one that doesn't require them to carry a blackboard on
their ponies.
			--Alan Perlis (Epigrams)

85. Though the Chinese should adore APL, it's FORTRAN they
put their money on.
			--Alan Perlis (Epigrams)

See also #68 and #9.


FORTRAN | C | C++ | etc.
------------------------
Why don't you guys grow up and use real languages?
==================================================

The best way to answer this question first is to determine what languages
the questioner is asking (sometimes called 'language bigots').
What's a 'real' langauge?
This is a topic guaranteed to get yawns from the experienced folk,
you will only argue with newbies.

In two words, many of the existing application programs are:
	"Dusty decks."
You remember what a 'card deck' was right?  These programs are non-trivial:
thousands and sometimes millions of lines of code whose authors have sometimes
retired and not kept on retainer.

A missing key concept is "conversion."  Users don't want to convert their
programs (rewrite, etc.) to use other languages.

Incentives.

See also: Statement: Supercomputers are too important to run
        interactive operating systems,
        text editors, etc.


Don't language Converters like f2c help?
----------------------------------------

No.

Problems fall into several categories:
        1) Implementation specific features:
                you have a software architecture to take advantage certain
                hardware specific features (doesn't have to be vectors,
                it could be I/O for instance).  A delicate tradeoff
                exists between using said features vs. not using them
                for reasons of things like portability and long-term
                program life.  E.g., Control Data Q8xxxxxx based
                subprogram calls while having proper legal FORTRAN syntax,
                involved calls to hardware and software which didn't
                exist on other systems.  Some of these calls could be
                replaced with non-vector code, but why?  You impulse purchased
                the machine for its speed to solve immediate problems.
        2) Some language features don't have precisely matching/
                corresponding semantics.  E.g., dynamic vs. static memory use.
        3). Etc.
These little "gotchas" are very annoying and frequently compound to
serious labor.


What's wrong with FORTRAN?  What are it's problems for parallel computing?
--------------------------------------------------------------------------

The best non-language specific explanation of the parallel computing problem
was written in 1980 by Anita Jones on the Cm* Project

Paraphasing:
1) Lack of facilities to protect and insure the consistency of results.
	[Determinism and consistency.]
2) Lack of adequate communication facilities.
	[What's wrong with READ and WRITE?]
3) Lack of synchronization (explicit or implicit) facilities.
	[Locks, barriers, and all those things.]
4) Exception handling (miscellaneous things).

Her citation of problems were: consistency, deadlock, and starvation.

FORTRAN's (from 1966 to current) problems:
	Side effects (mixed blessing: re: random numbers)
	GOTOs (the classic software engineering reason)
	Relatively rigid poor data structures
	Relatively static run time environment semantics

68. If we believe in data structures, we must believe in
independent (hence simultaneous) processing.  For why else
would we collect items within a structure?  Why do we
tolerate languages that give us the one without the other?
			--Alan Perlis (Epigrams)

9. It is better to have 100 functions operate on one data
structure than 10 functions on 10 data structures.
			--Alan Perlis (Epigrams)


A few people (Don Knuth included) would argue that the definition of an
algorithm contradicts certain aspects regarding parallelism.  Fine.
We can speak parallel (replicated) data structures, but the problem of
programming languages and architectures covers more than education and math.
 
Programming language types (people) tend to either develop specialized
languages for parallelism or their tend to add operating system features.
The issue is assuming determinism and consistency during a computation.
If you don't mind the odd inconsistent error, then you are lucky.
Such a person must clearly write perfect code every time.  The rest of
us must debug.
 
"Drop in" parallel speed-up is the Holy Grail of high performance computing.
The Holy Grail of programming and software engineering has been
"automatic programming."  If you believe we have either, then I have a
big bridge to sell you.
 
Attempts to write parallel languages fall into two categories:
completely new languages: with new semantics in some case
        e.g., APL, VAL, ID, SISAL, etc.
add ons to old languages: with new semantics and hacked on syntax.
        The latter fall into two types:
                OS like constructs like semaphores, monitors, etc.
                        which tend not to scale.  ("Oh, yeah you want
                        concurrency, well, let me help you with these....")
                        Starting with Concurrent Pascal, Modula, etc.
                Constructs for message passing or barriers thought up
                        by numerical analysts (actually these are two
                        vastly different subtypes (oversimplified)).
                        Starting with "meta-adjective" FORTRAN.
 
Compilers and architectures ARE an issue (can be different):
One issue is programmability or ease of programming:
Two camps:
        parallel programming is no different than any other programming.
        [Jones is an early ref.]
and
        Bull shit! It's at least comparable in difficulty to
        "systems" programming.
        [Grit and McGraw is an early ref.]
 
 
Take a look at the use of the full-empty bit on Denelcor HEP memory
(and soon Tera).  This stuff is weird if you have never encountered it.
I'm going to use this as one example feature, but keep in mind that
other features exist.  You can find "trenches" war stories (mine fields for
Tera to avoid [they know it]).  Why?  Because the programmers are very
confident they (we) know what they (we) are doing.  BUZZT!
We (I mean Murphy) screw up.
 
The difficulty comes (side effects) when you deal with global storage
(to varying degrees if you have ever seen TASK COMMON).  You have
difficulty tracing the scope.  Architecture issues.
I like to see serial codes which have dead-lock and other problems.
I think we should collect examples (including error messages) put them
on display as warnings (tell that to the govt. ha!).
 
The use of atomic full-empty bits might be the parity bits of the future
(noting that the early supercomputers didn't have parity).
How consistent do you like your data?  Debug any lately?
 
Don't get fooled that message passing is any safer.
See the Latest Word on Message Passing.
You can get just as confused.
 
Ideally, the programmer would LOVE to have all this stuff hidden.
I wonder when that will happen?
 
What makes us think that as we scale up processors, that we won't make
changes in our memory systems?  Probably because von Neumann memories
are so easily made.
 
Communication: moving data around consistently is tougher than most
people give credit, and it's not parallelism.  Floating point gets
too much attention.
 
Solutions (heuristic): education: I think we need to make emulations for
older designed machines like the HEP available (public domain for schools).
The problem is that I don't trust some of those emulators,
because I think we really need to run them on parallel machines,
and many are proprietary and written for sequential machines.
The schools potentiall have a complex porting job.
I fear that old emulations have timing gotchas which never
got updated as the designs moved into hardware.
 
Just as PC software got hacked, some of these emulators could use some hacking.
 
Another thing I thought was cool: free compilers.  Tim Bubb made his APL
compiler available.  I'm not necessarily a fan of APL, but I moved it
over to Convex for people to play with during a try back East.  I doubt
people (Steve: [don't fall off those holds with Dan belaying]) had time to
bring APL up on the Convex and get vector code generation working.
The information learned from that kind of experience needs to get fed
back to compiler writers.  That's not happening now.
 
Patrick and I spent an amusing Dec. evening standing outside the Convex HQ
pondering what it might be like raising a generation of APL (or parallel
language) hacker kids:
        either they will be very good, or
        they will be confused as hell.
 
 
For a while the ASPLOS conferences were pretty great conferences.
(Arch. support for PLs and OSes.).  Have not been to one lately.
 
Theory alone won't cut it.  You want a Wozniak.  This is debateable, of course.
He needs the right tools.  They don't exist now.  Maybe.
 
 
Math won't be enough.
To quote Knuth: mathematicians don't understand the cost of operations.
(AMM)
 
Perlis had that too (for the LISP guys, not directly for parallelism, in
the FAQ).
 
Beware the Turing tarpit.
 
 
Compilers reflect the architecture, and there is some influence on
architecture by compilers, but that vicious circle doesn't have enough
components to be worth considering.  Blind men attempting to describe
an elephant.


>   The computer languages field seems to have no Dawkins, no Gould, no
>   popularizers and not even any very good text books. Every one of the
>   books I have tried has come across as boring, poorly structured, making
>   no obvious attempt to justify things and completely unwilling to stand by
>   itself.
 
That's largely because Dawkins and Gould are making observations.
They are not attempting to construct things which have never existed before.
I say that after reading Gould's text book (O&P, very good)
not his popular books (like Mismeasure of Man) which are enjoyable.
 
Wirth and Ishbiah (sp) are pretty bright guys, but they are not above
making mistakes (Niklaus himself wrote a letter/article when he forgot
an important piece of syntax in Pascal (catch-all exception to the multiway
branch: aka as "OTHERWISE" in some compilers, "ELSE" in other compilers,
and a slew of other keywords having identical semnatics).  It was almost
impossible to get this simple (ha!) fix added to the language.
Ritchie is pretty bright, too.
 
I recommend The History of Programming Languages II (HOPL-II) published
by ACM Press/Addison-Wesley.  I can tell you there are no Silicon Valley
copies at Comp Lit Bookshop as I cleaned them all out for friends (you
might find a copy at Stacey's in Palo Alto, the Sunnyvale library has a
copy (I was impressed).
 
Backus is also bright.  Bill Wulf in conversation to me suggested that
the Griswolds are also bright.  Oh, a LISP cross post: I occasionally
see John McCarthy at Stanford and Printers Inc.  John is also quite bright.
I signed his petition against Computing the Future.
All bright guys, and they all learned (made mistakes along the way).
 
The brightest most inspired language designers I can think of might be
Alan Kay and Adele Goldberg and their world on Smalltalk-80.  If you are
using a windowing system, you are most likely using a system inspired by
them.  A very impressive chapter in HOPL-II about them (see the
paragraphs refering to "management").
 
You merely need a decent library or a 6 year IEEE member to get the gist.
Two articles stand out (one comes from the MIT AI Lab [and Stanford]).
The two articles stand as an interesting contrast: one is a perfect example
of the problems cited by the other:
 
The order which you read these articles might highly influence your
perception, so I will cite them in page order.  Fair enough?
        [The annotations are NOT all mine (collected over time).
        In particular see the last sentence of the first annotation
        to the first article.]
 
%A Cherri M. Pancake
%A Donna Bergmark
%T Do Parallel Languages Respond to the Needs of Scientific Programmers?
%J Computer
%I IEEE
%V 23
%N 12
%D December 1990
%P 13-23
%K fortran, shared memory, concurrency,
%X This article is a must read about the problems of designing, programming,
and "marketing" parallel programming languages.
It does not present definitive solutions but is a descriptive
"state-of-the-art" survey of the semantic problem.  The paper reads like
the "war of the sexes."  Computer scientist versus computational scientist,
some subtle topics (like shared memory models) are mentioned.  An
excellent table summarizes the article, but I think there is one format error.
[e.g. of barriers versus subroutines.]
It is ironically followed for an article by computer scientists typifying
the author's thesis.
%X Points out the hierarchical model of "model-making (4-level)
very similar to Rodrigue's (LLNL) parallelism model (real world ->
math theory -> numerical algorithms -> code).
%X Table 1:
Category        For scientific researcher       For computer scientist
*
Convenience
                Fortran 77 syntax               Structured syntax and abstract
                                                  data types
                Minimal number of new           Extensible constructs
                  constructs to learn
                Structures that provide         Less need for fine-grain
                 low-overhead parallelism        parallelism
Reliability
                Minimal number of changes to    Changes that provide
                  familiar constructs             clarification
                No conflict with Fortran models Support for nested scoping
                  of data storage and use         and packages
                Provision of deterministic      Provision of non-deterministic
                  high-level constructs           high-level constructs
                  (like critical sections,        (like parallel sections,
                   barriers)                       subroutine invocations)
                Syntax that clearly             Syntax distinctions less
                 distinguishes parallel from     critical
                  serial constructs
Expressiveness
                Conceptual models that support  Conceptual models adaptable to
                  common scientific programming   wide range of programming
                  strategies                      strategies
                High-level features for         High-level features for
                  distributing data across        distributing work across
                  processors                      processors
                Parallel operators for array/   Parallel operators for abstract
                  vector operands                 data types
                Operators for regular patterns  Operators for irregular
                  of process interaction          patterns of process
                                                  interaction
Compatibility
                Portability across range of     Vendor specificity or
                  vendors, product lines          portability to related
                                                 machine models
                Conversion/upgrading of         Conversion less important
                  existing Fortran code           (formal maintenance
                                                   procedures available)
               Reasonable efficiency on most   Tailorability to a variety of
                  machine models                  machine models
                Interfacing with visualization  Minimal visualization
                  support routines                support
                Compatibility with parallel     Little need for "canned"
                  subroutine libraries            routines
 
%A Andrew Berlin
%A Daniel Weise
%T Compiling Scientific Code Using Partial Evaluation
%J Computer
%I IEEE
%V 23
%N 12
%D December 1990
%P 25-37
%r AIM 1145
%i MIT
%d July 1989
%O pages 21 $3.25
%Z Computer Systems Lab, Stanford, University, Stanford, CA
%d March 1990
%O 31 pages......$5.20
%K partial evaluation, scientific computation, parallel architectures,
parallelizing compilers,
%K scheme, LISP,
%X Scientists are faced with a dilemma: Either they can write abstract
programs that express their understanding of a problem, but which do
not execute efficiently; or they can write programs that computers can
execute efficiently, but which are difficult to write and difficult to
understand.  We have developed a compiler that uses partial evaluation
and scheduling techniques to provide a solution to this dilemma.
%X Partial evaluation converts a high-level program into a low-level program
that is specialized for a particular application. We describe a compiler that
uses partial evaluation to dramatically speed up programs. We have measured
speedups over conventionally compiled code that range from seven times faster
to ninety one times faster. Further experiments have also shown that by
eliminating inherently sequential data structure references and their
associated conditional branches, partial evaluation exposes the
low-level parallelism inherent in a computation. By coupling partial evaluation
with parallel scheduling techniques, this parallelism can be exploited for
use on heavily pipelined or parallel architectures. We have demonstrated this
approach by applying a parallel scheduler to a partially evaluated
program that simulates the motion of a nine body solar system.
 

The Latest Word on Message Passing
----------------------------------

OpenMP ---  http://www.openmp.org


Why does the computer science community insist upon writing these
=================================================================
esoteric papers on theory and which no one uses anyway?
=======================================================
Why don't the computer engineers just throw the chips together?
===============================================================

It's the communications, stupid!

CREW, EREW, etc.etc.


Over the years, many pieces of email were exchanged in private complaining
about the parallel processing community from applications people.  
Specifically topics which appear to irk applications people, discussion:

	Operating systems.
	New programming languages.
	Multistage interconnection networks.
	Load balancing.

This is a short list, I know that I will remember other topics
and other people will remind me (anonymously).
Of course the applications would like "drop-in" automatic parallelization
(it will come after we have drop in "automatic programming."
A.k.a.: anything to get my program to run faster crowd.
	Short of added cost.

One noted early paper paraphrased:
	If a cook can parallel process, why can't computer people?


Boy, you guys sure argue a lot.
===============================

It's academic.
The bark so far is worse than the bite.
The name calling can be found in various parts of the literature
(e.g., "Polo players of science...").

Many misunderstandings have evolved:

``An exciting thing was happening at Livermore.  They were building a
supercomputer [the LLNL/USN S-1], and I will certainly confess to being
a cycle junkie.
Computers are never big enough or fast enough.  I have no patience at
all with these damned PC's.  What I didn't realize when I went over to
Livermore was that as long as physicists are running the show you're
never going to get any software.  And if you don't get any software,
you're never going to get anywhere.  Physicists have the most abysmal
taste in programming environments.  It's the software equivalent of a
junk-strewn lab with plug boards, bare wires and alligator clips.
They also seem to think that computers (and programmers for that
matter) are the sorts of things to which you submit punched card decks
like you did in the mid-sixties.''

--- Bill Gosper, in ``More Mathematical People: Contemporary Conversations''
               (Donald J. Albers et. al., Eds.; Harcourt Brace Jovanovich)
	[Gosper is a well-known LISPer.]

Computing the future: a broader agenda for computer science and engineering /
Juris Hartmanis and Herbert Lin, editors ;
Committee to Assess the Scope and Direction of Computer Science
and Technology, Computer Science and Telecommunications Board,
Commission on Physical Sciences, Mathematics, and Applications,
National Research Council. Washington, D.C. : National Academy Press, 1992

Petition by John McCarthy, John Backus, Don Knuth, Marvin Minsky,
Bob Boyer, Barbara Grosz, Jack Minker, and Nils Nilsson rebutting
"Computing the future," 1992.
	http://www-formal.stanford.edu/jmc/petition.html

The FAQ maintainer is one of the signatories of this petition and one of
the few people apparently to read the Report.  The Report has a good set of
references further citing problems.

Some of these problems go away when checkbooks are brought out.

Don't let anyone tell you there isn't a gulf between CS and other activities,
is some cases it better than this, and other cases it's worse.


Perhaps here's another FAQ? ;)
Q: How do experts debug parallel code?
======================================
A1: Debug? Just don't make any programming mistakes.
A2: Use print statements. 

Debugging
Yep, we discuss it.  No good ways.
People who say they have good ways have heavy context.

What does the FAQ recommend?

Long timers point to A2.  I know people who pride themselves on A1,
they really believe they do it.
One reader so far recommends Fox's textbooks (1994 in particular).

This section needs more thrashing on what little time which I don't have.
You can gain a person's perspective on how advanced they believe
debugging technology is.


What's holding back parallel computer development?
==================================================

Software, in one word.

Don't hold your breath for "automatic parallelization."
[Yes, we have a tendency to butcher language in this field.
I have a Grail to sell you.]

See my buzzword: "computational break-even" (like break-even in
controlled fusion research).  We have yet to reach computational break-even.

I think I first saw, but did not realize, the nature of the problem
when I saw the ILLIAC IV hardware being removed when I arrived here.
If you ever see a photo of this old machine, you should realize that
the ribbon cable (the first ever developed) connecting the control unit
(typically on the left side of the picture)
to the most physically distant processors had a length IDENTICAL to
the nearest processors (big circular loops of wire).  This hardware makes a
difference in software which only the naive (like me at the time)
don't appreciate (those concentrating on the hardware have less appreciation
for software synchronization).  I do have photos which I took for Alan Huang
of similar Cyber 205 coils of wire.  This problem exists today on even the
simplest SIMD or MIMD systems.  Parallelism does not imply (you cannot assume)
synchronous systems.

Another very bad implicit assumption exists that software tools developed
on sequential machines are useful stepping stones for parallel machines
This is not the case.
Debuggers are good examples.  Debuggers are bad (they well-intended,
but they tell you what you ask of them, not what you want to know,
a reasonable set of literature exists on debugger problems),
and debuggers also tend to be non-standard, dependent on system specific
features (you will hear the phrase "PRINT statements" a lot, but that assumes
I/O synchronization).

The typical comparison that programming parallel machines is like
taking a step back to the assembly language era.  This is simultaneously
(this is the parallelism group after all) true and false
(get used to paradox).  

Current problems are compounded by
	coding in lower-level, non-standard languages (e.g., assembly)
	different OSes, libraries, calls, tools, etc.

I will assert that one of the reason why microcomputers (serial machines)
and interaction got ahead of batch oriented systems was the result of
interactive tools like interpreters.  I think they were easily hacked
with improvements (gross generalization) and have not had the applications
pressure of commercial and some research systems.


When will parallel computing get easy?
======================================

God knows.

A real story:
In 1986 or so, I was introduced to a 16 year old son of two computer science
professors.  The kid was clearly bright.  He planned to construct a
"two-processor parallel processor machine" using MC68000s which his parents
would purchase.  Those were his words, not "dual-processor."  This is a
great idea, and listening to his plans for shared registers, locks, etc.
I can't remember if the machine was going to be a master-slave or a symmetric
multiprocessor, but the kid had thought about this.

On the issue of programming and software , the kid said,
"I will leave it to the programmer to keep things straight."
Typical hardware designer.

My immediate concluding comment was:
"Have your mother tell you about operating systems."

I suspect the future might lie with the students of Seymour Pappert,
the Lego(tm) Chair of Computing at MIT.  Except that the kids of that group
put together Lego Blocks(tm) like scientists need to assemble their future
Such Lego computing must be: 1) easily assembled (like plugging in
wall sockets), 2) have software which will "work" in a plug and play fashion.
3) have ideas I have not even begun to cover.  

But before we have Lego computing, I suspect we will have high school students
who do parallel computing projects, and I've not seen many of those yet.
It's just too hard, too resource intensive.  In my generation it was to build
a two-inch long ruby laser; a later generation made warm superconductors,
but we have yet to see parallel machines.

See also SuperQuest.


What are the especially notable schools to study parallelism and
================================================================
supercomputing?
===============

Concerning the parlib files, I think a lot of this is
way out of date. Especially the list of schools 
supporting parallel processing work. I think a better
way to present this information would be in a web page
(perhaps extracted from my page) which gives a pointer
to parallel groups at the various schools.

Those schools associated with supercomputing centers
U. Ill./NCSA
CMU/PSC
UMn/MSC
UCSD/SDSU/SDSC
U. Tsukba

Others
Cornell
SUNY Syracuse
OGI

Rice, Stanford, MIT, Berkeley, UWashington, UMaryland, and Harvard


My, my, this group seems to be dominated by FORTRAN PDE solvers.
================================================================
What about LISP (and Prolog, etc.) machines?
============================================

LISP people post in this group on occasion to ask about LISP supercomputers.
>From the late 60s, the LISP people have been derided.  It's just a sort of
fact of life in this field.  We did have a few discussions about the porting
of Common LISP on the Cray when it happened (world's fastest LISP) and
the initial use of *LISP on the CM.  But that's about it.  Nothing
will stop you from posting on the topic except yourself (ignore the
LISP hecklers).

See also:

	comp.lang.lisp.franz
	comp.lang.lisp.mcl
	comp.lang.lisp.x
	comp.org.lisp-users
	comp.std.lisp
	de.comp.lang.lisp
	comp.ai.alife
	comp.ai.edu
	comp.ai.fuzzy
	comp.ai.genetic
	comp.ai.jair.announce
	comp.ai.nat-lang
	comp.ai.neural-nets
	comp.ai.nlang-know-rep
	comp.ai.philosophy
	comp.ai.shells
	comp.ai.vision
	comp.ai.games
	comp.ai.jair.papers

Search for discussions on "AI winter."


NOTE: THERE IS NOTHING WRONG WITH CROSS-POSTING.
(This differs from a multi-post to separate news groups.)
JUST DO IT INTELLIGENTLY.  REMEMBER TO SET Followup-To: WITH A NEWS GROUP
OR A SMALLER NUMBER OF RELEVANT CROSS-POSTED NEWS GROUPS.
Or SET TO "poster".    ^^^^^^^^


What about fault-tolerant parallel systems?
===========================================

	What is the MTBF of 2000 PCs?

Several well known parallel computing books have pointed out
the need for fault-tolerant design from the start.  Parallel machines
are not inherently more reliable than sequential machines.  They require
that reliability be designed into systems from conception.
If you want to start a joke, use the words: "Graceful degradation."

Yes, we know and the group has discussed Seymour Cray's
"Parity is for farmers" quote.
And the group notes that the Cray-1 (and all subsequent
Cray architectures) have parity.  This is old hat.

"Even farmers buy computers"

This may be apocryphal.
[**What's the joke?
It's kind of a midwest 70's type of thing.
 
In the 1970's, the farm support price support programs were known as parity (to
give the farmer parity between the selling price and the cost of producing the
food? -- in any case, it was the government paying farmers). **]

Additional context:
It has been pointed out that, when the 6600 was designed,
parity error detection might well have caused more problems than it solved
(due to the relative reliability of 1963 vintage logic circuits and
core memory).


But what about things like the Tandem?  Like LISP, the question gets asked
here, but it lacks critical mass discussion.  Yes, Tandem is on the net
[I had a climbing partner who used to work there],
there's a mail-list and a news group for them.
 Unfortuately, our news group biases stay with us.

You are welcome to discuss/raise the topic here, but no guarantees exist
that a discussion will spark (like wet wood).


BASIC IDEA:
Computers are used as a means to solution of problems for one primary and
one secondary reason (and numerous teritary reasons):
	1) Performance: computers can do something faster than some other
		tools (analog, general purpose vs. special purposes, etc.).
		If something can be done faster w/o a computer, that
		thing will likely dominate.
		This is the analytic justification which covers areas
		such as the non-simulation, grand challenge problems,
		cryptography, etc.
	2) With the emergence of computers, the flexibility of software
		has proven computers quite useful independent of performance.
		This is the "What if?" justification which is the reason
		for simulation, VR, etc.

Traffic influences what gets posted in the group.  Interest influences traffic.

See also:
	comp.os.research
	comp.risks

Naw, don't let me bias you, post away in this group.


What about Japanese supers?
===========================

What about the Japanese?  Yes, they are here.  They read this group,
and they redistribute postings internally as they see fit.
Oh yes, there is also
	fj.comp.parallel in the Other news.groups category, but you
need to know how to display and read 16-bit Kanji.

One special reference to this group is Dr. David Kahaner.
You will hear about the Kahaner Reports.  David is a numerical analyst
by training.  Formerly with LANL, NBS/NIST and under funding of ONR,
he did a stint reporting electronically on technology events happening
in Japan from Tokyo for many years and then became popular on the conference
key note speaking circuit.  These became particularly popular in the late 80s
by "people in the know."
This group found DKK reports propagating to a now defunct magazine
because of a typo in his name (this is called traffic analysis).

David actually asked for help in the news group: soc.culture.japan,
proof that the soc.* groups are useful. [The net was smaller at that time.]
David does not appreciate being quoted out of context (i.e. as evidence
that American academia and industry are keeping up with developments
in other parts of the world as evidenced once in comp.os.research;
he does believe the US does not keep track of world developments).

To this end:
David has now left the wing of the ONR and has set up the
ATIP: Asian Technology Information Program.
This highly useful and deserving program can also use other sources of
funding.  See also: comp.research.japan.
If you quote David's work, please cite him.
David and the ATIP appreciate large monetary contributions.

And yes, people have posted racist statements in comp.sys.super.
The three letter shortening of "Japan" is appreciated by no one significant
in this group.  Save yourself the trouble: don't bother pissing me off.
Don't use it.  Don't encourage others to use it.

Ref:

%A Raul Mendez, ed.
%T High Performance Computing: Research and Practice in Japan
%I Wiley
%C West Sussex, England
%D 1992
%K book, text,
%O ISBN 0-471-92867-4

And many others


Where are the Killer Micros?
----------------------------
Where are the Networks of Workstations (NOW)?
=============================================

See the separate PVM and MPI news groups.
I need to flesh this out with Hugh LaMaster.


What are the Killer Micros?
===========================

Commodity micro processors.  Usually cited in content of loosely coupled
networked workstations or assembled into massively parallel
The term was quoted and first appeared at Supercomputing'89 in Reno, NX
by Eugene Brooks, III (Caltech and LLNL).  Hey, I got my killer micros
tee-shirt.


	Supercomputing people can identify the real orientation of
	a computer person by asking:
		How many bits are in "single-precision?"
	And they should know why.
	Answers tend to fall in three facial expressions.
	Exponent bits are also important.
	Now, what a machine does during arithmetic is a different issue.

	"Our new computer is *going* to be as fast as 'insert fastest
	existing CRI CPU...'"  -- Kiss of death (See the Xerox ad on the same
	theme.)


What's this I hear about 'super-linear speed up?'
=================================================

Q: What is superlinear speedup?

This is when the time to run a code on a "P" processor machines takes
less than 1/P of the time to run on a single processor. This is a
fallacy. Typically, this phenomenon is attributed to caching.
("P" caches are available in parallel executation, whereas only a single
cache is utilized for the uniprocessor execution.)


I think the issue of cache usage enhancement, which is the probably the most
|frequent reason for seeing "superlinear speedup" (and why this isn't really
|superlinear) is worth mentioning.  It brings up two of _my_ favorite points in
|high performance computing:
 
|1) memory bandwidth is and always has been a critical factor in supercomputing
 
|2) Dave Bailey's "How to fool the masses..." should be read and understood by
|   everyone in the business.

A division seems to be forming around those doing AI/genetic search algorithms
and those solving say "dense systems of linear equations."  It is essential to
know what camp you are reading/listening.


The parallel processing community most concern and impressed with speed up
as a metric is the parallel AI comunity.  They tend to be very impressed
with parallel search running real fast.  They are the most academic.
The most jaded members tend to be the numeric folk with big problems.
IDENTIFY YOUR COMMUNITY CLEARLY (who you are talking with).


Q: What is speedup?

Speedup is a function of "P", the number of processors, and is the
time to execute on a single processor divided by the time to execute
the same code on P processors. 

Q: What is efficiency?

This is the parallel algorithm's efficiency, defined as the best known
time to solve the problem sequentially, divided by P times the
parallel execution time on P processors.

---------------------------------------------------------
Everyone should read:

%A David H. Bailey
%T Twelve Ways to Fool the Masses When Giving 
Performance Results on Parallel Supercomputers
%J Supercomputing Review
%V 4
%N 8
%D August 1991
%r RNR Technical Report RNR-91-020
%d November 1991
%P 54-55
%K Performance Evaluation and Modelling,
%O http://www.nas.nasa.gov/NAS/TechReports/RNRreports/dbailey/RNR-91-020/RNR-91-020.html
%X A brief jeremiad on the abuse of performance statistics.
The author describes a dozen ways to mislead an audience about the
performance of one's machine, and illustrates these techniques with
examples drawn from the literature. Afraid of losing that all-important sale?
Quote your own single-precision, assembly-level megafloppage; it's sure to
look good against your competitor's double-precision, compiled-FORTRAN figures.
%X Abstract:
Many of us in the field of highly parallel scientific computing
recognize that it is often quite difficult to match the run time
performance of the best conventional supercomputers. This humorous
article outlines twelve ways commonly used in scientific papers and
presentations to artificially boost performance rates and to present
these results in the ``best possible light'' compared to other systems.
%X Summary:
1. Quote 32-bit performance results, not 64-bit results, or compare
your 32-bit results with others' 64-bit results. **
2. Present inner kernel performance figures as the performance of
the entire application.
3. Quietly employ assembly code and other low-level language
constructs, or compare your assembly-coded results with others'
Fortran or C implementations.
4. Scale up the problem size with the number of processors, but don't
clearly disclose this fact.
5. Quote performance results linearly projected to a full system. **
6. Compare your results against scalar, unoptimized code on Crays.
7. Compare with an old code on an obsolete system.
8. Base MFLOPS operation counts on the parallel implementation
instead of on the best sequential implementation.
9. Quote performance in terms of processor utilization, parallel
speedups or MFLOPS per dollar (peak MFLOPS, not sustained). **
10. Mutilate the algorithm used in the parallel implementation to
match the architecture.  In other words, employ algorithms that are
numerically inefficient, compared to the best known serial or vector
algorithms for this application, in order to exhibit artificially high
MFLOPS rates.
11. Measure parallel run times on a dedicated system, but
measure conventional run times in a busy environment.
12. If all else fails, show pretty pictures and animated videos,
and don't talk about performance.

See also a similarly humorous paper by Globus on scientific visualization
inspired by the 12).


Statement: Supercomputers are too important to run
==================================================
	interactive operating systems,
	==============================
	text editors, etc.
	=============
	
Troll.
You've lost.
This is a long time debate which dated from the 60s and ended in the late 80s
(and will undoubtly come back in the future) with ETA/CDC.

The usual answers are: the human or scientist/analyst/programmer time is
more valuable than the machine time.

You haven't seen some of the files I have had to text edit on the Cray.
Interactive text editor on huge files on supercomputers isn't just fun,
it's productive.

You are worried about text searching?  Then grab the spies and tell them to
stop buying supercomputers.


Where's the list of vendors?
============================

The list of vendors is intentionally spread across (distributed, decomposed,
partitioned) across the FAQ.

We will try to have email, name, and smail address contacts.

The PARALLEL Processing Connection - What Is It?
 
The PARALLEL Processing Connection is an entrepreneurial 
association; parallel@netcom.com (B. Mitchell Loebel).


Dead Computer Society
.....................

Burroughs
	ILLIAC IV
		1 64-processor quadrant of a planned 4-quad, 256-PE machine.
		Numerous reference papers and a retrospective book were
		produced.
		
		The ILLIAC IV was placed at NASA Ames (civilian aeronautics
		agency) during the height of the Vietnam War to avoid
		possible damage during on-college campus student riots
		(computer bowl trivia).

	BSP
		One made, never delivered

Texas Instruments (TI) Advanced Scientific Computer (ASC)
	Big long vector CPUs.  64-bit (or near) machine?
         (7 systems built - used internally= systems used
          for Seismic Data Processing by GSI (Geophysical
          Services Inc), a division of T.I
 
       ASC 1   prototype        cpus=1
       ASC 1A  used internally  cpus=1
       ASC 2   used internally  cpus=1
       ASC 3   Army Ballistic Missile Research Centre, Huntsville Alabama
	(cpus=2 ?) 
       ASC 4   GFDL/NOAA Princeton   cpus=4 
       ASC 5   used internally cpus=1
       ASC 6   used internally cpus=1
       ASC 7   Naval Research Laboratory (NRL) Washington DC (cpus=2?) (1976)  

ASC 3, later moved to:?
CEWES (Corps. of Engineere Waterways
Experiment Station, Vicksburg MS USA) isn't among them.
In the new computer center building, there's a series of photos
which portray the History of the Center.  This includes a photo
whose caption says "TI ASC, January 79 to November 80".  It doesn't
give a serial number, but the dates are about right.
MVS like OS?


Evans and Sutherland ES-1.

Evans & Sutherland Computer Division (ESCD) shipped two beta parallel
supers in 1989 before taking the bullet from the mother company.
Nine made, THREE BETA?

Why was E&S significant?

Evans & Sutherland is one of the foremost non-mainstream computer companies.
Most number crunchers have never heard of it, because they are best known
for state-of-the-art computer graphics systems like those used in real time
flight simulators (training and research).  The feeling was that if any
company had the experience to challenge Seymour Cray and give him a run
for his money it was going to be ESCD.  It has been said of E&S, the parent
company that more real in-hardware matrix multiplies have been done on
installed E&S machines than on most other high-performance computers.
Based in Salt Lake City, Utah.  ESCD was in Mountain View, CA.
E&S has been somewhat slow to get into the workstation market.
Ivan Sutherland (no longer with the company now at SUN) is cited as
the father of modern computer graphics and was also at ARPA helping
to start the ARPAnet, perhaps one of the ten or so unknown unsung scientists
in computing.

 
The most recent dead body is ACRI, the heavily subsidised French
supercomputer startup that was shut down this Spring after spending
$100M+ over five years.  A partial prototype was running at the end.


Section on CDC/ETA moved to panel 26.


Ametek
	Hypercube && Grids (meshes)?

Culler Scientific
	[Miya is a child of the Culler-Fried System.]
	Pre-ELI/VLIW

Multiflow
	7/128, 8/256
	Superscalar, ELI, VLIW,
	enormously long instruction (word)
	very long instruction word

Myrias
	Canadian multiprocessor firm
	Started with 68K based scalar processors and proprietary back end
	processors.
	Portions of the software part of the company will still exist.


SEL (Systems Engineering Laboratories)
(purchased by Gould Electronics; became Gould Computer Systems Division).
 
Gould Computer Systems Division (NP-1 was its first minisupercomputer).  Much
of Gould was purchased by a Japanese mining concern.  The Computer Systems
Division was sold to Encore.
 
[I know about this because a friend of mine used to be Manager of Latin American
Sales at Gould CSD's Corporate HQ in Fort Lauderdale, FL.  One NP-1 was sold
in 1987 to the Mexican Institute of Petroleum.  The CSD sale to Encore happened
not long after that.]

BBN Computer
	GP1000 (Butterfly), TC2000
	Voice funnel, signal processing,

American Supercomputer
	Mike Flynn (SU)

Prisma
	GaAs, Colorado Springs, CO

Vitesse
	GaAs, a Division of Rockwell

Supercomputer Systems, Inc. (Eau Claire, Wisc.) SSI (1)
	Steve Chen
	SS-1: Photos of a 2 CPU unit with Chen in a Faraday cage in Fortune.
	See Chen Systems and Sequent (S. Chen, CTO).

Supercomputer Systems, Inc. (San Diego)
	Very little is known about this firm.
	I know one person who worked for them.

Goodyear/Loral Aerospace
	See also the STARAN associative SIMD processor in the literature
		A version of the STARAN is used in the AWACS.
	MPP (always rumors of a second to be delivered to the NSA)
	decommissioned.  The first MPP machine.  No floating point.
	VAX/VMS based front-end.  Cost $10M.
	Bit-serial, 2 Kb memory per processing element?
	8 bit wide paths.
	Status: the one system was donated by the NASA GSFC to
	the Smithsonian.  It is not on display but in storage.
	The MPP was replaced with Maspar systems.

Inmos Transputers
	Popular processor (1982) in some circles.
	Well thought out communication scheme, but problems with scalability.
	Crippled because it's lacking US popularity.

|However, you must mention Transputers (something developed in EUROPE,
|outside of the U.S.A., name comes from TRANSistor and comPUTER) and the
|related companies:
|* INMOS (from GB), now bought by SGS Thompson (French), who was the
|inventor and sole manufacturer of transputers
|* Parsytec (still alive, but does not use Transputers any more, Germany)
|* Meiko (GB) produced the "computing surface"
|* IBM had an internal project (codenamed VICTOR)
|and there are many more. Transputers had a built in communication agent and
|it was very easy to connect them together to form large message passing
|machines in regular, fixed topologies. INMOS' idea was that they should be
|the "transistors" (building blocks) of the new parallel computers.


Cray Computer Corp. (CCC)
One of the nifty Cray 3's was the Cray3/SSS. It used some of the stuff done
along the "beltway" if you know what I mean. I have the docs on the 3 and
the 4. The following comes from some docs I got from the CCC team.
 
Cray-3
------
 Logic Circuits          GaAs SDFL
                         500 Gate-Equivalent
                         3.935 x 3.835 mm
 Memory Cricuits         Silicon CMOS SRAM, 4Meg x 1, 25ns Cycle Time
                         8 MWords per Module
 Modules                 4.1 x 4.1 x 0.25 Inches, 4 Modules per CPU
                         69 Electrical Layers, 22000 Z-Axis Connections
                         Twisted Pair Interconnect
 Cooling                 Chilled Water/Flourinert
 Cabinets                System Tank and C-Prod
 Typical Footprint       252 Square Feet
 
Cray-4
------
 Logic Circuits          GaAs DCFL, 5000 Gate-Equivalent, 5.4 x 5.4 mm
 Memory Circuits         21ns Cycle time, 16MWords per Module (SAA)
 Modules                 5.2 x 5.2 x 0.33 inces, 1 Module per CPU
                         90 Electrical Layers, 36000 Z-Axis Connections
                         Micro-Coaxial Interconnect
 Cooling                 SAA (Also Air Cooled)
 Cabinets                One Cabinet
 Typical Footprint       215 Square Feet
 
The Cray-3/SSS had PIM's. Nifty.

The thing is made of 5.2"x5.2"x.33" modules.  One for CPU and one per
16 MW of memory.  (This is 21 ns cycle time Toshiba 4Mx1 SRAM, the same
as in the Cray-3; they promise to go to 4Mx4 SRAM this year, doubling
memory size and bandwidth.) The ratio is 4 memory modules per CPU.
72-bit words with SECDED.
 
The CPU is solid GaAs.  The chips are shaved thin, placed in 4x4 arrays on
some sort of MCM-like board, and a 3x3 array of boards makes on logic
layer.  (The memory is different, as the memory chips aren't square.)
The module has 4 logic layers (stacked on the outside) and apparently
90 interconnect layers with 36,000 Z-axis vias.
 
The CPU runs at 1 GHz.  The GaAs chips are billed as "5000 gate-equivalent",
whatever that means.  In "DFCL" logic, a term I know not.
 
 
The modules are stacked against each other and fluorinert is
chilled and pumped up to drain through them by gravity.
 
The CPU and memory modules are bolted at one side to bus bars delivering
3 supply voltages + gnd.  On the other is a familiar-looking mat of grey
wires.  The wires are apparently actually miniature coax cables!  The
connectors are single pins on a fine (0.7 mm or so) pitch, with the male
ends being hollow gold-plated cylinders and the females being pins recessed
in plastic.  (I suppose this circularly symmetric arrangement is good for
controlled impedance or something.)
 
The end of the CPU module contains 4 rows of 4 sockets, each socket
containing 2 rows of 39 pins.  2/3 of the pins were signal; the others
were grouns in a S-G-S S-G-S configuration.  This provides for a total
of 832 signals from the CPU.  The memory modules leave one row of
sockets empty, leaving only 624 signals.
 
The sockets attached to plugs that tended to have 3 or 6 pins and 2
or 4 wires coming out the back ends.  It was all some sort of white plastic.
The plugs were pretty easy to insert and remove when I tried them.
Some cables also had plugs in the middle.
 
The basic unit of processing is a 4-processor "quartet" with 256 MW of
memory.  Each processor has a full-duplex 32-bit HiPPI channel.
 
The processor is billed as using the IEEE floating-point format but not, I
note, as doing IEEE math.

Now, on to the architecture.  All registers are 64 bits wide.  There are
three main sets of three buses, one result bus and two operand buses.
These are:
Vi, Vj, Vk: The vector result and operand buses (respectively: Vi is result)
Si, Sj, Sk: The scalar result and operand buses
Ai, Aj, Ak: The address result and operand buses
 
The memory part of the diagram confuses me.  There are clearly 2 bidirectional
memory ports.  Each has a "Retry Buffer".  Each is connected to 8 rows
of 4 "Memory Ranks for Each Port", the D unit (they are labelled A, B, C and D)
of each row is shown connected to one of the 8 "Octant"s.  Each "Octant"
contains 18 (eighteen) memory banks.
 
Instruction fetch (8 instruction buffers with 32 entries each) is
connected to port 0.  (The line is marked "Fetch Control".)  The HiPPI
and console interfaces are connected to port 1.  They are also shown
connected to memory (directly, not via one of the ports, which confuses
me), the vector registers, the scalar registers, the address registers,
the instruction buffers, the program counter, and some utility
registers:
- Exchange Address
- Limit Address
- Base Address
- Error register
- Status Register
 
Arithmetic bus Ai, Scalar bus Si and vector bus Vi all connect to a line
between the two memory ports that seems to mean "either".
 
There are three integer vector functional units: integer, logical, and shift.
These get inputs from Vj, Vj, Sj, Sk, Ak or the vector mask register
and deliver results to Vi or the vector mask register.
 
There are two floating point functional units, shared between vevtor and
scalar.  These take resuts from Vjk or Sjk, and deliver results to Vi, Si
or (apparently) the vector mask register.
 
There are three integer scalar functional units: integer, logical and shift.
These take input from Sj, Sk or Ak and deliver results to Si.
 
A 64-bit real time clock is readable on Si and writeable from Sk.
 
The vector length register is readable on Ai and writeable on Ak.
 
There is an address adder and a 35-bit address multiplier.  Inputs on
Aj and Ak and output on Ai.
 
The functional units are marked as supporting "chaining" and "tailgating",
which I don't understand and neglected to ask about the meaning of.
 
The program register is readable on Ai and writeable from Aj, as well as
being fed from the instruction buffers, the console/HiPPI interface and
a built-in incrementor.  (+1/3/5)
 
There are 8 vector registers of length 64, triple-ported on Vi (write), Vj
and Vk (read), 8 scalar aregisters (Si, Sj and Sk) and 8 address registers
(Ai, Aj and Ak), as well as 64 "temporary" registers (apparently an
addition since the Cray-3) that are read/write accessible on Vi, Si and Ai.
"Local memory has been eliminated and replaced by a new set of registers--
the temporary (T) registers"
 
There are also "up to" 64 semaphore flags.  The memory transfer rate
is billed as 2 GW/sec/processor.
 
 
Up to 32 processors may go into a single "node" in one cabinet.  A
"cluster" bus of 2 GB/sec (full-duplex, per node, <= 4 nodes) is
promised "mid-1995" for systems up to 128 processors.  There's some
mention on a features list of 64-bit HiPP.  ("Support for 200 Megabyte
per second, 64-bit HiPPI channels is planned for mid-1995."
 
Apparently a >4 processor system can be subdivided and operated as
a series of smaller systems.  Only even power-of-2 divisions are shown.
 
The console (one per node) is a MIPS processor running some flavour of
Unix.  It's claimed to be lots better than the Cray-3 console.  The
machine itself runs "Extended Unix" (CSOS) and they talk about F77
(with F90 extesions) and ANSI C compilers.  They say they don't plan
parallel C, although if someone makes a standard they'll implement it.
(Various noises about X, TCP/IP, Motif, and open systems whatnot deleted.)
 
The Cray-3 "Foreground processor" was eliminated.
 
The system fits all into one cabinet.  53"wx34"dx48"h for one quartet
and 77"wx43"dx48"h for up to 4 quartets.  It uses chilled water, although
there's something about "Also Air-Cooled Versions".
 
 
That's essentially all that was in the glossy.  There's a scary-looking
photo of a GaAs die-testing head.  The glossy was printed in November '94.
 
Apparently to test the system without boiling off oddles of fluorinert
they run it dry, pulsing it on for a few hundred ns and off for some
milliseconds (apparently the pulses are distinguishable to the ear,
implying about 10/sec) to run diagnostics.


	Cray-3, Cray-3/SSS [on the NSA Web page, BTW, I recommend a
personal visit to the National Cryptologic Museum (in person) to actually
play with a captured Nazi Engima machine: one of the main reasons why the world
has computers today], Cray-4 (and some Cray-2s).
The Cray-4 was a marked departure from the 1,2,3 line.
Dropped from 2,3 was the use of local memory, back to a large number of local
temporary registers (T).  IEEE floating point was adopted.

Chaining, which existed on the Cray-1 but not on the 2/3, was present,
along with "tailgating" (a feature of late-model Cray-2 and Cray-3 machines).
 
You really *ought* to mention GaAs and the assembly/cooling technology
somewhere. You should probably also note that the Cray-4 was the first
(and thus far the only) 1GHz machine - let us all hope it wasn't the last.
 
Those of us who have worked for both companies (some on more than one occaision)
cannot help but note that Cray Computer has actually outlasted Cray Research 
as an independent (in spite of bankruptcy and never actually *selling* a
machine) company - thus completing the transition from a manufacturer of
"Big Iron" to one of "Big Irony".
[I want credit for that one ;-)]  --Stephen O Gombosi
 

By request now here:
It's time to move CRI to "the dead computer society" - it's no more alive
than Supertek or FPS (except as a trademark):
Silicon Graphics acquired.
Cray Research, Inc. (CRI)
	1 800 BUG CRAY
	comp.unix.cray
	Cray-1
	Cray-2
	Cray X-MP
	Cray Y-MP
	Cray MP (C-90) (etc. J-90)
	Cray T3D, T3E, T90, SN, etc.
	CS6400  -- The CS6400 line was acquired by SUN Micro Systems.
	Cray Users Group (CUG)
		and unicos-l mailing list
		http://www.cug.org
		http://persephone.inel.gov/CUG/GSIC.html
		http://wwwCUG.uni-stuttgart.de/CUG/cug.html

		If a mailinglist for example is called "unicos-l@cug.org", then the -request
address can be inferred from this to be: "unicos-l-request@cug.org".
 
To subscribe to a mailinglist, simply send a message with the word "subscribe"
in the Subject: field to the -request address of that list.
 
To unsubscribe from a mailinglist, simply send a message with the word
"unsubscribe."

	
Kendall Square Research (KSR)

Ardent
	The Western trade
	Interesting multiprocessor MIPS R2000 and R3000 with vector units
	(Weitek) based Unix workstation boxes.  Almost serious 7600 in a box
	Very notable and powerful Fortran compilers.
	Graphics engines.  Dore' graphics system.
	An early leader in "scientific visualization."
Stellar
	The Eastern trade
	Interesting multiprocessor with vector units
	(Weitek) based Unix workstation boxes.  Almost serious 7600 in a box
	Very notable and powerful Fortran compilers (purchased from
	Convex).  X-based graphics engines.

Stardent
	The "shot gun" wedding of Ardent and Stellar.
	    ^^^^^^^^^^ not my term, but I wish it was.
	See also KPC.  Became.....

Kubota Pacific Computer (KPC)
	Now, .....
		Picker Medical Imaging Systems


Live Computer Society
.....................

CPP - {Cambridge Parallel Processing}/ICL/Active Memory Technology (AMT)/
	DAP (Distributed Array Processor) -- Several generations.
	SIMD, medium grain, special-purpose (like MPP), transputers or
	SPARC processors
	Centennial Court,
        Easthampstead  Road,
        Bracknell, Berks. RG12 1JA.
	Irvine, CA

H-P was Convex Computer
	comp.sys.convex
	C-1, C-2, C-3, C-4, Exemplar Series
It has been said that once a machine gets popular enough for a news group
that it becomes old hat.  We hope.

data from the mini-supermarket place.
 
Convex (company founded 1982, first shipment in 1985)
    
C1  1982-1985  3 years
C2  1985-1988  3 years (actually first shipment was end of 1987)
C3  1988-1991  3 years
C4  1991-1994  3 years
 
The C1 was successful, the C2 was wildy successful, the C3 was late and too
expensive, the C4 was late due to techology availability delays.  Each
product was 2-3 times faster per processor than its predecessor.  No
technology was projected to be available by 1997 that would provide a
similar speedup to justify a C5.  Basically high-density high-performance
gate arrays were the driving technology for vector mini-supers, and there
was not enough volume in the business to fund the basic technology R&D to
keep pace with the rate of improvement in the microprocessor business.
 
>From 1987 through 1994, the average annual revenues were in the range of
$100-200M per year.  R&D was 17% of revenues, and the company was publicly
traded that whole time.  So proof by example, that part of the model can
work.  What did not work was that microprocessors were improving by a factor
of 2 every two years, while mini-supers were improving by a factor of 2
every three years.  That growth rate differential will beat anyone
eventually.  Which is why Convex started working on parallel microprocessor
based systems in 1993.


Useful popular reference: pre-Convex:

%A Tracy Kidder
%T The Soul of a New Machine
%X Data General development of the MV/8000 at the same time as the
DEC VAX-11/780.

The Convex dead competitor grave yard.
Digitize for a web page some where.


DEC (Digital Equipment Corp).
ftp://ftp.dbit.com/pub/pdp10/info/soul.txt
	DEC Alpha chip used in Cray T3[DE]
	DEC resells for MasPar and CRI (EL/J series).
Company alive.
Digital owns about 20% of the mid-range high-performance computing
market, second after SGI/Cray! (according to IDC).
 
Also in here should be the DECmpp, the SIMD machine manufactured by Maspar
and sold by Digital.  The reason it should be here is that it started as
a research project in Digital; after a working 16k processor prototype
was built (it became operational in 1986), the rights to parts of the
design were transferred to Maspar, and was the basis of the MP-1.  The
research prototype consisted of 16k 4-bit processors, a multi-stage
bi-directional interconnection network, and a controller.  Research
software consisted of a Pascal compiler, a debugger, and various demos.

        History: The VAX 9000
	FireFly bus-oriented multiprocessor (non-product)
	The original Firefly was
based on MicroVAX II chips.  A later version, the Firefox, used MicroVAX
III chips, and was commercialized as the 3520 (2 processors) and 3540 (4
processors) workstation.  I had a demo of my Linda implementation (the
one you mention as from LRW Systems; thanks for noticing me - I'm still
there, though just barely) at DECWORLD that used a 3540, among other
systems.  It ran fine.

The 35x0's were intended as a partial answer to Steller/Ardent's model
of a super graphics system.  They came with all kinds of graphics
software.  Unfortunately, even 4 times a MicroVAX III chip wasn't super
fast by the time the machine came out (about 13 MIPS if you managed to
get linear speedup).  If you wanted to play with a real parallel
implementation, it was nice, though.
 
The original paper on the Firefly (by, I think, Chuck Thacker and a few
others) had two great quotes, both in response to criticisms that the
design was two conservative, especially in using the by-then very slow
MicroVAX II chipset:  "Sometimes it's better to have 10 millions cycles
by Friday than 10 million cycles per second"; and, describing the
machine, "It may not be that fast, but it's got a lot of torque".

	Andromedia/M31
The *real* DEC experimental multiprocessor was the Andromeda, a shared
memory machine with up to 64 MicroVAX II processors.  I got to run some
experiments on one with about 28 processors - it was a dead machine by
then, and the available processor cards were split between two machines
(plus VMS support was limited to 32 processors anyway).  There was a
Sisal implementation project for the thing, too.  Worked fine, for what
it was.  When the VMS implementors learned about it, they liked to use
it to really pound on their SMP code.


Fujitsu
IBM-cloned architectures as well as independent research efforts.
	VP-200 [later VP-50/100/400]
	VP-2000 series (up to VP-2600)
	AP-1000
	VPP-500
	VPP-300
	VPP-700 (like the CMOS-based 300 except w/ external X-bar)

Fujitsu VPP-500: Up to 222 processors (140 the largest actually
delivered to the customer), each having 8 add/mul pipes, 10ns clock,
shared memory crossbar network with 400MBx2 point-to-point bandwidth
1.6GF/PE, 236 GF peak for 140 PE machine with 9.5ns clock (delivered 1993).
 
Hitachi
IBM-cloned architectures as well as independent research efforts
	Vector Supercomputers:
	S-810
	S-820
	S-3600 and S-3800
Hitachi S-3800: 4 vector processors, each having 8 add/mul pipes, 2ns
clock, shared memory -> 8GF/PE, 32 Gflops peak (delivered in Jan. 1994
to University of Tokyo).
SR-2201 (PAX follow on), QCD machine
 
 
http://www.cs.cmu.edu/Web/Groups/scandal/www/vendors.html

 
	RISC Cluster Server:
	3500 Cluster
 
	MPPs:
	SR2001
	SR2201

IBM
	Starting with the 3090-600 VF
	to SP-1 and SP-2
		http://ibm.tc.cornell.edu/ibm/pps/doc/
		http://lscftp.kgn.ibm.com/pps/vibm/index.html
		http://www.tc.cornell.edu/~jeg/config.html
	GF-11  (Monty's machines)
	TF-1
	RP3
		IBM SJ RC, open seminar presentation:
		IBMer in the audience:
			Is it IBM370 instruction set compatible?
		[After the presenter noted the CPU was based on the 801.]
	Vulcan
	RS/6000 based Superscalar......
	Numerous IBM user groups exist from SHARE to much smaller groups.
	mailing lists:
		vfort
		s-comp (Bitnet)

Intel Scientific Computer now called Intel Scalable Systems
	iPSC/1
	iPSC/2
	Touchstone Alpha, Beta, Gamma, Delta, Sigma
	Paragon
	Systolic product

KAI: Kuck and Associates, Inc. (software only)
Cedar Project/Alliant

The KAP/Pro Toolset ( http://www.kai.com ) is a new direction that takes KAP
to the next level of difficulty.  We have reintroduced the PCF/X3H5 programming
model directives into the Fortran language, so that a single parallel program 
can be easily recompiled on various machines.  However, since this is an
explicit programming model, rather than the implicit/automatic model that KAP
previously used, we have introduced a new notion of automatic debugging.

The radically different part of the KAP/Pro Toolset is Assure.   It 
dramatically alters how you look at the creation and debugging of parallel
programs.  It tells you why (and where) your parallel program will produce 
different results from running the same program serially.


MasPar
	Bit serial SIMD in Sunnyvale.  Transitioning to a software
	company?
	MP-1, MP-2

Meiko
	Computing Surface
The Computing Surface was Meiko's first attempt. They now have a
sparc-based machine called the CS-2. (CS=computing surface).
	Machines located at LLNL (largest) and UCSB, others

Precisely the same mistakes as ICL, Acorn, and so many others.
Technically good - pity about the contact with reality.

This section pending changes from "anonymous."


nCUBE
	Impressive because small portable computers are made and
	demonstrated on road shows.

NEC
	SX-2, SX-1, SX-3, SX-4
	Cenju -series

	Long vector shared memory machines.

	Cenju series currently made with MIPS R4400 CPUs.

NEC SX-4: Up to 512 processors (shipment started December 1995),
2 Gflops each. Up to 32 processor share memory.
16 nodes connected by crossbar.
The NEC SX-4 does not have virtual memory.
The SX-4 uses paged map addressing, but it does not demand page.
All the instructions and data must be resident in main memory for execution.
 
Reference to an unloaded page == abort.
 
The advantage of paged map addressing is that there is no need to
garbage collect (the infamous storage move operation) and it greatly
enhances the ease and efficiency of partial swapping.

 
ParaSoft (Express)
	Arthur Hicken
	ParaSoft Corporation Technical Support
	voice: (818) 792-9941 
	fax  : (818) 792-0819
	email: ahicken@parasoft.com

Parsytec
Carsten Rietbrock
Parsytec Computer GmbH
Marketing
Roermonderstarsse 197
52072 Aachen
Germany
 
Tel: +49(0)241-8889-0
Fax: +49(0)241-8889-50
Email: carsten@parsytec.de
Cserve: 100303,3362

http://www.parsytec.de

Parsytec is manufacturing and marketing parallel systems for industry and
R&D since 1985. More than 1600 systems have been installed world-wide to
date, comprising single processor boardlevel systems and up to 1024
processor MPP systems. Parsytec first utilized the INMOS Transputer series
and in 1993 switched to the emerging PowerPC family. Current systems
(PowerXplorer, GC/PowerPlus and the new CC-Series) are being used in
industrial applications like postal automation, scheck processing and
related computational intensive pattern recognition tasks, as well as in
scientific applications like CFD, FEM and related fields in simulation and
processing (currently up to 15GFlops). Parsytec has subsidiaries in the US
(Chicago), Israel (Telaviv), Germany (Aachen is the head-office, Chemnitz) ;
The Netherlands (Oss, you can see the wizard there) and sales partners
world-wide such as Matsushita (Japan), Sundance (UK), PSS (Sweden), Paratec
(Korea) etc.

Parsytec Inc.
245 W. Roosevelt Rd.
Bldg.9, Unit 60
West Chicago, IL 60185
708-293-9500
cindy@parsytec.de

Hitex (.de)
Cluster workstation parallelism


URL: http://www.hitex.com/parastation
 

Portland Group
http://www.pgroup.com
HPF Compilers
SMP Compilers


Thinking Machines Corp. (TMC, frequently mistaken for TMI) [half-alive]
	SIMD: CM-1, CM-2, CM-200
	MIMD: CM-5 (SPARC based)
	http://north.pacificnet.net/bklaw/chapter11doc.html

TERA
	http://www.tera.com
	Follow on to Denelcor
	Learning to live with latency.
	Process handling in hardware
	Full-empty bit memory


Unisys (See also Burroughs)

OPUS, the Unisys entry into the MPP (or SPP "scalable PP" as the 
marketing department prefers to call it.  It is base on the Paragon
mesh, except with the nodes replaced with our own design using Pentium 
processors and ethernet and SCSI on every node to adapt it to database use.
The OS is based off the Chorus microkernel with a SysVr4.2 shell with a 
single system image so all messaging is with standard Unix IPC (don't
get me going on that, I'll just say I prefered the ipsc860 and Paragon API).
Still has a light show, just our low buck commercial version whick lacks
the Darth Vader look.
The OS group is now in the building with John Van Zandt.  (line to be removed)
  The marketing blurbs are off the Unisys home page and the Intel SSD pages.
My current project is getting Windows NT running on it using Intel's
reference work as a base for the next generation system (p6 based using
same mesh as their "terraflop" machine).  I don't think Sandia will be
asking for any SW from me on that one.
 
I will try to round up some of the brochures floating around the office
for you (handy fire starters).


Are/were they supers?
.....................

Arete

Avalon A12.  http://www.teraflop.com/acs/acs.html
 

Encore
	Gordon Bell helped to found bus oriented multiprocessor
	purchased Gould CSD in Florida.

FLEX -- Flexible Computer
Bus based multiprocessor similar to the Sequent.
A follow on to the NASA LaRC Finite Element Machine (FEM[1|2]).
Based in Texas.  Used 68020? I used the LaRC machine once,
and we had one presentation.  Reliability was a concern, but they did not
understand the OS market.  It's not clear why they folded other than
market pressure (make out later).

HaL
	SPARC based server

Tandem
	Himalayan
	http://www.tandem.com

Chen Systems (follow on) Eau Claire, Wisc.
	http://www.chensys.com
	CHEN-1000 servers, keywords: Pentium, Autopilot,

Sequent
	A promising, bus-oriented multiprocessor which has decided to go
	the business software route.  Forked off a division of Intel back
	in the days of the 432 processor.
http://www.sequent.com/public/fast.html
http://www.sequent.com/public/big.html
http://www.sequent.com/public/smart.html


SUN (Microsystems)
	A relatively late player in the multiprocessor game.  They have
	concentrated more on loosely coupled workstations and moderate
	speed networks.

Applied Parallel Research (APR)
1723 Professional Drive    Sacramento CA 95825
:Voice:       (916)481-9891        E-mail:    support@apri.com
received:FAX: (916)481-7924   APR Web Page: http://www.infomall.org/apri


CHARM
	L.V. Kale kale@cs.uiuc.edu
	Department of Computer Science, (217) 244-0094
	University of Illinois at Urbana Champaign
	1304 W. Springfield Ave., Urbana, IL-61801

Linda
 One Century Tower
 265 Church Street 
 Scientific Computing Assoc
 New Haven, CT 06510
 (203) 777-7442
 (203) 776-4074 (fax)
 
 
 email: software@sca.com

	LRW Systems (VMS based)
        24 Old Orchard Lane
        Stamford, CT 06903
        (203) 329 0921
        Email:  lrw@lrw.com

Public Domain Linda:
--------------------

ariadne.csi.forth.gr:posybl/POSYBL-1.102.TAR.Z .
by Ioannis Schoinas (sxoinas@csd.uch.gr)
on a network of Sun/3 or Sun/4 or MIPS.

Non-Public Domain Linda:
------------------------

AugmenTek has two that are not free but are inexpensive. One is a C
implementation and runs on Amiga computers, another is a REXX
implementation that runs on either Amiga or OS/2.  See AugmenTek's web site:
(general) http://www.halcyon.com/sbr/
(REXX version) http://www.halcyon.com/sbr/rexinda.html
(C version) http://www.halcyon.com/sbr/torqueware.html


Where's the list of research projects?
======================================

To repeat: many projects are sensitive by their very nature, most of this list
is likely to be academic.

The open list of research projects.


Q: What are some portable C-like parallel languages?

Split-C from Berkeley
http://http.cs.berkeley.edu/projects/parallel/castle/split-c/

Split-C is a parallel extension of the C programming language that
supports efficient access to a global address space on current
distributed memory multiprocessors. It retains the "small language"
character of C and supports careful engineering and optimization of
programs by providing a simple, predictable cost model. This is in
stark contrast to languages that rely on extensive program
transformation at compile time to obtain performance on parallel
machines. Split-C programs do what the programmer specifies; the
compiler takes care of addressing and communication, as well as code
generation. Thus, the ability to exploit parallelism or locality is
not limited by the compiler's recognition capability, nor is there
need to second guess the compiler transformations while optimizing the
program. The language provides a small set of global access primitives
and simple parallel storage layout declarations. These seem to capture
most of the useful elements of shared memory, message passing, and
data parallel programming in a common, familiar context.

Split-C is currently implemented on the Thinking Machines Corp. CM-5,
the Intel Paragon, the IBM SP-2, the Meiko CS-2, the Cray T3D, SGI
Challenge, and over any platform supported MPI or PVM. All versions
are built using the Free Software Foundation's GCC and the message
passing systems available on each machine. Faster implementations are
underway for networks of workstations using Active Messages. It has
been used extensively as a teaching tool in parallel computing courses
and hosts a wide variety of applications. Split-C may also be viewed
as a compilation target for higher level parallel languages.


Why is this FAQ structured this way?
====================================

Because I first coined the term and started Johnny Appleseeding them
with different ideas to work on problems like people asking: "Where's the FAQ?"
This is called a "chained FAQ" or a "chain."  It's a method of distributing
or partitioning.  These files tend to grow and they have to break up in
space and time anyway.

Another reason is that because it goes out regularly, it's like a lighthouse
beacon to let you know that connectivity is (or isn't) taking place.  The
panels make a good divider (it's a bit tough on those people reading
this group via email, but they can adjust.  You are encouraged to adopt
an intelligent news reader which use Killfiles (and you can version control
kill these panels and read them only when a change has taken place).
Some smart mailers have kill file capability.

Supercedes: and Expires: don't always work.

Hey, Burton and Dorothy Smith approved.


What is Miya's Exercise?
........................

This was first posted to net.arch in the 1984ish time range.
The inspiration is derived from attempting to understand quantum mechanics
and the nature of light.  Is light a particle or a wave?  They said,
"Okay, on Mondays, Wednesdays, and Fridays light is a particle.  On Tuesdays,
Thursdays, and Saturdayes, light is a wave.  On Sunday we rest."

Take what ever your terminology is: multiprocessing, parallel processing,
etc, and throw it out for a week.  Use another set of words to mean the
same thing.  Throw those words out.  Select another: e.g. competitive and
cooperative processing (these are particularly good ones which came out
over time).  The discipine is harder than you might imagine.

Continue removing words and phrases.
Don't laugh, it has proven useful.


-----
Q: What are some realistic (non-PRAM) models for analyzing the
complexities of parallel algorithms?

LogP: Towards a Realistic Model of Parallel Computation
ftp://ftp.cs.berkeley.edu/ucb/TAM/ppopp93.ps

Block Distributed Memory Model 
ftp://ftp.cs.umd.edu/pub/papers/papers/3207/3207.ps.Z

Bulk Synchronous Parallel Model
http://www.scs.carleton.ca/~palepu/BSP.html


What is Salishan?
-----------------

The Salishan Lodge is a Five-star, seaside, golf resort hotel and conference
center (it has its own small runway) on the coast of Oregon near Glen Eden
Beach (about a three-hour drive from Portland).  The Conference on High Speed
Computing sponsored by the DoE's LANL and LLNLs and was the brain child of
B. Buzbee, G. Michael, R. Ewald,
Occasionally other agencies (DOD, NASA, etc.) also contribute.
The Conference is invitation only for about 100 people to keep
discussion manageable.  A waiting list exists.  [Personally I stopped going
after chairing a session, too many managers wanting to meet too many famous
people; I found another better conference, not super admittedly.]

The Conference (which usually does not have proceedings) attempts to
bring together people who have big computing problems with
knowledgeable people with technical ideas and with people who have money, etc.
Similar meetings exist elsewhere (like Asilomar).  The basic idea is:
1) get rid of marketing, 2) keep it technical, 3) keep it informal
[this is evident with one specific room, a lot of wine, and a chalk board
(it's a somewhat academic in nature Conference)].

The setting:
The Lodge is part of a very small chain which includes the Salish Lodge
near Seattle.  The TV-series Twin Peaks was filmed near and occasionally
inside the Salish Lodge (called "The Great Northern", this gives a
very small sense of what the hotel is like).  Trivia.


What makes it Five-Star?
------------------------

It's very expensive.
It has complete, full-time 24-hour service.
Some people don't like it.


Aren't we disgressing from supercomputing?
-------------------------------------------

You asked.
This type of conference is popular.

>From the Salishan Meetings sprang the Supercomputing'88 and subsequent
Supercomputing'xx Conferences.

 
This is the Conference's logo (simplified):

                   ^ A
                s / \ r
               m /   \ c
              h /     \ h
             t /       \ i
            i /         \ t
           r /           \ e
          o /             \ c
         g /               \ t
        l /                 \ u
       A /                   \ r
        <_____________________> e
                Language


Flynn's terminology
-------------------

SISD: [Flynn's terminology] Single-Instruction stream, Single-Data stream
SIMD: [Flynn's terminology] Single-Instruction stream, Multiple-Data stream
MISD: [Flynn's terminology] Multiple-Instruction stream, Single-Data stream
MIMD: [Flynn's terminology] Multiple-Instruction stream, Multiple-Data stream

%A M. J. Flynn
%T Very High-Speed Computing Systems
%J Proceedings of the IEEE
%V 54
%N 12
%D December 1966
%P 1901-1909
%K btartar
%K maeder biblio: parallel architectures,
%K grecommended91,
%K JLb,
%K process migration,
%X The original paper which classified computer system into instruction
and data stream dimensions (SISD, SIMD, etc.).
%X Classification of parallel architectures, based on
the division of control and data. SISD, SIMD, MIMD, MISD.


Are/were there MISD machines?
-----------------------------

It depends who you talked to.
Generally, yes, a few people stated, no.


From: David Bader <dbader@Glue.umd.edu>
Subject: comp.parallel

  I read the FAQ proposal you posted and liked it a lot. Covers most
of the dirt. Here are some ideas that I had, in no particular order
(just to make your life a little more difficult ;)

[sigh!]

  I consider myself a long-time supercomputing/parallel junkie, since
I remember the formation of comp.hypercube ;)

[you qualify]


---------------------------------------------------------

Pointers to Benchmarks:

micro benchmarks:

PARKBENCH: Parallel Kernels and Benchmarks Report
http://www.epm.ornl.gov/~walker/parkbench/

Genesis Benchmark Interface Service
http://hpcc.soton.ac.uk/RandD/gbis/papiani-new-gbis-top.html

macro benchmarks:

The NAS Parallel Benchmarks
http://www.nas.nasa.gov/NAS/NPB/

macro benchmarks for shared memory machines:

SPLASH from the Stanford Flash Project
http://www-flash.stanford.edu

---------------------------------------------------------
What is the HPF (high performance fortran) standard?

See 
High Performance Fortran Forum
http://www.erc.msstate.edu/hpff/home.html

The High Performance Fortran Forum (HPFF), led by Ken Kennedy of the
Center for Research in Parallel Computing, is a coalition of industry,
academic and laboratory representatives working to define extensions
to Fortran 90 for the purpose of providing access to high-performance
architecture features while maintaining portability across platforms.


DEC:
	http://www.digital.com/info/hpc/f90/

	http://www.fortran.com/fortran/

hpff            Main HPFF list, including meeting minutes
hpff-core       "Core" list, for attendees at the meetings
hpff-interpret  Questions and answers about interpretation
                of the HPF specification
hpff-distribute Discussions of advanced data mapping
                (e.g. irregular distributions) for HPF 2.0
hpff-task       Discussions of task parallelism and paralle
                input/output for HPF 2.0
hpff-external   Discussions of external interfaces
                (e.g. linking with C++) for HPF 2.0

>From now on, the correct procedure for adding yourself to the hpff-xxx
list is to send mail to hpff-xxx-request@cs.rice.edu.  In the body of the
message (*not the Subject:, as before*), put the line
 
subscribe <your-mail-address>
 
<your-mail-address> is optional; it defaults to the message sender.  You
should probably only include it if people often have difficulty replying to
your e-mail directly.
 
Similarly, to remove yourself from the hpff-xxx list, you should send mail
to hpff-xxx-request@cs.rice.edu with the line
 
unsubscribe <your-mail-address>


---------------------------------------------------------
What is the MPI (message passing interface) standard?

See
Message Passing Interface
	http://www.mcs.anl.gov/mpi/index.html
	http://WWW.ERC.MsState.Edu:80/mpi/
	http://www.epm.ornl.gov/~walker/mpi/

MPI stands for Message Passing Interface. The goal of MPI, simply
stated, is to develop a widely used standard for writing
message-passing programs. As such the interface should establish a
practical, portable, efficient, and flexible standard for message
passing.

Message passing is a paradigm used widely on certain classes of
parallel machines, especially those with distributed memory. Although
there are many variations, the basic concept of processes
communicating through messages is well understood. Over the last ten
years, substantial progress has been made in casting significant
applications in this paradigm. Each vendor has implemented its own
variant. More recently, several systems have demonstrated that a
message passing system can be efficiently and portably implemented. It
is thus an appropriate time to try to define both the syntax and
semantics of a core of library routines that will be useful to a wide
range of users and efficiently implementable on a wide range of
computers.


IEEE URL: http://computer.org/parascope/


URL: http://www.tgc.com/HPCwire.html
 
Here's the text from that page:
 
HPCwire
The Text-on-Demand E-zine for High Performance
Computing
 
Delivered weekly, to over 19,000 readers. 
 
If you're interested in news and information about high-performance
computing and you have access to Internet e-mail, send an e-mail
message to:
              e-zine@hpcwire.tgc.com
 
to receive a free 6-week (this is now 4-week) trial subscription to HPCwire. 
 
Every issue includes news, information, and analysis related to both
engineering/scientific and commercial applications. HPCwire also
includes employment opportunity and conference listings.
 
Topics regularly covered include:
 
* Supercomputing
* Mass Storage
* Technology Transfer
* On-line Transaction Processing
* Parallel Processing
* Industial Computing
* Client/Server Applications
* Fluid Dynamics
 
HPCwire Sponsors:
 
* Ampex
* Avalon Computer
* Cray Research, Inc.
* Digital Equipment Corp. 
* Fujitsu America
* Genias Software
* HNSX Supercomputers
* IBM
* Intel
* MasPar Computer
* Maximum Strategy
* nCUBE
* The Portland Group
* Silicon Graphics, Inc. 
* Sony Corp.
 
HPCwire is published by Tabor Griffin Communications, the leader in
Internet-delivered news and information 
 
Tabor Griffin Communications
8445 Camino Santa Fe, Suite 202
San Diego, CA 92121
619-625-0070
human@tgc.com 


START section on academic projects


Mentat and Legion programming environments.
 
http://legion.virginia.edu/
Mentat uses dataflow on the object level for this. The programmer has
to decide what the unit of work is and encapsulate it into an object;
the run-time system then wrings out whatever parallelism can be
found between these objets. It's definitely an interesting approach.


NIST Parallel Processing 
http://www.itl.nist.gov/div895/sasg/parallel/


=============================

The "Ordinance Museum"

=============================

Histories of various Supercomputer centers
(yes, _some_ will not be able to participate).  It would
answer questions like which machine replaced which machine.  Also,
which machines were long-lived, and which weren't.  Call it the
"Ordinance Museum" (if old programmers tell war stories... ),
or some such.  (And there really is an Ordinance Museum at BRL.)


List of supercomputer facilities (yes, we know)

Some of this could probably be found on various Web sites.
Check.

LLNL
LANL
BRL
NCAR
The spooks.


CEWES (Corps. of Engineere Waterways Experiment Station, Vicksburg MS USA)
CEWES (from the Photos on the Wall):
 
                      IBM 650 8/57 - 11/62
                      GE-225  8/62 - 10/68
                      GE-437  8/68 -  6/72
                      GE-635  3/72 -  7/81
                      TI ASC  1/79 - 11/80
            Honeywell DPS 1   7/81 - 12/84
          Honeywell DPS 8/70 12/84 - 10/91
                    Cray YMP 11/89 - 12/96
                     CDC 962  4/90 -
                    Cray C90  7/93 -
                     SGI PCA  7/96 -
                    Cray T3E  2/97 -
                     SGI O2K  3/97 -
                      IBM SP  8/97 -


Articles to parallel@ctc.com (Administrative: bigrigg@ctc.com)
Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel