Newsgroups: comp.parallel,comp.sys.super
From: eugene@sally.nas.nasa.gov (Eugene N. Miya)
Reply-To: eugene@george.arc.nasa.gov (Eugene N. Miya)
Subject: [l/m xx/xx/xx] Dead Comp. Arch. Society	c.par/c.s.super (26/28) FAQ
Organization: NASA Ames Research Center, Moffett Field, CA
Date: 26 May 1998 12:03:05 GMT
Message-ID: <6keb1p$1sj$1@sun500.nas.nasa.gov>

Archive-Name: superpar-faq
Last-modified: 30 Apr 1998

26	Dead computer architecture society		< * This Panel * >
27	Special call
28	Dedications
2	Introduction and Table of Contents and justification
4	Comp.parallel news group history
6	parlib
8	comp.parallel group dynamics
10	Related news groups, archives and references
12
14
16	
18	Supercomputing and Crayisms
20	IBM and Amdahl
22	Grand challenges and HPCC
24	Suggested (required) readings

This space intentionally left blank (temporarily).


UNDERDEVELOPMENT


This is a roughly chronological list of past supercomputer, parallel computer,
or especially "interesting" architectures, not paper designs (See panel 14,
for references for those).  Computer archeology is important
(not merely interesting), because it is the failed projects where real
learning takes place.  Even Seymour Cray designed "failed" machines.

DCAS came from a so-so Robin Williams movie: Dead Poets Society (DPS)
which nerd CS students went to see (trust me, he's better in live performance).

In turn, the dead-architecture, lessons-learned discussion started in
comp.arch later that same year.  The idea was to collect material from
knowledgeable ex-engineers and former scientists, anonymously if need be,
before it was lost (since the company had either died or evolved).
The problem is that academic and commercial literature is fraught with
all kinds of useless glowing marketing/sales language.  We (the net,
I didn't do this alone) collected comments anonymously (if need be)
to prevent lessons being lost.  The idea was that anyone could comment.
It was that netters had hashed over this material before so many times,
it seemed useful to capture it (like an FAQ ;^).  We assembled a
list of architectures.

Maybe, a third the way through the list, I was asked by certain people
with CRI to suspend discussion, because CRI was starting to acquire
Supertek (which I personally always thought was a mistake).
We never resumed.  We lost the inertia.


Ever hear of the Gibbs Project?
If not: you should not be surprised.


Around that same time, ASPLOS came to Santa Clara, where they held a
Dead Computer Architecture Society panel session.  I had a meeting of
some sort (possibly SIGGRAPH) and I missed the starting hour.
I gave Peter Capek of IBM TJW a video camera, but I did not keep the tape
because I merely wanted to see what I had missed
(if I had, I would have given it to J. Fisher who sat on the panel).
I did not regard that as recording history.
The panel session discussed the various failed minisupercomputer firms
(perhaps I should use more flowery marketing language like
"attempted?").  Either way, lessons were there in front of 200+ architects,
OS and language designers.  Perhaps there was another video camera
in the room.....
	Let see what were the four architectures represented?
		Elxsi
		...
		Multiflow
		...


One poster has mentioned "Why no mention of the Symbolics 3600, LMI, or TI
LISP machines?"  I am not adverse to including the lessons from those machines,
however, the DCAS discussion was about minisupercomputers.  The 3600 and other
LISP machines fell more into the class of workstations during their time
competing with the Xerox "D-machines" [Dorado, Dolphin, and Dandelion],
SUN, SGI, VAXStation, etc.  Most at the time were not even parallel machines.
But if you can pitch me a good case, I'll consider them.  Do it.


Also useful:
old header files for those systems which ran C compilers.


Most recently, I am reminded of a warm fall Saturday morning in a house
on a hill overlooking the beautiful Santa Barbara Channel.
George Michael, who I drove just to see Glen Culler (who had suffered
a stroke some time back), was talking about "war stories,"
Ms. Culler [wife and David's mother] chimed in:
	"I really think you need a better title for your book
	{one GAM was working on}.  No one will buy it with a word like
	'war stories' in the title...."
Three of us in the room chuckled.  She is great.


The Dead Computer Architecture Society
======================================


Floating Point Systems (FPS)
----------------------------
(Purchased by Cray Research)
 
        FPS AP-series (Culler based design with VLIW attributes)
                7600 performance attached to your PDP-11.
Roots with Culler-Harris (CHI), Inc.  FPS started with specialized
attached processors FPS AP-120B, and scaled from there
to the FPS T-series Hypercubes.  The AP-120 line could be attached
to machines as small as a PDP-11.  They were controlled by specialized
Fortran (and later C) system calls (a software emulator existed for code
development: obviously slow).  Known as an FFT and MXM box.
It was marketed in 1977 in Scientific American as 7600 power on
your minicomputer and showed quite respectable, but economical,
number crunching power (I/O was still a problem).
38-bit words.  Pipelined, precursor to VLIW?  Perhaps.

Later models: FPS-164, FPS-264, FPS-500, APP
Larger 64-bit attached processors.  Pre-IEEE-FP.  Attached processors
became useful and popular for signal processing, medical apps.

        FPS T-series (hypercubes)
Someone else (maybe Stevenson) can write a T-series paragraph.

Absorbed by Cray Research.  
 
This business unit sold by SGI to Sun at time of SGI/Cray merger, 7/96.
[Current living incarnation.] The former CS-6400 line:
Current living incarnation is the UltraEnterprise 10000 and UltraHPC 10000
(2 different names for 2 different markets, same box).


Denelcor
--------

The Denelcor Heterogeneous Element Processor (HEP) was perhaps the most
unusual architecture a student will never get a chance to see.
My first knowledge of this machine came from Mike Muuss (BRL, scheduled
to get one [4 PEMs delivered]) at a time when the DEC VAX-11/780 was the only
VAX around.  Later I would invite representatives to Ames.

        7600-class scalar CPUs at a time when the Cray-1 was out
	and the X-MP was just being delivered.  64-bit machine.
	1978-1984.
        Full-Empty bits on the memory, goes way beyond mere Test-and-Set
	instructions.
	Separate physical Instruction (128MB) and Data (1 GB) Memory
	based in Aurora, CO, East of Denver.
	Operating systems: HEP-OS and HEP Unix.
	Programming and architecture manuals at the Museum.
	Keywords: dataflow (limited),
	13 systems delivered.  Photos.
	Sites (Messina list, 13 sites):
		BRL (only 4 PEMs)
		Argonne
		LANL
		GIT
		XXX (probably)
		Luftansa
		7 to go.

Problems: somewhat underpowered at the time, programming difficulties.
Hardware deadlock.  Early inexperience with serious parallel systems.
Software.  Ambitious. Pipelining.  Dataflow.

I would hope that a HEP simulator sees the light of day, one of these days.
It is suggested that the Horizon simulator is a close approximation to
the HEP.  I do not know how to obtain it (but know roughly where),
I just don't have time.

Successor machines: HEP-2 (design 70% complete?) and HEP-3.
Horizon (paper design).  Tera (1 machine).
Keywords: learning to live with latency.

See Dennis Shasha's book Out of Their Minds for a barely adequate profile
on Burton Smith (too short).


Elxsi
-----

	Sunnyvale, CA based super-minicomputer.
	ECL-technology, bus-oriented, true 64-bit, first IEEE 754 FP machine,
	SISD (non-vector) cpus (1-10 later 1-16 CPUs).
        Impressive for its time (designed to compete against the VAX-11/780
                AND low end CDC supercomputers).
        EMBOS ("Unix-like" operating system, "We renamed `grep` to `find.`"
                "Ah? what did you renamed `find` to?")
        ENIX
	Tata Elxsi, sites in Australia and India.
	Over 200 sites, and many CPUs.  1 CPU per board,
	Photos exist.
	The firm dissolved in the late 1980s, people to H-P.

Personal experience: Saw and briefly used a 4 processor system which
replaced a Cyber 172.  Since replaced by networked workstations.
That application was real-time flight data anaylsis on experimental aircraft.

Lessons?
--------

ECL is expensive.  Don't screw around with OSes.  Understand the market.


Alliant
-------

Once called Data Flow Systems.
        comp.sys.alliant
        FX/8, FX/80, FX/2800, etc.

The FX/8 (the first architecture)
had a particularly slow scalar system using MC68008 CPUs for
Interactive front-end Processors.  This at a time when SUN workstations
had better known 68010 processors.

The multiple backend Computational Engines (CEs) were
a proprietary design with vector instruction.

The Berkeley Unix port was a mixed beast.

Basis for U. Ill. Cedar Project.  Fizzled?

 
Acquired Raster Technologies (graphics).

The Friday before Memorial Day 1992.  At least that's when 80% of us got
laid off.
 
1) Undercapitalized in a market not as big as it first appeared.
        > I disagree about the "undercapitalized".
2) Technology changing faster than we could keep up with.  (Small
   Unix-CPU systems can be designed and shipped far faster than a
   parallel system.)
        > True,
3) Relying on Intel for a part that didn't _end_ in "86".
        > True,
4) Long lead time on sales of MPP systems.

#include "alliant.h"

Surviving news group, comp.sys.alliant.
Museum has 1 FX/8 "Do not run classified data on this machine"
and 1 FX/1 (former Wallach desk stand).


Multiflow
---------

VLIW 
A couple of us pondered what ever happened to SN#1 of this machine.
I saw it!  Even typed 'ls' on it!

Is the third flag at the assembly room at Convex Computer: Maryland?

Museum History Center (Trace 14).


Lessons?
Among others: Waited too long to go to ECL.


Myrias
------

What was the US DOD doing funding a Canadian company?
That was the first question which ever came onto the net.

Homebase was Edmonton, Al, Ca
Formed by some academic types from U of Edmonton ca 1984?
                        (or was it U of Alberta at Edmonton?)
Original design "on a napkin at a bar"...
 
Weakness:  Hardware.
 
SPS-1  68000, "proof of concept"
       a hierarchy of busses, 4 68k's per bus
       16 of those busses on another bus in a box
       (called a "cage")
       hook as many cages together as you want/can afford
       ca 1986?, none installed
 
SPS-2  as above but with
       68020 + 68881/2 + MMU,  "production system"
       4MB/68020
       largest system actually built was about 1088 cpus (~ 1024 + 64),
       a "benchmark system", proof of concept (again)
       ca 1988?, <~10 installed
 
SPS-3  as above but with
       68040, one or two actually built
       ca 1990?, ~0 installed
 
Strength:  Software.

Basic idea:  take VM Unix, remove pager/swapper
             replace pager with custom pager which swaps pages
             between processors according to rules which make
             the illusion of single address space, SPMD
 
Hey, this is Pre SCI, KSR, et al.
 
Neatest thing:  debugger could make "ghost pages", which contained
                a count of number of reads/write per word-
                it could find uninitialized words easier than anyone.
                (notice I didn't say "faster")
                (i.e., one ghost word per data word)
 
Funniest thing:  we had more trouble getting our Canadian friends
                 into NASA than into NSA...
 
Q: What did "SPS" stand for?
A: Oh, Scalable Processor System, or Super Parallel System, or
   something.  If you can make up something sensible from "SPS",
   someone at Myrias probably used the term at least once...
 
Q: Why did US DoD fund a Canadian startup?
A: No one else had provably _scalable_ system at the time.
 
They went bankrupt without warning, got a call at 1430 to come
to the office, "you're out of work at 1700"
 
Don't like it?  Sue us in Edmonton  "you're toast, eh?"
 
AFAIK,
Today:  they're still alive and well and selling the SW for WS clusters
You can see them at Supercomputing '9?, perhaps sharing a small booth

Lessons Learnt:
Myrias:  Hardware matters.  Best software in the world needs
something to run on.  300 Kflops/processor hasn't been supercomputing
for quite a while now.


Flexible Computer (FLEX)
------------------------

FLEX-32
Saw it!  (Langley Research Center)  Even typed 'ls' on it!
NS32032 processors on a bus.
Competitor to Sequent.


Scientific Computer Systems (SCS)
---------------------------------

        SCS-40, SCS-30 (in order of development time, not performance)
An attempt at a low-cost binary compatible Cray X-MP clone mostly for
software development, but also marketed to those who could not afford a
full sized Cray.  This was probably a bad business decision on their part.
                "Come down from your Cray...." kiss of death.
The term "Crayette" was first used with this machine.
                The last hosts running COS and CTSS (CIVIC: the Fortran
		which replace LRLTRAN).
Licensed COS 1.13 from CRI.  Meanwhile CRI was transitioning to UNICOS (tm).
They secretly hoped UNICOS was going to fail.
Then hoped for a remaining (surviving) COS/CTSS market: also failed.
        shipped a few dozen Cray-clones.

The first developed machines were delivered to San Diego.
Roots also in Portland, OR and Boeing (also purchased a couple) in Seattle.


Supertek (Purchased by Cray Research)
--------

Another attempt at a low-cost binary compatible Cray X-MP clone.
Mike Fung of H-P tried this.  Santa Clara, CA.
        S-1 (not to be confused with the LLNL S-1 project), S-2
                The last host running CTSS
You should probably note that the S-1 was sold by Cray Research after the
buyout as the "XMS" and the S-2 (which was still under development at the time
of purchase) was sold as the "Y-MP/EL"


Culler-Harris Inc. (CHI)
------------------------


Culler Scientific
-----------------


Ametek
------
A conglomerate with something like 40 divisions?
One producing a Hypercube clone similar to Intel and NCUBE in Acadia, CA.
Used at Caltech and other sites.

Never saw one.


Guiltech
--------

Based in Santa Clara, CA.  A somewhat mysterious company
Guilfolye.
        Originally optical interconnect.  It changed to a systolic design.
        The only VMS based "supercomputer."
        Two? delivered (JPL and TRW) in Beta test.
Its last PR gasp was when an employee sold a manual to the Soviets in
the mid-80s.  That employee was sent to prison for violating
export control laws.


Cydrome
-------

Milpitas, CA
Hosted SIGBIG meetings.
        Cydra 5  (black boxes)
        Two delivered?
Pittsburgh Supercomputer Center (PSC) and 
One Cydrome was delivered to Yale, where a water pipe running through
the machine room burst over it.

Bill Gropp
(At Yale 1982-1990)

--
http://www.mcs.anl.gov/~gropp


Hash addresses:
The Cydra 5 (aka MXCL 5) did this. It was one of the things that made
the memory system expensive (it didn't take 0 cycles; but it did make
access to memory pretty uniform independent of stride).
 
I should try to find Richard about this and also see if he retains any
old manuals.

 
Museum History Center (Cydra 5).

 
Cray Computer Corporation
-------------------------

Colorado Springs, CO
Cray-3
Cray-4

A computer company doing research when the parent research company was doing
computers.  Forked from its parent CRI sometime after an unsuccessful
Cray Labs in Boulder, CO.  About the same time as SSI(1).

The 3 was intended to be a 16 processor machines with a 2.0 ns clock cycle
(1 instruction per cycle unlike the Cray-2).

The 4 was to be a 1 cubic foot cube.
The 4 abandoned the local memory.  And brought back B and T registers.


GaAs technology (Vitesse)

Founder killed due to injury complications suffered following a car crash.

Successor, SRC Computer.

Cray-5
Cray-6


Supercomputer Systems Inc. (1)
------------------------------

Eau Claire, WI
Steve Chen
Heavily funded funded by IBM with not a lot to show for it.
1 2-CPU prototype.  Photo in BusinessWeek inside a Farady cage.
Stories that the hunk of the machine was not properly cooled on first power up
and that the hulk was later found by the side of a road abandoned.

Scheduled to be a ramped up 64-processor Y-MP with another memory stage.


Cray Research Incorporated
--------------------------
Acquired by Silicon Graphics.


Still Birth
===========

American Supercomputer
----------------------
A project by Mike Flynn (Stanford).


CHoPP
-----
forgot what the CH stands for...PP was parallel processing.

1970s era by Sullivan.


Supercomputer Systems Inc. (2)
------------------------------

San Diego, CA
Very little is know about this firm.


Half alive companies  (software, services, different products only)
====================


CDC (now CDS)
---
	This section will be added later
	comp.sys.cdc


ICL DAP (not totally dead)
-------

ICL, sometimes called the IBM of England.
Sometimes considered competition to the Goodyear/Loral MPP
English SIMD machine, a.k.a. Active Memory Technology (Irvine Offices)
	Versions:
	Transputer
	SPARC?

Inmos Transputers
-----------------

Not a supercomputer per se, but an interesting attempt at a component
with real concern for I/O.
        Popular processor (1982) in some circles.
        Well thought out communication scheme, but problems with scalability.
        Crippled because it's lacking US popularity.

|However, you must mention Transputers (something developed in EUROPE,
|outside of the U.S.A., name comes from TRANSistor and comPUTER) and the
|related companies:
|* INMOS (from GB), now bought by SGS Thompson (French), who was the
|inventor and sole manufacturer of transputers
|* Parsytec (still alive, but does not use Transputers any more, Germany)
|* Meiko (GB) produced the "computing surface"
|* IBM had an internal project (codenamed VICTOR)
|and there are many more. Transputers had a built in communication agent and
|it was very easy to connect them together to form large message passing
|machines in regular, fixed topologies. INMOS' idea was that they should be
|the "transistors" (building blocks) of the new parallel computers.

The Inmos transputer has earned a place in this file, now that
SGS/Thomson has issued the last-time-buy warning (end 98, last
deliveries end 99). The moral of this one is don't try to change
everything at once (language, processing model, hardware)


SIMD machines in general


Thinking Machines Corp.
-----------------------

Thinking Machines was founded by Danny Hillis to architect the concept
of massively parallel (SIMD) computers.  TMC sold over 100 systems
called "Connection Machines" between 1989 and 1996.
  CM-2  up to 65,536 single-bit computers with FP accelerator
  CM-5  up to 1024 32-bit (SPARC) computers with vector accelerator.
They went out of the computer business in 1996 and are still alive (barely)
making data mining software.

CM-1
CM-200
Special projects.


MasPar
------

Now a data mining software company.

SIMD mini-Connection Machine-1 also resold by DEC minus the lights and
the black cabinet.


KSR1/KSR2
---------

Homebase was Waltham, Ma, USA
   It was a non-descript, plain red brick building at the end
   of a long driveway past other office buildings.  There was
   _no_ identifying signage, and no indication which door was
   the front door.
 
Formed by some academic types from Cambridge, first office
was actually on Kendall Square, hence the name.
 
Strength:  AllCache
 
The goal is to have a logically shared memory in a scalable
architecture.
 
So you connect your processors, with their caches, to main memory.
   What does main memory do?
 
   1. It gives you a bottleneck, and
 
   2. It provides the value which any datum is assumed to have, and
 
   3. It doubles the memory costs of your computer.

Thus, if you can figure out how to do (2.), you can eliminate
(1.) & (3.).
 
The AllCache Engine solves (2.) as follows:
 
   Connect the processors together using a high-speed, unidirectional
   ring to give high bandwidth and allow all processor caches to stay
   coherent.  The size of the ring was 34 = 32 + 2 nodes.  Use 32 of
   the nodes for processors, and the other two for linking to
   other rings.  Configure the rings in a hierarchial fashion using
   32 processor rings as the base level, rings connecting to rings
   with 32 processors as the next higher level, rings connected to those
   rings as the next higher level, etc.  Tell yourself that "data
   locality" means you'll rarely have a memory access go thru the
   higher levels of the rings.  Viola- scalable shared memory.
 
AllCache level 0 is a ring of (up to) 32 processors.
AllCache level 1 is a ring whose nodes connect to
   the rings with 32 processors.  Any KSR1 with more than 32 processors
   had level 1.  Max is 32 * 34 = 1088
AllCache level >1 was never built, but was allowed by the architecture
 
AllCache moved cache lines, which were 128 bytes.
(Not to be confused with subcache lines which were 64 bytes.)
 
Subcache size: 512 KB
Cache size: 32 MB  (i.e., "per processor")
Level 0 AllCache:  1 GB ( 32 processors)
Level 1 AllCache:  34 GB ( 34 * Level 0)


KSR1 - 20 MHz processors, largest ever built was installed
       at BRL and was 384 processors.
       Sites included CTC, ORNL, GT, NCSC, UMi, UFl & a few more.
 
KSR2 - 40 MHz processors --> but same speed as KSR1 memory system !?!?
       (Some of my sales friends say it worked.  None of my sysadmin
       friends ever said it worked.)
 
KSR3 - same as KSR1 & KSR2, but would use IBM PPC processors.
 
Weakness: Implementation
 
KSR made their own processors-  20 MHz w/ fused fadd/fmul
instruction gave the luminal speed of 40 Mflops per.
Two instruction streams- arithmetic & memory.  IEEE fp.
It was a 64 bit processor w/ 64 bit addresses in 1991.  It had
no speculative execution or branch prediction, etc.
 
I/O ran thru the processor, and worked by "cycle stealing"-
When the I/O subsystem wanted the processor to do something,
it would stop instruction issue, and insert its own
instructions in the memory-op instruction stream.
 
AllCache latencies were approximate (_no_ memory time on the
KSR1 was determinate, all were averages, too many microstates for the
same macrostate)
 
   data item is somewhere in subcache - 2 clocks
   data item is somewhere in cache - 20 clocks
   data item is somewhere in your level 0 AllCache - 50 clocks
   data item is somewhere in your level 1 AllCache - 150 clocks
 
   subcache was two-way set associative with random replacement.
   cache was 16-way set associative with random replacement,
      but 4 of the 16 were tied down by the OS.
 
   The processor didn't have a scoreboard, and nobody really
   knew just exactly where, at any time, a data item might be
   located, so a subcache miss stalled the processor for _at
   least_ 10's of clocks.
 
The bottom line was that the KSR1 was a difficult beast to
program *for high efficiency*.  The programmer had to
keep in mind which subcache line a data item would use, which
cache line a data item would use, all the while trying to
make (typically vector) code have behavior resembling cache re-use.
 
   One thing which was supposed to help out was an instruction
   called "prefetch" which could move a data item to where it
   was needed prior to the actual data request.  In Fortran,
   a prefetch looked like a function call (which the compiler
   would silently ignore).  It didn't work in general, and who
   wants to code prefetches?  Why not just go with message passing?
 
Neatest thing: Free lunches in house.  This saved the company
   a lot of time employees didn't spend driving to restaurants,
   talking shop where others could overhear, getting stuck in
   traffic, etc.  It was a good meal (and I'm a fussy eater).
 
Funniest thing:  where other startups had newspaper clippings
   on the walls describing some Victory the company achieved,
   at KSR the most popular clipping was a gag article some local
   business reporter wrote on How Fast Startup CEO's Drove
   Their Fancy Sportscars on Rt128.  Henry claimed 138 mph (speed
   limit 55 mph).
 
Q:  If AllCache was such a good idea, why did KSR die?
A:  They were caught inflating the revenue the company had
    actually received.  They were sued by the stockholders,
    who were paid off largely in stock.  The day after the
    court finalized the settlement, the company declared
    bankruptcy (the capital didn't give any more to management).
 
Q:  Why was KSR so secretive?
A:  AllCache is a simple idea, and it's not clear the patents
    would be upheld in court (post Myrias, SCI).
 
Was laid off in true KSR style.  Found out when my login
didn't work anymore.
 

Lessons Learnt:
KSR:  Don't assume that having troubles in the Numerical World means
you're ready for the Transaction/Data Mining World (if you can't make an
NSF computer center happy, how are you going to make a bank happy?);
use, ahem, standard accounting practices, at least after you go public.


Evans and Sutherland Computer Division ES-1
-------------------------------------------

E&S is well-known in the computer graphics community as making some of the
finest high-performance real-time computer graphics hardware.  This
image generation hardware is used in $10-100M flight (and other) simulators.

When it was announced that E&S was getting into the supercomputer arena,
they were perceived a serious/credible new contender.
Gordon Bell, however, takes a dimmer view of them.
One representative machine is in storage at the Museum
(over Gordon's dead body).


ES-1
----

Jean Yves LeClerk studied under Dave Evans and upon getting his
degree, went back to France.  When he got an idea of how to build
a supercomputer, he came back to Evans for advice on how to raise
capital to fund the project.  Evans said "I won't tell you how to
raise money- I'll fund you myself."  Thus, Evans & Sutherland got
a computer division, ESCD.  It was located in Mountain View, Ca, USA,
just off US 101.  Formed ca 1986, product shipped 1989, near midnight
at the end of the September (so shipped in the quarter promised).


Basic idea:
The building block was an 8 x 8 nonblocking crossbar, which could
also connect to another similar crossbar without using one of the 
8 x 8 connections.  Use two crossbars to connect 16 processors
to a memory system with 16 banks.  Virtual memory, with translation
done in the memory side of the crossbar (to allow faster context
switches).  The processor had a small TLB (512 ? page entries).

Use another 8 x 8 crossbar to connect 8 of those together, and you
have 128 processors in one system.  Note that this scheme may be
extended: 8 128 processor systems could be connected by another
8 x 8 crossbar, etc.  Great for data-parallel, too bad there wasn't
any HPF in 1988 :-(  (ESCD played around with PCF, IIRC)  The
system was a (theoretically) scalable shared memory NUMA computer.

ESCD had a unique nomenclature: the processors were called 
"computational units", and the set of 16 computational units and memory
was called a "processor".  Memory was 128 MB per "processor".

Use MACH for your OS, so you'll have a "parallel" unix.  You need
the parallel file system to drive the very good I/O subsystem,
which was rated at 200 MB/s per processor (1.6 GB/s per full system).

Neatly finessing the issue of custom v. off-the-shelf processor,
ESCD made their own processor, but used Weitek chips for the
floating point. (This was back in the days when a processor was
a "chip set", rather than the single die for sale today.)  During
development, the clock was 40 MHz, with plans to go to 50 Mhz by the
time of production.  32-bit words, but the Weitek's would
do 64-bit fp nicely.  Luminal speed was about 12 Mflops per
computatiuonal unit.  Measured speeds (i.e., operands in memory rather
than register) were more like 2-3 Mflops per.

There were some unexpected problems with the pipelines in the
processors: certain instructions couldn't be issued at particular clocks
after issue of certian other instructions.  The French called this
"pipeline hazards", the Californians called it "cursed instruction sequences".
It was a closely guarded secret, and caused ESCD to
not release the instruction set, nor the assembler.

Biggest problem with the design was memory access:
processor to memory: return the physical address of this virtual address
memory to processor: here's the paddress
processor to memory: read/write data to/from this physical address
memory to processor/processor to memory: here's the data

So a memory op was actually 4 messages rather than 2 anytime the
physical address wasn't in the processor's small TLB.  Think Linpack.
512 pages just ain't supercomputin'

Serials 1 and 2 went to Caltech and U. Colorado at Boulder
(can't recall which got which).  Up to about serial 7 or so were in some
stage of production when the project ended.

The ES-1 at CU Boulder was installed right beside, and during the same
week as, the Myrias SPS-2.  Head to head competition.  (Myrias was
in-and-out in a couple of days.  ESCD needed a couple of weeks.)

Neatest thing:  Being within walking distance of Shoreline City Park,
		so a walk would clear the head of the frustrations
of working on beta (alpha ?) HW & SW.

Funniest thing:  Culturally, ESCD couldn't take a meal together, because
                 The French wouldn't eat without wine,
                 Which the Mormons wouldn't touch.
                 The Californians acted well-fed after mass
                 quantities of pizza and beer,
                 While the Sales team was out looking for a 3+ star
                 restaurant to put on their expense reports.

Mort d'ES-1
The project ended when Evans resigned from the Board of E&S.
Then a 4/3 vote in favor of the project became a 3/4 vote against.
We got 60 days notice (U.S. plant closing law), it was announced
at the Supercomputing convention in Reno.  (If you have to end
my project, end it at the Supercomputing convention.  Most of the
places I might look for work are within walking distance.  For
the record, Henry Cornelius was the first headhunter to our booth
after the announcement, within 3 or 4 minutes.)


Lessons Learnt:
ESCD:  Don't push too many technologies simultaneously.
Chips checked out on the silicon compiler, but were too dense (100K
transistors in 1986) for the foundries to make at acceptable yield;
MACH was not ready for commercial use in 1988.


End ES-1


[ My thanks to the moderator for allowing me to contribute my
reminiscences.  Next time I work for another startup with another
Great Idea, I'll take better notes... (I didn't know there was
going to be a test ;-) ]


Astronautics
------------

ZS-1: no vectors.

Wisconsin


Prisma
------

Colorado Springs.

GaAs
Convex software.


Vitesse
-------
Vitesse Electronics was a startup to do two things:
GaAs chips (which survives as Vitesse Semiconductor),
and a mini-supercomputer which never made it.

The architecture of the Vitesse Numerical Processor (VNP) used
a very deep pipeline, and attempted to bypass the latency
problems that arose from the deep pipeline with a so-called
data-driven optimizer (DADO).  The machine did not have
registers, but used three-address instructions directly
involving memory.  The DADO kept track of the interdependencies
among source and destination addresses, and issued instructions
when it could.  The intent was to allow enough instructions
in the DADO to cover two (or perhaps even more) data
dependencies in tight loops without needing to stall the pipeline.

The processor did not have any IO, but relied on a
front-end to do all of the work, and run most of the
UNIX operating system.  System calls from the backend were
relayed to the front-end.  The initial machine was intended
to be CMOS, with a view towards a later implementation in GaAs.

The machine was intended as an MP machine, and had a very
interesting interconnect.  SW was used to establish mappings
from a local processor's address space to so-called global
virtual addresses.  Similarly, global virtual addresses
could be mapped to local addresses.  The net effect was that
the SW could establish a form of "carbon-copy" memory.  Writes
from one CPU to a local address would also show up, through the
mappings, in a local address in one or more other processors.
The mappings could be, but need not be, symmetric.

The machine was designed far enough to have an assembler,
a compiler, and an OS that booted, and even ran a [trivial]
job in simulation, but the key chips were never fabbed.


The Applied Dynamics AD100, and ECL-based multiprocessors
with 65-bit (yes, 65, not 64) floating point, did 20 MFlops in 1981.  There are
a couple of hundred installations, or more, the majority of which are in
California.  The company was a University of Michigan Aerospace engineering
department spinoff located in Ann Arbor, Michigan, and founded by three UM
profs.  Their focus was/is on real time applications, their system had lots
of special hardware to interface to r/t equipment.  The company still exists,
although they are not selling many of these expensive machines any more, and
they have a web site at http://www.adi.com.  It had a minimal operating
system, and in addition to Fortran supported their in-house parallel
simulation language (ADSIM) derived from CSSL, for systems of odes.
 

Q Is it true that supercomputer programmers spend 
  their nights in flophouses?
 
A Only when coming up on a deadline.
 
Stan Lass