Vector Processors: Historical Perspectives

From Computer Architectures: A Quantitative Approach, the book by David A. Patterson and John L. Hennessy.

The first vector machines were the CDC STAR-100 and the TI ASC, both announced in 1972.Both were memory-memory vector machines. They had relatively slow scalar units - the STAR used the same units for scalars and vectors - making the scalar pipeline extremely deep. Both machines has high start-up overhead and worked on vectors of several hundred to several thousand elements. The crossover between scalar and vector could be over 50 elements. It appears that not enough attention was paid to the role of Amdahl's Law on these two machines.

Cray, who worked on the 6600 and the 7600 at CDC, founded Cray Research and introduced the CRAY-1 in 1976. The CRAY-1 used a vector-register architecture to significantly lower start-up overhead. He also had efficient support for nonunit stride and invented chaining. Most importantly, the CRAY-1 was also the fastests scalar machine in the world at that time. This matching of good scalar and vector performance was probably the most significant factor in making CRAY-1 a success. Some customers bought the machine primarily for its outstanding scalar performance. Many subsequent vector machines are based on the architecture of this first commercially successful vector machine.

In 1981, CDC started shipping CYBER-205. The 205 had the same basic architecture as the STAR, but offered improved performance all around as well as expansibility of the vector unit with up to four vector pipelines, each with multiple functional units and a wide load/store pipe that provided mutliple words per clock. The peak performance of the CYBER-205 greatly exceeded the performance of the CRAY-1. However, on real programs, the performance difference was much smaller.

The CDC STAR machine and tis descendant, the CYBER-205, were memory-memory vector machines. To keep the hardware simple and support the high-bandwidth requirements (up to 2 memory references per FLOP), these machines didn't efficiently handle nonunit stride. While most loops have unit stride, a nonunit stride loop had poor performance on these machines because memory-to-memory data movements were required to gather the nonadjacent vector elements.

In 1983, Cray shipped the first CRAY X-MP. With an improved clock rate, better chaining support, and multiple memory pipelines, this machine maintained the Cray Research lead in supercomputers. The CRAY-2, a completely new design configurable with up to 4 processors, was introduced later. It has much faster clock than the X-MP, but also much deeper pipelines. The CRAY-2 lacks chaining, has an enourmous memory latency, and has only one memory pipe per processor. In general, it is only faster than the X-MP on problems that require its very large main memory.

In 1983, the Japanese computer vendors entered the supercomputer market-place, starting with the Fujitsu VP100 and VP200, and later expanding to include the Hitachi S810, and the NEC SX/2. These machines proved to be close to the X-MP performance. In general, these machines have much higher peak performance than the CRAY X-MP, though because of large start-up overhead, their typical performance is often lower than the CRAY X-MP. The CRAY X-MP favored a multi-processor approach, first offering a two-processor version and later a four-processor machine. In contrast, the three Japanese machines had expandable vector capabilities. In 1988, Cray Research introduced the CRAY Y-MP - a bigger and faster version of the X-MP. The Y-MP allows up to 8 processors and lowers the cycle time to 6 ns. With a full complement of 8 processors, the Y-MP is generally the fastest supercomputer, though the single-processor Japanese supercomputers may be faster than a one-processor Y-MP. In late 1989 Cray Research was split into two companies, both aimed at building high-end machines availiable in the early 1990s. Seymour Cray continues to head the spin-off, which is now called Cray Computer Corporation.

In the early 1980s, CDC spun out a group, called ETA, to build a new supercomputer, the ETA-10, capable of 10 GigaFLOPS. The ETA machine delivered in the late 1980s used low-temperature CMOS in a configuration with up to 10 processors. Each processor retained the memory-memory architecture based on the CYBER-205. Although the ETA-10 achieved enormous peak performance, its scalar speed was not comparable. In 1989 CDC, the first supercomputer vendor, closed ETA and left supercomputer design business.

In 1986, IBM introduced the System/370 vector architecture and its first implementation in the 3090 Vector Facility. The architecture extends the System/370 architecture with 171 vector instructions. The 3090/VF is integrated into the 3090 CPU. Unlike most other vector machines, the 3090/VF routes its vectors through the cache.

The 1980s saw the arrival of smaller-scale vector machines, called mini-supercomputers. Priced at roughly one-tenth the cost of a supercomputer ($0.5-$1 million versus $5-$10 million), these machines caught on quickly. Although many companies joined the market, the two companies that were most successful are Convex and Alliant. Convex started with a uni-processor vector machine (C-1) and now offers a small multiprocessor (C-2); they emphasize Cray software capability. Alliant has concentrated more on the multiprocessor aspects; they build an 8-processor machine, with each processor offering vector capability.

The basis for modern vectorizing compiler technology and the notion of data dependence was developed by Kuck and his colleagues at the University of Illinois.

In the late 1980s, graphics supercomputers arrived on the market from Stellar and Ardent. The Stellar machine used a timeshared pipeline to allow high-speed vector processing and efficient mutlitasking. This approach was used earlier in a machine designed by B.J. Smith and called the HEP and built by Denelcor at the mid-1980s. This approach doesn't yield high-speed scalar performance, as evident in the scalar benchmarks of the Stellar machine. The Ardent machine combines a RISC processor (the MIPS R2000) with a custom vector unit. These vector machines, which cost about $100K, brought vector capabilities to a new potential marked. In late 1989, Stellar and Ardent were merged to form Stardent, and the Ardent architecture is being shipped from a combined company.

From this overview we can see the progress vector machines has made. In less than 20 years they have gone from unproven, new architectures to playing a significant role in the goal to provide engineers and scientists with ever larger amounts of computing power.