Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

A Brief History of Supercompting

Abstract

Since the 1960s, supercomputers have been constantly evolving and achieving peak speeds at
a rapid pace. Over the years, different technological advances have allowed supercomputers to
tackle challenges that involve enormous amounts of calculations ranging from the Department
of Defense modeling nuclear explosions in the 1960s to present day modeling of climate, unique
search and rescue efforts, and scientific exploration. This paper will give a brief history of
supercomputing which will include the origins of supercomputing, some of the major
supercomputers over the years, how supercomputing evolved over the years, and an outlook on
the future of supercomputing.

Introduction

Engineers have been constantly pushing the technological limits of what is possible over
generations in order create the fastest supercomputer possible. The CDC6600 created in 1964,
which is considered to be the first successful supercomputer, ran at a max speed of 3
megaFLOPS which was up to 13 times faster than the next fastest computer, the IBM 7030
Stretch [1]. By today’s standards, this is painfully slow. Even a small embedded computer such
as the Raspberry Pi can have speeds that are up to a few gigaFLOPS, which out classes the
CDC6600 by a considerable amount. Today, the title of the fastest supercomputer in the world
belongs to the Sunway TaihuLight which resides in Wuxi, China and boasts a speed of up to 93
petaFLOPS and contains over 10 million cores. The origins of supercomputers is traditionally
credited to Seymour Cray, who is universally referred to as the father of supercomputing [2]. His
discoveries helped define the architectures of supercomputers for about thirty years. Over time,
having multiple processors access the same shared memory in this type of architecture and
derivative architectures became a limiting factor for scalability of these supercomputers. This
lead to a new era of scalable multicomputer supercomputers which use independent memories
that are connected by high speed networks [1]. These insanely fast supercomputers help
achieve some of the grand challenges that we face today, such as climate modeling, and allow
us to answer questions that would otherwise be impossible to figure out.

The Different Eras of Supercomputers

Early Stages of Supercomputers

The CDC6600 created in 1964 iconically referenced as the first successful supercomputer
created. It provided a central processor connected to ten peripheral processors as well as a
symmetrical shared storage configuration. The CDC6600, which can be seen in Fig. 1 was a
physically built into the shape of a plus sign. The four bays of the CDC6600 each contained four
logic chassis that had 756 modules with 600,000 transistors [3]. The high speed of the
CDC6600 required short wires and dense packaging that used a refrigeration compressor to
cool the modules by circulating Freon through cooling plates [3]. One of the innovations of this
supercomputer was the 60-bit word architecture (Fig. 2.) that was centered around a memory of
128 K words that had no parity [4].

The clock of the CDC6600 had a speed of 10 MHz with four 25 ns phases. The CDC6600 used
10 peripheral processing units that were equivalent to the CDC160 peripheral microcomputer
[4]. These 10 peripheral processing units were interpreted by a single hardware unit every 100
ns in a multi-threaded fashion. Although the main CPU was highly parallel due to having 10
independent units, the CDC6600 has only 3 sets of 8 registers that were used for memory
access, indexing, and arithmetic operands. The 10 functional units that were used in parallel
consisted of two floating point multiplier, an adder, long adder, Boolean logic, divider, branch,
shift, and two incrementers for address arithmetic. The proper sequence of instructions was
assured by the Scoreboard unit [3].

The follow up to the CDC6600 was introduced in 1969 as the CDC7600 and provided a
performance five times greater than that of the CDC6600. The CDC7600 had a clock speed of
36 MHz and included the use of pipelining. After the CDC7600, Seymour Cray began working
on CDC8600 as an extension for the CDC7600 with the ability to run the processors in a
coupled mode [1]. The combination of technical difficulties related to reliability and cooling, and
cash-flow problems that CDC was having led to the cancellation of the CDC8600. This resulted
in Cray leaving the CDC to found Cray Research [2].

The Vector Architecture

The creator of the CDC6600 Scoreboard, Jim Thornton, went on to design the CDC STAR
around the same time. The STAR was first delivered in 1974 which was actually four years later
than expected. The STAR is known as the world’s first vector computer and was thought to be
able to speed up applications [1]. However, STAR ended up being a disappointment since it
became apparent that the general performance was considerably less than what was expected.
It turned out that few programs could be effectively vectorized into a series of single instructions.
Just about all of the calculations will depend on the results of an earlier instruction, but in order
for the results to be fed back in they had to first clear the pipeline. This led to programs hitting
the high cost of setting up the vector units and the only extreme examples were the ones that
benefited from the final speedup. In addition, the STARs ability to operate on scalars was
actually slower than the previously made CDC6600 and CDC7600 [1]. Consequently, Thornton
ending up leaving CDC to form Network Systems Corporation.

In 1976, another well known vector processor was created by Cray Research: the Cray-1. The
Cray-1 used vector instructions and registers as well as a fast CPU and main memory that were
closely coupled. The architecture of the Cray-1 can be seen in Fig. 3. The Cray-1’s processors
is fed by a high bandwidth memory that consists of 16 independent banks. A single store or load
instruction specified a vector of up to 64 words. A vector held in the registers is operated on in
parallel by using the pipelined arithmetic and logic units. Chaining of the arithmetic operations
was also possible in order to avoid temporary storage and retrieval. The scalar performance of
the Cray-1 was several times higher than the 7600, unlike the STAR [3]. In additional, the
Cray-1 achieved a performance gain from the vector register architecture which allowed vectors
to be stored and loaded in parallel alongside operations on data in the vector registers [4]. The
STAR, on the other hand, had a memory to memory vector architecture that required the main
memory to be accessed for every operation [1]. The Cray-1 ended up being the fastest
computer from when it was introduced in 1976 to 1981. Cray-2, Cray-3, and Cray-4 were all
successful follow ups to the Cray-1.
Transitioning to Multicomputers

Beginning in 1982, there began a transition from shared memory ECL vector multiprocessors to
powerful, low cost distributed CMOS multicomputers [3]. The Caltech Cosmic Cube, created in
1982, consisted of 64 interconnected Intel 386 boards that were independent of each other. This
supercomputer demonstrated the utility, programmability, and great cost-performance of a
distributed multicomputer architecture. Intel used this idea to make its own line of scalable
multicomputers known as iPSC in 1985 and continued development for the government into the
early 2000s. In addition, in 1984, Inmos introduced a scalable microprocessor called the
Transputer which used serial message passing links that could form multicomputers by
connected to other Transputers. These were heavily used in process control and embedded
applications. During this time period of transition to multicomputers, Multiflow, Elexsi, and
Cydrom introduced very long instruction word (VLIW) architectures.

By 1993, an official list of the top 500 fastest computers was issued. This compilation was
achieved by measuring the speed of the supercomputers using the Linpack benchmark. This
compilation of the fastest computers was possibly the biggest mark of the clear transition from
the Cray single shared memory architecture, to the multicomputer architecture. In 1993, Fujitsu
unveiled its 140 node vector processor multicomputer which nodes that were tightly
interconnected and a system with global addressing. This supercomputer had a peak of 260
Gflops and held the record for the fastest supercomputer until June of 1996 when Hitachi
delivered a 1024 node vector processor multicomputer. The Intel’s ASCI Red was released in
1997 which benchmarked at just over one teraFLOPS while housing 7,264 Intel Micros and
maintained at the top of the supercomputing list until November 2000 when it was upgraded to
house 9,632 Intel micros and peaked at 3.1 teraFLOPS. The barrier of one petaFLOPS was
broken in May 2008 by IBM when they developed the Roadrunner, consisting of 129,000,000
cores and had a size of 3-4x as large as Intel’s ASCI Red [3].

Present Day Supercomputing

Today, the title of the fastest supercomputer in the world belongs to the Sunway TaihuLight (Fig.
4). The TaihuLight has a peak performance of 93 petaFLOPS and consists of 40,960 manycore
64-bit RISC processors based on the Sunway architecture [5]. Every one of the processor chips
houses 256 processing cores along with four auxiliary cores that are used for system
management. This brings the total number of CPU cores across the whole system to
10,649,600. The TaihuLight has been used in applications such as atmosphere modeling. One
such example of this is the refactoring and optimizing of the Community Atmosphere Model on
the Sunway TaihuLight by Haohuan Fu of Tsinghua University [5]. With the TaihuLight, he was
able to simulate the entire life cycle of hurricane Katrina and achieved close-to-observation
simulation results for both the track of the hurricane and its intensity [5]. This type of simulations
helps understand hurricanes better and makes predicting their paths more accurate.

Future Outlook

With the petaFLOP barrier broken, the next biggest challenge to face is the creation of an
exaFLOP supercomputer. An exaFLOP capable supercomputer would be about ten times faster
than the current fastest computer, the Sunway TaihuLight. It was projected that an exascale
supercomputer of this kind would be developed by 2018, but that ended up not being the case.
An exascale computer would be a huge achievement since it is believed to be similar to the
order of processing power of the human brain at the neural level. As of now, the countries
currently attempting to develop a exascale supercomputer are China, the United States,
Taiwan, Europe, Japan, and India. These countries are looking to break the exaFLOP barrier by
2020. I personally believe that this goal will be reached by 2020 and engineers will soon be
looking for the next hurdle to overcome afterwards.

Conclusion

Supercomputers have come a long way since its origin in the 1960s. The first successful
supercomputer, the CDC6600 laid the foundation for supercomputers for three decades and
helped kickstart the production of supercomputers afterwards. Different types of technological
advances, such as vector processors and multicomputers, have paved the way for the future of
innovative supercomputing technology. With the petaFLOP barrier broken, engineers look
ahead at building the first supercomputer capable of exaFLOP speeds. Although the prediction
of reaching this achievement by 2018 was not met, multiple countries are working towards a
goal of creating an exascale system by 2020. This type of achievement will help scientists and
engineers tackle problems of unprecedented size.

References

[1] Li B., Lu P. (2016) The Evolution of Supercomputer Architecture: A Historical Perspective. In: Xu
W., Xiao L., Li J., Zhang C. (eds) Computer Engineering and Technology. Communications in
Computer and Information Science, vol 592. Springer, Berlin, Heidelberg

[2] ​G. Strawn and C. Strawn, "The Father of Supercomputing: Seymour Cray," in ​IT
Professional,​ vol. 17, no. 2, pp. 58-60, Mar.-Apr. 2015.
doi: 10.1109/MITP.2015.31

[3] Gordon Bell. Supercomputers: The amazing race (a history of supercomputing, 1960-2020).
Technical report, San Francisco, CA, 2014.

[4] ​Segall, Richard, et al. ​Research and Applications in Global Supercomputing.​ Information Science
Reference, an Imprint of IGI Global, 2015.

[5] H. Fu ​et al.​ , "Refactoring and Optimizing the Community Atmosphere Model (CAM) on the
Sunway TaihuLight Supercomputer," ​SC16: International Conference for High Performance
Computing, Networking, Storage and Analysis,​ Salt Lake City, UT, 2016, pp. 969-980.
doi: 10.1109/SC.2016.82

You might also like