This document discusses different types of parallel computers including multiprocessors and multicomputers. Multiprocessors have shared memory while multicomputers have distributed memory. Multiprocessors are suitable for general purpose and time sharing applications. Multicomputers can speed up execution of large programs. Modern multicomputers use hardware routers to pass messages. Generations of multicomputers include those using hypercube, mesh, and fine-grained architectures. Vector supercomputers and SIMD computers are introduced for vector processing and data parallelism.
This document discusses different types of parallel computers including multiprocessors and multicomputers. Multiprocessors have shared memory while multicomputers have distributed memory. Multiprocessors are suitable for general purpose and time sharing applications. Multicomputers can speed up execution of large programs. Modern multicomputers use hardware routers to pass messages. Generations of multicomputers include those using hypercube, mesh, and fine-grained architectures. Vector supercomputers and SIMD computers are introduced for vector processing and data parallelism.
This document discusses different types of parallel computers including multiprocessors and multicomputers. Multiprocessors have shared memory while multicomputers have distributed memory. Multiprocessors are suitable for general purpose and time sharing applications. Multicomputers can speed up execution of large programs. Modern multicomputers use hardware routers to pass messages. Generations of multicomputers include those using hypercube, mesh, and fine-grained architectures. Vector supercomputers and SIMD computers are introduced for vector processing and data parallelism.
• Two categories of parallel computers are discussed
below are distinguished by having shared common memory or unshared distributed memory. • Suitable for general purpose and time sharing applications by multi user. • It can be used to speed up the execution of a single large program in time critical applications. Other variations – CC-NUMA • Cache coherent non-uniform memory access model. • Specified with distributed shared memory and cache directories . • All cache copies must be kept consistent. • Modern multicomputer use hardware routers to pass message. Based on the interconnection and routers and channel used the multi computers are divided into generation • 1st generation : based on board technology using hypercube architecture and software controlled message switching.Eg:Caltech cosmic • 2nd Generation: implemented with mesh connected architecture, hardware message routing and software environment for medium distributed – grained computing. Eg: Intel Paragon • 3rd Generation : fine grained multicomputer like MIT J- Machine. • The network "fabric" used for data transfer varies widely, MULTI VECTOR and SIMD COMPUTERs • In this section ,we introduce supercomputer and parallel processors for vector processing and data parallelism. • Vector super computer is built on the top of scalar processor. Two pipeline vectors supercomputer models (vector processor models) • register to register – The above figure shows the register –to-register arch. – Vector register holds vector operands, results etc. – Vector functional pipeline retrieve operands from vector registers (to/from). – All registers are programmable in user instructions. – Length of each vector register is fixed.eg :64 bit register in Cray series . – Other machines uses reconfigurable vector registers to dynamically match the register length eg:Fujitsu VP 2000 series • There are fixed numbers of vector registers and functional pipelines in a vector processor. • Therefore both resources must be reserved in advance to avoid resource conflicts between different vector operations. • Memory to memory – uses of a vector stream unit to replace the vector registers. – Vector operands and results are directly retrieved from the main memory in super words, say 512 bits as in the Cyber 205. Eg: Operational specification of the MasPar MP- 1 computer • Listed below is a partial specification of the 5-tuple for this machine 1)MP-1 SIMD machine with N=1024 to 16,384 PEs.
2)The CU executes scalar instructions , broadcasts decoded
vector instructions to the PE-array and control the inter PE- communication.
3)Each PE is a register based load/store RISC prsr capable of
executing integer operations over various data sizes and std fp operations . PEs receive instructions from CU. 4) The masking scheme is built within each PES and continuously monitored by the CU which can set /reset status of each PE dynamically at run time.
5)The MP-1 has an X-net mesh network plus a global
multistage crossbar router for inter CU-PE ,X-net nearest 8-neighbor,and global router communications.