)

 Transmittal Notice for Publication Number <u>HR-2000-01</u>
 Revision Level <u>C</u>
 Date
 <u>11-7-88</u>

| R I<br>E N<br>M S<br>O E<br>V R<br>E T | PUB.<br>NUMBERS | PAGE<br>NUMBERS                                                                                                                                                                                                                                                | TABS &<br>DIVIDERS | BINDER<br>SIZE | What is changed;<br>why is it changed;<br>how should this<br>print package be<br>used?                                                                                                                                                                                                                  | CRAY-2 Computer System Functional<br>Description Manual Change Package<br>HR-2000-01 |
|----------------------------------------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| E T<br>★<br>★<br>★<br>★<br>★           | HR-2000 C       | Pages 1-1<br>through 1-7<br>Page 2-3/2-4<br>Page 3-5/3-6<br>Pages 4-1/4-2<br>Pages A-11/<br>A-12<br>Pages B-3<br>through B-5<br>Pages 1-1<br>through 1-7<br>Pages 2-3/2-4<br>Pages 3-5/3-6<br>Pages 4-1/4-2<br>Pages A-11/<br>A-12<br>Pages B-3<br>through B-5 |                    |                | Remove and shred all of<br>Remove and shred pages<br>Remove and shred pages<br>Remove and shred pages<br>Remove and shred pages<br>Remove and shred pages<br>Insert new Section 1 (pag<br>Insert new Section 2 pag<br>Insert new Section 3 pag<br>Insert new Section 4 pag<br>Insert new Appendix A pag | s 2-3/2-4.<br>s 3-5/3-6. Print on blue paper.<br>s 4-1/4-2.<br>s A-11/A-12 .         |
|                                        |                 |                                                                                                                                                                                                                                                                |                    |                |                                                                                                                                                                                                                                                                                                         |                                                                                      |

Cray Research, Inc.

1. INTRODUCTION

The CRAY-2 computer system is a powerful, general-purpose computer system with extremely high processing rates. Scalar and vector capabilities in a multiprocessing environment combined with integrated foreground processing achieve these high rates.

#### 1.1 CRAY-2 COMPUTER SYSTEM FEATURES

The CRAY-2 computer system mainframe contains either two or four independent Background Processors, each more powerful than a CRAY-1 computer system processor. Featuring a clock-cycle time faster than any other computer system available, each of these processors offers exceptional scalar and vector processing capabilities. The Background Processors can operate independently on separate jobs or concurrently on a single problem. The very high speed Local Memory integral to each Background Processor is available for temporary storage of vector and scalar data.

Common Memory is one of the most important features of the CRAY-2 computer system. It consists of 256 or 512 Mwords in dynamic memory, or 64 or 128 Mwords in static memory, 64-bits long, randomly accessible from any of the Background Processors and from any of the data channels. The memory is arranged in quadrants with either 64 or 128 interleaved banks. All memory access is performed automatically by the hardware. Any user may use all or part of the memory not being used by the operating system.

Control of network access equipment and the high-speed disk drives is integral to the CRAY-2 computer system mainframe hardware. A single Foreground Processor coordinates the data flow between the system's Common Memory and all the external devices across either two or four high-speed I/O channels. The synchronous operation of the Foreground Processor with the Background Processors and the external devices provides a significant increase in data throughput.

The most important CRAY-2 computer system features are:

- Extremely large directly addressable Common Memory
- Fastest cycle time available in a computer system
- Scalar, vector, and multiprocessing combined in one system
- Integral Foreground Processor

- Elegant architecture
- Extremely high reliability
- High density memory chips and extremely fast silicon logic chips
- Liquid immersion cooling

## 1.1.1 PHYSICAL CHARACTERISTICS

The CRAY-2 computer system mainframe is elegant in appearance as well as in architecture (see figure 1-1). The memory, computer logic, and DC power supplies are integrated into a compact mainframe composed of 14 vertical columns arranged in a 300° arc.

The upper part of each column contains a stack of logic modules and the lower part contains power supplies for the system. Total cabinet height, including the power supplies, is 45 in. (114.3 cm); the diameter of the mainframe is 53 in. (134.6 cm). Thus, the "footprint" of the mainframe is a mere 16  $ft^2$  (1.49 m<sup>2</sup>).

An inert fluorocarbon liquid circulates in the mainframe cabinet in direct contact with the integrated circuit packages. This liquid immersion cooling technology allows for the small size of the CRAY-2 computer system mainframe and is thus largely responsible for the high computation rates.

Significant CRAY-2 computer system physical characteristics are:

- Occupies only 16 ft<sup>2</sup> (1.49  $m^2$ ) of floor space
- Stands 45 in. (114.3) high, diameter is 53 in. (134.6 cm)
- Contains 14 columns arranged in a 300° arc
- Contains 3-dimensional modules
- Contains liquid immersion cooling
- Contains chilled water heat exchange



1353

Figure 1-1. CRAY-2 Computer System Mainframe

#### 1.1.2 ARCHITECTURE AND DESIGN

In addition to the cooling technology, the extremely high processing rates are achieved by a balanced integration of scalar and vector capabilities and a large Common Memory in a multiprocessing environment.

Significant architectural components of the CRAY-2 computer system include the following:

- Two or four independent Background Processors capable of vector and scalar operation. Synchronization of the Background Processors is achieved through the Foreground Processor and semaphore flags in the Background Processors.
- 256 or 512 Mwords of dynamic Common Memory, or 64 or 128 Mwords of static Common Memory
- A foreground system that controls and monitors system operation, including:
  - A Foreground Processor for system supervision
  - Two or four high-speed synchronous communication channels
  - Up to 40 I/O devices
  - Disk controllers to control up to 36 disk storage units (DSUs)
  - Two or four Common Memory ports for data transfer
  - Two or four Background Processor ports to allow Foreground Processor control
  - External I/O controllers (from one to as many as four per channel)
  - HSX controllers (two maximum per channel)

The identical Background Processors each contain registers and functional units to perform both vector and scalar operations. The single Foreground Processor supervises the Background Processors. The large Common Memory complements the processors and provides architectural balance, thus assuring extremely high throughput rates (see figure 1-2).

Shown in figure 1-2 is the four-processor model. The two-processor versions have two high-speed synchronous communication channels. The contents of a channel are the same in each version of the system.

On-site maintenance is possible through the maintenance control console.

HR-2000-01

1-4



Figure 1-2. CRAY-2 Four Background Processor Computer System Mainframe Configuration

HR-2000-01

1-5

## 1.2 CONVENTIONS

This manual uses the following conventions:

| Convention                  | Description                                                  |
|-----------------------------|--------------------------------------------------------------|
| lowercase<br>italics        | Variable information                                         |
| X or x or X                 | An ignored value                                             |
| n                           | An unknown variable value                                    |
| (XX)                        | The contents of a register designated by the XX value        |
| Register bit<br>designators | Numbered right to left as powers of 2, starting with $2^0$ . |

Unless otherwise indicated, numbers in this manual are decimal numbers. Octal numbers are indicated with an 8 subscript. Exceptions are instruction parcels in instruction buffers and instruction forms which are given in octal without the subscript.

## 1.2.1 EXAMPLES

Illustrations of the above conventions.

| Example                                      | Description                                                                                                                                                |
|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Transmit (A <b>k</b> ) to S <i>i</i>         | Transmit the contents of the A register specified by the $k$ designator to the S register specified by the $i$ designator                                  |
| 167 <i>ixk</i>                               | Machine instruction 167 where the <i>j</i><br>register designator is not used and is<br>an ignored value                                                   |
| Read <i>n</i> words from memory              | Read an unknown variable number of<br>words from memory. You can read,<br>within the stated restrictions, as few<br>or many words from memory as you wish. |
| Bit 2 <sup>63</sup> of an S or V<br>register | Value represents the most significant bit                                                                                                                  |

# ExampleDescriptionBit 2<sup>31</sup> of an A registerValue represents the most significant<br/>bitVM register elementThe VM register contains 64 bits, each<br/>corresponding to a word element in a<br/>Vector register. Bit 2<sup>63</sup> corresponds<br/>to element 0, bit 2<sup>0</sup> corresponds to<br/>element 63.

#### 1.3 ORGANIZATION

This manual is organized into the following sections:

- Section Description
  - 1 Contains the introduction to this manual
  - 2 Describes the CRAY-2 computer system Background Processor. The registers, functional units, and algorithms used are described.
  - 3 Provides detailed information on the CAL instructions that operate on the CRAY-2 computer system. Each machine instruction can be represented symbolically in Cray Assembly Language (CAL) Version 2. The instructions are listed octally in a box format that provides the Cray Assembly Language (CAL) Version 2 syntax format, an operand if required, a brief description of each instruction, and the machine instruction.

Following the boxed information is a detailed description of the instruction and an example using the instruction.

- 4 Describes the CRAY-2 Common Memory, phased memory access, and single-error correction/double-error detection (SECDED)
- 5 Describes the CRAY-2 Foreground System, which handles the I/O
- Appendix A Lists the symbolic machine instructions by function. The octal machine code can be used as an index when referring to section 3 for a detailed description of the instruction.
- Appendix B Contains the CRAY-2 system configuration specification sheets

HR - 2000 - 01

## Instruction buffers

Each Background Processor has a buffer with eight independent fields to allow program loops to execute without additional Common Memory references. Programs can loop within the instruction buffer using any of the branch instructions.

Each independent field contains 16 or 32 words. The total instruction buffer size is 128 or 256 words.

The next sequential instruction out of the instruction buffer or a branch out of the instruction buffer discards the oldest data field and replaces it with 16 or 32 words of new data.

## Instruction issue

Background instructions are translated in several steps and are allowed to issue sequentially by an instruction issue control mechanism. The words are disassembled into 16-bit parcels that are placed in a queue where the translation occurs. The instruction issue process involves checking the reservation flags for the registers and functional unit involved in the instruction sequence. The parcel waits in issue position in the instruction queue until all required resources are free.

Instruction parcels and 16-bit constants are intermixed in the instruction queue. The constant parcels are passed through the instruction queue without test.

## 2.1.2 REAL-TIME CLOCK

Each Background Processor has a 64-bit register that counts continuously at the clock period rate. This count value determines the passage of real time to an accuracy of 1 clock period (CP). The real-time clocks in the Background Processors are synchronized at deadstart. Instruction 115 reads the real-time clock.

## 2.1.3 SEMAPHORE FLAGS

To synchronize Common Memory references, eight semaphore flags in the background system interlock Common Memory references when multiple Background Processors are executing a single job. One semaphore flag is assigned to each currently active job in the background system. A Background Processor assigned to a job is assigned a semaphore flag at the same time.

The Background Processor uses four instructions in synchronizing its Common Memory references: 004, 005, 006, and 007. A 004 or 005 instruction requests the semaphore flag when the Background Processor program is accessing a Common Memory area that can interfere with other processors assigned to the job. The branch instruction results determine when the processor has exclusive access to this Common Memory area. The program must clear the semaphore flag to release the Common Memory area to another processor assigned to the same job.

## 2.1.4 COMMON MEMORY FIELD PROTECTION

At execution time each object program has a designated field of Common Memory holding instructions and data. The foreground functions specify the field limits when the object program is loaded and initiated. Field limits are contained in the Base Address (BA) register and the Limit Address (LA) register.

All memory addresses contained in the object program code are relative to the base address beginning the defined field. An object program cannot read or alter any Common Memory location with an absolute address lower than the base address. Each object program reference to Common Memory is checked against the limit and base addresses to determine if the address is within the assigned bounds.

## Base Address register

Each Background Processor has a 32-bit BA register. The BA register defines the lower boundary of the Common Memory address field. The Foreground Processor enters data into this register while the Background Processor is in idle mode. The data remains in the register for the duration of the Background Processor computation period.

Each Common Memory reference from the Background Processor includes the addition of the BA register contents to the other parts of the memory reference base address. All Background Processor references to Common Memory are relative to the base address boundary.

## Limit Address register

Each Background Processor has a 32-bit LA register. The LA register defines the upper boundary of the Common Memory address field. The Foreground Processor enters data into this register while the Background Processor is in idle mode. The data remains in this register for the duration of the Background Processor computation period.

## INSTRUCTIONS 000 - 001

| Result | Operand | Description               | Machine<br>Instruction |
|--------|---------|---------------------------|------------------------|
| err    |         | Error exit                | 00 <b>x</b> 00         |
| exit   |         | Normal exit               | 000x01                 |
| exit   | exp     | Normal exit               | 000xjk                 |
| CMR    |         | Hold issue on memory busy | 001 <b>xxx</b>         |

Instructions 000 and 001 stop the current program sequence, place the Background Processor in idle mode, and set the Exit Mode and Idle Mode flags in the Background Port Status register. The 6-bit jk value is entered into the Background Port Status register.

## Examples:

| Code Generated | Location | Result    | Operand | Comment |   |
|----------------|----------|-----------|---------|---------|---|
|                | 11       | 10        | 20      | 35      | _ |
| 000000         | 1        | err       |         |         |   |
| 000001         |          | <br> exit |         |         |   |
| 000004         |          | exit<br>  | 4       |         |   |

## INSTRUCTION 002

| Result           | Operand        | Description                                                                 | Machine<br>Instruction |
|------------------|----------------|-----------------------------------------------------------------------------|------------------------|
| r,a <sub>i</sub> | <sup>a</sup> k | Register jump to (a <sub>k</sub> ) with<br>return address to a <sub>i</sub> | 002 <i>ixk</i>         |
| j                | a <sub>k</sub> | Register jump to $(a_k)$ , value in $a_k$ erased                            | 002 <i>kxk</i>         |

Instruction 002 stops the current program sequence and begins a new sequence at a computed parcel address read from the  $A_k$  register. The parcel address for the next instruction in the current program sequence is entered into the  $A_i$  register.

Examples:

| Code Generated | Location<br>1 | Result<br>10 | Operand<br>20 | Comment<br>35 |  |
|----------------|---------------|--------------|---------------|---------------|--|
| 002102         |               | <br> r,al    | <br> a2       |               |  |
| 002101         |               | <br> j<br>   | <br> a1       |               |  |

.

#### 4. COMMON MEMORY

Common Memory contains 256 or 512 Mwords of dynamic memory, or 64 or 128 Mwords of static memory. The memory consists of either 64 or 128 banks. Each 72-bit word consists of 64-data bits and 8 error-correction bits.

Common Memory is organized into quadrants with 32 banks in each quadrant. The 64 Mword version has 16 banks per quadrant. Each memory quadrant has a data path to each of the Common Memory ports. A Background Processor and a foreground communication channel are connected to each Common Memory port. The total memory bandwidth of a four-processor system is 64 Gbits/s. The total memory capacity is now equal to 17 Gbits.

The Foreground Processor, Background Processors, external I/O devices, and disk controllers share Common Memory. Common Memory contains program code for the Background Processors, data for problem solution, and Foreground Processor system tables.

#### 4.1 MEMORY ADDRESSING

A word in memory is addressed by 32 bits. The low-order 2 bits select the quadrants and the next 5 bits select the bank. The 64-Mword system uses 4 bits for bank select. Figure 4-1 shows the format of the memory address for Common Memory.

| 2 <sup>31</sup> |              | 2 <sup>7</sup> | 2 <sup>6</sup> 2 <sup>2</sup> | 2 <sup>1</sup> 2 <sup>0</sup> |
|-----------------|--------------|----------------|-------------------------------|-------------------------------|
|                 | Bank Address |                | Bank<br>Select                | Quad<br>Select                |

Figure 4-1. Memory Address for Common Memory

## 4.2 MEMORY ACCESS

The Background Processors are locked into a phased access time scheme with the memory quadrants through the Common Memory ports. Through its Common Memory port, a Background Processor can access any given quadrant but only in the processor's own phase time, that is, every fourth clock period (CP). If a Background Processor requests a quadrant out of its phase time, the request is delayed until the correct time.

For example, assume the Background Processors are A through D, and the quadrants are 0 through 3. Also assume processor A is locked into quadrant 0 at phase time 0. If processor A references quadrant 0 at phase time 1, it must wait until the next phase time 0 (CP 4) to have access to memory in that quadrant.

Memory banks in a quadrant share a data path to each Common Memory port. Because of the phased access time between the quadrants and the Common Memory ports, however, only 1 bank accesses the path in a given 4-CP time slot. Because 2 banks never compete for the same data path in the same time slot, each bank functionally has an independent path to each of the four Common Memory ports.

## 4.3 MEMORY CONFLICTS

To prevent memory conflicts, each memory bank in the dynamic system has two Bank Busy flags. Each bank is divided logically into two or four pseudobanks. This enables quicker access to the half of the bank that is not busy. When a bank has been accessed it sets both of its busy flags. A long count busy applies to the pseudo bank that is actually busy, while a short count busy applies to the pseudo bank that is not. If the bank is busy, the quadrant sends a rejected signal to the requesting memory port. The requesting port retries the data.

The static memory, being much faster, does not require the pseudo bank arrangement. One bank busy is used per bank.

## 4.4 MEMORY BACKUP

Memory back-up occurs when too many memory references arrive at a single memory quadrant. Each Common Memory port has four quadrant buffers, one for each quadrant. Each buffer can hold two memory references for its memory quadrant. Therefore, references can continue to the memory port when the reference is not in the proper phase time. When a quadrant buffer in a memory port is filled, and another reference to that quadrant is made, the memory port begins a back-up procedure.

The memory port back-up procedure stops instruction issue for the associated Background Processor if that processor is making a memory reference. Vector streams initiated in the Background Processor and associated with a Common Memory reference are held.

After all references have been submitted for retry, stop issue is released allowing additional references to issue. A conflict during the retry process causes the back-up procedure to begin again at the point the conflict occurred; which could be the original back-up reference or another reference buffered during backup.

# Common Memory writes

| Result                            | Operand        | Description                                                                      | Machine<br>Instruction                     |
|-----------------------------------|----------------|----------------------------------------------------------------------------------|--------------------------------------------|
| (exp)                             | sj             | Writes (s <sub>i</sub> ) to Common Memory<br>at location <i>exp</i>              | 067ixx m <sub>1</sub> m <sub>2</sub>       |
| (a <sub>k</sub> )                 | sj             | Writes (s <sub>i</sub> ) to Common Memory<br>at location (a <sub>k</sub> )       | 063 <i>ixk</i>                             |
| (a <sub>k</sub> ,exp)             | si             | Writes (s <sub>i</sub> ) to Common Memory<br>at location (a <sub>k</sub> )+exp   | 065 <i>ixk m<sub>1</sub> m<sub>2</sub></i> |
| (a <sub>j</sub> ,a <sub>k</sub> ) | s <sub>i</sub> | Writes (s <sub>i</sub> ) to Common Memory at location $(a_j)+(a_k)$              | 061 <i>ijk</i>                             |
| (a <sub>j</sub> ,a <sub>k</sub> ) | vi             | Writes ( $v_i$ ) to Common Memory<br>location ( $a_j$ ) incremented by ( $a_k$ ) | 071 <i>ijk</i>                             |
| (a <sub>k</sub> ,v <sub>j</sub> ) | vi             | Scatters $(v_i)$ to Common Memory<br>locations $(a_k) + (v_j)$                   | 073 <i>ijk</i>                             |

## A.7.2 LOADS

Several instructions can be used to load data from memory into registers.

| Result         | Operand           | Description                                                             | Machine<br>Instruction       |
|----------------|-------------------|-------------------------------------------------------------------------|------------------------------|
| a <sub>i</sub> | [exp]             | Reads from location <i>exp</i> in<br>Local Memory to a <sub>i</sub>     | 044 <i>ixx m<sub>l</sub></i> |
| a <sub>i</sub> | [a <sub>k</sub> ] | Reads from location to $a_k$ in Local Memory to $a_i$                   | 046 <i>ixk</i>               |
| si             | [exp]             | Reads from location <i>exp</i> in<br>Local Memory to s <sub>i</sub>     | 054 <i>ixx m<sub>l</sub></i> |
| sj             | [a <sub>k</sub> ] | Reads from location to $a_k$ in<br>Local Memory to $s_i$                | 056 <i>ixk</i>               |
| vi             | [a <sub>k</sub> ] | Reads from Local Memory<br>location (a <sub>k</sub> ) to v <sub>i</sub> | 074 <i>ixk</i>               |

# Local Memory reads

# Complete Memory references

| Result | Operand | Description               | Machine<br>Instruction |
|--------|---------|---------------------------|------------------------|
| CMR    |         | Hold issue on memory busy | 001 <i>xxx</i>         |

| CPU Fe                                              | atures                      |  |
|-----------------------------------------------------|-----------------------------|--|
| Number of CPUs                                      | 4                           |  |
| Clock speed                                         | 4.1 ns                      |  |
| Common<br>memory size                               | 512 Mwords or<br>256 Mwords |  |
| Common<br>memory chip<br>type                       | Dynamic MOS                 |  |
| Number of<br>quadrants                              | 4                           |  |
| Number of<br>banks                                  | 128                         |  |
| Number of<br>common<br>memory<br>ports              | 4                           |  |
| Number of<br>foreground<br>channels                 | 4                           |  |
| Maximum<br>number of I/O<br>devices                 | 40                          |  |
| Maximum<br>number of disk<br>storage devices        | 36                          |  |
| Maximum<br>number of HSX<br>controllers             | 8                           |  |
| Maximum<br>number of<br>external I/O<br>controllers | 16                          |  |

| CPU Features continued |                      |  |  |
|------------------------|----------------------|--|--|
| Number of<br>columns   | 14                   |  |  |
| Arc                    | 300°                 |  |  |
| Floor space            | 16 ft²<br>(1.49 m²)  |  |  |
| Weight                 | 5500 lb<br>(2495 kg) |  |  |

|             | Functional Units<br>(register units)<br>Available per Background Processor                                                                                                |
|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Addro<br>•  | ess functional units:<br>Add/subtract (A)<br>Multiply (A)                                                                                                                 |
| Scalar<br>• | functional units:<br>Integer<br>-Add/subtract (S)<br>-Population/parity (S)<br>-Leading zero count (S)<br>Shift (S)<br>Logical (S)                                        |
| Vecto<br>•  | r functional units:<br>Integer<br>-Add/subtract (S)<br>-Shift (S)<br>-Population/parity (S)<br>-Leading zero count (S)<br>-Compressed iota (S and V)<br>Logical (S and V) |
| Floati<br>• | ng-point functional units:<br>Add/subtract (S and V)<br>Multiply, reciprocal, and<br>square root (S and V)                                                                |

| <b>Register Type</b><br>Available per Background Processor | Quantity | Size                                 |
|------------------------------------------------------------|----------|--------------------------------------|
| Address (A)                                                | 8        | 32 bits                              |
| Scalar (S)                                                 | 8        | 64 bits                              |
| Vector (V)                                                 | 8        | 64 elements<br>(64 bits per element) |
| Local Memory<br>(used for register save)                   | 1        | 16K 64-bit words                     |

| Support Equipment<br>Required per CRAY-2 Computer System | Number of Units Needed |
|----------------------------------------------------------|------------------------|
| Reservoir                                                | 1                      |
| M-pod                                                    | 1                      |
| S-pod                                                    | 1                      |
| Motor-generator sets                                     | 3                      |
| Maintenance Control Console                              | 1                      |

.



HR-2000-01

B-5

