Most of the terms in this list are defined somewhere within, and others
are available in the
Free On-line Dictionary of Computing, but here's clarification for
a few terms:
- A register that is used as the implicit source and destination
of an operation (the register doesn't have to be specified separately).
The PDP-8 has the best example in this
RISC processors use a load/store architecture instead - to add
memory to a register, it must be loaded into an intermediate register
- Asynchronous Design
- A design which does not synchronize individual circuits using
a clock signal, as synchronous designs
do. Some other method (such as a "dummy circuit" which does nothing
but consume the same amount of time as the real circuit) is used
to generate a signal when the result is ready/valid, and the valid
signals can be used to start the next operation.
There is an asynchronous version of the ARM
architecture, and Sun is researching an asynchronous Transport-triggered
architecture with a project called FleetZero.
- Bit Slice Processor
- A CPU which is separated into a unit which performs the actual
processing, and a control unit. The processing unit has input and
output signals representing borrow and carry bits so that two or
more "slices" can be added together to process wider data words
(e.g. four 2-bit slices can operate on 8-bit data). Used in the
AMD 2900 series.
- Branch Prediction
- The general method of keeping track of which path was taken by
a particular branch instruction, and following that path the next
time the same instruction is encountered. Generally a history table
is used to indicate how often a branch at a given address is taken
or not taken.
- Branch Target Cache
- The practice of saving one or more instructions which are executed
immediately after a branch instruction, so that the next time the
branch is encountered, the instructions have already been loaded.
- You should know this term already. But if you don't, it refers
to a small amount of fast memory which holds recently accessed data
or instructions so that if they are used by the programs again,
the cache can supply them transparently faster than main memory.
Cache memory is typically organised into lines (several bytes are
loaded at once, on the assumption that nearby memory will beused
next). The lines are organised into sets, each set is mapped to
a separate group of memory addresses, and there are usually between
two and sixty-four lines per set (fewer lines per set are simpler,
but access to more addresses than cache lines in the same set can
cause data in the cache to be discarded before it can be used).
Smaller caches are faster, so often a small level 1 cache is
used, with a larger but slower level 2 cache supporting it. Level
3 caches can even be used in some cases.
Some cache controllers monitor the memory bus to detect when
a cached memory value has been modified by another CPU, or a peripheral.
- Digital Signal Processor, a CPU designed mainly for performing
simple, repetitious operations on a stream or buffer of data - for
example, decoding digital audio data from a CD. Generally meant
for embedded applications, leaving out features of general purpose
CPUs which aren't needed in a DSP application. There is usually
little or no interrupt support, or memory management support.
- Electrically Erasable Programmable ROM.
- The order in which a multi-byte binary number is stored in byte-addressable
memory. "Little-endian" means the least significant byte (the "little
end") is stored in the first (lowest) address, "big-endian" means
the most significant byte ("big end") has the first position in
A potential source of code and communications incompatibility,
but with no significant advantages to either, making the decision
arbitrary (except for compatibility requirements). The term comes
from an equally arbitrary disagreement in Liliputian society (from
Jonathan Swift's book "Gulliver's Travels") over which end to
break boiled eggs (the big or little end), a distinction which
caused civil wars. Swift was satirizing differences in the treatment
of Catholics in his own time - fortunately there's been no documented
case of CPU designers coming to blows over CPU endian-ness, despite
the heated discussions that once took place (but which later became
unfashionable after network endian
order was standardised in TCP/IP).
- Explicitly Parallel Instruction Computing
- The HP/Intel term for a form of VLIW with
Variable Length Instruction Groupings which uses fields in the instruction
stream or instructions themselves to group (specify instruction
dependencies), rather than using a fixed length instruction word.
Used in the TI 320C6x and the HP/Intel
Two problems are usually identified with VLIW processors (like
the Phillips TriMedia). One is that if the instruction word can't
be filled, the rest of the entries need to be filled with NOP
instructions, which waste space. The other is that it prevents
future versions which may be able to execute more instructions
in parallel, or lower cost versions which execute fewer. EPIC
solves this, but requires a small semantic change that instructions
within a group must be independent - that is, act the same whether
they were executed in order or parallel. By contrast, in the MultiFlow
TRACE systems a pair of instructions such as "MOVE A, B" and "MOVE
B, A" could be in the same word because they were guaranteed to
execute in parallel, with the result that values in A and B would
- Erasable Programmable ROM (erased by exposing the EPROM to ultraviolet
- Harvard Architecture
- Strictly speaking, refers to a CPU with separate program and data
spaces, (specifically the PIC embedded
processors), but it's often generally used to refer to separate
program and data busses (and usually caches too) for improved speed,
though the address spaces are actually shared. Originally Harvard
architecture computers were programmed using plug boards or something
similar, and data was in a writable storage area. The von Neumann
architecture introduced the idea of a stored program in the same
writable memory that data was stored in.
- Indirection Bit
- Some designs used one address bit as an indirection bit, meaning
that the value in memory is the address of the actual value. Other
designs used a separate addressing mode for indirect addressing.
- An actual programming language designed to be as evil as possible.
- Earlier CPUs were designed to execute instructions with the circuitry
directly decoding and executing program instructions. Microcode
was a way of simplifying CPU design by allowing simpler hardware
which executes simple microinstructions to interpret more complex
machine instructions, first used commercially in the mid and low
range IBM System/360. Microcode
is often slower and increases CPU size (compare transistor count
of microcoded Motorola 68000 (68,000)
with hardwired Zilog Z-8000 (17,500)
- and the fact that the Z-8000) was both late and buggy).
Implementations generally use either 'horizontal' or 'vertical'
microcode, which differ mainly in number of bits. Microinstructions
include a condition code and jump address (jump if condition is
true, next instruction if false), and the operation to be performed.
In horizontal microcode, each operation bit triggers an individual
control line (simple CPU controller but large microcode storage),
in vertical microcode, the operation field is decoded to produce
the control signals (smaller microcode but more complex controller).
Some CPUs used a combination.
- The ability to share CPU resources among multiple threads.
'Vertical' multithreading allows a CPU to switch execution between
threads without needing to save thread state (generally using duplicated
registers, and usually used to continue execution with another thread
when one thread hits a delay due to a cache miss and must wait).
'Horizontal' multithreading allows threads to share functional units
without halting the execution of a thread (an idle functional unit
can be assigned to any thread that needs it).
A simpler variation called a "barrel processor" cycles through
threads on every clock cycle whether there is a delay or not,
so when there are enough thread "slots" to cover any expected
execution delay, it appears to the program that each instruction
takes one cycle (in addition, no hardware is required to check
for data dependencies in the pipeline).
- Network order
- Big-endian, used in TCP/IP standards.
- Out Of Order Execution
- A superscalar CPU may issue instructions in an order different
than that in the program if state conflicts can be resolved (with
renaming for example). For example:
1: add r1,r2->r8
2: sub r8,r3->r3
3: add r4,r5->r8
4: sub r8,r6->r6
Instructions 1 and 3 can be executed in parallel if r8 is renamed,
and instructions 2 and 4 can then be executed in parallel. Instruction
3 is executed before 2, out of the order which they appear in
- Predicated instructions
- Instructions which are executed only if conditions are true, usually
bits in a condition code register. This eliminates some branches,
and in a superscalar machine can allow both branches in certain
conditions to be executed in parallel, and the incorrect one discarded
with no branch penalty. Used in the ARM
and TMS320C6x, in HP some PA-RISC
instructions, and the upcoming HP/Intel IA-64.
- Programmable ROM (not erasable).
- If you don't know what Random Access Memory is, why are you reading
this in the first place?
- Register Renaming
- A number of extra registers can be assigned to hold the data that
would normally be written to the destination register (in other
words, the extra register is renamed as far as that particular instruction
is concerned). One use for this is for speculative
execution of branches - if the branch is eventually taken, then
data in the rename register can be written to the real register,
if not then the data is discarded. Another use is for out
of order execution, renamed registers can produce an 'image'
of the processor state which an instruction expects, while the actual
processor state has already been modified by another instruction
(known as write conflicts).
The circutry required to keep track of renamed registers can
- Resource Renaming
- A more general form of register renaming
where resources other than registers are renamed.
- Read Only RAM. It's really spelled ROR. Engineers know this, but
don't tell anybody so that they can laugh at everyone who says 'ROM'.
Really, this is the truth.
- Saturation Arithmetic
- When arithmetic operations produce values too large or too small
for registers, the largest or smallest value that can be represented
is substituted instead.
- Properly, a section of memory of almost any size and at any address,
accessed through an identifier tag which includes protection bits,
particularly useful for object oriented programming. A good idea
which was missed by a painful margin with the Intel
- Speculative Execution
- In a pipelined processor, branch instructions in the execute stage
affect the instruction fetch stage - there are two possible paths
of execution, and the correct one isn't known until the conditional
branch executes. If the CPU waits until the conditional branch executes,
the stages between fetch and execute become empty, leading to a
delay before execution can resume after a branch (the time taken
for new instructions to fill the pipeline again). The alternative
is to choose an execution path, and if that is the correct one,
there is no branch delay. But if it's the wrong one, any results
from the speculative execution have to either be discarded or undone.
- Stack Frame
- A segment of a stack which holds parameters, local variables,
previous stack frame pointer and return address, created when calling
a procedure, function (procedure which returns a value), or method
(function or procedure which can access private data in an object)
in most high level languages.
- Refers to a processor which executes more than one instruction
simultaneously, but more properly refers to the issuing of instructions
(the CDC 6600 issues one,
but executes many simultaneously).
- Synchronous Design
- A design which ensures that when two circuits take different amounts
of time to perform a function, further operations will wait until
a voltage signal (which switches between on and off at a specified
frequency) changes. The changing signal is called the circuit's
clock, and changes at the speed of the slowest circuit, in order
to keep the faster circuits synchronized with it.
Designs which don't use a clock signal are called asynchronous.
- A thread is a stream or path of execution where the state is entirely
stored in the CPU, while a process includes extra state information
- mainly operating system support to protect processes from unexpected
and unwanted interferences (either from bugs or intentional attack).
Threads are sometimes called lightweight processes.
- Transport Triggered Architecture
- Also called a Transfer Triggered Architecture, or Move Machine,
a TTA is a design where operations are triggered by moving data
to the functional units which operate on it, instead or moving data
in response to the CPU operations (an Operation Triggered Architechture,
For example, a TTA would have one unit for add, one for subtract,
one for load, and so on. A number would be loaded by moving the
address to the load unit, triggering it to load. The result could
be transferred to the add unit, and another number from a register
or another unit could be transferred, triggering the unit to add
TTAs are primarily experimental, with researchers into using
the very regular design properties for automated custom CPU designs.
The TI MSP430 implements the multiplier
as an on-chip peripheral, and Sun is researching high-speed asynchronous
- Very Long Instruction Word (VLIW)
- An instruction which includes more than one operation, intended
to be executed concurrently - either a fixed number of operations
per instruction, or a variable number (Variable Length Instruction
Grouping or Explicitly Parallel Instruction Computing (EPIC)).
- Virtual Machine
- A software emulation of a CPU, usually including an OS environment.