Abstract:
The core of every microprocessor and digital signal processor is its data path. The heart of data-path and addressing units in turn are arithmetic units which include adders. Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are well suited for VLSI implementations. This blog involves the design and comparison of high-speed, parallel-prefix adders such as Kogge-Stone, Brent-Kung, Sklansky, and Kogge-Stone Ling adders. It is found that Kogge-Stone Ling adder performs much efficiently when compared to the other adders. Here, Kogge-Stone Ling adders and ripple adders are incorporated as a part of a lattice filter in order to prove their functionalities. It is seen that the operating frequency of lattice filter increases if parallel prefix Kogge-Stone Ling adder is used instead of ripple adders since the combinational delay of Kogge-Stone Ling adder is less. Further, design and comparison of different tree adder structures are performed using both CMOS logic and transmission gate logic. Using these adders, unsigned and signed comparators are designed as an application example and compared with their performance parameters such as area, delay, and power consumed.
3. Brent-Kung Adder:
The core of every microprocessor and digital signal processor is its data path. The heart of data-path and addressing units in turn are arithmetic units which include adders. Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are well suited for VLSI implementations. This blog involves the design and comparison of high-speed, parallel-prefix adders such as Kogge-Stone, Brent-Kung, Sklansky, and Kogge-Stone Ling adders. It is found that Kogge-Stone Ling adder performs much efficiently when compared to the other adders. Here, Kogge-Stone Ling adders and ripple adders are incorporated as a part of a lattice filter in order to prove their functionalities. It is seen that the operating frequency of lattice filter increases if parallel prefix Kogge-Stone Ling adder is used instead of ripple adders since the combinational delay of Kogge-Stone Ling adder is less. Further, design and comparison of different tree adder structures are performed using both CMOS logic and transmission gate logic. Using these adders, unsigned and signed comparators are designed as an application example and compared with their performance parameters such as area, delay, and power consumed.
Key Words: Full
adder, half adder, Kogge-Stone, Brent-Kung, Sklansky, Kogge-Stone Ling adders.
Introduction:
An adder is a digital logic circuit in electronics that implements
addition of numbers. In many computers and other types of processors, adders
are used to calculate addresses, similar operations and table indices in the
ALU and also in other parts of the processors. These can be built for many
numerical representations like excess-3 or binary coded decimal. Basic adders
are classified into two types: half adder and full adder. The half adder
circuit has two inputs: A and B, which add two input digits and generate a
carry and sum. The full adder circuit has three inputs: A and C, which add the
three input numbers and generate a carry and sum.
1.
Half Adder:
A Half
Adder is defined as a basic four terminal digital device which adds two
binary input bits. It outputs the sum binary bit and a carry binary bit. Again, we know that decimal 2 can be represented in
two bits binary system as 10 (One Zero). Binary two is the smallest double
digits number in the binary system. When we add binary 1 with binary 1 we will
get both sum and carry since 10 is two digits binary number.
When we add 0 to 0, 0 to 1 and 1 to 0, we get the sum 0 and 1 respectively and
both of them are one digit binary number. Hence, in these three cases there
will be no carry during addition or carry is 0 here. We can summarize this in a
truth table for the half adder.
2. Full Adder:
This adder is
difficult to implement than a half-adder. The difference between a half-adder
and a full-adder is that the full-adder has three inputs and two outputs,
whereas half adder has only two inputs and two outputs. The first two inputs
are A and B and the third input is an input carry as C-IN. When a full-adder
logic is designed, you string eight of them together to create a byte-wide
adder and cascade the carry bit from one adder to the next.
The implementation of larger logic diagrams is possible with the above full adder logic a simpler symbol is mostly used to represent the operation. Given below is a simpler schematic representation of a one-bit full adder.
The implementation of larger logic diagrams is possible with the above full adder logic a simpler symbol is mostly used to represent the operation. Given below is a simpler schematic representation of a one-bit full adder.
Need for
fast adders:
In ripple carry adders, for each adder block, the two bits that are to be added are available instantly. However, each adder block waits for the carry to arrive from its previous block. So, it is not possible to generate the sum and carry of any block until the input carry is known. The block waits for the block to produce its carry.
In ripple carry adders, for each adder block, the two bits that are to be added are available instantly. However, each adder block waits for the carry to arrive from its previous block. So, it is not possible to generate the sum and carry of any block until the input carry is known. The block waits for the block to produce its carry.
Consider the above 4-bit ripple carry adder. The sum is produced
by the corresponding full adder as soon as the input signals are applied to it.
But the carry input is not available on its final steady state value until
carry is available at its steady state value. Similarly depends
on and on. Therefore, though the carry must propagate to all the
stages in order that output and carry settle their final steady-state
value. To overcome this problem we can use fast adders like, Kogge-Stone,
Brent-Kung, Sklansky, Kogge-Stone Ling adders.
1.
Kogge-Stone
Adder:
The Kogge Stone Adder is a
Parallel Prefix form of Carry Look-Ahead Adder. It generates the carry signal in
O (Log2 N) time. This adder is widely used in the industry and considered as
the fastest adder design. Carries are generated fast by computing them in
parallel, speed of operation is very high due to the low depth of node and
operation done in parallel and main important factor is the outcome of the
adder is depend upon the initial inputs. A three step process is generally
involved in the construction of a Kogge Stone Parallel Prefix Adder.
i.
Pre-Processing
Block: This is the initial stage of the Parallel Prefix Adder, it’s used to
generate the Propagated signal and generated signal and this signal are
computed for the given inputs by using following equations.
Pi
= Ai XOR Bi
Gi
= Ai AND Bi
ii.
Carry
Generation Block: Carry generated stage is a most important block in this
adder design. It consists of two components such as Block Cell and Grey Cell.
Block Cell is used to produce the Generated signal and Propagated signal,
needed to the calculation of the next stage. Grey Cell is used to produce only
Generated signal and this signal utilized or needed in the calculation of the
Sum in next stage. Black Cell: The black cell operator receives 2 set of
generate and propagate signals (Gi, Pi) and (Gj, Pj) compute one set of
generate and propagate signals (G, P).
G = Gi OR (Pi AND Pj)
P = Pi AND Pj
Grey Cell: The Grey operator receives two set of generate
and propagate signals (Gi, Pi) and (Gj, Pj) compute one set of generate signals
(G).
G = Gi OR (Pi AND Pj)
iii.
Post
Processing Block: This is the final stage of the adder; Sum and Carry are
the final outcome of the adder.
Si = Pi XOR Ci-1
The fundamental carry operator (FCO) block lies between the
PG blocks and PPC blocks. This block processes the propagator and generator
bits in pre-processing structure.
2.
Sklansky
Adder:
16-bit Sklansky
adder has minimum depth prefix graph. The longest lateral fanning wires go from
a node to 2 n other nodes.
The structure
possesses an optimal depth given by log(n) and the number of computational nodes is given
by n/2(log(n)). The fan-out of the
Sklansky’s adder increases drastically from the inputs to outputs along the
critical path, which accounts for large amount of latency. This degrades the
performance of the structure when the number of bits of the adder becomes
large.
3. Brent-Kung Adder:
The Brent–Kung
adder is a parallel prefix adder (PPA)
form of carry-lookahead adder (CLA). Proposed
by Richard Peirce Brent and Hsiang Te
Kung in 1982 it introduced higher regularity to the adder structure and has
less wiring congestion leading to better performance and less necessary chip
area to implement compared to the Kogge–Stone adder (KSA). It is also much
quicker than ripple-carry adders (RCA).
Ripple-carry
adders were the initial multi-bit adders which were developed in the early days
and got their name from the ripple effect which the carry made while being
propagated from right to left. The time taken for addition was directly
proportional to the length of the bit being added. This is reverse in
Brent–Kung adders where the carry is calculated in parallel thus reducing the
addition time drastically. Further work has been done on Brent–Kung adders and
other parallel adders to reduce the power consumption and chip area as well as
to increase the speed thus making them suitable for low-power designs.
A Brent–Kung
adder is a parallel adder made in a regular layout with an aim of minimizing
the chip area and ease of manufacturing. The addition of n-bit number can be
performed in time O(log(n)) with a chip size of
area O(n(log(n))) thus making it
a good-choice adder with constraints on area and maximizing the performance.
Its symmetry and regular build structure reduces costs of production
effectively and enable it to be used in pipeline architectures. In parallel
adders the critical path is decided by computation of the carry from least significant bit (LSB) adder to
the most significant bit (MSB) adder,
therefore efforts are in reducing the critical path for the carry to reach the
MSB.
4.
Ling Adders:
The family of
Ling adders is a particularly fast adder and is designed using H. Ling’s
equations and generally implemented in BiCMOS. It is an upgrade to the already
existing CarryLook-Ahead Adders and is mathematically faster, as it requires
lesser steps for the computation of a sum. The circuit of a Ling adder is
particularly more complex, and is less favourable for use in VLSI systems due
to its complexity and it requires far more extra components than traditional
systems. The circuit is divided into 4 parts, which can be denoted by H. Ling’s
equations.
A.
Initial
Generation of Bits
Ling Adders require to form the bit generate and bit
propagate that are used in the regular Carry look ahead adders. It is denoted
by the 3 symbols gi, pi and di. They generate and propagate bits follow CLA so
they can be denoted as,
gi = ai · bi
pi = ai + bi
However, Ling adder requires an extra half bit term
which later on simplifies the circuit design, while increasing the overall
efficiency of the adder. This half bit generate is denoted by di and can be
mathematically shown by,
di = ai ⊕ bi
The above mentioned Generate Bit gi and Propagate Bit
pi are used further to derive the Ling Generates, which are terms that will go
on to simplify the final equation. This is particularly important because these
generates will form the base of the Ling adder circuit design. These are
denoted by G∗ i and P ∗ i
B.
The CLA
Basis of Ling’s Equations
The CLA depends upon the carry out term of the previous for the new carry
terms.
ci+i = gi + pi · ci
Which is similar to the Simple Ripple Adder which uses the carry output
of the preceding data bits for forward addition. Similarly based on the above
concept, Ling created a new theoretical carry generate which he denoted by H.
This is used later on in the Adder to generate the sum Si. The term H is given
by,
Hi = ci + ci−1
Where
ci = Hi · pi
Introduction of ling carry Hi is one of the major reasons why Ling Adder
is a fast yet complex adder. Use of Ling carry equations decreases the number
of Boolean terms during its operations, but increases the design complexity.
C.
Ling
Generate and Propagate
Ling proposed the use of Ling Propagate and Ling Generate to simplify the
operations of the Ling adder. It is very important, as this is the first step
where we can see how the terms are generated by using the i th and (i−1)th
terms. These terms can be derived by
G ∗ i = gi + gi−1
And
P ∗ i = pi · pi−1
Ling generate and propagate terms are used to calculate the Ling carry
term H. Later on in the Adder design, the sum terms are directly influenced by
the all the Ling terms.
D.
Ling Sum
Term
The final sum term for the ith pair terms of a and b are
devised by following Ling sum equations, which take in lesser number of inputs,
and hence decrease the lag in the system. If we assume that all input gates
have only two inputs, we can see that calculation of CLA carry C requires 5
logic levels, whereas that for ling carry H requires only four. Although the
computation of carry is simplified, calculation of the sum bits using Ling
carries is much more complicated. The sum bit, when calculated by using
traditional carry, is given to be,
si = di ⊕ ci−1
We note that we require to use both the carry output and half-bit term
from the first operation block.
ci = Hi · pi
Using the above term in the Ling Sum Equation,
si = di ⊕ pi−1 · Hi−1
On break down,
Adder
|
Number
of bits
|
CMOS
Logic
|
Transmission
Gate Logic
|
|||||
Area (No.
of Transistors)
|
Power in
W
|
Delay in
sec
|
Area
(No. of Transistors)
|
Power in
W
|
Delay in
sec
|
|||
Kogge-Stone
|
8
|
486
|
4.13m
|
2.18E-11
|
432
|
1.87m
|
2.33E-10
|
|
16
|
1140
|
7.694m
|
2.87E-11
|
1056
|
5.27m
|
2.38E-10
|
||
32
|
2658
|
13.648m
|
3.01E-11
|
2345
|
10.31m
|
2.70E-10
|
||
Sklansky
|
8
|
415
|
17.88m
|
8.08E-10
|
323
|
8.92m
|
7.75E-10
|
|
16
|
1047
|
36.34m
|
1.12E-09
|
769
|
18.73m
|
1.10E-09
|
||
32
|
2199
|
65.13m
|
2.20E-09
|
1659
|
40.2m
|
2.12E-09
|
||
Brent-Kung
|
8
|
598
|
1.80E-07
|
7.08E-10
|
470
|
1.30E-07
|
7.03E-10
|
|
16
|
1268
|
4.00E-07
|
9.24E-10
|
1012
|
3.00E-07
|
9.03E-10
|
||
32
|
2494
|
1.25E-05
|
1.13E-09
|
1982
|
6.14E-07
|
1.04E-09
|
||
Ling
|
8
|
742
|
3.13E-07
|
2.31E-09
|
530
|
1.39E-07
|
1.93E-09
|
|
16
|
1655
|
6.00E-07
|
3.52E-09
|
1250
|
3.10E-07
|
2.85E-09
|
||
32
|
3382
|
13.3m
|
4.28E-09
|
2690
|
4.10E-07
|
3.74E-09
|
Conclusion:
From the above work, it was seen that the clock frequency for the IIR
filter using Ling adder was more than the clock frequency for the same IIR
filter using simple ripple adder. The combinational path delay for the Ling
adder was found to be 15% lesser than that for the ripple adder. Using
transmission gates reduced the area of the adder and hence the comparator built
using the adder, as compared to the area consumed when CMOS logic was used for
implementation. Using transmission gate logic reduced the delay and power
consumption of the adder, and hence the comparator using these adders, as
compared to the delay and power consumed when CMOS logic was used for
implementation. The power consumed by the comparator using Ling adder is lesser
than the power consumed by comparator designed using other normal tree adders.
References:
I.
Koren, Computer Arithmetic Algorithms, A. K. Peters, 2002.
B.
Parhami, Computer Arithmetic—Algorithms and Hardware Designs, Oxford
University Press, 2000.
M. Ergecovac
and T. Lang, Digital Arithmetic, Morgan-Kauffman, 2003.