Fast adder design for efficient digital circuits

Abstract:
The core of every microprocessor and digital signal processor is its data path. The heart of data-path and addressing units in turn are arithmetic units which include adders. Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are well suited for VLSI implementations. This blog involves the design and comparison of high-speed, parallel-prefix adders such as Kogge-Stone, Brent-Kung, Sklansky, and Kogge-Stone Ling adders. It is found that Kogge-Stone Ling adder performs much efficiently when compared to the other adders. Here, Kogge-Stone Ling adders and ripple adders are incorporated as a part of a lattice filter in order to prove their functionalities. It is seen that the operating frequency of lattice filter increases if parallel prefix Kogge-Stone Ling adder is used instead of ripple adders since the combinational delay of Kogge-Stone Ling adder is less. Further, design and comparison of different tree adder structures are performed using both CMOS logic and transmission gate logic. Using these adders, unsigned and signed comparators are designed as an application example and compared with their performance parameters such as area, delay, and power consumed.

Key Words: Full adder, half adder, Kogge-Stone, Brent-Kung, Sklansky, Kogge-Stone Ling adders.

Introduction:

An adder is a digital logic circuit in electronics that implements addition of numbers. In many computers and other types of processors, adders are used to calculate addresses, similar operations and table indices in the ALU and also in other parts of the processors. These can be built for many numerical representations like excess-3 or binary coded decimal. Basic adders are classified into two types: half adder and full adder. The half adder circuit has two inputs: A and B, which add two input digits and generate a carry and sum. The full adder circuit has three inputs: A and C, which add the three input numbers and generate a carry and sum.

1. Half Adder:

A Half Adder is defined as a basic four terminal digital device which adds two binary input bits. It outputs the sum binary bit and a carry binary bit. Again, we know that decimal 2 can be represented in two bits binary system as 10 (One Zero). Binary two is the smallest double digits number in the binary system. When we add binary 1 with binary 1 we will get both sum and carry since 10 is two digits binary number. When we add 0 to 0, 0 to 1 and 1 to 0, we get the sum 0 and 1 respectively and both of them are one digit binary number. Hence, in these three cases there will be no carry during addition or carry is 0 here. We can summarize this in a truth table for the half adder.

2. Full Adder:

This adder is difficult to implement than a half-adder. The difference between a half-adder and a full-adder is that the full-adder has three inputs and two outputs, whereas half adder has only two inputs and two outputs. The first two inputs are A and B and the third input is an input carry as C-IN. When a full-adder logic is designed, you string eight of them together to create a byte-wide adder and cascade the carry bit from one adder to the next.

The implementation of larger logic diagrams is possible with the above full adder logic a simpler symbol is mostly used to represent the operation. Given below is a simpler schematic representation of a one-bit full adder.

Need for fast adders:
In ripple carry adders, for each adder block, the two bits that are to be added are available instantly. However, each adder block waits for the carry to arrive from its previous block. So, it is not possible to generate the sum and carry of any block until the input carry is known. The block waits for the block to produce its carry.

Consider the above 4-bit ripple carry adder. The sum is produced by the corresponding full adder as soon as the input signals are applied to it. But the carry input is not available on its final steady state value until carry is available at its steady state value. Similarly depends on and on. Therefore, though the carry must propagate to all the stages in order that output and carry settle their final steady-state value. To overcome this problem we can use fast adders like, Kogge-Stone, Brent-Kung, Sklansky, Kogge-Stone Ling adders.

1. Kogge-Stone Adder:

The Kogge Stone Adder is a Parallel Prefix form of Carry Look-Ahead Adder. It generates the carry signal in O (Log2 N) time. This adder is widely used in the industry and considered as the fastest adder design. Carries are generated fast by computing them in parallel, speed of operation is very high due to the low depth of node and operation done in parallel and main important factor is the outcome of the adder is depend upon the initial inputs. A three step process is generally involved in the construction of a Kogge Stone Parallel Prefix Adder.

i. Pre-Processing Block: This is the initial stage of the Parallel Prefix Adder, it’s used to generate the Propagated signal and generated signal and this signal are computed for the given inputs by using following equations.

Pi = Ai XOR Bi

Gi = Ai AND Bi

ii. Carry Generation Block: Carry generated stage is a most important block in this adder design. It consists of two components such as Block Cell and Grey Cell. Block Cell is used to produce the Generated signal and Propagated signal, needed to the calculation of the next stage. Grey Cell is used to produce only Generated signal and this signal utilized or needed in the calculation of the Sum in next stage. Black Cell: The black cell operator receives 2 set of generate and propagate signals (Gi, Pi) and (Gj, Pj) compute one set of generate and propagate signals (G, P).

G = Gi OR (Pi AND Pj)

P = Pi AND Pj

Grey Cell: The Grey operator receives two set of generate and propagate signals (Gi, Pi) and (Gj, Pj) compute one set of generate signals (G).

G = Gi OR (Pi AND Pj)

iii. Post Processing Block: This is the final stage of the adder; Sum and Carry are the final outcome of the adder.

Si = Pi XOR Ci-1

The fundamental carry operator (FCO) block lies between the PG blocks and PPC blocks. This block processes the propagator and generator bits in pre-processing structure.

2. Sklansky Adder:

16-bit Sklansky adder has minimum depth prefix graph. The longest lateral fanning wires go from a node to 2 n other nodes.

The structure possesses an optimal depth given by log(n) and the number of computational nodes is given by n/2(log(n)). The fan-out of the Sklansky’s adder increases drastically from the inputs to outputs along the critical path, which accounts for large amount of latency. This degrades the performance of the structure when the number of bits of the adder becomes large.

3. Brent-Kung Adder:

The Brent–Kung adder is a parallel prefix adder (PPA) form of carry-lookahead adder (CLA). Proposed by Richard Peirce Brent and Hsiang Te Kung in 1982 it introduced higher regularity to the adder structure and has less wiring congestion leading to better performance and less necessary chip area to implement compared to the Kogge–Stone adder (KSA). It is also much quicker than ripple-carry adders (RCA).

Ripple-carry adders were the initial multi-bit adders which were developed in the early days and got their name from the ripple effect which the carry made while being propagated from right to left. The time taken for addition was directly proportional to the length of the bit being added. This is reverse in Brent–Kung adders where the carry is calculated in parallel thus reducing the addition time drastically. Further work has been done on Brent–Kung adders and other parallel adders to reduce the power consumption and chip area as well as to increase the speed thus making them suitable for low-power designs.

A Brent–Kung adder is a parallel adder made in a regular layout with an aim of minimizing the chip area and ease of manufacturing. The addition of n-bit number can be performed in time O(log(n)) with a chip size of area O(n(log(n))) thus making it a good-choice adder with constraints on area and maximizing the performance. Its symmetry and regular build structure reduces costs of production effectively and enable it to be used in pipeline architectures. In parallel adders the critical path is decided by computation of the carry from least significant bit (LSB) adder to the most significant bit (MSB) adder, therefore efforts are in reducing the critical path for the carry to reach the MSB.

4. Ling Adders:

The family of Ling adders is a particularly fast adder and is designed using H. Ling’s equations and generally implemented in BiCMOS. It is an upgrade to the already existing CarryLook-Ahead Adders and is mathematically faster, as it requires lesser steps for the computation of a sum. The circuit of a Ling adder is particularly more complex, and is less favourable for use in VLSI systems due to its complexity and it requires far more extra components than traditional systems. The circuit is divided into 4 parts, which can be denoted by H. Ling’s equations.

A. Initial Generation of Bits

Ling Adders require to form the bit generate and bit propagate that are used in the regular Carry look ahead adders. It is denoted by the 3 symbols gi, pi and di. They generate and propagate bits follow CLA so they can be denoted as,

gi = ai · bi

pi = ai + bi

However, Ling adder requires an extra half bit term which later on simplifies the circuit design, while increasing the overall efficiency of the adder. This half bit generate is denoted by di and can be mathematically shown by,

di = ai ⊕ bi

The above mentioned Generate Bit gi and Propagate Bit pi are used further to derive the Ling Generates, which are terms that will go on to simplify the final equation. This is particularly important because these generates will form the base of the Ling adder circuit design. These are denoted by G∗ i and P ∗ i

B. The CLA Basis of Ling’s Equations

The CLA depends upon the carry out term of the previous for the new carry terms.

ci+i = gi + pi · ci

Which is similar to the Simple Ripple Adder which uses the carry output of the preceding data bits for forward addition. Similarly based on the above concept, Ling created a new theoretical carry generate which he denoted by H. This is used later on in the Adder to generate the sum Si. The term H is given by,

Hi = ci + ci−1

Where

ci = Hi · pi

Introduction of ling carry Hi is one of the major reasons why Ling Adder is a fast yet complex adder. Use of Ling carry equations decreases the number of Boolean terms during its operations, but increases the design complexity.

C. Ling Generate and Propagate

Ling proposed the use of Ling Propagate and Ling Generate to simplify the operations of the Ling adder. It is very important, as this is the first step where we can see how the terms are generated by using the i th and (i−1)th terms. These terms can be derived by

G ∗ i = gi + gi−1

And

P ∗ i = pi · pi−1

Ling generate and propagate terms are used to calculate the Ling carry term H. Later on in the Adder design, the sum terms are directly influenced by the all the Ling terms.

D. Ling Sum Term

The final sum term for the i_thpair terms of a and b are devised by following Ling sum equations, which take in lesser number of inputs, and hence decrease the lag in the system. If we assume that all input gates have only two inputs, we can see that calculation of CLA carry C requires 5 logic levels, whereas that for ling carry H requires only four. Although the computation of carry is simplified, calculation of the sum bits using Ling carries is much more complicated. The sum bit, when calculated by using traditional carry, is given to be,

si = di ⊕ ci−1

We note that we require to use both the carry output and half-bit term from the first operation block.

ci = Hi · pi

Using the above term in the Ling Sum Equation,

si = di ⊕ pi−1 · Hi−1

On break down,

si = H0 i−1 ⊕ di + Hi−1(di ⊕ pi−1)

Hence, the output value for ai + bi is given by si and ci.

Delay, power and area consumed for different adders: a comparison.

Adder	Number of bits	CMOS Logic			Transmission Gate Logic
		Area (No. of Transistors)	Power in W	Delay in sec	Area (No. of Transistors)	Power in W	Delay in sec
		Area (No. of Transistors)	Power in W	Delay in sec	Area (No. of Transistors)	Power in W	Delay in sec

Kogge-Stone	8	486	4.13m	2.18E-11	432	1.87m	2.33E-10
	16	1140	7.694m	2.87E-11	1056	5.27m	2.38E-10
	32	2658	13.648m	3.01E-11	2345	10.31m	2.70E-10

Sklansky	8	415	17.88m	8.08E-10	323	8.92m	7.75E-10
	16	1047	36.34m	1.12E-09	769	18.73m	1.10E-09
	32	2199	65.13m	2.20E-09	1659	40.2m	2.12E-09

Brent-Kung	8	598	1.80E-07	7.08E-10	470	1.30E-07	7.03E-10
	16	1268	4.00E-07	9.24E-10	1012	3.00E-07	9.03E-10
	32	2494	1.25E-05	1.13E-09	1982	6.14E-07	1.04E-09

Ling	8	742	3.13E-07	2.31E-09	530	1.39E-07	1.93E-09
	16	1655	6.00E-07	3.52E-09	1250	3.10E-07	2.85E-09
	32	3382	13.3m	4.28E-09	2690	4.10E-07	3.74E-09

Conclusion:

From the above work, it was seen that the clock frequency for the IIR filter using Ling adder was more than the clock frequency for the same IIR filter using simple ripple adder. The combinational path delay for the Ling adder was found to be 15% lesser than that for the ripple adder. Using transmission gates reduced the area of the adder and hence the comparator built using the adder, as compared to the area consumed when CMOS logic was used for implementation. Using transmission gate logic reduced the delay and power consumption of the adder, and hence the comparator using these adders, as compared to the delay and power consumed when CMOS logic was used for implementation. The power consumed by the comparator using Ling adder is lesser than the power consumed by comparator designed using other normal tree adders.

References:

I. Koren, Computer Arithmetic Algorithms, A. K. Peters, 2002.

B. Parhami, Computer Arithmetic—Algorithms and Hardware Designs, Oxford University Press, 2000.

M. Ergecovac and T. Lang, Digital Arithmetic, Morgan-Kauffman, 2003.

Fast adder design for efficient digital circuits

Wednesday, April 15, 2020

49 comments: