

# LOW POWER BI-QUATERNARY FUSED MULTIPLIER FOR DSP APPLICATIONS

**CHEMEDA JAGADESWARI,** M.TECH(VLSI DESIGN), DEPT OF ECE, HOLY MARY INSTITUTE OF TECHNOLOGY AND SCIENCE, BOGARAM(V), KEESARA(M), MEDCHAL DIST, TELANGANA, INDIA, 501301 **Dr. E.KRISHNA HARI,** ASSOCIATE PROFESSOR, DEPT OF ECE, HOLY MARY INSTITUTE OF TECHNOLOGY AND SCIENCE, BOGARAM(V), KEESARA(M), MEDCHAL DIST, TELANGANA, INDIA, 501301

**ABSTRACT:** Electronic circuits are composed of many complex arithmetic units. Multipliers are one of the basic arithmetic elements. Multipliers are essential component in most of the Digital Signal Processing applications, Image processing architectures and microprocessors. Area and speed are two major concerns for designing multipliers. Three operations are inherent in multiplication: partial products generation, partial products reduction and addition. A fast adder architecture therefore greatly enhances the speed of the overall process.Quaternary logic adder architecture is proposed that works on a hybrid of binary and quaternary number systems. A given binary string is first divided into quaternary digits of 2 bits each followed by parallel addition reducing the carry propagation delay. The design doesn't require a radix conversion module as the sum is directly generated in binary using the novel concept of an adjusting bit. The proposed hybrid multiplier design is compared with an Existing multiplier based on multi voltage or multi value logic [MVL], Wallace Multiplier that incorporates a QSD adder with a conversion module for quaternary to binary conversion, Wallace multiplier that uses Carry Select Adder and a commonly used fast multiplication mechanism such as Booth multiplier. All these designs have been developed using Verilog HDL and synthesized by HDL Design Compiler.

#### Keywords:QSD adder, MVL, Power, Area.

#### I. INTRODUCTION:

Multiplication is an important part of real-time digital signal processing (DSP) applications ranging from digital filtering to image processing. Due to large area consumption and long latency characteristics of multiplier typically limited its performance; therefore a good design of multiplier is highly unavoidable. The increasing need towards fast multiplication made researchers to design higher order compressors which enhance the speed of computation by reducing the critical path delay of the processing unit. If we fused two or more arithmetic operation together the overall system performance can be improved great extent. In order to accomplish a novel FAM unit, it is indispensable to use high speed multipliers. The multiplication operation entails three major steps:

- (i) Generation of Partial products,
- (ii) Reduction of partial products, and
- (iii) Computation of final product.

The partial products reduction stage requires more attention and is the most complicated stage which determines the overall speed of the multiplier. This stage mainly contributes to the overall delay and power due to its long vertical path. Generally adders are used in the reduction process to minimize the long vertical path. But adders are used it takes more number of stages in the reduction process and this increases the area and the delay of the multiplier unit. In order to avoid this limitation, compressors are implemented in the

multiplier design. The main aim of this paper is to optimize the partial products reduction stage in the multiplier incorporated in the FAM unit. As a result speeding up the second stage is the prominent method to achieve high performance in the multiplier. Fig. 1 shows the conventional FAM design using 3:2 compressors in Wallace tree. The conventional design of the FAM operator requires that its inputs A and B are first driven to an adder and then the input X and the sum Y=A+B is multiplied. The generated partial products are added up by using Wallace tree incorporating 3:2 compressors. The sum and the carry outputs of Wallace tree multiplier are given to the CLA adder to form the final product Z=X.Y. The drawback of using an adder is that it inserts a significant delay in the critical path of the AM. As there are carry signals to be propagated inside the adder, the critical path depends on the bit-width of the inputs. In order to decrease this delay, a Carry-Look-Ahead (CLA) adder can be used which, however, increases the area occupation and power dissipation. So we propose a new technique, the direct recoding of the sum of two numbers in its MB form leads to a more efficient implementation of the fused Add-Multiply (FAM) unit compared to the convention alone. Direct shaping of the MB form of the sum of two numbers (Sum to MB – S-MB), the S-MB algorithm is employed. Three different recoding techniques(S-MB1, S-MB2 and S-MB3) are utilized.

# **OBJECTIVE:**

As the scale of integration keeps growing, more and more sophisticated signal processing systems are being implemented on a VLSI chip. These signal processing applications not only demand great computation capacity but also consume considerable amount of energy. While performance and Area remain to be the two major design tolls, power consumption has become a critical concern in today's VLSI system design. The need for low-power VLSI system arises from two main forces. First, with the steady growth of operating frequency and processing capacity per chip, large currents have to be delivered and the heat due to large power consumption must be removed by proper cooling techniques. Second, battery life in portable electronic devices is limited. Low power design directly leads to prolonged operation time in these portable devices. Addition usually impacts widely the overall performance of digital systems and a crucial arithmetic function. In electronic applications adders are most widely used. Applications where these are used are multipliers, DSP to execute various algorithms like FFT, FIR and IIR. Wherever concept of multiplication comes adders come in to the picture. As we know millions of instructions per second are performed in microprocessors. So, speed of operation is the most important constraint to be considered while designing multipliers. Due to device portability miniaturization of device should be high and power consumption should be low. Devices like Mobile, Laptops etc. require more battery backup. So, a VLSI designer has to optimize these three parameters in a design. These constraints are very difficult to achieve so depending on demand or application some compromise between constraints has to be made. Ripple carry adders exhibits the most compact design but the slowest in speed. Whereas carry look ahead is the fastest one but consumes more area. Carry select adders act as a compromise between the two adders.

# II. EXISTING METHODOLOGY

Design of area- and power-efficient high-speed data path logic systems are one of the most substantial areas of research in VLSI system design. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated in to the next position .The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. However ,the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input cin=0 and cin=1, then the final sum and carry are selected by the multiplexers (mux).The basic idea of this work is to use Binary to Excess-1 Converter(BEC) instead of RCA with cin=1 in the regular CSLA to achieve lower area and power consumption The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bitFull Adder (FA) structure.

The carry select adder comes in the category of conditional sum adder. Conditional sum adder works on some condition. Sum and carry are calculated by assuming input carry as 1 and 0 prior the input carry comes. When actual carry input arrives, the actual calculated values of sum and carry are selected using

a multiplexer. The conventional carry select adder consists of k/2 bit adder for the lower half of the bits i.e. least significant bits and for the upper half i.e. most significant bits (MSB's) two k/ bit adders. In MSB adders one adder assumes carry input as one for performing addition and another assumes carry input as zero. The carry out calculated from the last stage i.e. least significant bit stage is used to select the actual calculated values of output carry and sum.



Fig.2.1. Regular 16-b SQRT CSLA.

### III. PROPOSED METHODOLOGY

Multiplication can be considered as a series of repeated additions. The number to be added is the multiplicand, the number of times that it is added is the multiplier, and the result is the product. Each step of addition generates a partial product. In most computers, the operand usually contains the same number of bits. When the operands are interpreted as integers, the product is generally twice the length of operands in order to preserve the information content. This repeated addition method that is suggested by the arithmetic definition is slow that it is almost always replaced by an algorithm that makes use of positional representation. It possible to decompose multipliers into two parts. The first part is dedicated to the generation of partial products, and the second one collects and adds them. The basic multiplication principle is to fold i.e. evaluation of partial products and accumulation of the shifted partial products. It is performed by the successive additions of the columns of the shifted partial product matrix.

The multiplier is successfully shifted and gates the appropriate bit of the "multiplicand". The delayed, gated instance of the multiplicand must all be in the same column of the shifted partial product matrix. They arethen added to form the product bit for the particular form. Multiplication is therefore a multi operand operation. To extend the multiplication to both signed and unsigned numbers, a convenient number system would be there presentation of numbers in two complement format. The MAC(Multiplier and Accumulator Unit) is used for image processing and digital signal processing (DSP) in a DSP processor. Algorithm of MAC is Booth'sradix-2algorithm, Modified Booth Multiplier; 17-bit SPST adder improves speed and reduces the power. In this, when performance of circuits is compared, it is always done in terms of circuit speed, size and power. A good estimation of the circuit's size is to count the total number of gates used. The actual chip size of a circuit also depends on how the gates are placed on the chip -the circuit's layout. Since we do not deal with layout in this report, the only thing we can say about this is that regular circuits are usually smaller than non-regular ones (for the same number of gates), because regularity allows more compact layout. The physical delay of circuits originates from the small delays in single gates, and from the wiring between them. The delay of a wire depends on how long it is. Therefore, it is difficult to model the wiring delay; it requires knowledge about the circuit's layout on the chip. The gate delay, however, can easily be modeled by saying that the output is delayed constant amount of time from the latest input. What we can say about the wiring delays that larger circuits have longer wires, and hence more wiring delay. It follows that a circuit with a regular layout usually has shorter wires and hence less wiring delay than anon-regular circuit.



Fig.3.1. Proposed model.

#### IV. SIMULATION EXPLANATION

Multiplier components utilize 46% of chip area in most MAC modules. Thus, an energy-efficient multiplier design can play a significance role in low-power VLSI system design. To control error significance, the higher order multiplication operations are performed accurately while only the lowest significant part of the result is approximated in the proposed approach. Thus, based on quality evaluation of various approximate multipliers with a different number of approximated bits, here, we propose to use approximate array multipliers with 9-bit of the result approximated out of 16-bit in the approximate MAC units as shown in Figure. Approximating more than 9-bits gains more area, power and delay reduction. However, the quality degradation is also quite significant. The area, power and delay of different designs of the 8-bit approximate multiplier at RTL. Clearly, all designs have a reduced area and power consumption, where the design area is represented as the number of slice LUTs and occupied slices. The design delay (maximum combinational path delay) consists of the delay of two components, i.e., logic delay and routing delay. All approximate designs have a reduced logic delay, but AMA1 – AMA3 have a slightly longer routing delay. Thus, their total design delay is more than the exact design delay. As shown in Figure 4, the area reduction of our approximate multipliers varies from 10.3% to 60.7% with an average of 37.6%.



Fig.4.1. Schematic diagram.

Similarly, power reduction of the approximate multipliers ranges from 63.6% to 71.7% with an average of 68.8%, where dynamic power varies from 129.6 mW to 166.6 mW compared to the exact design, which is 457.7 mW. Due to the simplicity of the design, multipliers based on AMA4 and AMA5 have shorter critical paths with delay reduction of 32.7% and 48.1%, respectively. The energy reduction for the approximate multipliers varies between 60.5% and 85.3% with an average of 71.8%. Noticeably, designs based on AMA4 and AMA5 always exhibit more approximation benefits for all design metrics compared to others.

|  | and the provide state of the p |  |  |  |
|--|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
|  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |

Fig.4.2. Simulation results.

| Fig.4.2. Simulation results. |            |           |           |           |  |  |  |  |  |  |
|------------------------------|------------|-----------|-----------|-----------|--|--|--|--|--|--|
| PARAMETERS                   | EXISITNG 8 | PROPSOED  | PROPSOED  | PROPSOED  |  |  |  |  |  |  |
|                              | bit MAC    | HYBRID    | HYBRID    | HYBRID    |  |  |  |  |  |  |
|                              |            | FUSED MAC | FUSED MAC | FUSED MAC |  |  |  |  |  |  |
|                              |            | 8bit      | 16bit     | 32bit     |  |  |  |  |  |  |
| AREA                         | 4.56%      | 3.4%      | 7.56%     | 13.92     |  |  |  |  |  |  |
| POWER                        | 0.368W     | 0.233W    | .789W     | 1.455W    |  |  |  |  |  |  |
| DELAY                        | 14.87 ns   | 10.34 ns  | 19.42 ns  | 27.34 ns  |  |  |  |  |  |  |

Here the existing design was proposed on Low power Artix -7 for 8 bit adder and 8 bit multiplier, also the proposed scenario is extended via its adder and multiplier implementation for real time applications. These changes in Adder and multiplier for proposed design have provided extensive changes when compared individually but as a whole multiplier we have only changes observed. As per the proposed design we have reduced the adder delay about 4 ns improvement specially when compared to twice of 8 bit existing design. Similarly when compared to 32 bit we have reduced about 12 ns improved stats, resulting higher speed accuracy.

# V. CONCLUSION

In this paper we propose a new approximate multiplier design where it is used in the implementation of MAC unit which can enhance the performance of the MAC. An accuracy-controllable approximate multiplier has been designed on this paper that requires a lot less region and has a reduced path dispose of in evaluation to the conventional layout. Its dynamic accuracy controllability is found out by using manner of the proposed CMA. Both the circuit stage and application diploma are evaluated for the proposed multiplier. The experimental effects display that the proposed multiplier becomes able to supply considerable strength economic savings and speed with the useful resource of maintaining a considerably smaller circuit location than that of the conventional Wallace tree multiplier so that the proposed MAC introduced extra improvements in both power consumption and path delay than other formerly studied approximate MAC units.

#### **REFERENCES:**

- [1] Kostas Tsoumanis, Sotiris Xydis and KaimalPekmestzi "An Optimized Modified Booth Recoder for efficient design of the Add- Multiply Operator," IEEE Trans. Vol. 61,No. 4, April 2014.
- [2] PaladuguSrinivasTeja " Design of radix-8 Booth Multiplier Using Koggestone Adder For High Speed Airihmetic Applications " EEIEJ, Vol.1, No. 1, February 2014.
- [3] Minu Thomas "Design and Simulation of Radix-8 Booth Encoder Multiplier for Signed and Unsigned Numbers," IJSRD, vol.1, Issue 4, 2013.
- [4] A.Amaricai, M. Vladutiu, and O. Boncalo, "Design issues and implementations for floating-point divideadd fused," IEEE Trans. Circuits Syst. II–Exp. Briefs, vol. 57, no. 4, pp. 295–299, Apr. 2010.
- [5] E. E. Swartzlander and H. H. M. Saleh, "FFT implementation with fused floating-point operations," IEEE Trans. Comput., vol. 61, no. 2, pp. 284–288, Feb. 2012.
- [6] Y.-H. Seo and D.-W. Kim, "A new VLSI architecture of parallel multiplier-accumulator based on Radix-2 modified Booth algorithm," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 2, pp. 201– 208, Feb. 2010.
- [7] N. H. E. Weste and D. M. Harris, "Datapath subsystems," in CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Readington: Addison-Wesley, 2010, ch. 11.
- [8] Z. Huang and M. D. Ercegovac, "High-performance low-power left-toright array multiplier design," IEEE Trans. Comput., vol. 54, no. 3, pp.272–283, Mar. 2005.
- [9] R. Zimmermann and D. Q. Tran, "Optimized synthesis of sum-of-products," in Proc. Asilomar Conf. Signals, Syst. Comput., Pacific Grove, Washington, DC, 2003, pp. 867–872.