

# Design and Implementation of Dual Channel Multiplier-Controlled Frequency Synthesizer

Kolli Sharat, Department of Electronics and Communication Engineering, SRMIST, Kattankulathur, Kanchipuram, Tamilnadu, India

**Chitra E**, Department of Electronics and Communication Engineering, SRMIST, Kattankulathur, Kanchipuram, Tamilnadu, India, chitrae@srmist.edu.in

**Sarada V,** Department of Electronics and Communication Engineering, SRMIST, Kattankulathur, Kanchipuram, Tamilnadu, India, saradav@srmist.edu.in

**Abstract**- A low power energy efficient Dual channel multiplier is being evaluated here with FPGA hardware implementation to diminish the complexity of multiplier units in most of the DSPs for floating point operations. The novel DCM architecture is designed using the distributed architecture in RTL code, and the evaluation of application unit is design for validating the circuit flexibility. Design of an adaptive synthesizer is being designed using the implemented dual channel multiplier. The distributed functioning of DCM clearly depicts the reduced number of steps and logical elements. The proposed framework implies the XILINX ISE based frequency synthesizer design uses Low power clock gated techniques to reduce the power consumption naturally. XILINX SPARTAN XC3S250E device is used for implementation. The proposed architecture is also compared with Vedic multiplier-based synthesizer architecture in terms of power and area utilization.

Keywords: Dual channel multiplier, synthesizer design, distributed arithmetic, low power design etc.

#### I. INTRODUCTION

The five most common ways of implement multipliers are related to the expected DCM methodology. Various numbers of bits are used for evaluation. Table I and Table II show the 8-bit and 16-bit outcomes, respectively. Columns 2-4 define the power, area, and halt circuit functions, respectively. Integrating the power and halt values brings us the energy amount seen in the last section. Remember that DCM only takes cycles of N clock while the serial multiplier takes cycles of 2N clock to attain regeneration. Consequently, as mentioned in Table I, the time limit for an 8-bit serial multiplier is tantamount to a clock cycle multiplication by 16. The 8-bit DCM delay is also 8 times the delay duration of the clock.

A drop of 38 % in PDP is equal to a serial multiplier. In Tables I and II, the doubling of input size N from 8 bits to 16 bits dramatically raises the costs of applying the parallel multiplier (power usage, area, delay). DCM is characterized by low power consumption and a small area. Together with the fully parallel multiplier, DCM saves up to 37% and 93% in area and power, respectively. On the other hand, the multiplier requires 1 clock cycle to complete the multiplication, while the DCM requires 8 clock cycles. However, the expected DCM improves PDP by at least 80%. This type of multiplier is ideal for low energy applications such as low power consumption handheld device applications. Therefore, DCM provides a simple and energy efficient design structure, making it an excellent choice for functional evaluation of PWP. For radix-8 booth multipliers, you can multiply by  $0, \pm 1, \pm 2, \pm 3$ , or  $\pm 4$  instead of multiplying by 0 or 1. Therefore, extra hardware is expected to manage the Multiplying  $\pm$  3 stage of evolution, resulting in enormous energy and area consumption, even though partial product reduction results in faster productivity. The booth multipliers radix 4 and radix-8 are fully captured using Wallace Adder Sapling to once again minimize partial products and measure end result. Serial multipliers, by comparison have different configurations, and therefore different enactments. For the serial multiplier, substantial energy and area savings are achieved, at the cost of maximum power. The strategies suggested by DCM show better communication between area, power, and delay. The suggested DCM thus demonstrates that at least a 70 percent reduction in power requirements ranges through multiple forms of modern multipliers. A recently announced DCM will be used for significant power reduction in a secondary PWP architecture. The results of the hardware enactment for a second order PWP using DCM are shown in Section IV-B, compared to the traditional approach.

The system design will be implemented in MODELSIM 6.3 Software. A Simple and easy to use interactive simulation tool from Mentor Graphics. Includes a built-in C debugger for VHDL, Verilog, System C and other HDL emulations and simulations. Modelsim can be used with Intel Quart us II Prime, Xilinx ISE or Xilinx Vivado Software alone or in a combination [9].

#### II. LITERATURE SURVEY

[1] In 2009 Author Davide De Caro and Antonio G. M. Strollo, in his examination work, a fast-unique capacity unit (SFU) is offered in this paper. The framework underpins the single-accuracy IEEE-754 gliding point standard and executes devotedly adjusted complementary, square root, corresponding square root, logarithm, and exponential duties. The capacities are approximated by methods for a creative obliged piecewise quadratic addition procedure. In this strategy, the query table size is dense by 40% with veneration to once in the past anticipated strategies, with no misfortune in exactness. Mistake examination and estimating strategy are offered in the paper. The SFU has been affected in a 0.18-mCMOS innovation. The circuit is skilled to control up to 420- MHz clock recurrence, with a force scattering of 160 mW at 420 MHz. The framework can be occupied with programmable designs quickening agents and in further applications where superior capacity assessment is wanted [10].

[2] In 2008, Author named Shen-Fu Hsiao, Hou-Jen Ko, and Chia-Sheng Wen announced in his exploration that, another capacity assessment calculation is offered by methods for a two-level guess framework. In the main level, piecewise degree-one polynomial is utilized for introductory estimation to obtain the alleged regularized distinction works that are comparable to fit as a fiddle. At that point, a mutual regularized fluctuation work is figured to triumph the objective exactness in the resulting level of refined estimation. We additionally actualize the blunder examination and bit- width improvement with two unique plan objectives: territory streamlining and ROM enhancement. Investigational results show that the anticipated ROM streamlined development, when utilized in the multifunction evaluator for registering a few rudimentary number-crunching capacities on similar equipment, has critical territory sparing identified with going before approaches.

[3] In 2016, creator named Chih-Wei Liu, Member, IEEE, Shih-Hao Ou, Kuo-Chiang Chang, Tzung Ching Lin, and Shin-Kai Chen, Based on an error-smooth, non-uniform area straight estimate calculation, this description means a low- blunder and cost-effective design procedure to define an improved movement and integration of the Logarithmic Unit (LU) using the smallest equipment to achieve the desired error. Required besides installed design frameworks. Scientifically, this concise first determines two goals of the mistake leveled calculation. In this manner, for a mistake imperative, the smallest amount of estimate locales, n, Coefficients comparable to I<sup>th</sup> introduction (ai, bi), and local end points are accessed. Dispersing xi is non uniform, weakening the remarkable strength of the logarithmic capacity to make errors predictable in every locale. Smartly evaluating the cost of the inclusion / sub-arrangement implemented at that stage is addressed by a minimal effort applicant for a change and LU, a cost inquiry transfer, which continually increases n, is complete A gigantic measure of areas results in an increasingly exact redesign calculation that may bear more usage mistakes by utilizing straightforward equipment whose cost is more regrettable than that of the mediocre chosen one. Anticipated circuits were amalgamated in UMC 65RVT CMOS innovation. Identified with cutting edge movement and-including logarithmic converters, reenactment results reveal that the anticipated structure ensures approximately 12.7-51.1 percent region of polynomial estimate and patches approximately 1-14-dB SNR gain while triumphant over an imperative error.

[4] 2015, creators Yu-Jung Chaen; Chao-Hsien Hsu; Chung-Yao Hung; Chia-Ming Chang; Shan-Yi Chuang; Liang-Gee, Intensive pixel concealing controls the force dispersion of the pipeline of diagrams as supports screen goals In this study, we are proposing a 130.3mW portable 16-center GPU with three-pixel estimation procedures and an undifferentiated from tile-based purification technique. The anticipated design can exchange off among power utilization and optical quality to bear the cost of intensity mindful competency, and is created with TSMC 45 nm innovation. In this chip model, the chance and appropriateness of these methods are authenticated. The results of the application show that, with decent optical quality, 52.32% of the shader processor's use of force can be observed consolidated with an Approximated Precision shader engineering trail and an approximated Lighting Screen-space strategy. In addition, in our assessment the estimated texturing strategy will accumulate 24.57 percent of L1 reserve refresh.

[5] In 2014, creators named Shen-Fu Hsiaao; Po-Han Wu; Chia-Sheng Wen; Pramod Kumaar Meher, Tablequery and- expansion frameworks manage the cost of multiplier less capacity evaluation depleting numerous query tables and a multi operand viper. Regardless of their rapid activity, they are just down to earth in low-accuracy applications because of the quick upsurge in table size with exactness width. In this concise, we present two techniques for table size decreasing by breaking down the first table of starter measures into a few tables with less gets to as well as diminished piece width. The anticipated table deteriorations don't obtain any further adjusting blunders with the goal that the first table can be completely recovered. Investigational results display significant sparing of table measurements related with the best of the previous structures of the multipartite frameworks.

[6]In year 2006, creators H. Kim; B.- G. Nam ; J.- H. Sohn ; J.- H. Charm ; H.- J. Yoo, In this analysis it was reported that a32-piece fixed point logarithmic number juggling device is anticipated for plausible application to the 3-D portable illustration frame. The anticipated logarithmic retribution component. achieves analysis in two clock successions, equal square-root, corresponding square-root and square activities, and boosting four clock categories. It can also program the combination of figures for the exact dimensional adaptability of the 3-D design pipeline and eight-local piece - wise direct guesswork model for logarithmic remodeling under 0.2 percent to minimize operating error. Its research chip is inspired by the 1-poly 6-metal 0,18-mum CMOS discovery with 9-k inputs. This operates at a remarkable recurrence of 231 MHz and intakes 2.18mW at 1.8V flexibly.

[7] This paper proposes a dual channel multiplier (DCM) for calculation of the piecewise polynomial power of the vitality efficient second request for 3-D illustration applications. The exhibition of the evaluation procedure is exceptionally dependent on the increase plan and the structure figuring out. An epic use of the equipment is introducing for polynomial assessment. The proposed solution remunerates the staggering multipliers by using DCM that decreases the intricacy of the equipment. The DCM conspire performs complex capacities with power-proficient and zone effective methodology. The multiplier reduces computational exertion of the equipment with uniform or no uniform division in the piece-wise polynomial guesses. Similarly, a multiplier snake converter and a dedicated radix-4 configuration machine are suggested for a wide operating input scale. Compared with previous methodologies, such units achieve the least use of force with massive data on word size. Examination with a universally useful increase showed a decrease in force and a postponement of up to 36 percent, and a half individually. The revised approach demonstrates a power consumption sparing of up to 93 percent as opposed to the existing traditional plans.

## **III. DESIGN METHODOLOGY**

## A. Design of DCM

The complex operation and space consuming operation in any digital circuit is multiplication, in which the accumulation of bits increased every time, as show in figure 1. The dual channel multiplier split up the input data in the distributed manner and performs the operation to reduce the complexity and easy handling of carry [11].





Figure 2. Logic diagram of Dual Channel Multiplier

From the Literature [8] dual channel multiplier logical diagram is taken as reference. Figure.2 shows the architecture of DCM in the simplified version, in which x is the input provided serially for both the channels within the given clock cycles. Y is another input of the DCM provided concurrently. Partial product is performed by following [12]

$$PP0 = \{y0 \ x0, y1 \ x0 \ \{y0 \ x1.....(1)\}$$

The partial product (y0x1) is summed up to the partial product PP generated (y1x0) and circulated to the output. The partial product (y0x0) is directly proliferated to the output. The LSB of the product (P0 and P1) are generated instantaneously in the following equations.

| P0=y0 x0       | (2) |
|----------------|-----|
| P1=y0 x1+y1 x0 | (3) |

Each channel of the DCM is controlled by the reconfigurable clocks which in turn vary the frequency of operation of the DCM. The proposed system also focused on implementing the application circuit using the DCM. Here a reconfigurable frequency synthesizer having the basic operating clock of 40MHz is being evaluated [13].

B. Algorithm flow



Figure 3. Dataflow of Algorithm

#### C. Execution of Algorithm

The steps involved in implementing the proposed DCM based synthesizer design involve the steps mentioned above [14]. Here the role of DCM is to convolute the reference signal and feedback error signal produced by the Digital PLL after tuning the frequency word in the DCO. If the reference signal or the Input unknown signal is available, then the loop begins to execute, the algorithm follows the finite state machine model. The global clock or the master clock is used to control the entire architecture; global reset is used to clear the digital elements to initial condition. The overall operations are repeated until a stable clock found at the DC Figure 3.shows the Dataflow of Algorithm O.

#### D. Vedic Multiplier

A modest ordinal multiplier (referred hereafter as Vedic multiplier) architecture based on the Urdhvaa Triyakbhyam technique (Precipitously and Cross wise multiplication technique) is offered. This technique was traditionally used in ancient India for the development of two decimal numbers in relatively less time, he we utilize the benefit using less number of implementation steps. For the comparative analysis, initially Vedic multiplier is being implemented using RTL code and further the implemented module is applied into the frequency synthesizer for reducing the computation time on signal multiplication held in the execution flow discussed above.

## **IV. RESULTS & DISCUSSIONS**

## A. DCM based Synthesizer



Figure 4. Simulation Result of DCM Synthesizer

The above Figure 4 shows the simulation result of DCM based synthesizer implemented in QUARTUS II software. The result provides the reconfigurable synthesizer clock outputs by stable and equally divided duty cycles. clk, clr are global inputs for the architecture the variable name "sel" is used to select the frequency range of the input signal under test. A synthesizer is normally a PLL circuit which run in the closed loop system can be controlled by Digital words of the input.

The use of DCM here is the adding up of input signals to evaluate the DPLL input controlled. the range of the DPLL works is determined by the output generated by the DCM data1, data2 are the DCM inputs, the output of the DCM is accumulated and assigned to minclka which act as the master control for synthesizer The difference between the normal PLL and synthesizer is that the digital synthesizer enable the repeated oscillating signal output when sel ="00000001" the output of synthesizer is different which is varied when the sel="00000010" which act as reconfigurable synthesizer output.

B. Vedic Multiplier based Synthesizer



Figure 5. Simulation result of Vedic Multiplier Synthesizer

The above figure 5 shows the simulation result of Vedic multiplier applied as a signal multiplier at 40MHz global operating clock and active high reset, the Vedic multiplier is normally the divided slow multiplication technique, when integrated with synthesizer produces more clock propagation delays that increase the power in few mW points as displayed in Figure 6. The comparative table of vedic multiplier based synthesizer is being displayed in Figure 6.

C. Power Comparison results

| A       | B         | C         | D         | E               |
|---------|-----------|-----------|-----------|-----------------|
| On-Chip | Power (W) | Used      | Available | Utilization (%) |
| Clocks  | 0.001     | 1         |           | -               |
| Logic   | 0.000     | 46        | 4896      | ं ा             |
| Signals | 0.000     | 60        | -         | -               |
| Os      | 0.003     | 12        | 158       | 8               |
| Leakage | 0.052     | Called In |           | den de          |
| Total   | 0.057     |           |           |                 |

#### VEDIC MULTIPLIER BASED SYNTHESIZER

#### DCM BASED SYNTHESIZER

| A       | 8         | C    | D                  | E               |
|---------|-----------|------|--------------------|-----------------|
| On-Chip | Power (W) | Used | Available          | Utilization (%) |
| Gooks   | 0.001     | 1    | Contraction of the |                 |
| Logic   | 0.000     | 46   | -4896              | 1               |
| Signals | 0.000     | 60   | 3 9. <del>4</del>  | 4               |
| 101     | 0.000     | 12   | 158                | 8               |
| Leakage | 0.052     |      |                    |                 |
| Total   | 0.053     | 2    |                    |                 |

| Figure ( | 6. 1 | Simul | lation | result |
|----------|------|-------|--------|--------|
|----------|------|-------|--------|--------|

Figure 6 shows the clock distribution is common and equal to all digital blocks, IO configurations of Vedic multiplier consume more power when comparing with the DCM based synthesizer. XILINX ISE 12.5 is

used to accomplish the power utilization and area utilization summary. The proposed DCM based synthesizer is further used for N number of channels to perform signal multiplication in N channels.

| Parameters      | Logic<br>Delay<br>in ns | Path<br>Delay in<br>ns | Power in<br>Watt | Maximum<br>Frequency in<br>MHz |
|-----------------|-------------------------|------------------------|------------------|--------------------------------|
| VM-Synthesizer  | 3.786                   | 1.689                  | 0.057            | 182.352                        |
| DCM-Synthesizer | 1.394                   | 0.383                  | 0.053            | 562.762                        |

Table 1. Comparison of DCM-Synthesizer and VM- Synthesizer

From the Table 1, the comparison of DCM based synthesizer design and Vedic multiplier based synthesizer design is being clearly depicted. The delay generated by the logics is 3.786ns for VMS and 1.394 for DCMS, whereas path delay is 1.689 and 0.383 for VMS, DCMS respectively. Power utilization is shows in the Figure 6. Maximum frequency of operation adopted for VMS is 182.352 MHz and DCMS is 562.762 MHz [15].

## V. CONCLUSION

Reducing the complexity of DSP applications and enormous process impact the resilience of processing elements in image processing, floating point applications, the development of low power and high speed area reduced multipliers are motivated for study. Design and implementation of Dual channel multiplier packed up with complete low power modules for developing the logics and reconfigurable structure is evaluated here. The proposed framework achieved reduced area and work well in low power consumption. Dual channel multiplier with integrated High Speed Digital data synthesizer is simulated and the results are tested. It is evident from Table I that the system adopts higher frequency of operation at 562.762 MHz, reduced path delay of 1.689ns etc.

Further the implementation can be extended by improving the DCM through pipelined approach and can be applied to much high speed medical applications, Image processing and signal processing applications.

## References

- 1. Davide De Caro and Antonio G. M. Strollo, Year 2009, IEEE 754, multiplier design.
- 2. IEEE trans.2012 Hardware function evaluation on correction of Normalized difference function, By author Shen-Fu Hsiao, Hou-Jen Ko. Shih-Hao Ou, Kuo-Chiang Chang, Tzung-Ching Lin, and Shin- Kai Chen,"Design procedure for evaluating Logrithms" IEEE Transactions 2016
- 3. IEEE Journal, Solid State evaluaton, Year 2015 Yu-Jung Chaen authors, published GPU with Power Aware pixel approximation.
- 4. Shen-Fu Hsiaao ,IEEE Euro-Micro conference 2014 Peicewise-polynomial in compressed form, Chiaa-Shingg, Poo-Han AuthorIEEE Journal, Publisher on Solid State Dynamics, Logrithmic arithmetic unit 3D graphics.
- 5. H. authors Kim; B.-G. Nam J.-H. Sohn ; J.-H. Woo ; H.-J. Yoo, SF-Hsiao, HJ-Ko, and CS-Wen, "Twolevel hardware function appraisal grounded on correction of stabilized piecewise difference functions," IEEE Trans.in the year of May 2012
- 6. DCM\* for Piecewise-Polynomial IEEE trans Dina -M. Ellaithy, -Magdy -A. El-Moursy , -Amal -Zaki, and -Abdelhalim -Zekry IEEE 2019
- 7. YJ\*Chen et al., "A 130.3 mW 16-core mobile GPU with power-aware pixel guesstimate techniques," in the year of Sep. 2015.
- 8. I-Koraen and O-Zinaty, "Evaluating elementary functions in a numerical coprocessor based on rational approximations, in the year of Aug. 1990.
- 9. S-A Tawfik and H-A-H. Fahmy, "Algorithmic truncation of minimax polynomial coefficients," in the year of May 2006.

- 10. G-Cao, H-Du, P-Wang, Q-Du, and J-Ding, "A piecewise cubic polynomial interruption algorithm for resembling uncomplicated function," in the year of Aug. 2015,
- A-G-M. Strollo, D-De Caro, and A-Petra, "Elementary functions hardware implementation using constrained piecewise-polynomial approximations, in the year of Mar. 2011.
- 12. M-Sadeghian, J-E. Stine, and E-G-Walters, "Optimized linear, quadratic and cubic interpolators for uncomplicated function hardware enactments," Electronics in the year of Apr. 2016.
- 13. SF-Hsiao, HJ-Ko, YL-Tseng, WL-. Huang, SH-. Lin, and CS-. Wen, "Design of hardware function assessors using low- overhead nonuniform dissection with address remapping," in the year of May 2013.
- 14. A-E-G- Qoutb, A-M. El-Gunidy, MF-Tolba, and MA-El- Moursy, "High speed special function unit for graphics processing unit," in the year of Dec. 2014.
- 15. DU-Lee, R-C-C. Cheung, W-Luuk, and JD-Villasenor, "Hierarchical dissection for hardware function appraisal," IEEE- Trans in the year of Jan. 2009.