# A New Approximate 4-2 Compressor using Merged Sum and Carry

Chinthalgiri Jyothi<sup>1</sup> · K. Saranya<sup>2</sup> · Bhaskara Rao Jammu<sup>3</sup> · Sreehari Veeramachaneni<sup>1</sup> · SK Noor Mahammad<sup>4</sup>

Received: 21 April 2022 / Accepted: 21 July 2022 / Published online: 16 August 2022 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

### Abstract

Multiplication is the fundamental process in many image processing systems that undertake more computational assets. As many DSP and image applications are tolerable to inaccurate results, approximate multiplication is preferred for energy efficiency. Here in this paper, two types of approximate compressors are proposed by exploring the relationship between the sum and carry from the truth table to utilize them to design energy-saving multipliers. The proposed compressor circuits are synthesized using a 45nm library. The proposed circuits produce better Energy and Energy Delay Product (EDP) percentages when compared with the previously presented approximate compressors. Using the proposed approximate multiplier designs, the application to image processing is also presented in this paper. Image quality parameters like error rate, Normalized Relative Error Distance (NRED), and Average Relative Error Distance (ARED) are evaluated. New parameter Power and Exactness Product (PEP) is introduced, and it explicitly shows that the proposed designs are 35% and 47% efficient in terms of structural and quality aspects.

Keywords Approximate multiplier · Approximate circuits · Energy efficiency

# 1 Introduction

As technology is advancing, the lower value in delay, less circuit area, and less power consumption in systems is preferred for better hardware efficiency. Here, the computing platform named approximate computing is introduced to trade speed, power, and area of the integrated circuits and the output accuracy as the human eye does not tend to encounter a change in the results within images and videos. By considering this fact, approximate circuits were proposed in arithmetic operation on digital signals, Artificial intelligence (AI), and digital image applications. Arithmetic circuits are the central elementary units of these operations so that many researchers use these blocks to enhance

Responsible Editor: S. T. Chakradhar

- <sup>1</sup> Department of ECE, Gokaraju Rangaraju Institute of Engineering & Technology, Hyderabad, India
- <sup>2</sup> Dr. MCET Pollachi, Coimbatore, India
- <sup>3</sup> Department of ECE, GVP College of Engineering (A), Visakhapatnam, India
- <sup>4</sup> Department of CSE, IIITDM Kancheepuram, Chennai, India

efficiency. Multiplication is the main fundamental block of signal, image, and video processing, but it uses complex circuitry and consumes more power. Basic multipliers undergo three stages: production of a partial-product, reduction of partial-product, and addition of the partial -product ([8, 13]). In any of these three stages, the approximation is introduced. Reduction of the partial product stage is a main critical block in the multiplier, Here many works of existing compressors are used to reduce the complexity and power consumption . But the main problem with these circuits is with the high consumption of area and power since these compressors are XOR rich circuits ([1, 4, 6, 7, 9, 11, 12, 14]). Also, in most works of literature, the higher bits having more weights. If the approximation is imported on the MSB side, the error of the circuit will be more. The approximation can be made for the lower side bits and higher side bits should be accurate by considering this fact. The proposed method will significantly impact the approximate image contrasting, finding it suitable for many human-free image monitoring applications.

The contributions of the work are as follows

 Existing Design uses a separate circuit for calculating SUM and CARRY, whereas our proposed designs generate any one output and the other output is the inversion of the generated output



Bhaskara Rao Jammu jbhaskararao@gvpce.ac.in

- 2. It eliminates the need for two circuits for output and, in turn, reduces the structural cost by 50%
- 3. Outputs of 4:2 approximate compressor is designed to have equal weight and are used in the same column during Partial Product (PP) reduction
- 4. PP reduction is faster than the existing methodologies as the delay for the propagation of output to the preceding column is eliminated
- 5. Additionally partial product reduction stage is optimized such that 4:2 approximate compressor utilization is done to reduce the overall error less than 5%, i.e., the deviation from actual output is less than 5%.

Section 1 explains about Exact compressors used in the MSB part. Section 2 explains related works. Section 3 presents the proposed work, and in Sect. 4 experimental setup is described. Section 5, the proposed multiplier is tested on an image processing application. Section 4 gives the Results with detailed hardware and accuracy analysis of the proposed circuit, and Sect. 6 gives the paper's conclusions.

# 2 Related Works

Compressors are used to total the occurrence of one's in inputs. 4 to 2 compressor is designed by using two full adders. According to [2], the basic compressor has five inputs and three outputs. The function of the exact 4 to 2 compressor is given by Eq. 1,

$$A + B + C + D + Cin = 2(Carry + Cout) + Sum$$
 (1)

The inputs are in LHS of Eq. 1, Carry, Cout, and Sum are the compressor's outputs. As shown in Eq. 1, the sum has an identical value for weight as input bits. Here Carry, and Cout have one binary bit with a higher weight. If four ones are available in the input, then the output should be 100 (the binary equivalent of 4). In this case, the exact compressor will give the result as 110, giving the weight equivalent to 4 based on Eq. (1). Figure 1, shows the exact 4 to 2 compressor design, which is enacted using two Full Adders. The expressions for the three outputs of the exact 4 to 2 compressor are,

$$Sum = A \oplus B \oplus C \oplus D \oplus Cin \tag{2}$$

$$Cout = (A \oplus B).C + A \oplus B.A \tag{3}$$

$$Carry = (A \oplus B \oplus C \oplus D).Cin + (A \oplus B \oplus C \oplus D).D \quad (4)$$



Fig. 1 Block diagram of accurate 4 to 2 Compressor

The 4 to 2 approximate compressor has four inputs and two outputs Carry and Sum. Figure 2 shows the 4 to 2 approximate compressor block diagram. The input Cin and Cout are eliminated in the approximate compressor designs.

Table 1 expresses the truth table for 4 to 2 approximate compressors with only one error combination, i.e., 1111. By taking this table as a reference, many 4 to 2 approximate compressor designs were proposed in the literature ([1, 4, 6, 7, 9, 11, 12, 14]). The corresponding designs and implementations are provided in Figs. 3, 4, 5, and 6 and the corresponding Tables are shown in Tables 2, 3, 4, and 5. By redesigning the hardware circuit of the compressor, many 4:2 approximate compressor designs were proposed previously. Here, all the existing designs can be summarized into two types, one that totals the occurrence of one's (Count based compressor) in the inputs, and the other type gives weighted value (Weight based compressor). Many value-based or count-based approximate 4-2 compressors are proposed to use in multiplier design.

**Fig. 2** Block diagram of approximate 4 to 2 compressor



Table 1Truth table of 4 to 2 Approximate compressor with one error

| Inputs |   |   |   | Acc | Approxima | te 4-2 compresso | or |
|--------|---|---|---|-----|-----------|------------------|----|
| A      | В | С | D | S   | Carry     | Sum              | ED |
| 0      | 0 | 0 | 0 | 0   | 0         | 0                | 0  |
| 0      | 0 | 0 | 1 | 1   | 0         | 1                | 0  |
| 0      | 0 | 1 | 0 | 1   | 0         | 1                | 0  |
| 0      | 0 | 1 | 1 | 2   | 1         | 0                | 0  |
| 0      | 1 | 0 | 0 | 1   | 0         | 1                | 0  |
| 0      | 1 | 0 | 1 | 2   | 1         | 0                | 0  |
| 0      | 1 | 1 | 0 | 2   | 1         | 0                | 0  |
| 0      | 1 | 1 | 1 | 3   | 1         | 1                | 0  |
| 1      | 0 | 0 | 0 | 1   | 0         | 1                | 0  |
| 1      | 0 | 0 | 1 | 2   | 1         | 0                | 0  |
| 1      | 0 | 1 | 0 | 2   | 1         | 0                | 0  |
| 1      | 0 | 1 | 1 | 3   | 1         | 1                | 0  |
| 1      | 1 | 0 | 0 | 2   | 1         | 0                | 0  |
| 1      | 1 | 0 | 1 | 3   | 1         | 1                | 0  |
| 1      | 1 | 1 | 0 | 3   | 1         | 1                | 0  |
| 1      | 1 | 1 | 1 | 4   | 1         | 0                | -2 |

In [2] study, errors are found in 8 out of 32 possible combination, resulting in a 25% inaccurate output. With a structural realisation that requires 6 inputs (X0-X4, C0, and C1) and an overall error distance of 8, it has a higher structural cost than the suggested systems. Strollo et al. [11] report two instances of mistake, however in both cases the approximation compressor consists of AND, 2 OR gates, and a full adder structure. Even though the likelihood of mistake is just 4/256 and the overall error distance is only 2, the approximation structure is almost identical to the exact compressor. For the generation of SUM and CARRY, In [3], utilized One AND, two OR, One XOR, and a MUX unit and achieved 25% error rate.

In their four designs, [5] combined AOI22 cells with 4 NAND and 2 NOR gates. According to Kong et al., Design 1 has a total error distance of 1, Design 2 has a total error distance of 4, while Designs 3, 4, and 5 have a total error distance of 2. Although error distance is relatively low in all 4 systems, the sum and Carry generation involves a longer path time and more gates.



Fig. 3 Block diagram of approximate 4 to 2 compressor ([1])

According to contemporary literature, every design that has been implemented so far requires a separate circuit for Carry and Sum generation, which raises the structural cost. By using the same circuit for both Sum and Carry generation, the proposed designs achieve the goal of less hardware cost. The approximation implies structural simplification. Using this approach, the need for two circuits for output is eliminated and results in a reduction of the structural cost by 50%.

## **3** Proposed Work

In this work, new 4 to 2 compressor design is proposed with less area and power. Table 1 shows the truth table of 4 to 2 approximate compressors with one error, i.e., 1111 conditions have the error, and all other combinations are correct.



Fig. 4 Block diagram of approximate 4 to 2 compressor ([12])





From Table 1, it can be observed that there is a relationship between carry and sum. Out of 16 combinations, 11 times the sum and carry are inversely proportional. A new type of compressor design with minimum gate count/ hardware by exploring this relationship between carry and sum is proposed. Here the design offered two types of compressors. Type I is an Approximate Sum based Compressor (ASC), which is sum based design, i.e., the carry is calculated from the sum. Type II is an Approximate Carry based Compressor (ACC) that consists of three designs, i.e., ACC-1, ACC-2, and ACC-3. Figure 7 shows the proposed Type I Approximate Sum based Compressor(ASC). The output Eqs. 5 and 6 are shown as follows,

$$Sum = (\overline{A \oplus B} + \overline{C \oplus D}) \tag{5}$$

$$Carry = Sum \tag{6}$$

Figure 8 shows the proposed Type II Approximate Carry based Compressor(ACC-1). The output equations of ACC-1 shown as follows in Eqs. 7 and 8.



Table 2Approximate 4 to 2compressor truth table Ref ([1])

| Inputs |   |   |   | Acc | Approxim | ate 4-2 compr | essor Ref ([1]) |    |
|--------|---|---|---|-----|----------|---------------|-----------------|----|
| A      | В | С | D | S   | Carry    | Sum           | App(S1)         | ED |
| 0      | 0 | 0 | 0 | 0   | 1        | 1             | 3               | -3 |
| 0      | 0 | 0 | 1 | 1   | 1        | 0             | 2               | -1 |
| 0      | 0 | 1 | 0 | 1   | 1        | 0             | 2               | -1 |
| 0      | 0 | 1 | 1 | 2   | 0        | 1             | 1               | 1  |
| 0      | 1 | 0 | 0 | 1   | 1        | 0             | 2               | -1 |
| 0      | 1 | 1 | 0 | 2   | 1        | 0             | 2               | 0  |
| 0      | 1 | 1 | 1 | 3   | 0        | 0             | 0               | 3  |
| 1      | 0 | 0 | 0 | 1   | 1        | 0             | 2               | -1 |
| 1      | 0 | 0 | 1 | 2   | 1        | 0             | 2               | 0  |
| 1      | 0 | 1 | 0 | 2   | 1        | 0             | 2               | 0  |
| 1      | 0 | 1 | 1 | 3   | 0        | 0             | 0               | 3  |
| 1      | 1 | 0 | 0 | 2   | 0        | 1             | 1               | 1  |
| 1      | 1 | 0 | 1 | 3   | 0        | 0             | 0               | 3  |
| 1      | 1 | 1 | 0 | 3   | 0        | 0             | 0               | 3  |
| 1      | 1 | 1 | 1 | 4   | 0        | 1             | 1               | 3  |

\*App- Approximate Decimal Output

(9)

$$Carry = (\overline{\overline{A \oplus B} + \overline{C \oplus D}})$$
(7)

$$Sum = Carry \tag{8}$$

Figure 9 shows the proposed Type II Approximate Carry based Compressor (ACC-2). The output equations of ACC-2 are shown as follows,

$$Carry = (\overline{A.B} + \overline{C.D})$$

**Table 3**Approximate 4 to 2compressor truth table Ref

([12])

| Sum = C | Carry | (10) |
|---------|-------|------|
|         |       |      |

Figure 10 shows the proposed Type II Approximate Carry based Compressor (ACC-3). The output equations of ACC-3 shown as follows,

$$Carry = (A+B).(C+D)$$
(11)

$$Sum = \overline{Carry}$$
 (12)

| Inputs |   |   |   | Acc | Approxim | ate 4-2 compr | ressor Ref ([12]) |    |
|--------|---|---|---|-----|----------|---------------|-------------------|----|
| A      | В | С | D | S   | Carry    | Sum           | App(S2)           | ED |
| 0      | 0 | 0 | 0 | 0   | 0        | 0             | 0                 | 0  |
| 0      | 0 | 0 | 1 | 1   | 0        | 1             | 1                 | 0  |
| 0      | 0 | 1 | 0 | 1   | 0        | 1             | 1                 | 0  |
| 0      | 0 | 1 | 1 | 2   | 1        | 0             | 2                 | 0  |
| 0      | 1 | 0 | 0 | 1   | 0        | 1             | 1                 | 0  |
| 0      | 1 | 0 | 1 | 2   | 0        | 1             | 1                 | 1  |
| 0      | 1 | 1 | 0 | 2   | 0        | 1             | 1                 | 1  |
| 0      | 1 | 1 | 1 | 3   | 1        | 1             | 3                 | 0  |
| 1      | 0 | 0 | 0 | 1   | 0        | 1             | 1                 | 0  |
| 1      | 0 | 0 | 1 | 2   | 0        | 1             | 1                 | 1  |
| 1      | 0 | 1 | 0 | 2   | 0        | 1             | 1                 | 1  |
| 1      | 0 | 1 | 1 | 3   | 1        | 1             | 3                 | 0  |
| 1      | 1 | 0 | 0 | 2   | 1        | 0             | 2                 | 0  |
| 1      | 1 | 0 | 1 | 3   | 1        | 1             | 3                 | 0  |
| 1      | 1 | 1 | 0 | 3   | 1        | 1             | 3                 | 0  |
| 1      | 1 | 1 | 1 | 4   | 1        | 1             | 3                 | 1  |

Table 4 Approximate 4 to 2 compressor truth table Ref ([7])

| Inputs | 5 |   |   | Acc | Approxim | ate 4-2 compr | ressor Ref ([7]) |    |
|--------|---|---|---|-----|----------|---------------|------------------|----|
| A      | В | С | D | S   | Carry    | Sum           | App(S3)          | ED |
| 0      | 0 | 0 | 0 | 0   | 0        | 1             | 1                | 1  |
| 0      | 0 | 0 | 1 | 1   | 0        | 1             | 1                | 0  |
| 0      | 0 | 1 | 0 | 1   | 0        | 1             | 1                | 0  |
| 0      | 0 | 1 | 1 | 2   | 0        | 1             | 1                | -1 |
| 0      | 1 | 0 | 0 | 1   | 0        | 1             | 1                | 0  |
| 0      | 1 | 0 | 1 | 2   | 1        | 0             | 2                | 0  |
| 0      | 1 | 1 | 0 | 2   | 1        | 0             | 2                | 0  |
| 0      | 1 | 1 | 1 | 3   | 1        | 1             | 2                | 0  |
| 1      | 0 | 0 | 0 | 1   | 0        | 1             | 1                | 0  |
| 1      | 0 | 0 | 1 | 2   | 1        | 0             | 2                | 0  |
| 1      | 0 | 1 | 0 | 2   | 1        | 0             | 2                | 0  |
| 1      | 0 | 1 | 1 | 3   | 1        | 1             | 3                | 0  |
| 1      | 1 | 0 | 0 | 2   | 0        | 1             | 1                | -1 |
| 1      | 1 | 0 | 1 | 3   | 1        | 1             | 3                | 0  |
| 1      | 1 | 1 | 0 | 3   | 1        | 1             | 3                | 0  |
| 1      | 1 | 1 | 1 | 4   | 1        | 1             | 3                | -1 |

Table 6 shows the truth table and P(E) of the proposed designs. Table 4 ASC and ACC-3 have the error distance for 8 cases out of 16 cases, but the P(E) is 112/256 only. ACC-1 has the error distance for 7 cases out of 16 cases, but the P(E)is the same as 112/256 only. Similarly, ASC-2 has the error distance for 10 cases out of 16 cases, but the P(E) is 130/256.

The Probability parameters are given in Eqs. 13–16. PErr is calculated by taking into account the chance of that particular combination occurring in the multiplier's partial product block. Equation 13 gives the relation to find PErr for n bit input compressors, Where  $n_Z$  indicates number of zeroes in the  $i^{th}$  n bit input. The mean error  $E_{mean}$  is obtained using Eq. 14. Here,  $PErr_i$  is the probability of having an error in the  $i^{th}$  partial products combination, and Err, indicates the difference between accurate and approximate compressor and it is given in Eq. 16. The error probability, PE is the total probability of  $PErr_i$  and it shown in Eq. 15.

| <b>Table 5</b> Approximate 4 to 2compressor truth table Ref | Inputs  |             |          |   | Acc | Approxim | ate 4-2 comp | ressor Ref ([15]) |        |
|-------------------------------------------------------------|---------|-------------|----------|---|-----|----------|--------------|-------------------|--------|
| ([15])                                                      | A       | В           | С        | D | S   | Carry    | Sum          | App(S3)           | ED     |
|                                                             | 0       | 0           | 0        | 0 | 0   | 0        | 0            | 0                 | 0      |
|                                                             | 0       | 0           | 0        | 1 | 1   | 0        | 1            | 1                 | 0      |
|                                                             | 0       | 0           | 1        | 0 | 1   | 0        | 1            | 1                 | 0      |
|                                                             | 0       | 0           | 1        | 1 | 2   | 1        | 0            | 2                 | 0      |
|                                                             | 0       | 1           | 0        | 0 | 1   | 0        | 1            | 1                 | 0      |
|                                                             | 0       | 1           | 0        | 1 | 2   | 0        | 1            | 1                 | 1      |
|                                                             | 0       | 1           | 1        | 0 | 2   | 0        | 1            | 1                 | 1      |
|                                                             | 0       | 1           | 1        | 1 | 3   | 1        | 1            | 3                 | 0      |
|                                                             | 1       | 0           | 0        | 0 | 1   | 0        | 1            | 1                 | 0      |
|                                                             | 1       | 0           | 0        | 1 | 2   | 0        | 1            | 1                 | 1      |
|                                                             | 1       | 0           | 1        | 0 | 2   | 0        | 1            | 1                 | 1      |
|                                                             | 1       | 0           | 1        | 1 | 3   | 1        | 1            | 3                 | 0      |
|                                                             | 1       | 1           | 0        | 0 | 2   | 1        | 0            | 2                 | 0      |
|                                                             | 1       | 1           | 0        | 1 | 3   | 1        | 1            | 3                 | 0      |
|                                                             | 1       | 1           | 1        | 0 | 3   | 1        | 1            | 3                 | 0      |
|                                                             | 1       | 1           | 1        | 1 | 4   | 1        | 1            | 3                 | 2      |
|                                                             | Total J | probability | of error |   |     |          |              |                   | 37/256 |



Fig. 7 Proposed ASC

$$PErr = \frac{3^{n_z}}{2^{2n}} \tag{13}$$

$$E_{mean} = \sum_{i} (PErr_i)Err_i \tag{14}$$

$$P(E) = \sum_{i} (PErr_i) \tag{15}$$

$$Err_i = S - S_{App} \tag{16}$$

Table 6 explicitly revealed that ASC, ACC-3 have the same error probability irrespective of the circuit design. And also, ACC-1 has fewer cases with error distance but with the same error probability. But the circuit design and functional equations are different in ASC, ACC-3, and ACC-1, respectively. But the analysis taken from Table 6 shows that the multiplier's overall result depends on many factors like error distance as in Table 7, probability of error in quality aspects, and structure of the implementation of these designs in three different stages of multiplication. Though the error occurring cases and the probability of error are high in the proposed designs while implementing the same multiplier, this paper aims to prove that the proposed designs' effectiveness is equivalent to the existing ones in output quality.



Fig. 9 Proposed ACC -2

### 3.1 Multiplication

Here the approximate multipliers using the proposed compressors have been implemented. The exact multiplication process uses three stages,

- 1. Partial Product Production
- 2. Partial Products Reduction
- 3. Carry Propagate Addition

The stage two i.e., the partial-product reduction, plays a vital role in reducing the majority of area, power, and delay. To reduce partial product in this stage, different compressors are used with various combinations in many pieces of literature to reduce the complexity. The approximate multiplier uses the approximate 4 to 2 compressors. Figure 11 shows the approximate multiplier with dotted lines, which divided two subparts. The MSB side part uses the accurate compressors shown in Fig. 1, and the LSB side uses the proposed ASC, ACC-1, ACC-2, and ACC-3 type compressors.

Table 8 shows the Error Comparison of Existing and Proposed Compressor Based Multiplier Design to emphasize the new designs' nearness to the existing ones. Out of 10,000 input combinations 5 corner cases are shown for analyzing. Here, %Error is calcualted using Eq. 17



Fig. 8 Proposed ACC-1

Fig. 10 Proposed ACC-3

| Inp  | uts    |       |        | Acc   | Design     | #1         |         | Design     | #2         |         | Design     | #3         |         | Design     | #4         |         |
|------|--------|-------|--------|-------|------------|------------|---------|------------|------------|---------|------------|------------|---------|------------|------------|---------|
| A    | В      | С     | D      | S     | Carry      | Sum        | P(E)    |
| 0    | 0      | 0     | 0      | 0     | 0          | 1x         | 81/256  |
| 0    | 0      | 0     | 1      | 1     | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         |
| 0    | 0      | 1     | 0      | 1     | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         |
| 0    | 0      | 1     | 1      | 2     | $0 \times$ | $1 \times$ | 9/256   | $0 \times$ | $1 \times$ | 9/256   | 1          | 0          |         | 1          | 0          |         |
| 0    | 1      | 0     | 0      | 1     | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         |
| 0    | 1      | 0     | 1      | 2     | 1          | 0          |         | 1          | 0          |         | $0 \times$ | $1 \times$ | 9/256   | $0 \times$ | 1×         | 9/256   |
| 0    | 1      | 1     | 0      | 2     | 1          | 0          |         | 1          | 0          |         | $0 \times$ | $1 \times$ | 9/256   | 1          | 0          |         |
| 0    | 1      | 1     | 1      | 3     | $0 \times$ | 1          | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   |
| 1    | 0      | 0     | 0      | 1     | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         | 0          | 1          |         |
| 1    | 0      | 0     | 1      | 2     | 1          | 0          |         | 1          | 0          |         | $0 \times$ | $1 \times$ | 9/256   | 1          | 0          |         |
| 1    | 0      | 1     | 0      | 2     | 1          | 0          |         | 1          | 0          |         | $0 \times$ | $1 \times$ | 9/256   | $0 \times$ | 1x         | 9/256   |
| 1    | 0      | 1     | 1      | 3     | $0 \times$ | 1          | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   |
| 1    | 1      | 0     | 0      | 2     | $0 \times$ | $1 \times$ | 9/256   | $0 \times$ | $1 \times$ | 9/256   | 1          | 0          |         | 1          | 0          |         |
| 1    | 1      | 0     | 1      | 3     | $0 \times$ | 1          | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   |
| 1    | 1      | 1     | 0      | 3     | $0 \times$ | 1          | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   | 1          | $0 \times$ | 3/256   |
| 1    | 1      | 1     | 1      | 4     | 0          | 1          | 1/256   | 1×         | 0          | 1/256   | 1×         | 0          | 1/256   | 1×         | 0          | 1/256   |
| Tota | al pro | babil | ity of | error |            |            | 112/256 |            |            | 112/256 |            |            | 130/256 |            |            | 112/256 |

Table 6 Truth table for proposed approximate 4 to 2 compressors

#### % Error = Pacc - Papp/Pacc \* 100(17)

# 4 Experimental Setup & Results

It is inferred that the value obtained is less than 2% for almost all the cases considered. The critical parameter of integrated circuit design to which semiconductor technologies is mainly dependent are area, power, and delay period. All the design architectures are described using structural modeling in Verilog HDL and synthesized using Cadence EDA tools with a 45nm CMOS technology standard library. At the rate of 500MHZ frequency, the power

| Table 7Error Distanceof existing and proposed | Inp | outs  |        |       | Outputs | Error Dista | nce       |           |           |     |      |     |      |
|-----------------------------------------------|-----|-------|--------|-------|---------|-------------|-----------|-----------|-----------|-----|------|-----|------|
| compressors                                   | A   | В     | С      | D     | Acc     | Design #1   | Design #2 | Design #3 | Design #4 | [7] | [15] | [1] | [12] |
|                                               | 0   | 0     | 0      | 0     | 0       | -1          | -1        | -1        | -1        | -1  |      |     | -1   |
|                                               | 0   | 0     | 0      | 1     | 1       |             |           |           |           |     |      |     |      |
|                                               | 0   | 0     | 1      | 0     | 1       |             |           |           |           |     |      |     |      |
|                                               | 0   | 0     | 1      | 1     | 2       | -1          | -1        |           |           | 1   |      |     | 1    |
|                                               | 0   | 1     | 0      | 0     | 1       |             |           |           |           |     |      |     |      |
|                                               | 0   | 1     | 0      | 1     | 2       |             |           | -1        | -1        |     | 1    | 1   |      |
|                                               | 0   | 1     | 1      | 0     | 2       |             |           | -1        | 0         |     | 1    | 1   |      |
|                                               | 0   | 1     | 1      | 1     | 3       | -2          | -1        | -1        |           |     |      |     |      |
|                                               | 1   | 0     | 0      | 0     | 1       |             |           |           |           |     |      |     |      |
|                                               | 1   | 0     | 0      | 1     | 2       |             |           | -1        |           |     | 1    | 1   |      |
|                                               | 1   | 0     | 1      | 0     | 2       |             |           | -1        | -1        |     | 1    | 1   |      |
|                                               | 1   | 0     | 1      | 1     | 3       | -2          | -1        | -1        | -1        |     |      |     |      |
|                                               | 1   | 1     | 0      | 0     | 2       | -1          | -1        |           |           | 1   |      |     | 1    |
|                                               | 1   | 1     | 0      | 1     | 3       | -2          | -1        | -1        | -1        |     |      |     |      |
|                                               | 1   | 1     | 1      | 0     | 3       | -2          | -1        | -1        | -1        |     |      |     |      |
|                                               | 1   | 1     | 1      | 1     | 4       | -3          | -2        | -2 -      | -2        | 1   | 2    | 1   | 1    |
|                                               | To  | tal E | rror I | Dista | nce     | 14          | 9         | 11        | 9         | 4   | 6    | 5   | 4    |





analysis is performed for all the designs. All the parameters are measured for already available and new designs and are analyzed. The estimation and validation of the work is performed using 1. Cadence EDA tools with a 45nm CMOS technology standard library to prove that structural metrics improve by 50 percent in area power and delay

 Table 8
 % Error Comparison of Existing and Proposed Compressor Based Multiplier Design

| TYPE    | Input1 (11101000 <sup>*</sup><br>00111011) Pacc= 13688 |          | · ·   | 11110010 <sup>*</sup><br>0) Pacc= 54692 | 1    | (10110001 <sup>*</sup><br>01) Pacc=8673 | 1 (   |              | Input5 (10101001 <sup>*</sup><br>11010101)<br>Pacc=35997 |         |
|---------|--------------------------------------------------------|----------|-------|-----------------------------------------|------|-----------------------------------------|-------|--------------|----------------------------------------------------------|---------|
|         | Papp                                                   | %Error   | Papp  | %Error                                  | Papp | %Error                                  | Papp  | %Error       | Papp                                                     | %Error  |
| [7]     | 13560                                                  | 0.93     | 54524 | 0.3                                     | 8745 | 0.83                                    | 35576 | 0.18         | 36221                                                    | 0.62    |
| [15]    | 13560                                                  | 0.93     | 54692 | 0                                       | 8625 | 0.55                                    | 35512 | 0            | 35709                                                    | 0.8     |
| [1]     | 13699                                                  | 0.08036  | 54923 | 0.42237                                 | 8701 | 0.32284                                 | 35802 | 0.8166336241 | 0.67783                                                  |         |
| [12]    | 13602                                                  | 0.708081 | 54890 | 0.060084                                | 8625 | 0.873463                                | 35552 | 0.698285     | 36321                                                    | 0.22074 |
| Design1 | 13304                                                  | 2.8      | 56524 | 3.34                                    | 8745 | 0.83                                    | 35576 | 0.18         | 36093                                                    | 0.26    |
| Design2 | 13432                                                  | 1.8      | 54524 | 0.30                                    | 8745 | 0.83                                    | 35576 | 0.18         | 36157                                                    | 0.44    |
| Design3 | 13560                                                  | 0.93     | 54716 | 0.04                                    | 8825 | 1.75                                    | 35704 | 0.54         | 35709                                                    | 0.80    |
| Design4 | 13432                                                  | 1.8      | 54524 | 0.3                                     | 8745 | 0.83                                    | 35576 | 0.18         | 36157                                                    | 0.44    |

\*Papp- approximate product when implemented in the approximate multiplier structure

\*\*Pacc- accurate product of multiplication

**Table 9** Structural performanceof existing and proposedapproximate 4-2 compressors

| Туре         | Delay (ps) | Area ( $\mu m^2$ ) | Power (nW) | Energy (z.J) | EDP (zJ.ps)           | ED | PDA*ED P      |
|--------------|------------|--------------------|------------|--------------|-----------------------|----|---------------|
| Exact        | 564        | 24                 | 696.14     | 392623.5     | $2.21 \times 10^{8}$  | 0  | 0             |
| [12]         | 456        | 21                 | 592.6      | 270225.6     | $1.23 \times 10^{8}$  | 10 | 567,47,376    |
| [1]          | 457        | 3.5                | 623.66     | 285012.6     | $1.30 \times 10^{8}$  | 5  | 334,88,982.85 |
| [ <b>7</b> ] | 493        | 19.5               | 584.15     | 287985.9     | $1.41 \times 10^{8}$  | 2  | 112,31,452.05 |
| [15]         | 461        | 22.5               | 598.9      | 276092.9     | $1.27 \times 10^{8}$  | 5  | 310,60,451.25 |
| ASC          | 350        | 16.5               | 403.82     | 141339.8     | $0.4 \times 10^{8}$   | 14 | 326,48,847    |
| ACC-1        | 92         | 5.8                | 461.46     | 42455.14     | $0.039 \times 10^{8}$ | 9  | 22,16,115.504 |
| ACC-2        | 91         | 6.4                | 405.63     | 36912.78     | $0.033 \times 10^{8}$ | 11 | 25,98,628.032 |
| ACC-3        | 92         | 10.5               | 461.46     | 42455.14     | $0.039 \times 10^{8}$ | 9  | 40,11,933.24  |

- 2. Image processing simulation using MATLAB to prove that the quality metrics are equivalent to existing methods in terms of image quality parameters
- 3. Analysis of parameters for the existing and proposed method is done to justify the competence of the new method
- 4. The results are compared in this paper with Ref ([7]) and Ref ([15]) as they are the basic models and are hardware efficient, from these the other designs are evolved ([11]) to achieve more accuracy.

Table 9 illustrates the comparison of the proposed approximate 4 to 2 compressors existed 4 to 2 approximate compressors with respect to the exact 4 to 2 compressor. It can be observed that the proposed approximate 4-2 compressor designs consume very minimum energy and EDP compared with other 4 to 2 approximate compressor designs. Table 9 shows that the proposed approximate compressors consumes less power, area, and delay when differentiated with other hardware efficient 4 to 2 approximate compressors.

Table 10 illustrates the differentiation of different approximate multipliers by using the different 4 to 2 approximate compressors, with respect to an exact multiplier. From the Table 10, the proposed multiplier design has less power and minimum path delay when compared with the exact multiplier.

## **5** Application

An approximate circuit ([10]) will provide a better solution in many error-tolerant areas. Many ideas have been proposed in these approximate circuits in the last two decades. But today's era as applications are growing, still more research is carried on the arithmetic circuits. In this section, Lena image from MATLAB file is processed, and contrasting is performed using the proposed concept to showcase its performance in quality and structural cost efficiency. In image contrasting, and in almost all type of image processing areas wherever the arithmetic operation on images is performed, the proposed method can be utilized to improve the lifetime of battery by reducing the power and area utilization.

Approximate Image multiplication is a mostly used mathematical operation in approximate-based computing methodology. It aims to provide high performance and low power in arithmetic devices. Power and area-efficient digital logics circuits for approximate multiplication can be realized with an approximate 4 to 2 compressor. The proposed designs are nearly accurate, require less hardware, and consume less power than previously available 4 to 2 compressor approximate multiplier designs. To study the efficacy of the proposed multipliers in image conditioning relevant application, an image multiplication is implemented and simulated using MATLAB. The proposed multiplier designs provide excellent quality parameters for the multiplication of images.

| Table 10  | Structural          |
|-----------|---------------------|
| performa  | nce of existing and |
| proposed  | approximate 8 Bit   |
| Multiplie | rs                  |

EDP(zJ.ps) Type Delay(ps)  $Area(\mu m^2)$ Power(nW) Energy(z.J) Exact 3280 834 26650.932 87415056.96  $2.86 \times 10^{11}$  $2.083 \times 10^{11}$ [7] 3094 820.5 21759.65 67324357.1 [15] 3141 793.5 23495.781 73800248.12  $2.318 \times 10^{11}$ [1] 3231 529 177653 57398715  $1.854 \times 10^{11}$ 167845 53692016  $1.717 \times 10^{11}$ [12] 3199 921 ASC 3272 766.5 20217.431 66151434.23  $2.164 \times 10^{11}$  $1.675 \times 10^{11}$ ACC-1 3093 672 17509.962 54158312.47  $1.679 \times 10^{11}$ ACC-2 3093 672 17558.541 54308567.31  $1.675 \times 10^{11}$ ACC-3 3093 712.5 17509.962 54158312.47

Fig. 12 Image Processing Application



(a) Original Image using Accu- (b) Contrast Image using Acrate Multiplier. curate Multiplier.



(c) Momeni et al. (2014)

(f) Venkatachalam and Ko (2017)



(e) Akbari et al. (2017)



(g) Contrast Image using var-(h) Contrast Image using various Existing and Proposedious Existing and Proposed Multiplier Design 1. Multiplier Design 2.



(i) Contrast Image using var- (j) Contrast Image using various Existing and Proposedious Existing and Proposed Multiplier Design 3. Multiplier Design 4.

It can also be extended and used for signal and video approximation also. The average normalized error distance is significantly less, and also desirable peak signal-to-noise ratio is achieved. This shows the implementation of the proposed 4 to 2 approximate designs are in contrast to input images. Here, the MATLAB platform is utilized for processing the input images. The application used 256\*256 Lena image. Here, the input image is resized and the input for the contrast image is  $256 \times 256 \times 3$  with 256 rows and 256 columns. As it incorporates RGB values, it is multiplied with constant 3 to obtain the count of pixels in the image considered. Initially, the input image pixel is converted to 8 bits as our multipliers' size is  $8 \times 8$  bit. UNIT8 pixel in MATLAB has the pixel value in the range of 0 to 255. Secondly, as it is mandatory to convert each value into UNIT 16, the proposed multipliers has 16 bit output and maintaining in UNIT8 leads to the truncation of 8 bits. Once this is done, the image pixel values will fit between 0 and 65535 at the output stage of image contrast.

Figure 12a shows the contrast image obtained using Accurate Multiplier. Figure 12b shows the results obtained from the MATLAB output after performing contrasting using various approximate multipliers on the input 256\*256 Lena image. Image Contrasting is applied on the multiplied image as the difference in color that makes an object noticeable from other objects within the same field of view is made clear. In images with minimum contrast value it is quite tough to identify the segments present as compared to images with high contrast value. From Fig. 12 Contrasting of Image using various existing and proposed multiplier is performed, and differentiation of output image is also visible between images. The precision factors of the approximate multipliers designs are investigated. The quality of the image is measured using parameters shown in Table 11. It is observed from Table 11 that the proposed design inhibits the same quality as that of the existing design.

The incorrect output of the approximate compressor and approximate multiplier are measured as error rate. The error rate is very less, and it is less than 0.2% in all the proposed designs.

Ref ([7]) is having a very less error rate of 0.094 %, but it utilizes more hardware compared to Ref ([15]) and proposed designs. From Table 11 it can be easily inferred that

Table 11 Image Quality Metrics

Design

[7]

[15] [1]

[12]

ASC

ACC-1

ACC-2

ACC-3

ACC-2 and ACC-3 has very less error rate and also utilizes less hardware.

The mean value of all error distances for different input combinations for the given image pixel values is calculated as Average Relative Error Distance (ARED). ED is the difference between the actual sum and carry output and approximate sum and output for the flowing input combinations based on an image.

$$ARED = \frac{1}{2^{2n}} \sum_{i=1}^{2^{2n}} \frac{ErrorDistance_i}{ExactOutput_i}$$
(18)

Average Relative Error Distance of ASC, ACC-1, ACC-2, and ACC-3 is 21%, 17%, 32%, and 40% less than the Ref ([15]). ASC, ACC-1, ACC-2, and ACC-3 is 7%, 3%, 17%, and 24% less than the Ref ([7]). Normalized Relative Error Distance (NRED) Metric is the average value of error distance, and the maximum acquired value of error normalizes it from a reasonably accurate multiplication. In NRED, n represents the bit size.

$$NRED = \frac{1}{L(2^n - 1)^2} \sum_{i=1}^{2^{2n}} \frac{ErrorDistance_i}{ExactOutput_i}$$
(19)

NRED values of Ref ([7]), Ref ([15]), ASC, ACC-1, ACC-2, ACC-3 based approximate image multipliers are shown in Table 11. Many error tolerant image processing applications probably prefer the multiplier with a lower value of NRED during computation. Our multipliers provide tremendous logic circuit-level performance than exact circuit-based multipliers NRED values are in the range of 0.006 to 0.008, respectively. Table 11 shows NRED values of various approximate multipliers. Our MATLAB simulation results have shown that the proposed design provides PSNR around 30dB in image multiplications. From various literature, it can be inferred that a PSNR of 30dB is good to use in most error-tolerant image applications. The PSNR figures for contrasting images are presented in Table 12. According to the results, the proposed approximate multiplier design provides optimum PSNRs, leading to noticeably improved performance in energy parameters.

Design

[7]

[15]

[1]

[12]

ASC

ACC-1

ACC-2

ACC-3

PSNR (db)

28.44 31.32

32.76 34.99

32.56

35.53

31.4

26.13

| Error Rate % | ARED   | NRED   | Table 12PSNR of ApproximateMultipliers |
|--------------|--------|--------|----------------------------------------|
| 0.094        | 0.0498 | .00658 |                                        |
| 0.18         | 0.0564 | .00765 |                                        |
| 0.19         | 0.048  | .00799 |                                        |
| 0.0189       | 0.05   | .00555 |                                        |
| 0.18         | 0.0465 | .00792 |                                        |
| 0.18         | 0.0480 | .00899 |                                        |
| 0.12         | 0.0425 | .00772 |                                        |
| 0.10         | 0.0401 | .00634 |                                        |
| <br>         |        |        |                                        |

Table 13PEP of ApproximateMultipliers

| Design | PEP      |
|--------|----------|
| [7]    | 1083.631 |
| [15]   | 1325.162 |
| [1]    | 1419.447 |
| [12]   | 931.5398 |
| ASC    | 940.1105 |
| ACC-1  | 840.4782 |
| ACC-2  | 746.238  |
| ACC-3  | 702.1495 |

The paper suggest a new metric, Power & Exactness Product (PEP), which yields the product between Power and MRED and is displayed in Table 13, because the proposed designs surpass the previous design in terms of structural characteristics and image quality. The relationship between structural and quality indicators is represented by the Power & Exactness Product (PEP). Various image quality parameters are evaluated using these tables, and it is clear that the suggested ideas outperform existing methods.

## 6 Conclusion

A proposed multiplier design with four different approximate 4 to 2 compressors is implemented to explore the sum and carry relation. The proposed compressor designs i.e, ASC uses 10.82%, ACC-1 uses 36%, ACC-2 uses 9.5%, and ACC-3 uses 10.82% of energy. From all the 4 designs of ACC-3 uses less EDP of 1.52% only. To have a descriptive assessment about the application of ASC, ACC-1, ACC-2, and ACC-3 in approximate image multipliers, both quality-based parameter and circuitlevel performance metrics are taken into consideration. The results of MATLAB simulation and CADENCE with 45nm library output clearly show that the new design is equivalent to the current designs in terms of quality and circuit-level efficiency. The arithmetic operation on the suggested method's images is demonstrated by the image contrasting application and the metrics evaluated in the image processing section. It can be used in battery-powered gadgets since it saves power and space while preserving image quality and existing designs.

**Data Availability** Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

### Declarations

**Conflicts of Interest** The authors declare that they have no conflict of interest.

### References

- Akbari O, Kamal M, Afzali-Kusha A, Pedram M (2017) Dualquality 4: 2 compressors for utilizing in dynamic accuracy configurable multipliers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25(4):1352–1361
- Chang YJ, Cheng YC, Lin YF, Liao SC, Lai CH, Wu TC (2019) Imprecise 4–2 compressor design used in image processing applications. IET Circuits Devices Syst 13(6):848–856
- Edavoor PJ, Raveendran S, Rahulkar AD (2020) Approximate multiplier design using novel dual-stage 4: 2 compressors. IEEE Access 8:48337–48351
- Ha M, Lee S (2017) Multipliers with approximate 4–2 compressors and error recovery modules. IEEE Embed Syst Lett 10(1):6–9
- Kong T, Li S (2021) Design and analysis of approximate 4–2 compressors for high-accuracy multipliers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29(10):1771–1781
- Lin CH, Lin C (2013) High accuracy approximate multiplier with error correction. In: Proc IEEE 31st International Conference on Computer Design (ICCD). IEEE, pp 33–38
- Momeni A, Han J, Montuschi P, Lombardi F (2014) Design and analysis of approximate compressors for multiplication. IEEE Trans Comput 64(4):984–994
- Rabaey JM, Chandrakasan AP, Nikolić B (2003) Digital integrated circuits: a design perspective, vol 7. Pearson education Upper Saddle River, NJ
- Sabetzadeh F, Moaiyeri MH, Ahmadinejad M (2019) A majoritybased imprecise multiplier for ultra-efficient approximate image multiplication. IEEE Trans Circuits Syst I Regul Pap 66(11):4200–4208
- Shirinabadi Farahani S, Reshadinezhad MR (2019) A new twelvetransistor approximate 4: 2 compressor in cntfet technology. Int J Electron 106(5):691–706
- Strollo AGM, Napoli E, De Caro D, Petra N, Di Meo G (2020) Comparison and extension of approximate 4–2 compressors for low-power approximate multipliers. IEEE Trans Circuits Syst I Regul Pap 67(9):3021–3034
- Venkatachalam S, Ko SB (2017) Design of power and area efficient approximate multipliers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25(5):1782–1786
- 13. Weste NH, Harris D (2015) CMOS VLSI design: a circuits and systems perspective. Pearson Education India
- Yang Z, Han J, Lombardi F (2015) Approximate compressors for error-resilient multiplier design. In: Proc IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS). IEEE, pp 183–186
- Yi X, Pei H, Zhang Z, Zhou H, He Y (2019) Design of an energyefficient approximate compressor for error-resilient multiplications. In: Proc IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp 1–5

**Publisher's Note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Jyoti Chintalgiri has done her M. Tech. in the department of Electronics and Communication Engineering, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India, Her research interests are Approximate Computing, Image Processing, and Low power VLSI Architectures. **K. Saranya** is an Assistant Professor in the Department of EEE in Dr. Mahalingam College of Engineering & Technology, Pollachi, Tamil Nadu. India. She completed her BE in department of EEE at SKCET, Coimbatore under Anna University. She completed her M.E. in department of Applied Electronics at Government College of Technology, Coimbatore. Her research interests are VLSI Design, Analog Circuits, Reversible Circuits, ASIC Implementation.

**Bhaskara Rao Jammu** received his B.Tech. from Andhra University. He recieved his masters degree from MNNIT, Allahabad and PhD from NIT Rourkela. He has authored more than 10 research papers in National and International Level. He worked as ASIC design engineer in Aricent Techologies during 2006 to 2009. Currently he is working as associated professor in the Department of ECE, GVP COE(A). He filed two indian patents in the field of electronics. His research interests are Approximate Computing, Evolvable Hardware, Biomedical Image Processing, Artificial Intelligence, and FPGA Implementations. Sreehari Veeramachaneni is a faculty member in the Department of Electronics and Communication Engineering at Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India. He received his PhD degree from International Institute of Information Technology Hyderabad. He is a reviewer of several IEEE/ Springer/ Elsevier journals. His research interests include Arithmetic Circuits Approximate Computing Hardware Security Memory Design, Data Converters, Analog VLSI Design and Low Power VLSI.

**Sk Noor Mahammad** is currently working as faculty in the Indian Institute of Information Technology Design and Manufacturing (IIITDM) Kancheepuram, Chennai, India. He received a Ph.D. in Computer Science and Engineering, Indian Institute of Technology Madras. His research interests are reconfigurable computing, computer architecture, Software for VLSI Design, and Network System Design.