# Efficient Design of Rounding-Based Approximate Multiplier Using Modified Karatsuba Algorithm

E. Jagadeeswara Rao<sup>1</sup> · K. Tarakeswara Rao<sup>2</sup> · K. Sudha Ramya<sup>3</sup> · D. Ajaykumar<sup>4</sup> · R. Trinadh<sup>4</sup>

Received: 12 June 2022 / Accepted: 3 October 2022 / Published online: 17 October 2022 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

## Abstract

Arithmetic operations play a substantial role in many applications, such as image processing. In image processing applications, a multiplier is a predominantly used arithmetic operation. In recent designs of Approximate Multipliers (AMs), the design metrics of multipliers are made better at the cost of Error metrics and vice versa. So, in order to balance both the error and design metrics in a multiplier design with increasing the width of the input operands, a Rounding-based AM (RAM) using a modified Karatsuba algorithm is proposed, in which the usage of the number of multipliers is reduced. Small multipliers are used with shifting and rounding operations so as to reduce power consumption, delay, and area. Both the prior and proposed AMs are later synthesized in Verilog HDL using the Cadence RTL compiler. The simulation results divulge that the proposed RAM of sizes 8 and 16 bits are designed and their performance metrics in terms of delay, and area are decreased on an average of 61.8%, and 52.6% with an improvement in power by 53.8% for 8-bit AM and also the delay, area and power are reduced on an average of 53.2%, 59.7%, and 25% for a 16-bit AM's, in comparison with the prior AMs. The proposed RAM is demonstrated using the smoothening image application, and we observe that an improved image quality is obtained with SSIM and PSNR of the ISFA incorporated proposed RAM within the range of 1.44%—84.47% and 0.28%-24.4%, over the ISFA incorporated existing AMs.

Keywords Arithmetic operations · Wallace tree multiplier · Karatsuba multipliers

Responsible Editor: J. Han

E. Jagadeeswara Rao emandi.jagadeesh@gmail.com

> K. Tarakeswara Rao tkota@gitam.edu

K. Sudha Ramya ramyakarris@gmail.com

D. Ajaykumar ajaybabuji@gmail.com

R. Trinadh thrinath.vrajs@gmail.com

<sup>1</sup> Department of Electronics and Communication Engineering, Vignan's Institute of Engineering for Women, Visakhapatnam, Andhra Pradesh, India

- <sup>2</sup> Department of Electronics and Communication Engineering, GITAM (Deemed to be University), Andhra Pradesh, Visakhapatnam, India
- <sup>3</sup> Department of Electronics and Instrumentation Engineering, Government Polytechnic for Women, Srikakulam, Andhra Pradesh, India
- <sup>4</sup> Department of ECE, West Godavari District, Sir C.R. Reddy College of Engineering, AP 534007 Eluru, India

# **1** Introduction

Approximate Arithmetic operations like multiplication and division, play a significant role in many applications related to machine learning and data mining [1]. But they also show some computational errors when used in certain signal processing applications [2]. So, energy minimization is the dominant requirement in any electronic system to curtail these computational errors [3]. Moreover, there are many techniques at different levels to diminish power or energy consumption [4-11]. Out of these diverse techniques, the most in-demand method is approximate computing [4]. To be clearer, consider error-resilient applications, which are nothing but applications that reduce power consumption [5]. The error resilience occurs due to the prolixity of input data, iterative computations, and the non-existence of single golden output. So, in approximate designs, the modern techniques are utilized to efficiently design error-resilient applications [5].

Recently, energy-efficient AMs are being developed as they affect processing core's performance [5-19]. The prior



AMs provide excellent performance for image processing applications, but they sacrificed accuracy to reduce area, power consumption or delay, by lengthening the size of the AMs. So, to make methodical AMs, an advanced Wallace tree multiplier based on a bit width-aware multiplication algorithm has been proposed, through which the power consumption and area can be diminished. In real-time, by employing this multiplier, power usage is depleted to 39%, and the area is reduced up to 30%, when compared to previous AM's [6–20]. Furthermore, these multipliers are also built using counters to maximize the speed but at the cost of an exponential increase in the utilization of hardware components. Additionally, to reduce the hardware usage, another multiplier called a Karatsuba multiplier is suggested, in which the number of multipliers are reduced [1].

Hence, this paper uses a modified Karatsuba algorithm to design Rounding-based AM (RAM) efficiently. In the proposed RAM, firstly, two higher-order and lower-order n/2-bits are chosen from two n-bit input operands, and then the chosen values are rounded to the neighboring power of two. The rounded values are given as input to the shifters and adders for tracing the end multiplication product. In addition, error metrics for accuracy analysis are calculated in terms of Error Distance (ED), Max. Error, Normalized ED (NED), Mean ED (MED), and Mean Relative ED (MRED) [14, 16] for both the proposed RAM and prior AMs. Additionally, in this paper, an Image Smoothening Filter Application (ISFA) [21] with proposed RAM and prior AMs is established and verified with three standard test images [22], in order to study their corresponding quality metrics in terms of Structural Similarity Index Metric (SSIM) and Peak Signal to Noise Ratio (PSNR) [23].

The main contributions of this paper are enumerated as follows:

- An effective design of rounding-based AM using a modified Karatsuba algorithm, with improved design and error metrics, is proposed.
- The proposed RAM is combined with rounding approaches for enhancing the design metrics.
- Gate-level design for the proposed RAM is provided to estimate the design and error metrics.
- ISFA with prior AMs and proposed RAM are given to calculate the quality metrics.
- The advantages of the proposed RAM are—improved design and error metrics compared to prior AMs.
- The suggested RAM also improved the image quality after utilizing ISFA compared with ISFA, used in prior AMs.

The rest of the paper is organized as follows: In Sect. 2, we talk about the previous AMs. Section 3 gives a clear explanation of the proposed RAM. Section 4 constitutes the

design metrics, accuracy, and performance of ISFA. Ultimately, in Sect. 5, the conclusion is given.

#### 2 Literature Survey

The preceding AMs are contemplated in this section. Jain et al. [1] suggested error-efficient AM using reduced number multiplier blocks in the existing Karatsuba algorithm. But, suggested AM provides outstanding quality of errorresilience applications with a little bit of increase in the area value. Next, Half input-operand sizes are truncated in the static-type AM [7], and the truncated values are inserted into a precise multiplier. As an outcome, the final approximate multiplication output is obtained and is less than the precise multiplier result. Moreover, the MRED rate is elevated with an increase in truncation size of the static-type AM. In the dynamic-type AM, the input-operands are truncated with j-bits dependent on the leading-one-bit point. The truncated values are applied to a precise multiplier. Since the rough output is lesser than the precise output, the MRED is negative, which is an objectionable property.

In energy-efficient static-type AM [7], half of the inputoperand sizes are truncated by the static-segment method and multiplexers. Furthermore, these truncated values are positioned into the precise multiplier. The energy-efficient static-type AM offers enhanced error and design metrics than static-type and dynamic-type AMs, in which the error is increased with large truncated size. Subsequently, in low-Energy, Truncation-based AM's [4], Partial Products (PPs) are lacking and reliant on k-bit truncation size after the input-operand sizes are truncated in the multiplication method. So, the accuracy of Truncation-based AM is revamped compared to static-type, dynamic-type, and dynamic-range unbiased AM.

Nevertheless, the accuracy and design metrics of Truncationbased AM depend on the truncation size. After that, the complexity is reduced in the Rounding-Based AM [3] when the input-operand sizes are first rounded to the nearby power of two. Subsequently, the end approximate multiplication product is attained using adders and shifters, except that the accuracy of Rounding-Based AM depends on the rounded value of input-operand sizes.

Truncation and Approximate-based Scalable-AM [13] also explicates about the approximated and truncated value of the input-operand, rooted on the leading-one-bit point. The final inaccurate multiplication product is accomplished using adders and shifters. However, the accuracy of the suggested AM hangs on the approximated and truncated values. Bharat Garg et al. [14] have suggested a leading-one-bit-based AM, which yields a final inaccurate multiplication product that picks j-bit from the n-bit input-operand. The accuracy of the advised AM is dependent on the leading-one-bit value.

Moreover, a modified static-type AM is designed by using a significance-estimator-logic circuit. It reduced the design metrics by eradicating the LSB bits of input-operands [17]. Nonetheless, its accuracy depends on the input-operand size of AM.

Further, reconfigurable rounding-based AMs are accustomed to the adjacent rounding power of two input operands. The accuracy of the suggested AM depends on the rounded value of input-operands [18]. Shravani Chandaka et al. [19] recommended a performance-enhanced AM by modifying the Wallace-tree multiplier with an error minimized 4-2 approximate compressors for image processing applications, thereby enhancing the design metrics of the suggested AM with an increase in the input-bit size. Honglan Jiang et al. [20] suggest a comprehensive survey and a comparative evaluation of recently developed AMs. They addressed the recent AMs improved quality of its applications by sacrificing accuracy. From the literature survey, it is observed that the recent AMs provide better accuracy but increase the design metrics. Therefore, the proposed RAM is designed to reduce the design metrics with improved accuracy compared to that of prior AMs, which are considered in the following section.

# 3 Proposed Rounding-based AM Using Modified Karatsuba Algorithm

The proposed RAM's main objective is to reduce the number of multiplier blocks compared to the Karatsuba algorithm [1] and to reduce the circuit complexity compared to reconfigurable rounding-based AM [18]. The recommended RAM also boosts the design metrics by ameliorating the AM's size. Accordingly, the urged RAM was developed with two n/2-bit multiplier units and four adder units. Thereby, the n/2-bit multiplier is developed using a rounding approach. The rounding approach dispenses better design metrics in compact AM's [3]. The mathematical expressions of the suggested RAM are derived as follows:

Let us outline the multiplication operation of two operands in the binary scientific representation to illustrate its operation.

Two n-bit operands, A (multiplier) and B (multiplicand), can be written as;

$$A = \sum_{i=0}^{n-1} A_i 2^i$$
 (1)

$$B = \sum_{i=0}^{n-1} B_i 2^i$$
 (2)

First, let  $A_H$  and  $A_L$  represent the most and least significant n/2-bits of A, and  $B_H$  and  $B_L$  represent the most and least significant n/2-bits of B. Later, the values of  $A_{H}$ ,  $B_{H}$ ,  $A_L$ , and  $B_L$  in terms of A and B are determined as follows:

$$A_{H} = \sum_{i=\frac{n}{2}}^{n-1} A_{i} 2^{\frac{i}{2}}$$
(3)

$$B_H = \sum_{i=\frac{n}{2}}^{n-1} B_i 2^{\frac{i}{2}}$$
(4)

$$A_L = \sum_{i=0}^{\frac{n}{2}-1} A_i 2^{\frac{i}{2}}$$
(5)

$$B_L = \sum_{i=0}^{\frac{n}{2}-1} B_i 2^{\frac{i}{2}}$$
(6)

Next, the values of X and Y in terms of  $A_L$ ,  $A_H$ ,  $B_L$ , and  $B_H$  are determined as follows:

$$X = A_L + A_H \tag{7}$$

$$Y = B_L + B_H \tag{8}$$

The  $A_H$  and  $B_H$  are multiplied using the *N/2-bit* multiplier, which is a rounding-based multiplier and is given as

$$C \simeq A_H \times B_H \tag{9}$$

The output of both the adders A1 and A2 are multiplied using the N/2-bit multiplier, which is also a rounding-based multiplier and is given as

$$D \simeq (A_L + A_H) \cdot (B_L + B_H \tag{10}$$

Equation (10) is subtracted from Eq. (9). The output is given as

$$E \simeq (A_L + A_H).(B_L + B_H) - A_H \times B_H \tag{11}$$

Outputs are given to the Adder (A2) to get the final product.

Final Product 
$$\simeq 2^{n-1}C + 2^1E$$
 (12)

The proposed RAM Architecture is illustrated in Fig. 1. Initially, A and B are divided into  $A_H$ ,  $A_L$ ,  $B_H$ , and  $B_L$  using four n/2-bit separators. Later,  $A_H$  and  $A_L$  is added using adder A1, and  $B_H$  and  $B_L$  are added using another adder A2. Then,  $A_H$ , and  $B_H$  is further multiplied using Arithmetic Block—1 (AB-1), and  $A_H + A_L$  and  $B_H + B_L$  are multiplied using AB-2 [3]. Moreover, AB-1 and AB-2 outputs are subtracted using the n-bit subtractor. Following the shifter unit,





the approximate products of C and E are left shifted with n-1 and 1 bit. Finally, n-bit proposed RAM is found from the final 2n-bit adder output after applying these shifters' outputs.

One more contribution of this paper is that the shifters balance the proposed RAM's accuracy. The implementation process of the proposed RAM is discussed in the algorithm form and is given below.

| Algorithm RAM using modified Karatsuba algorithm                                            |
|---------------------------------------------------------------------------------------------|
| <b>Input</b> A, B of size n                                                                 |
| <b>Output</b> R of size 2n                                                                  |
| Require:                                                                                    |
| Wire X, Y of size $n/2$                                                                     |
| Wire $C$ , $D$ , $E$ of size $n$                                                            |
| $X = A_H + A_L$                                                                             |
| $Y = B_H + B_L$                                                                             |
| $C = A_H \times B_H$ (Multiplication done using rounding approach)                          |
| $D \simeq (A_L + A_H) \cdot (B_L + B_H) \int (Multiplication done using rounding approach)$ |
| $E \simeq D - C$                                                                            |
| $R \simeq 2^{n-1}C + 2^{1}E$                                                                |

# 4 Comparison of Simulation Results

The architectures of the suggested RAM and prior AM's [1-4, 6, 13, 18], with input-operand sizes covering from 8-bit to 32-bit are considered. These are initially coded in Verilog HDL and then synthesized with Cadence RTL Compiler using 90 *nm* CMOS Technology, with a Supply Voltage = 1 V and Operating frequency = 1 GHz, for design metrics analysis. The structures and features calculated for n-bit prior AMs are summarized in Table 1, and the equivalent design features are also shown in terms of truncation length (TL) and rounding

length (RL). The values of TL and RL are common for all prior AM's, having input widths extending from 8-bit to 32-bits.

Furthermore, this section deliberates the performance analysis in terms of delay, Power-Delay-Product (PDP), area, and power. Further, accuracy analysis is done by considering error metrics such as WCE, MRED, NED, ED, and MED, of the proposed RAM. Also, the proposed RAM and the prior AM's are compared in terms of the error and design metrics. At the bottom, the quality metrics examination is achieved in terms of SSIM and PSNR by including the proposed RAM and Prior AMs in the ISFA to demonstrate the correctness of this filter for image processing applications.

## 4.1 Performance Analysis

The performance analysis is done in terms of design metrics. The synthesized results of the 8-bit advocated and existing AM's in terms of delay, area, PDP, and power are shown in Table 2.

From Table 2, it is found that,

| Table 1 Structures and Features of Prior AM Desi | gns |
|--------------------------------------------------|-----|
|--------------------------------------------------|-----|

| Structures                                             |     | ures | Notation<br>for prior<br>AMs |  |
|--------------------------------------------------------|-----|------|------------------------------|--|
|                                                        | TL  |      |                              |  |
| Existing Karatsuba algorithm [1]                       |     |      | AM1                          |  |
| Dynamic range unbiased AM [2]                          | 6   |      | AM2                          |  |
| Rounding-Based AM [3]                                  |     | Ν    | AM3                          |  |
| Low Energy Truncation-based AM [4]                     | 3   |      | AM4                          |  |
| Dynamic-Type AM [6]                                    | n/2 |      | AM5                          |  |
| Truncation and Approximate-based Scal-<br>able AM [13] | 7   | 3    | AM6                          |  |
| Reconfigurable rounding-based AM [18]                  |     | Ν    | AM7                          |  |

Table 2 Design metrics evaluation of 8-bit existing and proposed AM's

| AM Design | Area (µm <sup>2</sup> ) | Delay (ns) | Power (mW) | PDP (fJ) |
|-----------|-------------------------|------------|------------|----------|
| AM1[1]    | 1330                    | 4.03       | 0.041      | 165      |
| AM2[2]    | 2017                    | 4.21       | 0.021      | 88       |
| AM3[3]    | 1461                    | 6.54       | 0.092      | 602      |
| AM4[4]    | 1047                    | 5.24       | 0.022      | 115      |
| AM5[6]    | 1686                    | 4.42       | 0.041      | 181      |
| AM6[13]   | 741                     | 6.89       | 0.082      | 565      |
| AM7[18]   | 2011                    | 6.52       | 0.115      | 750      |
| Proposed  | 624                     | 1.98       | 0.056      | 111      |

- Delay and area are reduced on average of 61.8%, and 52.6% for the 8-bit proposed RAM compared to the existing AMs.
- Enhanced performance in terms of power is achieved approximately by 40.8%, for the proposed 8-bit RAM, in comparison with the existing AM3, AM6, and AM7.
- PDP is improved on average of 53.8% compared to AM1, and AM3-AM7.

Furthermore, the synthesized results of the 16-bit proposed and existing AMs in terms of delay, area, PDP, and power are shown in Table 3.

From Table 3, it is observed that,

AM's

- Area and delay has reduced on average of 59.7%, and 53.2% for the 16-bit proposed RAM compared to the existing AMs.
- Table 2 shows that the proposed 16-bit proposed RAM's higher performance in terms of PDP is improved by approximately 67.2% compared to that of the existing AM1, AM3, AM4, AM6, and AM7.
- Power has reduced on average of 25% for the 16-bit proposed RAM compared to the existing AM1, AM3, AM6, and AM7.

Table 3 Design metrics evaluation of 16-bit existing and proposed

From the performance analysis, proposed RAM achieves reduced area, delay, and power compared to AM1.

#### 4.2 Accuracy Analysis

Error metrics are used to estimate the accuracy of the proposed RAM. The error metrics of chosen RAM are differentiated from the existing AM's. For this, 10 million random input patterns are used to simulate the AMs in Verilog HDL, and error metrics are generated in MATLAB. The accuracy of AMs is measured, including WCE, ED, MRED, NED, and MED [14, 16]. The Maximum Error output of the AM for the 10 million sample values applied, is known as WCE, which helps to multiply more significant values. The error metrics are computed and presented in Tables 4 and 5, for the 8-bit and 16-bit existing and proposed AMs.

From Tables 4 and 5, it is found that,

- A decrease in the value of ED is achieved for the 8-bit and 16-bit based proposed RAM nearly in the range of 85.89%—30.32%, respectively, compared with existing AMs.
- It is improved the accuracy metrics like MED, WCE, NED, and MRED that are obtained through the 8-bit and 16-bit based proposed RAM, approximately in the range of 74.25%—23.69%, 82.71%—2.52%, 23.25%,—15.25%, and 75.23%—24.44% respectively, compared with that of existing AMs.

#### 4.3 Quality Analysis of ISFA

Further, after incorporating the ISFA's in these AMs, they are computed and verified with standard images for quality metrics examination in terms of SSIM and PSNR. A convolution between the sub-matrix of the input image and the smoothed standard kernel [22] produces the smoothed pixel in ISFA. The proposed AMs are used in the convolution process, while the addition process is done by considering

AM Design Area  $(\mu m^2)$ Delay (ns) Power (mW) PDP (fJ)AM1[1] 3788 6.16 0.138 850 3605 0.022 124 AM2[2] 5.65 AM3[3] 6739 8.54 0.152 1298 AM4[4] 1446 5.73 0.024 138 AM5[6] 2278 7.06 0.031 219 AM6[13] 2092 6.89 0.191 1316 AM7[18] 4106 9.22 0.314 2895 Proposed 1111 3.19 0.135 431

| AM Design | ED        | WCE   | NED    | MED        | MRED     |
|-----------|-----------|-------|--------|------------|----------|
| AM1[1]    | 994394846 | 50124 | 0.3107 | 1.53E+04   | 3.12E-06 |
| AM2[2]    | 443952900 | 63098 | 0.1083 | 6.83E+03   | 8.01E-06 |
| AM3[3]    | 823636708 | 62158 | 0.2038 | 1.27E + 04 | 2.37E-05 |
| AM4[11]   | 443952900 | 63098 | 0.1083 | 6.83E+03   | 8.02E-06 |
| AM5[7]    | 319955568 | 62106 | 0.0793 | 4.92E + 03 | 5.04E-06 |
| AM6[19]   | 325889758 | 62950 | 0.0797 | 5.01E+03   | 5.16E-06 |
| AM7[13]   | 458567992 | 16376 | 0.4315 | 7.05E + 08 | 5.01E-06 |
| Proposed  | 403015756 | 49154 | 0.1484 | 1.25E + 03 | 2.47E-06 |

the exact adders. PSNR and SSIM [23] are used to evaluate the performance of ISFA's incorporated with AM's.

Moreover, the performance of ISFA's incorporated with the proposed RAM and existing AMs is examined with quality metrics. Three standard images [22] are used to evaluate the quality of ISFA incorporated with AMs. Table 6 denotes the computed quality metrics.

Table 4 shows that SSIM and PSNR of the ISFA incorporated proposed RAM are improved in the range of 1.44%—84.47% and 0.28%- 24.4% over the ISFA incorporated existing AMs. Table 4 also provides the performance

analysis based on PSNR and SSIM values for different faltered images.

Finally, the Gaussian smoothening operation is done using all AMs. In contrast, addition and subtraction operations are very precise, and the Smoothening is done using RAM. Figure 2 shows some results obtained after filtering three standard images of  $256 \times 256$  size. Figure 2 shows the ISFA images obtained using RAM and existing AMs for three standard images. The evaluation of design, error, and quality metrics shows that the proposed RAM's performance depends on the rounding value of the input operand.

| AM Design | ED         | WCE        | NED      | MED        | MRED     |
|-----------|------------|------------|----------|------------|----------|
| AM1[1]    | 9282551686 | 47498      | 0.3005   | 2.42E+04   | 3.67E-05 |
| AM2[2]    | 6269131273 | 41293      | 0.1849   | 2.39E + 02 | 6.08E-08 |
| AM3[3]    | 1.33E+11   | 4.29E + 09 | 1.18E-04 | 5.08E + 05 | 3.71E-06 |
| AM4[11]   | 718644558  | 22418      | 0.1223   | 2.75E+03   | 7.86E-07 |
| AM5[7]    | 626913473  | 41293      | 0.1849   | 2.39E+02   | 6.08E-08 |
| AM6[19]   | 1.72E + 12 | 3.76E+09   | 0.0018   | 6.57E+06   | 3.81E-06 |
| AM7[13]   | 7.03E + 14 | 9.82E + 04 | 1        | 1.08E + 04 | 1.54E-05 |
| Proposed  | 845672378  | 40254      | 0.1408   | 2.27E + 03 | 3.47E-06 |

Table 6Performance analysisof RAM using proposed andexisting designs for ISFA

Table 5Error metricsevaluation between 16-bitexisting and proposed AMs

| Image    | Cameraman |      | Lena  |      | Girl  |      |
|----------|-----------|------|-------|------|-------|------|
|          |           |      | Lena  |      | GIA   |      |
| Metrics  | SSIM      | PSNR | SSIM  | PSNR | SSIM  | PSNR |
| Proposed | 0.812     | 36.1 | 0.801 | 35.8 | 0.882 | 35.2 |
| AM1[1]   | 0.799     | 35.4 | 0.781 | 34.1 | 0.802 | 34.3 |
| AM2[2]   | 0.649     | 34.2 | 0.592 | 33.1 | 0.839 | 34.5 |
| AM3[3]   | 0.798     | 35.8 | 0.808 | 35.5 | 0.872 | 34.5 |
| AM4[4]   | 0.756     | 35.9 | 0.799 | 36.0 | 0.862 | 34.5 |
| AM5[6]   | 0.648     | 33.9 | 0.618 | 33.3 | 0.677 | 30.5 |
| AM6[13]  | 0.472     | 32.2 | 0.472 | 33.2 | 0.776 | 33   |
| AM7[18]  | 0.752     | 30.4 | 0.795 | 30.3 | 0.799 | 31.1 |

**Fig. 2** ISFA with proposed RAM for **a** Cameraman.jpg **b** Leena.jpg, and **c** Girl.jpg



(b)

It provides the best performance with an only small width of rounding value.

# 5 Conclusion

This paper provides the design of rounding-based AM (RAM), using a modified Karatsuba algorithm. The proposed RAM accuracy is high compared to existing AMs. It even proved efficient in terms of power, area, and delay compared to existing AMs. MED, WCE, MRED, NED, and ED attained better values for the proposed RAM. The proposed RAM is analyzed for ISFA, and 8-bit RAM also illustrates that the area, delay, and PDP are reduced approximately by 52.6%, 61.8%, and 53.8% compared to prior AMs. Also, the 16-bit RAM shows that the area, delay, and PDP are reduced nearly by 59.7%, 53.2%, and 67.2%, respectively, compared to that of prior AMs. Moreover, the proposed architecture RAM can be added and extended to real-time ISFA.

**Data Availability** Data sharing does not apply to this article as no datasets were generated or analyzed during the Proposed and Existing AMs.

#### Declarations

**Conflict of Interest** I certify that this article has no actual or potential conflict of interest.

**Competing Interests** The authors declare that they have no known competing financial interests or personal relationships that could influence the work reported in this paper.

### References

- Jain R, Pandey N (2021) Approximate Karatsuba multiplier for error-resilient applications. AEU-Int J Electron Commun 130:153–579
- Hashemi S, Bahar RI, Reda S (2015) DRUM: A dynamic range unbiased multiplier for approximate applications. In: Proc. of IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, pp. 418–25
- Zendegani R, Kamal M, Bahadori M, Afzali-Kusha A, Pedram M (2016) RoBA multiplier: A rounding-based approximate multiplier for high-speed yet energy-efficient digital signal processing. IEEE Transact Very Large Scale Integr (VLSI) Syst 25(2):393–401
- Akhlaghi V, Gao S, Gupta RK (2018) Lemax: learning-based energy consumption minimization in approximate computing with quality guarantee. In: Proc. of the 55th Annual Design Automation Conference. IEEE, pp. 1–6
- Babič Z, Avramovič A, Bulič P (2008) An iterative Mitchell's algorithm based multiplier. In: Proc. of IEEE International Symposium on Signal Processing and Information Technology. IEEE, pp. 303–308
- Bhardwaj K, Mane PS, Henkel J (2014) Power-and area-efficient approximate Wallace tree multiplier for error-resilient systems. In: Proc. of Fifteenth International Symposium on Quality Electronic Design. IEEE, pp. 263–269

- Garg B, Sharma GK (2017) ACM: An energy-efficient accuracy configurable multiplier for error-resilient applications. J Electron Test 33(4):479–489
- Kundi DES, Bian S, Khalid A, Wang C, O'Neill M, Liu W (2020) AxMM: Area and power efficient approximate modular multiplier for R-LWE cryptosystem. In: Proc. of IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, pp. 1–5
- Leon V, Zervakis G, Soudris D, Pekmestzi K (2017) Approximate hybrid high radix encoding for energy-efficient inexact multipliers. IEEE Transact Very Large Scale Integr (VLSI) Syst 26(3):421–430
- Pandey D, Singh S, Mishra V, Satapathy S, Banerjee DS (2021) SAM: A Segmentation based Approximate Multiplier for Error Tolerant Applications. In: Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, pp. 1–5
- Vahdat S, Kamal M, Afzali-Kusha A, Pedram M (2017) LETAM: A low energy truncation-based approximate multiplier. Comput Electr Eng 63:1–17
- Chandaka S, Narayanam B (2022) Hardware Efficient Approximate Multiplier Architecture for Image Processing Applications. J Electron Test 38:217–230
- Garg B, Patel S (2021) Reconfigurable Rounding Based Approximate Multiplier for Energy-Efficient Multimedia Applications. Wireless Personal Commun 118:919–931
- Garg B, Patel SK, Dutt S (2020) Loba: A leading one bit based imprecise multiplier for efficient image processing. J Electron Test 36(3):429–437
- Gorantla A, Deepa P (2019) Design of approximate adders and multipliers for error tolerant image processing. Microprocess Microsyst 72:1–7
- Jothin R, Vasanthanayaki C (2018) High-Performance Modified Static Segment Approximate Multiplier based on Significance Probability. J Electron Test 34:607–614
- Liang J, Han J, Lombardi F (2013) New metrics for the reliability of approximate and probabilistic adders. IEEE Transact Computer 62(9):1760–1771
- Moons B, Verhelst M (2015) Dvas: Dynamic voltage accuracy scaling for increased energy-efficiency in approximate computing. In: Proc. of IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, pp. 237–242
- Vahdat S, Kamal M, Afzali-Kusha A, Pedram M (2019) TOSAM: An energy-efficient truncation-and rounding-based scalable approximate multiplier. IEEE Transact Very Large Scale Integr (VLSI) Systems 27(5):1161–1173
- Jiang H, Santiago FJH, Mo H, Liu L, Han J (2020) Approximate Arithmetic Circuits: A Survey, Characterization, and Recent Applications. In: Proc. of the IEEE, 108(12):2108–2135
- Myler HR, Weeks AR (2009) The Pocket Handbook of Image Processing Algorithms in C. Englewood Cliffs, NJ, and USA: Prentice-Hall
- 22. Garg B, Sharma G (2016) A quality-aware energy-scalable Gaussian smoothing filter for image processing applications. Microprocess Microsyst 45:1–9
- Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

**Publisher's Note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. **E. Jagadeeswara Rao** received his B. Tech. and M. Tech. degrees in Electronics and Communication Engineering from JNTU University, Kakinada, AU, Visakhapatnam, India, in 2010 and 2015. He is a part-time Research Scholar at Pondicherry University, Pondicherry, India and an Assistant professor at Vignan's Institute of Engineering for Women in the Department of Electronics and Communication Engineering, Visakhapatnam, Andhra Pradesh, India. His areas of research include Approximate Multipliers, and Efficient Design of Arithmetic Elements. He has 9 years of teaching and 1 year of Industrial experience. He has supervised and guided various projects at undergraduate and graduate levels.

**K. Tarakeswara Rao** received his B. Tech. degree in Electronics and Communications Engineering from JNTUH, India, in 2005 and M.E degree in Electronics and Communication Engineering from Andhra University Visakhapatnam, India, in 2009. He is a faculty and Research scholar at GITAM (Deemed to be University). His areas of research include VLSI and biomedical signal processing. He has 13 years of teaching experience. He has supervised and guided various projects at undergraduate and graduate levels.

**K. Sudha Ramya** received her master's in Electronics and Instrumentation from Andhra University in 2009. She submitted her PhD (part-time) thesis to Andhra University, Instrument Technology

Department, in 2021. In 2012, she was appointed a lecturer in EIE, through APPSC, at Government Polytechnic for Women, Srikakulam and cleared UGC—NET in Electronic Science in 2020. She has 17 years of experience teaching and guiding projects at the undergraduate level.

**D. Ajaykumar** is working as Assistant Professor at Sir C. R. Reddy College of Engineering in the Department of Electronics and Communication Engineering, West Godavari, Andhra Pradesh, India. His areas of research include Approximate Multipliers, and Efficient Design of Arithmetic Elements. He has supervised and guided various projects at undergraduate and graduate levels.

**R. Trinadh** is working as Assistant Professor at Sir C. R. Reddy College of Engineering in the Department of Electronics and Communication Engineering, West Godavari, Andhra Pradesh, India. His areas of research include Approximate Multipliers, and Efficient Design of Arithmetic Elements. He has supervised and guided various projects at undergraduate and graduate levels.