## True Minimum Energy Design Using Dual Below-Threshold Supply Voltages

Kyungseok Kim and Vishwani D. Agrawal Department of ECE, Auburn University, Auburn, AL 36849, USA kyungkim@auburn.edu, vagrawal@eng.auburn.edu

Abstract—This paper investigates subthreshold voltage operation of digital circuits. The minimum energy per cycle operating point with a single voltage for this mode is known. We further lower the energy per cycle below that point by using dual subthreshold supplies. We call this the true minimum. Special considerations are used in the design for eliminating level converters. We give new mixed integer linear programs (MILP) that automatically and optimally assign gate voltages, avoid the use of level converters, and determine and hold the minimum critical path delay, while minimizing the total energy per cycle. Using examples of a 16-bit ripple-carry adder and a  $4 \times 4$  multiplier we show energy savings of 23% and 5%, respectively. The latter is a worst case example because most paths are critical. Alternatively, for the same energy as that of single below-threshold supply, an optimized dual voltage design can operate at 3 to 4 times higher clock rate. The MILP optimization with special consideration for level converters is general and applicable to any supply voltage range.

### I. INTRODUCTION

Ultra-low power applications such as micro-sensor networks, pacemakers, and many portable devices require extreme energy constraint for long battery lifetime. Subthreshold operation presents an opportunity for such energy-constrained applications with its very low energy consumption and low to medium clock frequency operation [6], [13], [14], [15].

As the power supply voltage  $(V_{dd})$  is scaled below the device threshold voltage  $(V_{th})$ , the subthreshold leakage currents charge and discharge load capacitances according to the logic function of the circuit. This weak driving current inherently limits application only to low performance systems. Dynamic voltage scaling (DVS) can provide useful system applications by switching between a highly energy efficient subthreshold  $V_{dd}$  mode and a normal above-threshold  $V_{dd}$  mode. The normal or subthreshold mode may be chosen according to the workload of the system [1]. To exploit the time slack on non-critical paths, some designs use dual voltages within a mode. Although dual voltage operation for above threshold  $V_{dd}$  has been studied [12], below-threshold dual voltages have not been examined until the work presented here.

Small energy increase from the absolute minimum energy point of a subthreshold circuit can notably improve performance [3]. Utilizing the time slack for dual  $V_{dd}$  assignments can give valuable energy saving with small extra cost in physical design. To the best of our knowledge this paper is the first to present a dual  $V_{dd}$  scheme for subthreshold logic circuits to get the true minimum energy point that is an improvement over the known minimum energy operating point. Our contribution provides a framework for finding the

optimal dual  $V_{dd}$  assignments in subthreshold circuits with given speed requirement. The design procedure formulates mixed integer lineal programs (MILP) that, given today's computing capabilities, can deal with moderately large circuit complexity [4].

In a dual voltage circuit, signal level converters are considered essential. Level converters insert delays and consume power [7], [16]. In the absence of level converters, certain interfaces become unsatisfactory. Especially, driving a high  $V_{dd}$  gate with a low voltage signal presents problems of high leakage and long delay. We characterize the multi-level interfaces and our MILP contains constraints to eliminate voltage converters.

The paper is organized as follows. Section II introduces properties of subthreshold operating circuits with key terms. In Section III, we extend the existing dual  $V_{dd}$  techniques of above-threshold operation, clustered voltage scaling (CVS) [11] and extended-CVS (ECVS) [12] to the subthreshold regime. New MILP solutions are presented in Section IV. Section V reports SPICE simulation results to validate MILP solutions. Finally, conclusion of this work is given in Section VI.

### II. SUBTHRESHOLD CIRCUITS

Before optimizing the minimum energy of subthreshold circuits by dual- $V_{dd}$  assignments, we briefly summarize the properties of subthreshold circuits in terms of functional operation and failure, performance, and energy in this section.

### A. Minimum Operating Voltage

For the correct functional operation of a subthreshold logic circuit, the supply voltage  $V_{dd}$  should be higher than a certain minimum voltage  $(V_{min})$ . The theoretical  $V_{min}$  is given as [17],

$$V_{min} = 2 \cdot V_T \cdot ln \left( 1 + \frac{S}{ln10 \cdot V_T} \right) \tag{1}$$

where  $V_T=kT/q$  is the thermal voltage,  $k=1.381 \times 10^{-23}$  J/K is Boltzmann's constant, T is absolute temperature in Kelvin,  $q=1.602\times 10^{-19}$  C is electronic charge and S is the subthreshold swing. From [5], S is degraded with the downscaling trend of the CMOS technology, which means that the reduced ratio of on-current  $I_{on}$  at  $V_{gs}=V_{ds}=V_{dd}$  to off-current  $I_{off}$  at  $V_{gs}=0$  and  $V_{ds}=V_{dd}$  in subthreshold region ( $V_{dd}< V_{th}$ ) causes smaller noise margins and possible functional logic failures at or below  $V_{min}$ .

### B. Delay

The delay of a gate in a subthreshold circuit can be simply formulated from the gate delay equation [5],

$$t_d = \frac{K \cdot C_L \cdot V_{dd}}{I_{on}} \tag{2}$$

where K is a fitting parameter and  $C_L$  is the load capacitance of the gate. By replacing  $I_{on}$  with subthreshold drain current  $(I_{sub})$  [14],

$$I_{sub} = I_o \cdot 10^{\left(\frac{V_{gs} - V_{th} + \eta V_{ds}}{S}\right)} \cdot \left(1 - e^{\frac{-V_{ds}}{V_T}}\right)$$
 (3)

where  $\eta$  is drain-induced barrier lowering (DIBL) coefficient and  $I_o$  is drain current at  $V_{gs}=V_{th}$ . When  $V_{gs}=V_{ds}=V_{dd}\gg V_T$ , we get gate delay as,

$$t_d = \frac{K \cdot C_L \cdot V_{dd}}{I_0 \cdot 10^{\left(\frac{(\eta+1)V_{dd} - V_{th}}{S}\right)}}.$$
 (4)

Thus,  $t_d$  is exponentially dependent on  $V_{dd}$ ,  $V_{th}$ ,  $\eta$ , and S.

### C. Energy

Energy per cycle of a circuit is a key parameter for energy efficiency in ultra-low power applications. Because computing workload is characterized in terms of clock cycles, this measure directly relates energy consumption to the workload. Before considering the energy consumed by a circuit, we start by examining the total energy per cycle  $(E_{tot})$  of a single gate, which is composed of dynamic energy  $(E_{dyn})$  and leakage energy  $(E_{leak})$ :

$$E_{dyn} = \alpha_{0 \to 1} \cdot C_L \cdot V_{dd}^2 \tag{5a}$$

$$E_{leak} = P_{leak} \cdot t_d$$

$$= I_{off} \cdot V_{dd} \cdot t_d$$

$$= K \cdot C_L \cdot V_{dd}^2 \cdot 10^{\frac{-V_{dd}}{S}}$$
(5b)

$$E_{tot} = E_{dyn} + E_{leak}$$

$$= \left(\alpha_{0 \to 1} + K \cdot 10^{\frac{-V_{dd}}{S}}\right) \cdot C_L \cdot V_{dd}^2$$
(5c)

where  $\alpha_{0\to 1}$  is the low to high transition activity for the gate output node and  $P_{leak}$  is static leakage power.

# III. DUAL- $V_{dd}$ SCHEME FOR SUBTHRESHOLD OPERATION

Scaling  $V_{dd}$  down in circuits reduces both dynamic power and static leakage power besides reducing the performance. To reduce power consumption without degrading performance, a multi- $V_{dd}$  technique exploits time slacks and lowers voltage  $V_{DDL}$  for gates on non-critical paths.

As shown in Figure 1(a), a clustered voltage scaling (CVS) algorithm [11] does not allow the  $V_{DDL}$  cells to feed directly into  $V_{DDH}$  cells and so level converting is implemented inside the filp-flop (LCFF). This topological limitation reduces full use of time slacks that exist in a circuit. The extended clustered voltage scaling (ECVS) in Figure 1(b) eliminates this constraint by inserting a level converter (LC) with each  $V_{DDL}$  cell feeding into a  $V_{DDH}$ 



(a) Clustered voltage scaling (CVS).



(b) Extended clustered voltage scaling (ECVS).



(c) Level converter (LC).

Figure 1. Dual  $V_{dd}$  schemes and level converter schematic [11], [12].



Figure 2. A two-inverter chain without level converter.

cell. ECVS gives better power saving than CVS but LC adds to power and delay overheads.

Without a level converter the low to high output transition delay of the second stage inverter in Figure 2 is not affected by the input voltage swing  $V_{DDL}$  from the previous stage, because the delay of the pull-up PMOS is only dependent on its own power supply  $V_{DDH}$  [10]. During the high to low output transition of the second inverter, the pull-down NMOS delay is affected by both the input swing  $V_{DDL}$  and the power supply  $V_{DDH}$ . Therefore, lower input swing reduces discharge current through the NMOS, which increases the pull-down delay. Because the pull-up PMOS in the inverter could not be shut off completely by the lower input swing level, severe DC current from the power supply  $V_{DDH}$  induces higher static leakage power consumption.

In subthreshold operation, the lower input swing exponentially increases the delay (4) of the driven gate. We investigate the delay and leakage power penalty from lower input swing voltage. For simplicity, in this paper, we use only four types of cells, namely, INV, NAND2, NAND3 and NOR2, to synthesize example circuits. For cell characterization, all simulation results are from SPICE using the Predictive Technology Model (PTM) for 90 nm CMOS [18]. CMOS device threshold voltages are  $V_{th,PMOS} = 0.21V$ 

Table I Measurement of a gate delay with a single INV load and static leakage power in Figure 3 configurations at  $V_{DDH}=250mV$  and  $V_{DDL}=200mV$  through SPICE simulation for PTM 90 nm CMOS.

|       | Gate delay, $t_d$ (ns) |        |        |        |            |        | Leakage power, $P_{leak}$ (pW) |        |        |            |  |  |
|-------|------------------------|--------|--------|--------|------------|--------|--------------------------------|--------|--------|------------|--|--|
| Gate  | (a) LL                 | (b) HH | (c) HL | (d) LH | (e) L-LC-H | (a) LL | (b) HH                         | (c) HL | (d) LH | (e) L-LC-H |  |  |
| INV   | 2.81                   | 0.83   | 2.98   | 2.70   | 255.04     | 30.9   | 46.2                           | 22.8   | 126.2  | 260.8      |  |  |
| NAND2 | 6.82                   | 2.10   | 5.31   | 7.92   | 260.32     | 31.1   | 45.3                           | 26.2   | 101.5  | 259.9      |  |  |
| NAND3 | 9.72                   | 3.04   | 7.31   | 11.17  | 264.16     | 53.1   | 75.6                           | 49.0   | 135.5  | 290.2      |  |  |
| NOR2  | 8.33                   | 2.54   | 8.91   | 5.73   | 262.27     | 32.6   | 48.4                           | 20.8   | 156.6  | 263.0      |  |  |

Table II Comparison of conventional LC ( Figure 1(c) ) delays normalized to INV(FO=4) delay ( $V_{DD}=V_{DDH}$ ) for normal and subthreshold operations in PTM 90 nm CMOS.

| Gate delay           | $\begin{aligned} & \text{Normal} \\ & V_{DDH} = 1.2V \\ & V_{DDL} = 0.8V \end{aligned}$ | $\begin{array}{c} \text{Subthreshold} \\ V_{DDH} = 300mV \\ V_{DDL} = 250mV \end{array}$ |  |  |  |
|----------------------|-----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|--|--|--|
| INV(FO=4)            | 23.64 ps                                                                                | 1.52 ns                                                                                  |  |  |  |
| LC                   | 112.33 ps                                                                               | 121.86 ns                                                                                |  |  |  |
| LC norm. to INV(FO4) | 4.8                                                                                     | 80.2                                                                                     |  |  |  |

and  $V_{th,NMOS} = 0.29V$  at nominal  $V_{dd} = 1.2V$  and room temperature (300K).

Various input and output configurations interfacing gates in dual  $V_{dd}$  assignments are shown in Figure 3. Table I summarizes the delay and static leakage power for each case where  $V_{DDH}=250mV$  and  $V_{DDL}=200mV$  such that the entire operation is in subthreshold region. The difference between LL and HH delays shows that gate delay (4) is exponentially sensitive to the power supply voltage, while  $P_{leak}$  has a smaller change.

In Table I, as expected, due to smaller discharging time constants HL delays for NAND2 and NAND3 gates are lower than those for the LL configuration. However, that is not the case for INV and NOR2 gates, which are faster in the LL configuration. This speed increase is due to a higher logic 0 level for the LL configuration in charging time. In the case of leakage power for HL, all gates suppress the leakage current through the pull-up PMOS ( $V_{gs}>0$ ) from the power supply. Severe increases of the delay and power in dual  $V_{dd}$  schemes are from LH, which is prohibited in CVS methodology and is allowed in ECVS with LC. But, a common LC used for above-threshold in Figure 1(c) cannot be used due to its unacceptable delay overhead besides the power overhead.

From Table II, the LC delay penalty in subthreshold operation is around 80 fanout-of-four (FO4) inverter delays, which exceeds a clock cycle time of pipelined microprocessor (13-15 FO4 delays) or ASIC processor (44 FO4 delays) [2]. A new LC design suitable for subthreshold circuits may be needed but is out of the scope of the present work. In the next section, we include additional constraints in the MILP that will not allow the LH configuration (similar to CVS) for energy optimization.

### IV. MILP FOR $V_{DDL}$ ASSIGNMENT

In this section, we design minimum energy circuits with dual  $V_{dd}$  assignments using mixed integer linear programming (MILP) [4]. First, the optimal (i.e., minimum energy per cycle) supply voltage  $(V_{opt})$  for a single  $V_{dd}$  operation is determined. The critical path delay (or clock cycle time)



(a) LL: Low input swing driving a low  $V_{dd}$  gate.



(b) HH: High input swing driving a high  $V_{dd}$  gate.



(c) HL: High input swing driving a low  $V_{dd}$  gate.



(d) LH: Low input swing driving a high  $V_{dd}$  gate.



(e) L-LC-H: Low input swing driving a high  $V_{dd}$  gate through a level converter.

Figure 3. Driven gates and input swing levels.

of this design is used as the timing requirement for the dual voltage design. Thus, the MILP automatically applies higher supply voltage  $V_{DDH} = V_{opt}$  to gates on critical paths to maintain the performance and finds an optimal lower supply voltage  $V_{DDL}$  assigned to gates on non-critical paths to reduce the total energy consumption by a global



Figure 4. Topological constraints.

optimization considering all possible  $V_{DDL}$ . This differs from the backward traversal CVS heuristic algorithms that tend to be non-optimal. Note that more paths now may have delays that are either equal or close to the critical path delay.

Let  $X_i$  be an integer variable that is 0 for  $V_{DDH}$  or 1 for  $V_{DDL}$  for the power supply assignment of gate i. Let  $T_c$  be a predetermined critical path delay for the circuit. The optimal minimum energy voltage assignment problem is formulated as an MILP model:

$$\text{Minimize } \sum_{i \in \text{ all gates}} \left[ E_{tot, V_{DDL}, i} \cdot X_i + E_{tot, V_{DDH}, i} \cdot (1 - X_i) \right]$$
(6)

 $E_{tot,i}$  for  $V_{DDL}$  and  $V_{DDH}$  are given by (5a) and (5b)

$$E_{tot,i} = \alpha_i \cdot C_{L,i} \cdot V_{dd,i}^2 + P_{leak,V_{dd},i} \cdot T_c \tag{7}$$

Subject to timing constraints:

$$t_{d,i} = t_{d,V_{DDL},i} \cdot X_i + t_{d,V_{DDH},i} \cdot (1 - X_i) \quad \forall i \in \text{all gates}$$
(8)

$$T_i \ge T_j + t_{d,i} \qquad \forall j \in \text{all fanin gates of gate } i$$
 (9)

$$T_i \le T_c$$
  $\forall i \in \text{all primary output gates}$  (10)

Subject to topological constraints:

$$X_i - X_j \ge 0$$
  $\forall j \in \text{all fanin gates of gate } i$  (11)

In above constraints,  $T_i$  is the latest arrival time at the output of gate i corresponding to a primary input event [8]. As mentioned in Section III, the unacceptable delay penalty of asynchronous LC prohibits its use in a dual  $V_{dd}$  scheme in the subthreshold region. The MILP model does not allow a  $V_{DDL}$  cell to drive a  $V_{DDH}$  cell as its fanout gate on account of topological constraint (11) as shown in Figure 4. Thus, the LH configuration of Figure 3(d) never occurs in the optimized circuit. Within the given timing constraint  $T_c$ , originally obtained for the best energy per cycle for single subthreshold  $V_{DDH}$  operation, the MILP searches for the best  $V_{DDL}$  such that the energy per cycle is further reduced to a true minimum.

### V. RESULTS

As mentioned before, we use only simple four basic cells (INV, NAND2, NAND3 and NOR2) for synthesizing two example circuits, a 16-bit ripple carry adder and a  $4 \times 4$  multiplier, in PTM 90nm CMOS technology. As shown in



Figure 5. Simulation setup.



(a) Energy per cycle for single  $V_{dd}$ .



(b) Energy per cycle for single and dual subthreshold supply voltages.

Figure 6. Energy per cycle for a 16-bit ripple carry adder for single  $V_{dd}$  and dual  $V_{dd}$  in subthreshold region, activity factor  $\alpha=0.21$ , PTM 90nm CMOS.

Figure 5, our example circuit, embedded in a test bench, is driven by randomly generated high input swing flipflops. Two subthreshold voltages may be provided by a DC to DC voltage converter [9], [15]. The energy per cycle measurement is for combinational circuit excluding flipflops. From Figure 6(a), the minimum energy point for a 16-bit ripple carry adder with an activity factor  $\alpha=0.21$  is  $9.65\,f\mathrm{J}$  at  $V_{dd}=0.21V$ . The clock frequency was found to be  $2.15\mathrm{MHz}$ . With dual  $V_{dd}$  assignments the optimized circuit with  $V_{DDH}=0.21V$  and  $V_{DDL}=0.14V$  reduces the energy per cycle by up to 23.6% retaining the same performance. This energy reduction is shown by the downward arrow in Figure 6(b).



Figure 7. Output signal waveforms of s1 and s1q in a 16-bit ripple carry adder at minimum operating voltage,  $V_{DDL}=0.09V, \, {\rm in \,\, SPICE}$  simulation, PTM 90nm CMOS.

Consider again the minimum energy per cycle  $(9.65\,f\mathrm{J})$  operation of the 16-bit ripple-carry adder circuit with a single subthreshold voltage  $0.21\mathrm{V}$  and a clock frequency of  $2.15\mathrm{MHz}$ . In an alternative design, we may hold the minimum energy constant and improve the performance. From the MILP results in Table III, we find that operation with two voltages  $0.27\mathrm{V}$   $(V_{DDH})$  and  $0.19\mathrm{V}$   $(V_{DDL})$  consumes  $9.42\,f\mathrm{J}$ , which is just under the minimum energy but has a clock frequency  $8.41\mathrm{MHz}$ . This, as shown by the right arrow in Figure 6(b), has about  $4\mathrm{X}$  speed improvement.

Table III summarizes SPICE simulation giving the total energy per cycle for the single voltage  $V_{dd} = V_{DDH}$  reference and the optimized dual voltage  $V_{dd} = \{V_{DDH}, V_{DDL}\}$  circuits. Voltages vary from 0.1V to 0.3V. Both single and dual  $V_{dd}$  circuits have the same speed because all gates on critical paths have the same  $V_{DDH}$  for either circuit.

Logic function failure occurs at 0.08V in NAND3, so the possible lowest  $V_{DDL}$  assignment in MILP optimization is 0.09V. This minimum operating voltage guarantees 10% to 90% output voltage swing for all four cells in the full range of operational voltages used. Figure 7 shows sample signal waveforms from an optimized 16-bit ripple carry adder circuit for  $V_{DDH} = 0.11V$  and  $V_{DDL} = 0.09V$ . This has  $V_{DDL}$  assigned to cells on a non-critical path that leads to the least significant sum bit (s1). The output flip-flop (s1q) holds correct signal values at the minimum operating voltage on positive clock edges.

When  $V_{DDH}$  is 100mV, it is approaching the lower end of its range beyond which the circuit would fail to operate. The MILP now has limited choices for a solution and gives a  $V_{DDL}$  that provides smaller energy saving. The 16-bit ripple carry adder has better energy reduction because it can utilize more time slack from non-critical paths compared to the  $4\times 4$  multiplier with more balanced paths. The gate delay in subthreshold operation increases exponentially with reducing supply voltage, which forces the optimal  $V_{DDL}$  close to  $V_{DDH}$ .

Even though the MILP model only allows HL configuration and eliminates the use of LC for a dual  $V_{dd}$  circuit block, level conversion may be needed at outputs to match signal levels across block to block connections of a system.



Figure 8.  $V_{DDL}$  bound for given  $V_{DDH}$  with LH configured cells.

The differential cascode voltage switch (DCVS) based level converter of a normal standard cell library in Figure 1(c) is not suitable for dual subthreshold design due to its huge delay penalty. Realizing that the design of LC for ultra low voltage is an open problem, our design refrains from using level converters while taking the penalty of energy saving into account. For level converting, we always assign  $V_{DDH}$ to primary output (PO) gates before the output flip-flops at multiple voltage boundaries between circuit blocks. The PO gates driven by  $V_{DDL}$  cells are found to correctly execute their logic functions if for a given  $V_{DDH}$ ,  $V_{DDL}$  is bounded as shown in Figure 8. This lowest possible  $V_{DDL}$  raises the minimum operating voltage for the dual voltage optimized circuit block. The optimal  $V_{DDL}$  in MILP model can be higher than its true optimal value to suppress DC leakage power of the LH configured PO gates. Using two small example circuits, a 16-bit ripple-carry adder and a  $4 \times 4$ multiplier show average reduced energy savings of 11.9% and 2.6%, respectively. The penalty of energy saving from level converting may be negligible for a large system in which most blocks would operate at  $V_{DDL}$  and only a few need  $V_{DDH}$ .

### VI. CONCLUSION

In this paper, we investigate the validation of dual  $V_{dd}$ assignments to a bulk CMOS subthreshold circuit. Some applications in the market may need minimum energy consumption without a performance concern. This work could provide a framework for solving those design problems. For a wide range of speed requirements, the MILP determines globally minimum energy optimized circuit configurations by assigning an extra supply voltage  $V_{DDL}$  to gates on noncritical paths. A 16-bit ripple carry adder shows on average 20.5% reduced energy consumption, while maintaining same performance as the original single  $V_{dd}$  circuit. The worst case example of  $4 \times 4$  multiplier still gives on average 4.9% reduction. Further, allowing a small amount of increase in the energy consumption can significantly speed-up the subthreshold operation of a logic circuit. The methodology of dual  $V_{dd}$  assignment is valid for substantial speed-up without energy increase, as well as for energy reduction below the minimum achievable in a single voltage circuit.

The MILP techniques of this paper are not restricted to subthreshold operation alone. When a higher performance,

 ${\it Table~III}$  Total energy per cycle with optimal  $V_{DDL}$  for given  $V_{DDH}$  and maximum corresponding speed.

|           | 16-bit ripple carry adder ( $\alpha = 0.21$ , total gates = 176) |           |                  |                |           |       |           | $4 \times 4$ multiplier ( $\alpha = 0.32$ , total gates = 140) |                  |                |           |       |  |  |
|-----------|------------------------------------------------------------------|-----------|------------------|----------------|-----------|-------|-----------|----------------------------------------------------------------|------------------|----------------|-----------|-------|--|--|
| $V_{DDH}$ | $V_{DDL}$                                                        | $V_{DDL}$ | $E_{tot,single}$ | $E_{tot,dual}$ | reduction | Freq. | $V_{DDL}$ | $V_{DDL}$                                                      | $E_{tot,single}$ | $E_{tot,dual}$ | reduction | Freq. |  |  |
| (V)       | (V)                                                              | gate #    | (fJ)             | (fJ)           | (%)       | (MHz) | (V)       | gate #                                                         | (fJ)             | (fJ)           | (%)       | (MHz) |  |  |
| 0.10      | 0.09                                                             | 108       | 19.40            | 17.52          | 9.7       | 0.13  | 0.09      | 18                                                             | 13.78            | 13.35          | 3.1       | 0.16  |  |  |
| 0.11      | 0.09                                                             | 106       | 17.55            | 14.64          | 16.6      | 0.17  | 0.09      | 18                                                             | 12.44            | 11.80          | 5.1       | 0.21  |  |  |
| 0.12      | 0.10                                                             | 106       | 15.83            | 13.38          | 15.5      | 0.22  | 0.10      | 18                                                             | 11.41            | 10.85          | 4.9       | 0.27  |  |  |
| 0.13      | 0.10                                                             | 101       | 14.31            | 11.51          | 19.6      | 0.28  | 0.10      | 15                                                             | 10.61            | 10.08          | 5.0       | 0.35  |  |  |
| 0.14      | 0.11                                                             | 101       | 13.00            | 10.58          | 18.6      | 0.37  | 0.11      | 15                                                             | 10.04            | 9.56           | 4.8       | 0.46  |  |  |
| 0.15      | 0.11                                                             | 99        | 11.92            | 9.27           | 22.3      | 0.48  | 0.11      | 15                                                             | 9.69             | 9.13           | 5.8       | 0.60  |  |  |
| 0.16      | 0.12                                                             | 99        | 11.14            | 8.73           | 21.6      | 0.62  | 0.12      | 15                                                             | 9.51             | 8.98           | 5.6       | 0.78  |  |  |
| 0.17      | 0.12                                                             | 95        | 10.52            | 7.99           | 24.0      | 0.80  | 0.12      | 13                                                             | 9.48             | 8.99           | 5.2       | 1.00  |  |  |
| 0.18      | 0.13                                                             | 95        | 10.04            | 7.73           | 23.0      | 1.02  | 0.13      | 13                                                             | 9.59             | 9.11           | 5.0       | 1.30  |  |  |
| 0.19      | 0.13                                                             | 88        | 9.72             | 7.42           | 23.6      | 1.32  | 0.13      | 13                                                             | 9.74             | 9.19           | 5.6       | 1.67  |  |  |
| 0.20      | 0.14                                                             | 88        | 9.66             | 7.45           | 22.9      | 1.68  | 0.14      | 13                                                             | 10.21            | 9.65           | 5.5       | 2.14  |  |  |
| 0.21      | 0.14                                                             | 84        | 9.65             | 7.37           | 23.6      | 2.15  | 0.15      | 13                                                             | 10.66            | 10.08          | 5.4       | 2.73  |  |  |
| 0.22      | 0.15                                                             | 84        | 9.73             | 7.49           | 23.1      | 2.72  | 0.15      | 12                                                             | 11.06            | 10.60          | 4.2       | 3.46  |  |  |
| 0.23      | 0.16                                                             | 84        | 10.06            | 7.80           | 22.5      | 3.44  | 0.16      | 12                                                             | 11.83            | 11.24          | 5.0       | 4.37  |  |  |
| 0.24      | 0.17                                                             | 84        | 10.40            | 8.14           | 21.8      | 4.33  | 0.17      | 12                                                             | 12.53            | 11.93          | 4.8       | 5.50  |  |  |
| 0.25      | 0.18                                                             | 84        | 10.78            | 8.48           | 21.3      | 5.43  | 0.18      | 13                                                             | 13.28            | 12.61          | 5.0       | 6.87  |  |  |
| 0.26      | 0.18                                                             | 78        | 11.31            | 8.91           | 21.2      | 6.77  | 0.19      | 13                                                             | 14.14            | 13.43          | 5.0       | 8.55  |  |  |
| 0.27      | 0.19                                                             | 78        | 11.87            | 9.42           | 20.7      | 8.41  | 0.19      | 12                                                             | 15.03            | 14.30          | 4.9       | 10.60 |  |  |
| 0.28      | 0.20                                                             | 78        | 12.49            | 9.97           | 20.2      | 10.39 | 0.20      | 12                                                             | 15.98            | 15.22          | 4.8       | 13.06 |  |  |
| 0.29      | 0.22                                                             | 88        | 13.16            | 10.52          | 20.1      | 12.79 | 0.21      | 12                                                             | 16.98            | 16.19          | 4.7       | 16.02 |  |  |
| 0.30      | 0.23                                                             | 88        | 13.88            | 11.16          | 19.6      | 15.65 | 0.22      | 12                                                             | 18.03            | 17.21          | 4.5       | 19.54 |  |  |
| Average   |                                                                  |           |                  |                | 20.5      |       |           |                                                                |                  |                | 4.9       |       |  |  |

impossible to achieve in the subthreshold region, is required we would then obtain two above-threshold voltages that will satisfy the performance criteria and minimize the energy per cycle. There may be potential for greater energy saving as circuit size increases due to larger critical path delay leading to greater slack for many gates. The process variation of the device thrshold voltage  $(V_{th})$  can seriously affect a subthreshold voltage design and this needs to be studied especially for nanometer technologies. Higher leakage technologies may display higher speed in the subthreshold region because the logic operation relies on leakage currents. These aspects of dual  $V_{dd}$  design in subthreshold region are worth exploring in the future.

**Acknowledgment** - This research was supported by the Wireless Engineering Research and Education Center at Auburn University.

### REFERENCES

- [1] B. H. Calhoun and A. P. Chandrakasan, "Ultra-Dynamic Voltage Scaling (UDVS) Using Sub-Threshold Operation and Local Voltage Dithering," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 1, pp. 238–245, 2006.
- [2] D. G. Chinnery and K. Keutzer, "Closing the Gap Between ASIC and Custom: an ASIC Perspective," in *Proc. 37th Design Automation Conference*, 2000, pp. 637–642.
- [3] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," *Proc. IEEE*, vol. 98, no. 2, pp. 253–266, Feb. 2010.
- [4] R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A Mathematical Programming Language. Brooks/Cole-Thomson Learning, 2003.
- [5] S. Hanson, M. Seok, D. Sylvester, and D. Blaauw, "Nanometer Device Scaling in Subthreshold Logic and SRAM," *IEEE Trans. on Electron Devices*, vol. 55, no. 1, pp. 175–185, 2008.
- [6] C. H. I. Kim, H. Soeleman, and K. Roy, "Ultra-Low-Power DLMS Adaptive Filter for Hearing Aid Applications," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 11, no. 6, pp. 1058–1067, 2003.

- [7] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits*. Prentice-Hall, second edition, 2003.
- [8] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program," in *Proceedings of 16th International Conference on VLSI Design*, Jan. 2003, pp. 527–532.
- [9] Y. Ramadass and A. Chandrakasan, "Voltage scalable switched capacitor dc-dc converter for ultra-low-power onchip applications," in *Power Electronics Specialists Confer*ence, 2007. PESC 2007. IEEE, 2007, pp. 2353–2359.
- [10] K. Roy, L. Wei, and Z. Chen, "Multiple-Vdd Multiple-Vth CMOS (MVCMOS) for Low Power Applications," in *IEEE Int. Symp. on Circuits and Systems*, 1999, pp. 366–370.
- [11] K. Usami and M. Horowitz, "Clustered Voltage Scaling Technique for Low-Power Design," in *Proceedings of Inter*national Symposium on Low Power Design, 1995, pp. 3–8.
- [12] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanzawa, M. Ichida, and K. Nogami, "Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 3, pp. 463–472, 1998.
- [13] R. Vaddi, S. Dasgupta, and R. P. Agarwal, "Device and Circuit Design Challenges in the Digital Subthreshold Region for Ultralow-Power Applications," *VLSI Design*, vol. 2009, pp. 1–14, Jan. 2009.
- [14] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold Design for Ultra Low-Power Systems. Springer, 2006
- [15] A. Wang and A. Chandrakasan, "A 180mV FFT Processor Using Subthreshold Circuit Techniques," in *IEEE Interna*tional Solid-State Circuits Conference Digest of Technical Papers, 2004, pp. 292–529.
- [16] N. H. E. Weste and D. M. Harris, CMOS VLSI Design. Boston: Addison-Wesley, fourth edition, 2009.
- [17] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, "Theoretical and Practical Limits of Dynamic Voltage Scaling," in Proc. 41st Design Automation Conf., 2004, p. 873.
- [18] W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration," *IEEE Trans. Electron Devices*, vol. 53, no. 11, pp. 2816–2823, 2006.