# Semi-Analytical Current Source Modeling of Near-Threshold Operating Logic Cells Considering Process Variations

Qing Xie, Tiansong Cui, Yanzhi Wang, Shahin Nazarian, and Massoud Pedram University of Southern California Department of Electrical Engineering Los Angeles, California, United States, 90089 {xqing, tcui, yanzhiwa, snazaria, pedram}@usc.edu

Abstract – Operating circuits in the ultra-low voltage regime results in significantly lower power consumption but can also degrade the circuit performance. In addition, it leads to higher sensitivity to various sources of variability in VLSI circuits. This paper extends the current source modeling (CSM) technique, which has successfully been applied to VLSI circuits to achieve very high accuracy in timing analysis, to the near-threshold voltage regime. In particular, it shows how to combine non-linear analytical models and lowdimensionality CSM lookup tables to simultaneously achieve modeling accuracy, space and time efficiency, when performing CSM-based timing analysis of VLSI circuits operating in near-threshold regime and subject to process variability effects.

*Index terms* – near-threshold computing, statistical timing analysis, current-source modeling, process variation

## I. INTRODUCTION

Near-threshold (NT) operation regime has emerged as a particularly effective technique for reducing circuit power consumption [1]. According to [2], voltage scaling from superthreshold regime (e.g., 1.1V) down to the near-threshold regime (e.g., 0.5V) yields an energy reduction on the order of 10X at the expense of approximately 10X performance degradation. This data underlines the fact that, when the performance targets are low, the NT operation can result in a significant enhancement in circuit power efficiency. However, circuits operating in the NT regime are quite sensitive to the process-induced variations that emanate from the manufacturing process imperfections. It has been reported in [3] that, in a 90nm CMOS technology, the  $3\sigma$  delay variation of a combinational logic block operating at 0.5V is 2.5 times higher than the  $3\sigma$  delay variation of the same circuit operating at 1V. Therefore, the variability of the important parameters, such as the threshold voltage  $V_{th}$  and effective gate length  $L_{eff}$  should be carefully accounted for during the timing analysis of circuits operating in the NT regime.

Statistical static timing analysis (SSTA) is a well-known method to verify the timing of the circuits. Considerable efforts on SSTA with process variation have been invested in developing statistical gate delay models [4]~[11]. Among these models, *current-source-based logic cell modeling* (CSM) has been introduced to calculate the exact shape of the output signal

waveform. The CSM method builds an equivalent circuit model of the logic gate using independent current sources and several equivalent capacitances. Values of the current sources and capacitances are pre-characterized and recorded into the standard CSM look-up tables (LUTs), in which the terminal voltages are used as index keys. The output waveforms are calculated in a discrete time manner using the pre-characterized LUTs, according to given input waveforms. CSM method achieves very high accuracy in producing output waveforms and calculating delays. In addition, during the evaluation phase, the CSM method accesses several pre-characterization LUTs to obtain values of currents and capacitances. Thus, it is much faster compared to the circuit simulator such as the HSPICE, which solves for these information iteratively all the time. Thanks to these capabilities, CSM methods are used in the timing analysis and effectively reduce the errors in delay calculation [8].

We consider two most important sources of the process variation: the Random Dopant Fluctuation (RDF), which affects the threshold voltage V<sub>th</sub>; and the Line-Edge Roughness (LER), which results in variable effective channel length  $L_e$ . We derive accurate current-based cell models for standard logic cells operating in the NT regime subject to process variations. We adopt the current-based logic cell model that was previously developed for logic cells in the super-threshold regime from [9]. For the output parameter of interest (e.g., the cell's output current), we derive analytical equations relating it to terminal voltages, nominal values and variations of process parameters. We perform the regression over the characterization data and store coefficients in analytic equations into LUTs. We demonstrate that the proposed method captures the driving current under process variations more accurately in the NT regime, compared to conventional methods. We also compare calculated output waveforms using proposed CSM method with HSPICE results considering the input noise and process variation. The waveforms obtained using the proposed CSM method for standard cells in Synopsys 32/28nm technology [13] and simple circuits operating in the NT regime achieve very high accuracy.

### II. CSM IN THE NEAR-THRESHOLD REGIME

#### A. Equivalent Circuit Models of the Standard Logic Cells

We start from building the equivalent circuit model for the logic cells in the standard cell library. Without loss of generality, we consider three types of standard cells: inverter, two-input NAND gate (NAND2), and two-input NOR gate (NOR2). Figure 1 shows the equivalent circuit model for NAND2 (NOR2) under the single input switching assumption. Besides the input voltage level  $V_i$  and the output voltage level  $V_o$ , the  $V_X$  stands for the

This research is sponsored in part by grants from the Defense Advanced Research Projects Agency and the National Science Foundation.

voltage level at the internal node between two series-connected NMOS of the pull-down network of the NAND2 or the pull-up network of the NOR2. We include this as an index key, therefore, each component in Figure 1 is recorded in LUTs with three index keys  $(V_i, V_o, V_X)$ .



Figure 1. Equivalent circuit model for a NAND2 (NOR2) gate. Each component has three voltage dependencies:  $V_i$ ,  $V_o$  and  $V_X$ .

The timing analysis using CSM consists of two phases in general: characterization phase and evaluation phase. In the characterization phase, an equivalent circuit model for each logic cells in the standard cell library is proposed and the accurate circuit simulators (e.g., HSPICE) are used to obtain the components at different samples of the input and output voltages. In the evaluation phase, the output waveforms are calculated using pre-characterized driving currents and equivalent capacitances, as well as values of the input voltages. We calculate the change of the terminal voltages after that time step by solving the differential equations. The accuracy of CSM methods depends on the sampling precision of the input and output voltages, i.e., step-size of  $V_i$ ,  $V_o$  and  $V_X$  in the characterization phases.

To extend the CSM-based method into the NT regime, a key requirement is to appropriately capture variations of components in the equivalent model due to the process variation. A small change in the threshold voltage results in a large change in the driving current (and hence the gate delay). We propose an efficient way to construct the process variability-aware semianalytical CSM for the standard logic cells based on the physical relations of currents, terminal voltages and other process parameters. We perform the characterization of the driving currents and equivalent capacitances for different voltage levels at all samples of process variation, e.g.,  $(-\Delta L^K, -\Delta L^{K-1}, ..., 0, , \Delta L^K)$  and  $(-\Delta V_{th0}^S, \Delta V_{th0}^{S-1}, ..., 0, ..., \Delta V_{th0}^S)$ , where the maximum level of variation,  $\Delta L^K$  and  $\Delta V_{th0}^S$ , are determined by the process corner, and *K* and *S* are determined by the desired sampling precision.

## B. Modeling the Driving Current

We take the characterization of the inverter as an example, which has to voltage dependence  $V_i$  and  $V_o$ . We characterize the driving currents for all possible combinations of  $V_i$  and  $V_o$  at every sample of process variation. At each  $(V_i, V_o)$ , we obtain the driving current of NMOS,  $I_{nmos}(V_i, V_o, \Delta L^k, \Delta V_{th0}^s)$ , and that of PMOS,  $I_{pmos}(V_i, V_o, \Delta L^k, \Delta V_{th0}^s)$  as a function of the process parameter  $\Delta L^k$  and  $\Delta V_{th0}^s$ . Note that in (1), the  $V_{ds}$  and  $V_{gs}$  are fixed and determined by  $V_i$  and  $V_o$ . Therefore, based on (1), we fit the  $I_{nmos}$  and  $I_{pmos}$  with respect to  $\Delta L^k$  and  $\Delta V_{th0}^s$  using the following form presented in [12],

$$I(V_i, V_o, \Delta L^k, \Delta V_{th0}^s) = \frac{C(V_i, V_o)}{L_e} \cdot \exp(A(V_i, V_o) \cdot V_{th}^2 + B(V_i, V_o) \cdot V_{th})$$
<sup>(2)</sup>

where  $A(V_i, V_o)$ ,  $B(V_i, V_o)$ , and  $C(V_i, V_o)$  are fitting coefficients. The dependencies of driving current on  $V_{ds}$  and  $V_{gs}$  in (1) are absorbed into these fitting coefficients. Equation (2) shows a current equation which involves the effective channel length  $L_e$  and threshold voltage  $V_{th}$ .

## 1) Impact of Line Edge Roughness effect

LER effect causes variation of channel length, which subsequently results in the variation of driving currents due to two reasons: first, the current is inversely proportional  $(\sim 1/L_e)$  to the channel length; and second, the  $V_{th}$  also depends on the channel length according to the drain-induced barrier lower (DIBL) effect. This effect is described as,

$$L_e = L_0 + \Delta L$$

$$dV_{th}^{DLBI} = -\theta(L_e) \cdot (\Phi(L_e) + V_{ds})$$
(3)

where  $L_0$  is the intrinsic channel length and  $\Delta L$  is channel length variation,  $\Phi(L_e)$  and  $\theta(L_e)$  are fitting parameters, and both of them have a strong dependency on the channel length. Coefficients  $\Phi(L_e)$  and  $\theta(L_e)$  in (3) are functions of  $L_e$ . For the sake of memory complexity, we perform a linear curve fitting on the parameters  $\Phi(L_e)$  and  $\theta(L_e)$  versus channel length *L*.

## 2) Impact of Random Dopant Fluctuation effect

RDF is another important variation source which causes the variability of the threshold voltage. Although the threshold voltage variation induced by RDF is proportional to the  $(WL_e)^{-1/2}$ , the variation of channel length is typically small (~10%) and thus the dependency of RDF distribution on the channel length is negligible. Differentiate the RDF effect from the LER, we denote the threshold voltage variation caused by RDF as  $\Delta V_{th0}$ . Thus, the threshold voltage is given by,

$$V_{th} = V_{th0} + dV_{th}^{DLBL} + \Delta V_{th0} \tag{4}$$

where  $V_{th0}$  is the original threshold voltage.

## C. Parasitic Capacitances Modeling

The equivalent CSM shown in Figure 1 consists of several non-linear voltage-dependent capacitances. Among them,  $C_i(V_i, V_o)$  and  $C_o(V_i, V_o)$  model the parasitic effects at the input and output nodes of the cell, while the Miller capacitance,  $C_M(V_i, V_o)$  models the Miller effect between these two nodes. Both of the process variation sources affect all equivalent capacitances. The LER effect affects the physical capacitances as these capacitances are functions of the dimension of the transistors. For the RDF effect, the HSPICE simulation results show that equivalent capacitances at different  $\Delta V_{th0}$  are different. We perform curve fitting to relate the equivalent capacitances to both of the process parameters for each  $(V_i, V_o)$  combination using,

$$C_{i}(V_{i}, V_{o}) = C_{i0}(V_{i}, V_{o}) + a_{P}(V_{i}, V_{o})\Delta L_{e,P} + a_{N}(V_{i}, V_{o})\Delta L_{e,N} + b_{P}(V_{i}, V_{o})\Delta V_{thOP} + b_{N}(V_{i}, V_{o})\Delta V_{thON}$$
(5)

where  $C_{i0}(V_i, V_o)$  is the nominal input capacitance in CSM,  $\Delta L_{e,P}$  and  $\Delta L_{e,N}$  are variations of the channel length of PMOS and NMOS transistors. Similar fittings are performed for  $C_o$  and  $C_M$ .

## D. CSM Look-up Table Construction

After the characterization phase, we perform the curve fittings and record coefficients into the LUTs with index of interested voltage levels. In the evaluation phase, for the specific level of process variation, e.g.,  $\Delta L$  and  $\Delta V_{th0}$ , we use coefficient LUTs to reconstruct the standard CSM LUTs such as the ones in [6]. The standard CSM LUTs are used to calculate the output voltage waveform based on the given input voltage waveform, as shown in Figure 2. The standard 3D CSM LUTs are generated for NAND2 and NOR2 in a similar way.

The proposed semi-analytical CSM method produces lowdimensional LUTs and thereby significantly reduces the memory complexity, compared to recording standard CSM LUTs at all samples of process variation. In practice, the variations of the process parameters are random and normally described using random distributions. In this case, the distribution of the process parameters such as  $\Delta L$  and  $\Delta V_{th0}$  can be used to replace the deterministic values of  $\Delta L$  and  $\Delta V_{th0}$  in this flow and generate accurate distributions of the standard CSM LUTs. With these distributions of the LUTs, we can calculate the output waveforms statistically. We plan to extend the proposed CSM method to perform SSTA in the future.



Figure 2. Flow of the proposed semi-analytical CSM method.

#### III. EXPERIMENTAL RESULTS

We adopt Synopsys 32/28nm technology [13], in which the threshold voltages of the standard NMOS is 0.44V and that of the standard PMOS is -0.26V. We set the supply voltage to 0.5V so that the circuits are operated in the NT regime. To ensure that voltage characterization covers the range of the noise, we sweep the input and output voltage from -200mV to +700mV with the interval of 10 mV. We consider 10% variation on the process parameters  $\Delta L$  and  $\Delta V_{th0}$ , and perform characterization to more than 200 different samples of  $\Delta L$  and  $\Delta V_{th0}$ . The characterization is based on HSPICE, and the entire process for all logic cells takes about one hour on a Debian 7 machine with 16 Intel E7-8837 2.66 GHz CPUs and 64 GB memory.

We compare our work with two baseline process variation handling methods: (i) the CSM with first order correction like [9][8] and (ii) the CSM with second order correction of the process parameters like [10]. We compare proposed method and baseline methods to golden results generated using the HSPICE simulator. We first show the capability of the proposed semianalytical CSM method in capturing the driving currents at different samples of process variation for the standard cells. After that, we show the improvement of proposed method in determining the output waveform under the noisy inputs. Finally, we demonstrate the accuracy of the proposed CSM methods in dealing with the delay calculation of a simple circuit under the process variation.

### A. Driving currents under the process variation

Accurately capturing the driving currents is the key step in the CSM-based method. We first perform a complete characterization process over the standard logic cells for all interested terminal voltages and samples of process variation. We apply the proposed semi-analytical CSM method and generate the LUTs of fitting coefficients. We generate 2D coefficient LUTs with the index keys of  $V_i$  and  $V_o$  for the inverter, and 3D coefficient LUTs with the index keys of  $V_i$ ,  $V_o$ ,  $V_x$  for the NAND2 and NOR2.

We compare the average value and maximum value of the error function of the driving currents captured by the proposed

method and two baseline methods, as a percentage of the golden results for these three logic cells in Table 1, Table 2, and Table 3, respectively.

Table 1. Comparison of the driving currents captured by the proposed methods and baseline methods for the inverter.

| Inverter | Error %           | Proposed |     | First-order |      | Second-order |      |
|----------|-------------------|----------|-----|-------------|------|--------------|------|
|          |                   | Avg      | Max | Avg         | Max  | Avg          | Max  |
|          | Inmos             | 4.8      | 7.6 | 30.3        | 51.6 | 6.03         | 13.1 |
|          | I <sub>pmos</sub> | 4.2      | 6.6 | 41.8        | 60.5 | 9.88         | 16.1 |

Table 2. Comparison of the driving currents captured by the proposed methods and baseline methods for the NAND2, single input switching on A (near output) and B (near ground), respectively.

|         | Error %             | Proposed |      | First-order |      | Second-order |      |
|---------|---------------------|----------|------|-------------|------|--------------|------|
|         |                     | Avg      | Max  | Avg         | Max  | Avg          | Max  |
| NAND2-A | I <sub>nmos,A</sub> | 1.7      | 6.8  | 10.4        | 55.9 | 2.1          | 14.3 |
|         | I <sub>nmos,B</sub> | 0.2      | 4.6  | 7.7         | 10.7 | 0.3          | 1.0  |
|         | I <sub>pmos,A</sub> | 2.7      | 10.2 | 14.2        | 57.1 | 3.0          | 14.8 |
|         | I <sub>pmos,B</sub> | 0        | 0    | 0           | 0    | 0            | 0    |
| NAND2-B | I <sub>nmos,A</sub> | 2.5      | 6.4  | 12.4        | 52.5 | 2.2          | 13.2 |
|         | I <sub>nmos,B</sub> | 0.2      | 8.8  | 12.3        | 49.7 | 2.4          | 12.4 |
|         | I <sub>pmos,A</sub> | 0        | 0    | 0           | 0    | 0            | 0    |
|         | I <sub>pmos,B</sub> | 0.1      | 10.2 | 14.2        | 57.1 | 3.0          | 14.8 |

Table 3. Comparison of the driving currents captured by the proposed methods and baseline methods for the NOR2, single input switching on A (near output) and B (near  $V_{dd}$ ), respectively.

|        | Error %             | Proposed |      | First-order |      | Second-order |      |
|--------|---------------------|----------|------|-------------|------|--------------|------|
|        |                     | Avg      | Max  | Avg         | Max  | Avg          | Max  |
| NOR2-A | I <sub>nmos,A</sub> | 1.6      | 7.6  | 7.8         | 45.0 | 1.3          | 10.4 |
|        | I <sub>nmos,B</sub> | 0        | 0    | 0           | 0    | 0            | 0    |
|        | I <sub>pmos,A</sub> | 2.6      | 15.6 | 12.7        | 65.9 | 2.8          | 17.7 |
|        | I <sub>pmos,B</sub> | 0.1      | 2.3  | 16.7        | 24.1 | 2.1          | 3.3  |
| NOR2-B | I <sub>nmos,A</sub> | 0        | 0    | 0           | 0    | 0            | 0    |
|        | I <sub>nmos,B</sub> | 0.1      | 7.6  | 7.8         | 45.0 | 1.3          | 10.4 |
|        | I <sub>pmos,A</sub> | 3.6      | 10.7 | 16.5        | 63.5 | 3.4          | 16.9 |
|        | I <sub>pmos.B</sub> | 0.1      | 7.4  | 16.0        | 58.1 | 3.4          | 15.2 |

One can observe from these tables that in most cases the proposed CSM method outperforms the first-order or secondorder variation handling methods. Although in general the secondorder correction method gives good average error, it does not accurately capture the dependency of driving currents on the process parameters in the extreme variation conditions, i.e., the maximum error is much larger than that of the proposed method. Regarding the memory complexity, the proposed CSM method adopts three coefficients (i.e., A, B, and C in (2)), which is the same as the first-order correction but 50% less than six coefficients used in second-order correction.

#### B. Output waveform under the noisy inputs

We generate the standard CSM LUTs to calculate the output waveforms based on the pre-characterized LUTs. Having obtained the standard CSM LUTs, we calculate output waveforms for two example input profiles. Figure 3 and Figure 4 compare the calculated output waveforms with the waveforms obtained using HSPICE simulation, under a step input and a noisy input, respectively. In Figure 3, we simulate three cases of  $\Delta V_{th0}$ variation levels: -10mV, -30mV, and -50mV. As  $\Delta V_{th0}$  decreases, the falling delay decreases since the NMOS transistor, whose threshold voltage is positive, is discharging during the transition. This is opposite to the situation in Figure 4. One can observe that for both input profiles, the proposed CSM method consistently outperforms baseline methods and reproduce the output waveform with very high accuracy. Although baseline methods occasionally perform well in some cases (50mV in Figure 4), due to the fact that the fitting error is uncontrollable, the baseline curves may over- or under-estimate the delays. For the examples, the proposed CSM method achieves up to 15.0% and 12.2% error reduction in delay calculation in these two cases, respectively, compared to baseline methods.



Figure 3. Output waveforms for different CSM variation handling techniques under a step rising input at different threshold voltage variation levels: -10mV (solid), -30mV (dashed), and -50mV (dotted).



Figure 4. Output waveforms for different CSM variation handling techniques under a noise falling input at different threshold voltage variation levels: 10mV (dotted), 30mV (dashed), and 50mV (solid).

We use the proposed CSM method to analyze the timing behaviors of a 10-stage inverter chain operating in the NT regime, as shown in Figure 5. The variations of process parameter  $\Delta L$  and  $\Delta V_{th0}$  are assigned randomly for each inverter with maximum variation level of 10%. One can observe that mismatches between the HSPICE results and baseline waveforms accumulate with the number of stages. The delay errors of the baseline methods are 5.0% and 10.7%, respectively. In contrast, errors of the calculated waveform using the proposed CSM almost do not increase as the signal propagates. The proposed method shows a very high accuracy with an error of only 0.2% in delay calculation, compared to HSPICE results. The proposed CSM method shows the capability of maintaining the accuracy of the waveform even after many stages of the circuits.



Figure 5. Output waveforms for a 10-stage inverter chain under local process variation.

## IV. CONCLUSION

The circuits operating in the near-threshold (NT) regime suffer a hugh impact from the process varition. We proposed a semi-analynical Current Source Model (CSM) method to analysis circuits operating in the NT regime considering the process variation. We considered two variation sources, including channel length and threshold voltage. We characterized driving currents and equivalent capacitaors in equaivalent current-based circuit models at different variation situations for all voltage combinations. We analyzed driving currents under the process varation in the NT regime, and performed non-linear curve fittings to relate driving currents with respect to variation parameters. We stored fitting parameters in look-up tables (LUTs) and reconstructed LUTs of driving current and equivalent capacitor, according to the specific variation situation. Experimal results demonstrated very high accuracy in captuing the diriving currents of the standard logic cells and significant error reductions in delay calculation of circuits in the NT regime.

#### REFERENCES

- H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, "Near-threshold voltage (NTV) design — Opportunities and challenges," *Design Automation Conference*, 2012.
- [2] R.G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," *Proceedings of the IEEE*, vol. 98, no. 2, pp. 253-266, Feb. 2010.
- [3] S. Seo, R.G. Dreslinski, M. Woh, P. Yongjun, C. Charkrabari, S. Mahlke, D. Blaauw, and T. Mudge, "Process variation in near-threshold wide SIMD architectures," *Design Automation Conference* (DAC), pp. 980-987, 3-7 June 2012.
- [4] C. Visweswariah, K. Ravindran, K. Kalafala, S.G. Walker, S. Narayan, D.K. Beece, J. Piaget, N. Venkateswaran, and J.G. Hemmett, "First-Order Incremental Block-Based Statistical Timing Analysis," *IEEE Transactions on CAD*, Oct. 2006.
- [5] D. Blaauw, V. Zolotov, and S. Sundareswaran, "Slope propagation in static timing analysis," *Trans. Computer Aided-Design of Integrated Circuits & Systems*, pp. 1180-1195, 2002.
- [6] J.F. Croix, and D.F. Wong, "Blade and razor: cell and interconnect delay analysis using current-based models," *Design Automation Conference* (DAC), pp. 386-389, 2003.
- [7] I. Keller, K. Tseng, and N. Verghese, "A robust cell-level crosstalk delay change analysis," *Proc. of Int'l Conf. on Computer Aided Design* (DAC), pp. 147-154, 2004.
- [8] V. Veetil, D. Sylvester, and D. Blaauw, "Fast and Accurate Waveform Analysis with Current Source Models," *Proc. of Int'l Symp. On Quality Electronic Design*, 2008.
- [9] H. Fatemi, S. Nazarian, and M. Pedram, "Statistical logic cell delay analysis using a current-based model," *Design Automation Conference* (DAC), pp. 253-256, 2006.
- [10] A. Goel, and S. Vrudhula, "Statistical waveform and current source based standard cell models for accurate timing analysis," *Design Automation Conference* (DAC), 2008.
- [11] C. Kashyap, C. Amin, N. Menezes, and E. Chiprout, "A Nonlinear Cell Macromodel for Digital Applications", in *ICCAD*, 2007.
- [12] D.M. Harris, B. Keller, J. Karl, and S. Keller, "A transregional model for near-threshold circuits with application to minimumenergy operation," *Microelectronics International Conference* (ICM), pp. 64-67, 19-22 Dec. 2010.
- [13] Synopsys, Inc. http://www.synopsys.com/Community/ UniversityProgram/ Pages/32-28nm-generic-library.aspx.