Gate Delay Calculation Considering the Crosstalk Capacitances
Soroush Abbaspour and Massoud Pedram
University of Southern California
{soroush, massoud}@sahand.usc.edu

Abstract
In this paper, we present a new technique for calculating the output waveform of CMOS drivers for cross-coupled RC loads. The proposed technique is based on an effective capacitance calculation for each driver and an efficient, provably convergent, iteration scheme between the coupled drivers. Our technique can easily handle different input arrival times, transition times, and polarities, and can be extended to multiple cross-coupled drivers in a straightforward manner. Experimental results show that the new technique exhibits high accuracy (less than 4% error in average).

1. Introduction
As we continue to exploit deep submicron (DSM) technologies to design faster and smaller circuits, we must revisit the problem of calculating the gate propagation delay. This is an important design problem that is becoming more involved because of the highly nonlinear behavior of CMOS logic gates in the DSM region. Since interconnect modeling and RC model order reductions have advanced significantly over the past several years, it is reasonable to assume that we can accurately extract and model the various R and C parasitics as well as the capacitive coupling between interconnect lines. However, since we are stuck in a pattern of computing gate delays only when grounded linear capacitances load the gate, we are immediately faced with the problem of load modeling for the purpose of gate delay calculation [15].

Modeling the coupling capacitances is the most difficult step in computing the load. Sometimes this problem is addressed by modeling the coupling capacitors as elements to ground, with modified values of capacitances. For example, for opposite direction switching of two identical coupled lines, switching at the same instance of time, the coupling capacitance can be accurately modeled as twice the amount of line capacitance to ground. Although such approximation tends to yield pessimistic delay values, in general, it does not provide an upper bound on the delay for many realistic coupling scenarios [15,21].

An example of VLSI routing is depicted in Figure 1. If there are significant coupling capacitances among these lines, then transitions on some subset of lines (aggressors) can affect the output behavior of the remaining lines (victims). Furthermore, because of different signal arrival times, slew rates, and switching behavior (falling or rising) of transitions on the aggressor lines, the output waveforms of the victim lines may vary by a lot [15,21].

![Figure 1: An example VLSI routing scenario](image)

To analyze noise, due to capacitance coupling, one can start with a reduced order coupled interconnect model and calculate the outputs on a quiet victim line by superimposing the coupled signals from all other lines. Such a model, while inexact, can provide a reasonable approximation since the nonlinear CMOS gate of the non-switching victim line is behaving like a transistor in its linear region of operation, therefore, is modeled fairly well by a linear resistor. However, when the victim line itself is also switching, the problem becomes much more complex. As the victim line switches, the impedance of its driving gate changes by orders of magnitude, thereby, influencing the amount of coupling voltage. Obviously, such effects can accurately be modeled in SPICE, but due to the large circuit sizes, it is desirable to perform such analyses at the highest possible level of abstraction.

1.1 Gate Delay Calculation for Capacitive Loads
The gate propagation delay is divided into two terms: the intrinsic gate delay and the (external) gate load delay. The intrinsic gate delay is due to the native characteristics of the CMOS devices (e.g., transistors) in the gates/cells. More precisely, it is equal to gate propagation delay under zero load condition. The load delay captures the timing effect of the load on the gate propagation delay.

Figure 2(a) depicts a CMOS gate, which drives a purely capacitive load \( C_L \), where one of its inputs switches with a signal transition time of \( T_{in} \) causing the output of the gate to change. The gate propagation delay is a function of the input transition time and the output load. In commercial ASIC cell libraries, it is possible to characterize various output transition times (e.g. 10%, 50%, and 90%) as a function of the input transition time and output capacitance, i.e.,

\[
T_p = \alpha(T_{in}, C_L)
\]

where \( \alpha \) denotes the percentage of the output transition, \( T_p \) is the output delay with respect to the 50% point of the input signal, and \( T_{in} \) is the corresponding delay function. The delay description function can be obtained in various ways. Two common approaches for the gate propagation delay computation are based on (1) the use of a Thevenin equivalent circuit for the driver, which is in turn composed of a voltage source and a series resistance, and (2) the delay tables. The first approach is difficult to deal with and is not as accurate as the second approach, which is currently in wide use especially in the ASIC design flow.

The algorithm for finding the output waveform of a gate is reviewed next. Given: 1) the capacitive load, \( C_L \), 2) the input transition time, \( T_{in} \), 3) the 50% transition point of the input waveform, \( \delta \), the output waveform is obtained as follows:

1. **Draw_Output_Waveform** \((\delta, T_{in}, C_L)\)
   1. For \( \alpha = 10\%, 50\% \), and 90% do
      \[
      T_{in, CL} = \text{Calc_Delay}(\delta, T_{in}, C_L, \text{Table}(T_{in}, C_L, \alpha)) \]
   2. Draw the output waveform according to above data

2. **Calc_Delay** \((\delta, T_{in}, C_L, \text{Table}(T_{in}, C_L, \alpha))\)
   1. From Table\( (T_{in}, C_L, \alpha) \) according to \( T_{in} \) and \( C_L \), find the 50% input to \( \alpha \% \) output propagation delay, add \( \delta \) to this value, and call it \( T_{in} \)
   2. Return \( T_{in} \)

In VDSM technologies, we cannot neglect the effect of interconnect resistances of the load. Using the sum of all load capacitances as the capacitive load is a simple, yet quite pessimistic, approximation [7]. A more accurate approximation for an \( n \)th order load seen by the gate/cell (i.e., a load with \( n \) distributed capacitances to ground) is to use a second order \( RC-\pi \) model [3, 5]. Equating the first, second, and third moments of the admittance of the real load with the first, second, and third moments of the \( RC-\pi \)-load [19], we can find \( C_L \), \( R_L \) and \( C_2 \) as shown in Figure 3. It follows that for accurate gate delay calculation, we can use a four-dimensional delay table, where the dimensions are \( T_{in}, C_L, R_L \), and \( C_2 \). However, this is costly in terms of storage and computational requirements. Therefore, the “effective capacitance” approach has been proposed [4, 8] whereby the \( RC-\pi \) load is approximated by an equivalent capacitance. Consequently, it is possible to continue to use a two-dimensional table lookup to calculate the propagation delay to the output quantile point (i.e., 10%, 50%, and 90%).

The crosstalk capacitances affect the output transition time of each node of the load. Indeed, this effect has become more apparent in VDSM technologies where the coupling capacitances between interconnect lines have become quite important in terms of their relative magnitude compared with the area plus parasitics.
fringing capacitances of these lines. It is thus quite important to directly model the effect of the coupling capacitances on the gate output waveforms. An even more accurate approximation for this case is to consider n drivers where each of them drives an RC-π load. Obviously, there may be coupling capacitances between the near-end points and far-end points of the RC-π load as well as capacitive couplings between the near-end output terminals of each driver and the input terminals of the other drivers. The general load model for two drivers is shown in Figure 4 [3, 5, 16, 17].

![Figure 3: A gate/cell, which drives an RC-π calculated load](image)

| Given a complicated load with these two types of capacitive couplings (load-to-load and input-to-load couplings), one can use moment matching techniques to model the load and parasitic networks as an RC network depicted in Figure 4. In other words, given an extracted netlist, it is possible to calculate the precise values of all the R and C elements in Figure 4, where could be positive or negative. This calculation is not the focus of our paper, which starts by assuming that the various R and C values of the model of Figure 4 are already known. For accurate gate delay calculation, we may start from a multi-dimensional delay table, where the dimensions are the input transition times, the difference between the 50% transition points of the input waveforms, values of capacitances to ground, coupling capacitances, and shielding resistances, etc. However, this is extremely costly (if practical at all) in terms of the storage and computational requirements. Therefore, we ought to develop an alternative approach that allows us to avoid constructing and relying on such tables.

![Figure 4: Two cross coupled CMOS gates each driving an RC-π load and exhibiting 1) interconnect coupling and 2) input-output coupling](image)

In this paper, we present an algorithm, which calculates the output waveform of each driver driving a general load considering coupling capacitances in deep submicron technologies as shown in Figure 4. Using the algorithms, we can find the output waveform of each driver where the input transitions can have falling or rising behavior and could have overlap and etc. We show that our algorithm provides accurate results that are in a very good match for some important points of the output waveform of the CMOS drivers (e.g., 10%, 50% and 90%).

The remainder of this paper is organized as follows. Section 2 reviews effective capacitance calculation approach that approximates the output waveform of a single-stage gate [10]. The authors calculate the effective capacitance by equating the charges at the gate output when using the driving-point admittance of the load and using a single effective capacitance as the load. Average charges for both loads models are equated until the gate output voltage reaches the 50% threshold. Qian's effective capacitance approach is costly in terms of its computational calculations and requires a large number of iterations (e.g., 5 to 10 iterations). It also involves empirical equations that assume fast input transitions.

Kahng and Muddu [11, 13] propose a number of effective capacitance algorithms. In their latest approach [13], they state that by using the voltage of output pin of the gate/cell, they can find a non-iterative and fast method for calculating the effective capacitance that accurately matches the output waveforms in a range from 0.3Vdd to 0.6Vdd. However, finding the output transition time (from the complex set of equations that the authors present) can be very costly. Furthermore, in reality, the driver resistance in their model is a function of the output load and input transition time and can thus vary greatly. However, the authors use a single value for the driver resistance corresponding to the case that the driver sees the total capacitances of a load.

In [19] authors calculate an effective capacitance, which approximately matches both 50% propagation delay and output transition time with reasonable accuracy. Their approach is analytical and has good performance. However, their analytical expressions can be applied to low-order circuits and are not suitable for high-order circuits (with more storage elements).

Furthermore, the previous papers do not prove the convergence of their iterative algorithms. In this paper, we propose an optimal lookup-table based effective capacitance algorithm for both speed and accuracy; such that using the capacitances, we can approximate the voltage waveform of the output terminal of the gates, which is less sensitive with respect to lookup table.

### 2.2 A New Effective Capacitance Calculation Algorithm

**Problem Statement:** Given is a CMOS driver where its input rise time is \( T_{\text{rise}} \) and an output load modeled by an RC-π circuit which is shown in Figure 6, find the output waveform of the driver.

Due to the shielding effect of the resistance, the effective capacitance can be written as \( C_{\text{eff}} = C_0 + kC_2 \) where \( 0 \leq k \leq 1 \). Our algorithm gives an iterative approach for calculating \( k \). Consider a unit step voltage source that drives an RC circuit in Figure 5(a). The current flowing into the RC circuit in Laplace domain is calculated as [18]:

\[
I(s) = V(s)Y(s) = \frac{C}{sRC + 1}
\]

![Figure 5: Effective capacitance concepts](image)

The total charge induced into the capacitance up to time \( T \) is equal to the area, which is shown in Figure 5(b). In this case, it can be calculated as:

\[
q(t) = \int_0^t i(t) dt = \int_0^T \frac{C}{R} e^{-\frac{t}{RC}} dt = C(1 - e^{-\frac{T}{RC}})
\]

By using a table of circuit simulation results and a pair of two-dimensional delay tables, Macys et al. [12] calculated a value for the effective capacitance. In their work, the effective capacitance is a function of the total capacitance in the RC-π model \( (C_{\text{Tot}}) \), the gate output slew rate, and the Elmore delay \( [1] \) of the load. The authors approximate the RC-π load with an effective capacitance such that the output voltage waveforms of the driving cell passes through some critical voltages (e.g., 0 and 0.75Vdd) at the same instances in time. They also normalize the four model parameters (output slew time and three π model parameters) to two parameters and use a table of circuit simulation results to find the effective capacitance by exploiting an iteration-based procedure. However, Macys’ approach is not based on any analytical derivation and is very sensitive to the simulation table entries.

Using a two-piece output waveform, Qian et al. propose an effective capacitance calculation approach that approximates the output waveform of a single-stage gate [10]. The authors calculate the effective capacitance by equating the charges at the gate output when using the driving-point admittance of the load and using a single effective capacitance as the load. Average charges for both loads models are equated until the gate output voltage reaches the 50% threshold. Qian’s effective capacitance approach is costly in terms of its computational calculations and requires a large number of iterations (e.g., 5 to 10 iterations). It also involves empirical equations that assume fast input transitions.

However, the authors use a single value for the driver resistance corresponding to the case that the driver sees the total capacitances of a load.

**Algorithm**

\[
C_{\text{eff}} = C_0 + kC_2, \quad 0 \leq k \leq 1
\]
We can replace the RC load with a single effective capacitance and calculate the amount of charge dumped into this capacitance for the same unit step input as shown in Figure 5(c). By matching the charge dumped into this load with Equation (4), we have:

$$C_{\text{eff}} = C(1 - e^{-\frac{t}{R\cdot C}})$$  \hspace{1cm} (5)

According to Equation (5), $C_{\text{eff}}$ depends on the time up to which the charge is matched as well as the $R$ and $C$ values. The same observation holds for a ramp input [20].

According to the above discussion, the effective capacitance for an RC-$\pi$ model load (depicted in Figure 6) can be written as:

$$C_{\text{eff}} = C_1 + (1 - e^{-\frac{k_{\text{out}}}{R_{\text{C}}}})C_2$$  \hspace{1cm} (6)

where $k$ is a dimensionless constant and $t_{\text{out}}$ is the gate output transition time. Macy's [12] showed that the effective capacitance calculation is a function of only three parameters $\alpha$, $\beta$, and $\gamma$, where:

$$\alpha = \frac{C_1}{(C_1 + C_2)} \quad \beta = \frac{t_{\text{out}}}{R_p C_2} \quad \gamma = \frac{C_{\text{eff}}}{(C_1 + C_2)}$$  \hspace{1cm} (7)

In addition, Macy's showed that if we provide the table that relates these three variables for each technology, where the table is independent of the input transition time and the gate configuration, the table could be used for any different combinations of load parameters; therefore:

$$C_{\text{eff}} = C_1 + (1 - e^{-\frac{k_{\text{out}}}{R_{\text{C}}}})C_2 = (C_1 + C_2) \left[ \alpha + \left(1 - e^{-\frac{k_{\text{out}}}{R_{\text{C}}}}\right)\left(1 - \alpha\right) \right]$$  \hspace{1cm} (8)

which results in

$$C_{\text{eff}} = \gamma = \frac{C_{\text{eff}}}{(C_1 + C_2)}$$ \hspace{1cm} (9)

If we replace the output transition time ($t_{\text{out}}$) by 50% propagation delay we can rewrite the equation as:

$$C_{\text{eff}} = \gamma = \alpha + \left(1 - e^{-\frac{k_{\text{out}}}{R_{\text{C}}}}\right)\left(1 - \alpha\right)$$  \hspace{1cm} (10)

where $\beta$ is the ratio between the 50% propagation delay and the $R_{\text{C}} C_{\text{C}}$ product, and $k$ is a fixed value which can be obtained from a look-up table (compiled from circuit simulation results) and which is constant for the calculated $\alpha$ and $\beta$. Figure 7 reports the $k$ values for different $\alpha$ and $\beta$ values in a 0.1 $\mu$m CMOS technology. The figure shows that the table of $k$ values is only a function of $\alpha$ and $\beta$ and remains constant for three different configurations of the gate and input waveform. It should be noted that $k$ is decreasing with respect to $\alpha$ and $\beta$ and achieves its maximum at the minimum values of $\alpha$ and $\beta$, as seen in Figure 7. The advantage of this approach, with respect to Macy's approach, is that first, we have an equation that helps us understand the behavior of the effective capacitance. Second, by writing the sensitivity function of the output transition time with respect to $k$ (for our approach) and $\alpha$ (for Macy's approach), it can be proven that our approach results in a more stable effective capacitance estimation. More precisely,

$$H_{\text{eff}} = \frac{\partial C_{\text{eff}}}{\partial t_{\text{out}}} = \frac{\partial C_{\text{eff}}(t_{\text{out}})}{\partial t_{\text{out}}} \left(\frac{\partial t_{\text{out}}}{\partial C_{\text{eff}}}\right)$$  \hspace{1cm} (11)

$$H_{\alpha} = \frac{\partial C_{\text{eff}}}{\partial \alpha} = \frac{\partial C_{\text{eff}}}{\partial t_{\text{out}}} \left(\frac{k_{\text{out}}}{C_1 + C_2}\right)$$  \hspace{1cm} (12)

where:

$$\text{coeff} = \frac{\partial C_{\text{eff}}}{\partial \alpha} = \frac{k_{\text{out}}}{C_{\text{eff}}} \left(\frac{k_{\text{out}}^2 + \beta}{\alpha + \left(1 - e^{-\frac{k_{\text{out}}}{R_{\text{C}}}}\right)^2\left(1 - \alpha\right)}\right)$$  \hspace{1cm} (13)

Experimental results show that coefficient “coeff” is strictly smaller than one. In particular, the results for 30 different cases in 0.1 $\mu$m technology are tabulated in Figure 8, which show that this coefficient is always less than one. We conclude that our approach is less sensitive compared to Macy's approach. To make the equations hold for 10% output transition point, we ought to derive a new lookup table for the $k$. The same holds for the 50%, 70%, and 90% output percentile points.

In order to solve the proposed problem in section 5, we need to have the voltage waveform of the far-end capacitance for the RC-$\pi$ load. If we apply a ramp input with rise time $T_r$ to an RC load, the output waveform equation would be:

$$V_{\text{out}}(t) = \begin{cases} V_{dd} (1 - e^{-\frac{t}{R C_{\text{eff}}}}) & 0 < t < T_r \\ V_{dd} (1 - e^{-\frac{t}{R C_{\text{eff}}}}) - V_{dd} (1 - e^{-\frac{t - T_r}{R C_{\text{eff}}}}) & T_r < t < t_{50} \\
V_{dd} (1 - e^{-\frac{t - t_{50}}{R C_{\text{eff}}}}) + V_{dd} (1 - e^{-\frac{t - t_{50} + T_r}{R C_{\text{eff}}}}) & t > t_{50} \end{cases}$$  \hspace{1cm} (14)

where $r$ is the time variable and $V_{dd}$ is the final value of the input voltage waveform as shown in Figure 9. Therefore, to find the delay for the time it takes for the output waveform to reach the $\alpha$ percentile point ($t_{\alpha}$), we need to solve the following nonlinear equation:

$$V_{\text{out}}(t_{\alpha}) = V_{dd}$$  \hspace{1cm} (15)

Instead, according to Equation (14), if $T_r/R_{\text{C}}$ values of two different circuit configurations are equal, their $t_{\alpha}/R_{\text{C}}$ values are also equal. Therefore, instead of solving the nonlinear Equation (15), we can make a table of delays, where for each $T_r/R_{\text{C}}$, for each $\alpha$ percentile point, we have $t_{\alpha}/R_{\text{C}}$. For example, suppose we have a circuit where its input transition time is $T_s$ and it drives an RC load. We need to find the time that the output waveform reaches its 50% of $V_{dd}$. Then from its corresponding table in Figure 9, for the $T_s/R_{\text{C}}$, we find $t_{50}/R_{\text{C}}$ and thus, $t_{50}$.

Our algorithm for calculating the output waveform is as follows. Given the following information for a particular timing path of a cell; the input slew time, $T_{\text{slew}}$, the $\pi$-load model parameters, $(C_1, R_1, C_2)$, gate propagation delay from 50% transition point of input waveform to 0% transition point of the output waveform, Table(50%-x%), the $k$ table, Table(k), we perform the following steps:

- For a given cell, we need to determine the input slew time, $T_{\text{slew}}$, and the $\pi$-load model parameters, $(C_1, R_1, C_2)$.
- Using the Table(50%-x%) and the $k$ table, we can determine the $t_{50}$. From the Table(50%-x%), we can find the value of $t_{50}$.
- Using the $t_{50}$ and the $k$ table, we can determine the $T_s$.
- Finally, using the $T_s$ and the $k$ table, we can determine the output delay, $T_{\text{delay}}$.
Draw_for_RC-π_Load\:(T_{\text{in}},\text{Load Parameters})

1. For \(\alpha=10, 50, \text{ and } 90\)
   a. Find_Transition_Point\:(T_{\text{in}}, C_a, R_a, C_b, Table\:(50%-\alpha\%))\;\text{Table}(k_i))
2. Draw output waveform according to the results

Find_Transition_Point\:(t_{\text{in}}, C_a, C_b, R_a, R_b, \text{Table}(50%-\alpha\%))\;\text{Table}(k_i))

1. Guess an initial value for \(C_{\text{eff}}\)
2. Compute \(\beta\) from the load values
3. Obtain \(t_{\text{in}}\) from Table \((50%-\alpha\%)\) based on values of \(C_{\text{eff}}\) and \(t_{\text{in}}\)
4. Compute \(\beta\) from \(t_{\text{in}}\) and load elements
5. Find \(k_i\) from Table \((k_i)\) according to \(\alpha\) and \(\beta\)
6. Calculate \(C_{\text{eff}}\) from Equation (10)
7. Find the new value of \(t_{\text{in}}\) for the obtained \(C_{\text{eff}}\) from Table\:(50%-\alpha\%)
8. Compare the new \(t_{\text{in}}\) with the old \(t_{\text{in}}\)
9. If not within acceptable tolerance, then return to step 4 until \(t_{\text{in}}\) converges
10. Return \(t_{\text{in}}\)

Experimental results demonstrate that this algorithm gives accurate results with fast convergence. Next, we prove that this algorithm converges independently of the initial guess.

Theorem 1: Iterative Equation (10) always converges independently of the initial guess. Furthermore, its solution is unique.

Proof: Per reference [23], the iterative equation for finding the solution to \(x=f(x)\) will converge for any initial input and its solution is unique, if

\[
\frac{d}{dx}f(x) < 1
\]

In this case, to prove the convergence of the equation, we prove:

\[
\left|\frac{d}{dC_{\text{eff}}}\left(C_{\text{eff}} + (1 - e^{-k\beta})C_2\right)\right| < 1
\]

Because

\[
\left|\frac{d}{dC_{\text{eff}}}\left(C_{\text{eff}} + (1 - e^{-k\beta})C_2\right)\right| < 1
\]

Therefore, the proposed iterative effective capacitance equation always converges to its unique solution, independently of initial value of \(C_{\text{eff}}\).

3. Crosstalk for Coupled Capacitive Loads

Problem Statement: Two CMOS drivers, a and b, are given where their corresponding input transition times are \(t_{\text{in}(a)}\) and \(t_{\text{in}(b)}\), and there is a \(\Delta_{\text{in}}\) delay between their input waveforms where \(\Delta_{\text{in}}\) denote the 50% transition points of the input waveforms of driver a and b, respectively. Also, the corresponding capacitive loads are \(C_a\) and \(C_b\), and there is a capacitive coupling between the two output loads with value \(C_c\). Furthermore, the output waveform of drivers a and b are \(t_{\text{out}(a)}\) and \(t_{\text{out}(b)}\), respectively. The objective is to find the output waveform of the two drivers. In fact, we must solve a nonlinear equation:

\[
S(t_{\text{in}}, C_a, C_b, t_{\text{out}}) = 0
\]

where;

\[
t_{\text{in}} \leftarrow t_{\text{in}(a)} + t_{\text{in}(b)} \quad t_{\text{out}} \leftarrow t_{\text{out}(a)} + t_{\text{out}(b)} \quad C_a \leftarrow C_a \quad C_b \leftarrow C_b
\]

This is, however, a difficult undertaking. Thus, we look for a better solution. Many different scenarios could arise under this problem statement. For example, the input voltages may independently have a falling or rising transition; there could be a non-zero positive or negative skew between their 50% input transition times, the slew rates of the two inputs can widely differ, etc. According to circuit theory, we can model the coupling capacitance by a Miller capacitance to ground [21] for each scenario. A simple approximation can be obtained as follows. Taking the circuit in Figure 10 as a two-port network, in order to model the coupling capacitance as an equivalent capacitance to the ground, we suppose that the equivalent circuit has the same current sink and the same voltage waveforms at the output terminals of the drivers. Using this assumption, we have [22]:

\[
l_a = \frac{C_a}{C_{\text{eff}}} \frac{\partial V_a}{\partial t} = \frac{C_b}{C_{\text{eff}}} \frac{\partial V_b}{\partial t} = -C_{\text{eff}} \frac{\partial V_a}{\partial t} = -C_{\text{eff}} \frac{\partial V_b}{\partial t} \quad (21)
\]

To calculate the effective capacitance to ground for driver a, by integrating the current over the period from the rising time of the output of driver a, to the switching threshold point, \(t_{\text{in}}\), we have;

\[
\Delta V_b = \int_0^{t_{\text{in}}} \int_{V_{\text{th}}}^{V_{\text{th}}} \frac{\partial V_b}{\partial t} \; dt \; dV_b
\]

\[
\Rightarrow C_{\text{eff},a} = \frac{C_a}{V_{\text{th}}} \quad \Delta V_b = \frac{V_b^0 - V_b}{V_{\text{th}}}
\]

Therefore, the algorithm is as follows:

Find_Wavesform_Coupled_Crosstalk((\Delta_\text{in}, \Delta_\text{out}, \Delta_{\text{in(a)b}}, \Delta_{\text{out(a)b}}, C_a, C_b, C_c))

1. Guess an initial value for output capacitive load and put it \(C_{\text{L}(a)}\) and \(C_{\text{L}(b)}\) (for example \(C_{\text{L}(a)}=C_a+C_b\))
2. \(t_{\text{in(a)b}}=\text{Draw_Output_Waveform}(\Delta_\text{in}, \Delta_{\text{in(a)b}}, C_{\text{L}(a)}))
3. \(t_{\text{out(a)b}}=\text{Draw_Output_Waveform}(\Delta_\text{out}, \Delta_{\text{out(a)b}}, C_{\text{L}(a)}))
4. Repeat until the output waveform converges
   a. For \((V_{\text{th},a}, V_{\text{th},b})\) = \{(50%,50%), (50%,90%), (90%,50%), (50%,10%), (10%,50%)\} do
   b. Find_Output((t_{\text{in(a)b}}, V_{\text{th},a}, V_{\text{th},b}, C_{\text{L}(a)}, C_{\text{L}(b)}, C_c))

This algorithm can easily be extended to handle a collection of \(N\) cross-coupled drivers. In practice, we may encounter cases where the output waveform behaves like the voltage waveform shown in Figure 12. In such a case, we need to generate delay estimates for more percentile points of the output waveform. As shown in Figure 12, if we apply curve fitting for upper points and lower points, we can predict the output waveform of the gate. This technique works fine if the magnitude of distortion does not exceed over some threshold voltages which for each horizontal lines drawn in Figure 12, we have more than one point. At this point we need to prove that the proposed algorithm

Figure 10: Two gates driving capacitive load with capacitive crosstalk

Figure 11: Definition of voltages for Equations (22) and (23)
converges to a unique solution independently of the initial guess for the effective capacitance to ground.

**Theorem 2:** The “Find Waveforms Capacitive Cross Coupled” algorithm converges to its unique solution independently of the initial guess for the value of the effective capacitance to ground.

**Proof:** In this algorithm, instead of solving the rather complicated Equation 19, we solve the problem by an iterative technique, which can be described as follows:

\[
\begin{align*}
T_{out} &= F(T_{in}, C_L + C_{EFF}) \\
C_{EFF} &= G(C_L, T_{out})
\end{align*}
\]

To prove that the above equation converges, we use Equation 16. More precisely, we show that [23]:

\[
\frac{\partial}{\partial T_{out}} (F(T_{in}, C_L + G(C_L, T_{out}))) < 1
\]

(25)

In addition, we assume the $G$ and $F$ functions have the following forms:

\[
G(C_L, T_{out}) = C_L \left[ 1 - k_1 \frac{C_{out}(t)}{C_L} \right] F(T_{in}, C_L + C_{out}) = k_2 \left[ C_L + C_{out}(1 - k_3 \frac{C_{out}(t)}{C_L}) \right]
\]

(26)

where $k_1$ (i=1,2) is the rate of output transition time change to output load change for the $i$th driver and $k_2$ (i=1,2) is the ratio of Miller factors changes to output transition time changes. Therefore,

\[
\frac{\partial}{\partial T_{out}} (F(T_{in}, C_L + G(C_L, T_{out}))) = \frac{\partial F}{\partial C_{out}(t)} \frac{\partial C_{out}(t)}{\partial T_{out}} - \frac{\partial F}{\partial C_{out}(t)} \frac{\partial C_{out}(t)}{\partial T_{out}} = k_1 C_L + C_{out}(t) k_2 (C_L + C_{out}(t) k_2)
\]

The worst-case value of $k_2$ (i=1,2) is when the drivers are weak, where in 0.1μm technology is in the order of $10^{-3}/\sqrt{F}$. Also according to [21], the miller factor could vary from -1 to 3, therefore, for the worst-case in 0.1μm technology, $k_2$ (i=1,2) is in the order of $10^3/\sqrt{F}$ and the coupling capacitance is in the order of $10^{-3}/\sqrt{F}$ in the worst-case. Therefore, the condition in Equation (27) always holds and the product is always less than 1, which proves the convergence of the iterative algorithm.

4. Crosstalk for Coupled RC Loads

**Problem Statement:** The problem statement is the same as the one in Section 3 except that the load is now the one that is depicted in Figure 13. We are interested in determining the output waveforms at the near ends.

![Figure 13: A general format of two gates driving resistive and capacitive loads considering crosstalk](image)

Empirical gate/cell level models remain popular for timing analysis, even for full custom designs. In [14] a gate/cell level modeling methodology was developed which achieves compatibility with RC interconnect loading through an “effective capacitance” approximation. Dutta and Pileggi in [15] extended this waveform-based gate model to consider the problem of calculating the delay (and response waveform) when there is a significant amount of coupling. In particular, they present algorithms for obtaining the best and the worst gate delays in the presence of coupling capacitances. In their paper, the authors use a Norton equivalent model for the gates to do the analysis. What we do for this problem is to partition the circuit with the two cross-coupled drivers into two separate sub-circuits, each with a single driver and loaded by an RC-$\pi$ load. By applying the “Find Output” algorithm of Section 3, we first decouple the drivers and next by applying the “Draw for RC Load” algorithm of Section 2, we estimate the output waveforms of the two drivers. Finally, we go thru a number of iterations to determine the exact output waveform of each sub-circuit. The algorithm to find the output waveforms is as follows:

1. Update Voltage Waveforms (Voltage Waveforms, $V_{th,a}$, $V_{th,b}$, Load Parameters)
2. Draw for RC Load ($V_{th,a}$, Load Parameters)
3. Draw for RC Load ($V_{th,b}$, Load Parameters)
4. Repeat
   a. For ($V_{th,a, b}$): $(50\%, 50\%), (50\%, 90\%), (90\%, 50\%),$
   (50\%, 10\%), (10\%, 50\%)) do
   1. Update Voltage Waveforms (Voltage Waveforms, $V_{th,a}$, $V_{th,b}$, Load Parameters)
5. Until the output waveforms converges

The proof of convergence for this iterative approach is similar to the proof in Section 3. It is omitted here due to space limitation.

5. Experimental Results

We performed a large number of simulations on different circuits in 0.1μm CMOS technology and report the results here. We considered different ranges of coupling capacitances, driver sizes and loads. In Table 1, we compare the results of the algorithm proposed in Section 2, “Draw for RC Load”, with those obtained from Hspice simulations for three different percentile points of the output transition time. The increments for $\alpha$ and $\beta$ of the $k$ table were taken as 0.1 and 1, respectively. Table 1 shows that our algorithm comes within 1% of Hspice. In Table 2, we compare the results of the algorithm proposed in Section 3, “Find Waveforms Capacitive Cross Coupled”, with the results obtained by Hspice. We achieved a mere 3% error for different cases. In Table 3, we applied the algorithm proposed in Section 4 to the complex load configuration of Figure 13 and compared its results with Hspice. Again, we observed a small error (about 6% on average.). These results are reported after 3 iterations. Also, in the experiments, the $k=0.6$ was chosen in the range from 0.15 to 0.90.

6. Conclusion

As we go toward VDSM technologies, the effect of interconnect resistance and coupling capacitance must be carefully taken into account. The interconnect resistance reduces the cell delay via shielding the far-end capacitances, whereas, the coupling capacitances increases the gate propagation delay. Gate load delay calculation requires accuracy, and using delay tables is essential for accurate delay calculation for given capacitive load and input transition times. The gate delay can widely vary as a function of the input transition times, driver strengths, and the skew between the transitions, and the output load configurations. In this paper, we presented three efficient iterative algorithms with provable convergence property, which have low computational complexity and result in highly accurate results. To use the delay tables, we approximated the load with an effective capacitance, which is equivalent to the real load in terms of its propagation delay (at 10%, 50%, and 90% percentile points). The algorithms proposed for calculating gate propagation delay results in high accuracy with 6% error on average.

7. References

Table 1: Simulation results for output waveform evaluation algorithm for RC load (proposed in Section 2) for 0.1µm technologies (3 iterations).

<table>
<thead>
<tr>
<th>Driver and Load Parameters</th>
<th>10% propagation delay (from 50% of input to 90% of output)</th>
<th>50% propagation delay (from 50% of input to 50% of output)</th>
<th>90% propagation delay (from 50% of input to 90% of output)</th>
</tr>
</thead>
<tbody>
<tr>
<td>400</td>
<td>100/50</td>
<td>500</td>
<td>100</td>
</tr>
<tr>
<td>200</td>
<td>150/75</td>
<td>1500</td>
<td>200</td>
</tr>
<tr>
<td>100</td>
<td>50/25</td>
<td>500</td>
<td>1000</td>
</tr>
<tr>
<td>100</td>
<td>50/25</td>
<td>500</td>
<td>1000</td>
</tr>
<tr>
<td>600</td>
<td>80/40</td>
<td>350</td>
<td>150</td>
</tr>
<tr>
<td>300</td>
<td>100/100</td>
<td>1000</td>
<td>450</td>
</tr>
<tr>
<td>50</td>
<td>100/100</td>
<td>150</td>
<td>650</td>
</tr>
<tr>
<td>250</td>
<td>20/100</td>
<td>450</td>
<td>350</td>
</tr>
<tr>
<td>150</td>
<td>160/80</td>
<td>850</td>
<td>1000</td>
</tr>
<tr>
<td>550</td>
<td>150/75</td>
<td>1300</td>
<td>400</td>
</tr>
<tr>
<td>350</td>
<td>120/60</td>
<td>1600</td>
<td>500</td>
</tr>
<tr>
<td>450</td>
<td>35/30</td>
<td>500</td>
<td>100</td>
</tr>
<tr>
<td>Avg</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

***All inputs are rising.

Table 2: Simulation results for capacitive load considering crosstalk (cf. section 3) for 0.1µm technologies (3 iterations).