DESIGN OF CMOS DYNAMIC CIRCUITS WITH IMPROVED NOISE TOLERANCE

Fernando Mendoza-Hernández, Mónico Linares and Víctor H. Champac

Dept. of Electronic Engineering, National Institute for Astrophysics, Optics and Electronics-INAOE
P.O. Box 51 and 216 72000 Puebla, Pue., MEXICO
E-mail: [fer, mlinares, champac]@inaoep.mx

ABSTRACT

In this paper we present a new noise-tolerant dynamic circuit technique suitable for pipelined dynamic digital circuits. The effectiveness of this technique is demonstrated by means of HSPICE simulations for two kind of gates, CMOS AND and OR gates, both TSPC and Domino. In order to have a clear idea about this proposal’s noise immunity improvement we compare its performance with previous works. Simulation results show that the proposed technique has an improvement in the noise tolerance and the $AUTE - delay$ quotient over the conventional dynamic logic and the previous noise-tolerant dynamic circuit techniques with a slight delay and power increase.
In this paper we present a new noise-tolerant dynamic circuit technique suitable for pipelined dynamic digital circuits. The effectiveness of this technique is demonstrated by means of HSPICE simulations for two kind of gates, CMOS AND and OR gates, both TSPC and Domino. In order to have a clear idea about this proposal’s noise immunity improvement we compare its performance with previous works. Simulation results show that the proposed technique has an improvement in the noise tolerance and the ANTE-delay quotient over the conventional dynamic logic and the previous noise-tolerant dynamic circuit techniques with a slight delay and power increase.

1. INTRODUCTION

Technology scaling has lead up to increasing clock frequencies, more chip density and power saving. Nevertheless, the aggressive scaling trends in devices and interconnections have bring the noise issues to the forefront. This is because the coupling capacitance between adjacent interconnections is becoming an important part of the total interconnect capacitance.

The increasing use of dynamic logics in high-performance CMOS VLSI circuits aggravates the noise effects. These circuits are affected by charge sharing and leakage currents. Moreover, the noise margin of dynamic gates is lower than that of static ones. Thus, noise appearing at the inputs of dynamic gates (crosstalk) can create degradations of the logic levels at the output and also can affect the delay of the gates. If the maximum noise voltage is greater than the noise threshold of the dynamic gate, an undesirable logic transition may occur. Furthermore, the glitches generated at the output of the dynamic gates increase the power consumption.

There are two ways to address this noise issues: (1) by reducing the peak crosstalk generated in the interconnections by means of interconnect optimization via repeater insertion and wire sizing, and (2) by designing noise-tolerant gates that bear bigger crosstalk pulses appearing at their inputs. Here we focus in the second way and we consider the case when crosstalk appears at the inputs of a dynamic gate.

In this paper a new noise-tolerant dynamic circuit technique based on the topology described in [4] is presented. This technique can be applied to pipelined dynamic circuits like TSPC [5] but can be extended to other logic styles. Simulation results for OR and AND gates show that the proposed technique has a better noise-tolerance with a slight performance degradation than existing ones regarding power consumption and delay.

In the next section we analyze deep submicron noise effects on the performance of pipelined dynamic circuits. In section 3 existing noise-tolerant dynamic circuit techniques [6, 7] are reviewed. In section 4 the proposed technique is introduced. Section 5 presents simulation results that compare the performance of different noise-tolerance techniques as well as conventional dynamic. Conclusions and future work are presented in section 6.

2. NOISE EFFECTS IN SUBMICRON TECHNOLOGIES

Digital noise is any disturbance that causes the voltage level of a dynamic evaluation node to deviate from power supply or ground rails when it should otherwise have a stable logic level. In precharged dynamic circuits like TSPC and Domino, dynamic evaluation nodes are susceptible to crosstalk noise especially during the part of normal system operation where are disconnected from power supply and ground. Pipelined TSPC circuits suffer twice this problem. The output nodes of the N-block or P-block in memory phase (M) are not connected to the power rails (See Fig. 1). Furthermore, the internal dynamic nodes of the N-block or P-block are also not connected to the power rails in evaluate phase (E) if the logic function is not in ON state. While clock remains low N-blocks are in memory phase and their inputs take a new value or remain unchanged. N-blocks are in evaluate phase when clock rises. Let’s assume all inputs of the N-block remain high except the upper one (See Fig. 1) which goes low. When a noise pulse is generated at the top input of the N-block (See Fig. 1) the internal dynamic precharge node \( P_1 \) can have a direct path to ground if noise amplitude is greater than \( V_{th} \) which is the threshold voltage of the NMOS transistor controlled by the noisy input in the N-logic. Consequently \( P_1 \) is discharged as far as the noise amplitude is greater than \( V_{th} \). Fig. 2 shows how in a TSPC AND gate the noisy input discharges the dynamic precharge node \( P_1 \). Because this the output node has an undesirable logic transition from low to high. Furthermore, the glitches generated at the output of the dynamic gate provoke an increment in the power consumption. In this way, digital noise effects may degrade the performance of the circuit. Even more, the circuit may work incorrectly.

3. PREVIOUS NOISE TOLERANT TECHNIQUES

Rising the noise threshold voltage \( V_{th} \) of the gate is one effective way to increase the noise tolerance of digital gates, where \( V_{th} \) is defined as the minimum input voltage required to cause a logic transition at the output. Some noise-tolerant techniques based on this strategy have been recently published [6, 7]. All of them rise \( V_{th} \) by precharging the N-logic internal nodes using the input data, the internal dynamic node \( P_1 \) or the clock signal.

Fig. 3(a) shows Bobba’s technique [6] for a 2 input AND gate. In this technique more transistors are added to the N-logic to improve the noise-tolerance. One NMOS transistor is added in the N-logic per each NMOS transistor in the original N-logic and a...
PMOS transistor rises the nodes $N_1$ and $N_2$ to $V_{DD}$ while gate inputs are low. If during evaluate phase all inputs go high a voltage divider is formed by PMOS and NMOS added transistors at nodes $N_1$ and $N_2$. The improved noise tolerance is obtained by two ways: (a) the $V_{th}$ of the AND gate equals to $V_{th}$ of the static inverters that operate as voltage dividers, $V_{th}$ can be adjusted modifying the transistor width to length ratios; and (b) by rising the source node voltage of the top NMOS in the N-logic avoids sub-threshold leakage current from drain to source. One drawback of this technique is the significant delay penalty for AND gates due to the duplicated N-logic and the increased capacitance at the gate inputs. Power consumption penalty is increased because two transistors are added per each transistor in the N-logic and if internal nodes are discharged the precharge rises from ground to $V_{DD}$.

The Twin-Transistor technique [7] (See Fig. 3(b)) rises the voltage of the N-logic internal nodes via additional transistors ($M_{TT}$). Due to body-effect the noise threshold voltage of the N-logic transistors pulls-up. Hence, the tolerance of the gate improves. One drawback of this technique is that using gate inputs to rise the voltage of the N-logic internal nodes adds load capacitance to the gate inputs drivers. Furthermore, this technique cannot be applied to pipelined logic like TSPC because N-block inputs are floating in evaluate phase. Consider the case when this technique is applied to a pipelined circuit in N-blocks (See Fig. 1). If all inputs of an N-block are high at the beginning of the evaluate phase the upper input may degrade its voltage level.

All noise-tolerant techniques described in this section are applied only to Domino logic and even the Twin-transistor technique presents problems when is applied to pipelined systems.

4. THE NEW NOISE TOLERANT TECHNIQUE

Fig. 4(a) shows the general schematic of the proposed noise-tolerant dynamic circuit technique, which is based on the topology presented in [4]. A two-input AND gate (See Fig. 4(b)) is formed replacing the N-logic block of Fig. 4(a) by two series transistors. OR gates are implemented in a similar way substituting the N-logic block by parallel transistors.

The proposed technique adds one NMOS transistor $M_N$ at the top of the N logic and a delay circuitry between the clock signal and the gate of $M_N$. A PMOS transistor $M_P$ (See Fig. 4(b)) controlled by the clock is added to speed-up the charge of node $P_2$. The delay circuitry output rises to $V_{DD}$ approximately in the middle of the memory phase turning-on the transistor $M_N$. Node $P_2$ rises to $V_{DD}$ through $M_N$ and $M_P$. Approximately in the middle of the evaluate phase the NMOS transistor $M_N$ is turned-off. In this way the node $P_1$ is virtually isolated during the rest of the evaluate phase. Hence, any noise influence at the gate inputs is not reflected at the node $P_1$. Furthermore, the noise tolerance before the transistor $M_N$ is turned-off also increases due to the presence of the added transistor in the N-logic and the risen voltage at node $P_2$. The delay circuitry should be designed to turn-off the $M_N$ transistor such that the node $P_1$ evaluates correctly in absence of digital noise. In this section node $P_2$ has been chosen for precharge. The performance of the AND gate when other nodes
The delay circuitry can be shared by several dynamic gates. Hence, it is expected that our proposal would have small area and power penalties for a given circuit application. Furthermore, gate inputs are not used to improve the noise tolerance. This implies less parasitic capacitances in the data path and N-logic internal nodes. The delay and power consumption added by the additional circuitry are acceptable considering the improved noise-tolerance as can be seen in the next section. In addition, layout techniques should be used to avoid node \( P_1 \) to be affected by coupling capacitances.

5. SIMULATION RESULTS AND CMOS GATES COMPARISONS

In this section we present simulation results for a two input TSPC AND gate, a six input TSPC OR gate and a 4-bit carry look-ahead full adder using the proposed technique. Comparisons with Bobba and Twin-transistor techniques also are presented.

5.1. Simulation Results for AND and OR Gates

AND gates using Bobba’s and Twin-transistor techniques are shown in Fig. 3(a) and Fig. 3(b), respectively. Although Twin-transistor and Bobba’s techniques have been applied in Domino circuits, they are used in TSPC circuits for comparisons purposes in this section. Fig. 4(b) shows the AND gate implemented with the proposed technique.

In the simulations we use a 0.35 \( \mu m \) CMOS AMS technology. The size of the transistors was chosen to reach a clock frequency of 1.0 \( GHz \) with a P-type TSPC latch as load at the output and a power supply of \( V_{DD} = 3.3 V \). The noise pulse injected at the gate inputs is characterized by its width and amplitude. All inputs are driven by static inverters. Fig. 5 and Fig. 7 show the noise immunity curves [8] of the AND and OR gate implementations, respectively. As can be seen, the proposed technique has better noise immunity than the existing ones. In Table 1 and Table 2 we can observe that the proposed technique has the highest \( ANTE \) metric [7]. In Fig. 6 and Fig. 8 the ANTE-delay tradeoff for 2-input AND and 6-input OR gates using the noise tolerant techniques is shown, respectively.

Moreover, delay measures indicate a delay penalty of 21.5% and 25.7% in the proposed technique against 73.8% and 23.3% in Bobba’s technique for AND and OR gates, respectively (See Tables 1 and 2). Twin-Transistor technique has a delay penalty of 10.5% for the case of the AND gate and a delay improvement for OR gates.

All the noise tolerant techniques increase the \( ANTE \) at the expense of an increased delay. The quotient between \( ANTE \) and delay in Table 1 and Table 2 indicates how techniques are efficient to increase the noise immunity with a slight delay penalty. The proposed technique has the highest \( ANTE \)-Delay quotient for the AND gate. For the OR gate, the proposed technique is better than Bobba’s technique and conventional dynamic but is worse than the Twin-transistor technique. For wide noise pulses the tolerated noise amplitude is higher than for the other techniques (See
Fig. 5). This fact can be further exploited in future technologies because the peak noise value scales and noise pulse width increases with voltage scaling [7].

Table 1. Performance for 2-input TSPC AND gate.

<table>
<thead>
<tr>
<th>TECHNIQUE</th>
<th>Power (mW)</th>
<th>ANTE Delay (ps)</th>
<th>Delay (ps)</th>
<th>( \frac{ANTE}{Delay} (V^2) )</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conv. dynamic</td>
<td>1.32</td>
<td>656</td>
<td>184.4</td>
<td>3.55</td>
</tr>
<tr>
<td>Twin-transistor</td>
<td>1.40</td>
<td>1003</td>
<td>203.8</td>
<td>4.92</td>
</tr>
<tr>
<td>Bobba’s</td>
<td>1.63</td>
<td>1858</td>
<td>320.6</td>
<td>5.79</td>
</tr>
<tr>
<td>This work</td>
<td>1.63</td>
<td>2227</td>
<td>224.2</td>
<td>9.93</td>
</tr>
</tbody>
</table>

Table 2. Performance for 6-input TSPC OR gate.

<table>
<thead>
<tr>
<th>TECHNIQUE</th>
<th>Power (mW)</th>
<th>ANTE Delay (ps)</th>
<th>Delay (ps)</th>
<th>( \frac{ANTE}{Delay} (V^2) )</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conv. dynamic</td>
<td>1.94</td>
<td>1288</td>
<td>168.6</td>
<td>7.83</td>
</tr>
<tr>
<td>Twin-transistor</td>
<td>2.14</td>
<td>1518</td>
<td>103.9</td>
<td>14.61</td>
</tr>
<tr>
<td>Bobba’s</td>
<td>3.73</td>
<td>1938</td>
<td>208</td>
<td>9.31</td>
</tr>
<tr>
<td>This work</td>
<td>3.54</td>
<td>2350</td>
<td>212</td>
<td>11.08</td>
</tr>
</tbody>
</table>

The delay of the proposed technique increases when the size of the \( M_p \) transistor \( (W_{M_p}) \) increases as can be seen in Table 3(a). The noise immunity is also better when \( W_{M_p} \) increases. In Table 3(b) we can observe that the best delay is reached when the node \( P_3 \) is precharged (Fig. 4(b)). Nevertheless, the best \( ANTE/\text{Delay} \) quotient is reached when the node \( P_3 \) is precharged (See Table 3(b)).

Table 3. Delay dependence of an AND gate with the proposed technique (a) on \( M_p \) size and (b) on the precharged node, \( (W_{M_p} = 0.6 \mu m) \).

<table>
<thead>
<tr>
<th>( W_{M_p} (\mu m) )</th>
<th>Delay (ps)</th>
<th>Node</th>
<th>Delay (ps)</th>
<th>( \frac{ANTE}{Delay} (V^2) )</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.6</td>
<td>224.2</td>
<td>( P_3 )</td>
<td>224.2</td>
<td>7.7</td>
</tr>
<tr>
<td>0.9</td>
<td>226.3</td>
<td>( P_3 )</td>
<td>231.5</td>
<td>8.2</td>
</tr>
<tr>
<td>1.2</td>
<td>229.1</td>
<td>( P_3 )</td>
<td>227.3</td>
<td>5.9</td>
</tr>
<tr>
<td></td>
<td></td>
<td>( P_3 ) and ( P_4 )</td>
<td>242.6</td>
<td>6.7</td>
</tr>
</tbody>
</table>

In order to verify the effectiveness of the proposed technique in an experimental fashion it is necessary to have a test structure to make comparisons with the conventional dynamic logic. In Fig. 9 the layout of a test chip containing two datapaths for comparisons is presented. One datapath is implemented with the conventional TSPC dynamic logic and the other one with the proposal. Each datapath is mainly composed by a register bank and a carry look-ahead full adder.

6. CONCLUSIONS

We present a new noise-tolerant dynamic circuit technique suitable for pipelined digital systems. noise immunity curves and \( ANTE/\text{Delay} \) quotient show that this technique improves the noise immunity with less performance degradation than existing ones for AND gates and with an acceptable performance degradation for OR gates. The power consumption of the proposed technique can be improved if two or more blocks share the same delay circuitry. This also improves the area penalty.

Further work is directed to minimize the delay penalty of the proposed technique. This can be done pre-discharging the N-logic internal nodes before the evaluate phase begins. Another future task is to experimentally verify the better performance of the proposal.

7. REFERENCES


