# **Soft Error Rate Determination for Nanometer CMOS VLSI Logic**

Fan Wang and Vishwani D. Agrawal

Department of Electrical and Computer Engineering, Auburn University Auburn, AL 36849, USA

wangfan@auburn.edu, vagrawal@eng.auburn.edu

### Abstract

Nanometer CMOS VLSI circuits are highly sensitive to soft errors due to environmental causes such as cosmic radiation and charged particles. These phenomena, also known as single-event upset (SEU) induce current pulses at random times and random locations in a digital circuit. In this paper we model neutron-induced soft errors using two parameters, namely, frequency and intensity. Our soft error rate (SER) estimation method propagates both frequency (expressed as probability) and intensity as the width of single event transient (SET) pulses expressed as probability density functions through the circuit. With this model we are able to accurately model electrical masking factors in logic circuits. Also, the error pulse width density information at primary outputs of the logic circuit allows evaluation of SER reduction schemes such as time or space redundancy.

#### 1 Introduction

Continuous downscaling of CMOS technologies has resulted in clock frequencies reaching the multiple GHz range, supply voltage decreasing below one volt level and load capacitances of circuit nodes dropping to femtofarads. Consequently, microelectronics systems are more vulnerable to noise sources in the working environment. Nanotechnology therefore makes the meeting of the reliability requirements more challenging.

With advances in the design and manufacturing technology, the non-environmental conditions may not as much affect the sub-micron semiconductor reliability. However, the errors caused by cosmic rays and alpha particles will remain the dominant factors causing errors in electronic systems. Alpha particles come from package impurities [14]. The galactic cosmic rays traverse the earth's atmosphere where they collide with atomic nuclei to create cascades of reactions producing neutrons. Some of those neutrons reach the ground and become a major source of single event upsets in microelectronics at the ground level. While alpha-particles can be greatly reduced by removing the radioactive impurities from the package material, it is harder to shield the circuits from high-energy neutrons. As Mason points out [6], soft fails caused by neutron particles will be the dominant failure mechanism in the SRAM programmable logic.

Single event upset phenomenon is a complex process. For a broad tutorial on this subject one may refer to a recent paper [14]. When neutrons strike silicon, any of more than 100 different nuclear reactions can be generated [9]. Accurate measurement of the neutron flux and its energy distribution are first considerations for estimating neutron-induced error rates. In this paper, we only consider the soft errors caused by neutrons and neglect the effect of alpha particles.

Analytical methods are widely used to model soft errors probabilistically. Asadi et al. [1] presented a soft error rate estimation technique based on error probability propagation. Rejimon and Bhanja [12] gave a single event fault model based on probabilistic Bayesian networks, which capture spatial dependencies. Hayes et al. [4] presented a framework for modeling transienterror tolerance in logic circuits. However, these approaches do not take the circuit electrical masking factor and the characteristic of transient pulses like pulse widths into account. An improvement was provided by Zhao et al. [15]. They proposed a constraint-aware robustness insertion methodology that protects the sequential elements in digital circuits against various noise effects. The noise probability density function represents the distribution of noise that has survived circuit masking effects at internal nodes to reach the flip-flops as determined by a probability matrix mapping. However, in that work the authors did not include the environmental factors like the error frequency. Besides, their propagation method required tabulating all the pulse width and height data for each logic gate. It would thus take enormous memory for large logic circuits.

In Section 2, we present an environment dependent soft error model for logic circuits based on both error frequency represented as probability, and soft error density represented as transient width distribution. In Section 3, we develop a probability propagation scheme to propagate both soft error frequency and pulse widths distribution through the logic circuit. In Section 4, we develop an algorithm to calculate the soft error rate based on our model. Because we propagate both error frequency and pulse width densities, the pulse widths information at primary outputs can be used to analyze the time (or space) redundancy-based error reduction efficiency.

### 2 An Environment-Based Probabilistic Soft Error Model

Different from memories, in a logic circuit, a single event effect (SEE) exists as single event transient (SET) pulse. An SET has its unique characteristics like polarity, waveform, amplitude and duration, and these characteristics depend on particle impact location, particle energy, device technology, device supply voltage and output load. A single event upset (SEU) does not occur unless the SET can survive the circuit masking effects and is captured by a clock edge into a sequential element. The SET can be eliminated by electrical masking, logic masking and temporal masking [9, 10].

Environmental neutrons come from cascaded interactions when galactic cosmic rays traverse earth's atmosphere. These neutrons reach the ground with finite probabilities. The neutron flux is usually in unit of  $N/cm^2$ -s, where N is the number of neutron particles. The intensity of cosmic-ray induced neutrons flux in the atmosphere varies with altitude, location in the geomagnetic field, and solar magnetic activity. The flux data are available from observations accumulated over decades [8, 16]. One often cites the JEDEC standard [5].

Each neutron has a unique energy when it arrives to the ground. The particle does not induce an error itself, it is the interaction that causes the error in electronic materials. The neutron energy is one of the key properties here; we neglect the effects of angle of incidence of the particle strike. Not every particle hit on the sensitive silicon area can induce an error. An SEU occurs with certain probability for each high-energy particle hit. Such probability can be obtained from existing computer programs, for example, IBM's SEMM (Soft Error Monte-Carlo Modeling) program [13]. Figure 1 shows the result when a CMOS SRAM chip was simulated for 30-MeV neutron hits. The probability of SEU is a function of particle energy and the critical charges. In the circuit design process, once a circuit is layed out, the critical charge for each cell is defined. Although we did not use the SEMM program in our experiment on logic circuits, we mentioned it to illustrate how the error probability can be derived.

To consider all energy components in our proposed soft error model, we average the error probability over different energies and assign each circuit node with a unique error probability value. The particle energy distribution under specific locations for specific technology nodes can be obtained from experimental results. For example, the cosmic particle strikes were simulated using a heavy ion beam at the Twin Tandem Van de Graaff accelerator at Brookhaven National Laboratory and the results suggest that in the natural environment of space the probability distribution of high-energy particles falls rapidly with increasing LET. For both  $0.5\mu$  and  $0.35\mu$ CMOS technology processes at the ground level, the largest population has an linear energy transfer (LET) of  $20MeV-cm^2/mq$  or less and the particles with LET greater than  $30MeV-cm^2/mg$  are exceedingly rare [3]. The LET of a striking particle multiplied by a character-



Figure 1. Probability of soft error for each collision of a 30-MeV neutron as a function of the average critical charge for an SRAM chip (from IBM SEMM program [13]).

istic length of the material gives the charge accumulated due to the strike. These results are used in our experiments in Section 4.

In addition, from the statistical energy distribution we are able to model the statistical SET widths in logic circuit by applying the LET values to the commonly used transient current double-exponential model [7].

$$\begin{cases} I(t) = \frac{Q_{coll}}{\tau_{\alpha} - \tau_{\beta}} \left( e^{-\frac{t}{\tau_{\alpha}}} - e^{-\frac{t}{\tau_{\beta}}} \right) & \text{(a)} \\ Q_{coll} = 10.8 \times L \times LET & \text{(b)} \end{cases}$$

where  $Q_{coll}$  is the collected charge in the sensitive region,  $\tau_{\alpha}$  is the collection time constant, which is a process-dependent property of the junction, and  $\tau_{\beta}$  is the ion-track establishment time constant, which is relatively independent of the technology. In bulk silicon, a typical charge collection depth (L) is  $2\mu$  for every 1 MeV- $cm^2/mg$ , and an ionizing particle deposits about 10.8fC charge along each micron on its track. Typical values are approximately  $1.64 \times 10^{-10} sec$  for  $\tau_{\alpha}$  and  $5 \times 10^{-11} sec$  for  $\tau_{\beta}$  [2, 15].

From Equation (1), the transient current pulse created by a particle strike for each given LET can be calculated. By charging and discharging the circuit node capacitance, the single event transient current pulse is converted into a transient voltage pulse in Figure 2. Following the preceding discussion, Figure 3 gives a neutron-induced soft error model for logic circuits. Because the probability per hit is related to the neutron flux which is location dependent, we can easily get the circuit SER in units of *FIT* for different locations if the corresponding neutron flux data is available.

In summary, this probabilistic soft error model is based on two considerations: (1) the occurrence of



Figure 2. Transforming statistical neutron energy spectrum to SET width statistics.

SEUs, presented as the soft error frequencies and (2) once an SEU occurs, it exists in the logic circuit as SETs with different pulse width densities represented as probability density functions. Note that the pulse width is not the pulse duration between its half peak-peak values, but is the half of the power supply value in the logic circuit.



Figure 3. Proposed probabilistic neutron induced soft error model for logic.

## **3** Gate-Level SET Propagation

Having discussed the modeling of soft errors by two factors (frequency and density), we will now discuss the propagation of errors through a logic gate.

### 3.1 Pulse Widths Probability Density Propagation

Assume that the input SET width is a random variable X with probability density function  $f_x(X)$ , the SET pulse width density function  $f_y(Y)$ . We calculate these two parameters at the output of the gate. Suppose the function g expresses the relationship between variable X and variable Y: Y=g(X). The mathematical model of propagation is a function of random variable. The pulse width density propagation function g for each individual gate is obtained as follows:

X, Y are random variables

X: input pulse width, Y: output pulse width

 $f_X(x)$ : probability density function of X  $f_Y(y)$ : probability density function of Y

Given function 
$$g: Y = g(X)$$
  
 $g: Y = X\{p: W/L, n: W/L, C_{load}, technology\}$ 

Assume g is differentiable and an increasing function, so g' and  $g^{-1}$  exist. Then,

$$\int_{x}^{x+\Delta x} f_{X}(s)ds = \int_{y}^{y+\Delta y} f_{Y}(t)dt$$

$$\implies f_{X}(x)\Delta x = f_{Y}(y)\Delta y$$
i.e.,  $f_{Y}(y) = \lim_{\Delta x \to \infty} f_{X}(x)\frac{\Delta x}{\Delta y}$ 

$$= \lim_{\Delta x \to \infty} f_{X}(x)\frac{1}{\Delta y/\Delta x}$$

$$= \frac{f_{X}(x)}{g'(x)}$$

$$\implies \mathbf{f_{Y}(y)} = \frac{\mathbf{f_{X}(x)}}{g'(x)}$$

The pulse width propagation depends on the wire load capacitance and the induced soft error pulse at the input of the gate will propagate only if the affected node is on a sensitized path of the circuit. From HSPICE simulation we find that the function  $\boldsymbol{g}$  is a nonlinear transmission function. However, a linear "3-interval" propagation model can give a good approximation. Given a sensitized path of a generic gate, depending on the input pulse width and the gate input-output delay there are three intervals of possible input glitch durations that can be identified [11]. Thus, for a generic logic gate, the pulse width propagation model is:

- 1. Propagation with no attenuation, if  $D_{in} \geq 2\tau_p$ .
- 2. Propagation with attenuation, if  $\tau_p < D_{in} < 2\tau_p$
- 3. Non-propagation, if if  $D_{in} \leq \tau_p$ .

#### Where

- $D_{in}$ : input pulse width
- $D_{out}$ : output pulse width
- $\tau_p$ : gate input output delay

We validated this propagation model by simulating a CMOS inverter using HSPICE. The results are shown in Figure 4. This CMOS inverter is in TSMC035 technology with nmos W/L ratio =  $0.6\mu/0.24\mu$  and pmos W/L ratio =  $1.08\mu/0.24\mu$ . Rising gate delay was 41.5ps and falling gate delay was 30.8ps for load capacitance of 10fF. We use an average gate delay of 36.0ps in the proposed propagation model. The mathematical expression is given in Equation (2). In Figure 4, X axis is the input pulse width and the Y axis is the output pulse width. We observe that when input pulse width is greater than 72ps the output pulse width can be either greater or smaller than the input pulse width, depending on the input pulse type. These differences are caused by different rising and falling delays. Thus, the proposed model is a good approximation to HSPICE.

$$D_{out} = \begin{array}{c} 0 & \text{if } D_{in} \le 36.0ps \\ (D_{in} - 36.0) \times \frac{72.0}{36.0} & \text{if } 36.0ps < D_{in} < 72.0ps \\ D_{in} & \text{if } D_{in} \ge 72.0ps \end{array}$$

$$(2)$$



Figure 4. Comparison of proposed model and HSPICE simulation for CMOS inverter with 10fF load capacitance.

For this CMOS inverter with output load capacitance 10fF, an illustration of the monotonic mapping of probability density  $f_y(Y)$  is given in Figure 5. The characteristics of the three regions in this figure are: the input pulse width in regions 1, 2 or 3, respectively, will be filtered, attenuated, or pass without attenuation. A pulse being filtered actually assumes the shape of a delta function. Similarly, we simulated all gates by HSPICE to extract the gate delays and build the propagation model g. Similar agreements as in Figure 4 were observed for all other logic gates.

### 3.2 Logic SEU Probability Propagation

Because all pulse widths are greater than or equal to 0, we have

$$\int_0^\infty f_Y(y)dy = \int_0^\infty f_X(x)dx = 1 \tag{3}$$

In  $f_X(x)$  to  $f_Y(y)$  conversion, there is a fraction of pulses being filtered out or attenuated due to electrical masking. We define electrical masking ratio (EMR) as the fraction of pulses that survives propagation in Equation (4):

$$EMR = \frac{\int\limits_{y\geq 0} f_Y(y)dy}{\int\limits_{x\geq 0} f_X(x)dx} \tag{4}$$

If SEU occurs on input 1 of logic gate *j* in Figure 6 then the output soft error probability is calculated by Equation (5):

$$P_{SEU}(o) = P_{SEU}(1) \cdot \underbrace{EMR_{j}}_{Electrical} \cdot \underbrace{\prod_{i}^{i} [P_{non-controlling}(i)]}_{Logic\ Masking}$$
(5)



Figure 5. Pulse width density propagation through a CMOS inverter with  $10\,fF$  load.



Figure 6. A generic gate with particle strike on node 1.

# 4 Experimental Results

We simulated ISCAS85 benchmark circuits and inverter chains of varying lengths by a simulator developed in C programming language. For simplicity, we assume that all the circuits are working at the ground level and the probability of SEU per particle hit is  $10^{-4}$ . For ground level we use the neutron energy statistics discussed in Section 3. We assume the SET width density per circuit node follows the *normal* distribution with mean  $\mu = 150$  and standard deviation  $\sigma = 50$ . These assumptions are justified for relatively small value of particle flux and small chip area. From [17], the total neutron flux at sea level is  $56.5m^{-2}s^{-1}$ . For a CMOS circuit in TSMC035 technology, we assume a relative large sensitive region  $(1000\mu m^2)$  for each circuit node. For a circuit with n primary outputs and m nodes, the SER is  $\frac{1}{n}\sum_{i=0}^{n}(\frac{1}{m}\sum_{j=0}^{m}SER_{i\_caused\_by\_j})$ . The unit for SER is FIT, which means failures in  $10^9$  hours of operation [14]. From Table 2 we see that SER increases almost linearly as the increasing length of inverter chains. That is because in the inverter chain, there is no logic masking and there will always be a portion of SEUs un-

Table 1. Estimated error rates for ISCAS85 benchmark circuits.

| Circuit | #   | #   | #     | CPU   | SER    |
|---------|-----|-----|-------|-------|--------|
|         | PIs | POs | Gates | s     | (FITs) |
| c17     | 5   | 2   | 6     | 0.01  | 36.79  |
| c432    | 36  | 7   | 160   | 0.04  | 105.63 |
| c499    | 41  | 32  | 202   | 0.14  | 21.88  |
| c880    | 60  | 26  | 383   | 0.08  | 38.82  |
| c1908   | 33  | 25  | 880   | 1.14  | 74.27  |
| c2670   | 233 | 140 | 1193  | 0.77  | 28.82  |
| c5315   | 178 | 123 | 2307  | 2.78  | 55.72  |
| c7552   | 207 | 108 | 3512  | 10.82 | 66.52  |

Table 2. Estimated error rates for inverter chains.

| Circuit | #   | #   | #     | CPU  | SER    |
|---------|-----|-----|-------|------|--------|
|         | PIs | POs | Gates | S    | (FITs) |
| inv2    | 1   | 1   | 2     | 0.00 | 28.19  |
| inv5    | 1   | 1   | 5     | 0.00 | 53.88  |
| inv10   | 1   | 1   | 10    | 0.00 | 96.54  |
| inv20   | 1   | 1   | 20    | 0.00 | 181.85 |
| inv50   | 1   | 1   | 50    | 0.00 | 437.80 |
| inv100  | 1   | 1   | 100   | 0.04 | 864.73 |

der the current environmental condition that will survive through inverters no matter how long the chain is. But in Table 1 for logic circuits, the SER does not increase with the number of gates. The logic masking in these circuits seems to increase with increased number of gates. The field test data for logic circuits is largely unavailable but the actual neutron experiments on a test chip would help to validate our analysis in the future. The CPU times for these results are for a Sun Fire 280R workstation.

### 5 Conclusion

In this paper we presented a environment-dependent soft error model for logic circuits based on both error frequency and the SET density. An error propagation scheme through logic gates is developed. We take electrical masking into account. The SEU pulse width information at the primary outputs can help analyze the timing and space redundancy schemes. However, our error rates may be pessimistic because ours is a static approach, in which signal probabilities are used instead of their actual logic values. In real cases, depending of the actual signal values, some paths may not be activated further increasing the masking. Different types of circuits with different topologies will have significantly different SERs. Such studies provide good insight.

### 6 Acknowledgment

The authors express thanks to colleague Jins Alexander for his help and to anonymous reviewers for useful comments.

### References

- G. Asadi and M. B. Tahoori, "An Accurate SER Estimation Method Based on Propagation Probability," *Proc. De*sign Automation and Test in Europe Conf., pp. 306–307, 2005.
- [2] V. Carreno, G. Choi, and R. K. Iyer, "Analog-digital simulation of transient-induced logic errors and upset susceptibility of an advanced control system," in NASA Technical Memo 4241, 1990.
- [3] K. J. Hass and J. W. Ambles, "Single Event Transients in Deep Submicron CMOS," *Circuits and Systems, 42nd Midwest Symposium on*, vol. 1, 1999.
- [4] J. P. Hayes, I. Polian, and B. Becker, "An Analysis Framework for Transient-Error Tolerance," in VLSI Test Symposium, 25th IEEE, 2007, pp. 249–255.
- [5] JEDEC, "Measurements and Reporting of Alpha Particles and Terrestrial Comic Ray-Induced Soft Errors in Semiconductor Devices," *JESD89, August*, 2001.
- [6] M. Mason, "Automotive Failures from Space?—Neutron and Alpha Particle SEU Failures in SRAM Technologies," Technical report, Actel Corporation, Feb., 2006.
- [7] G. C. Messenger, "Collection of Charge on Junction Nodes from Ion Tracks," *IEEE Trans. Nuclear Science*, vol. 29, no. 6, pp. 2024–2031, 1982.
- [8] G. C. Messenger and M. Ash, *Single Event Phenomena*. Chapman & Hall, 1997.
- [9] S. S. Mitra, N. Kee, and S. Kim, "Robust System Design with Built-In Soft-Error Resilience," *IEEE Design & Test Computers*, vol. 38, no. 2, pp. 43–52, 2005.
- [10] H. T. Nguyen and Y. Yagil, "A Systematic Approach to SER Estimation and Solutions, journal = Reliability Physics Symposium Proceedings, 41st Annual. IEEE International," pp. 60–70, 2003.
- [11] M. Omana, G. Papasso, D. Rossi, and C. Metra, "A Model for Transient Fault Propagation in Combinatorial Logic," in *Proc. 9th IEEE On-Line Testing Symp.*, 2003, pp. 111–115.
- [12] T. Rejimon and S. Bhanja, "An Accurate Probabilistic Model for Error Detection," in VLSI Design, 2005. 18th International Conference on, 2005, pp. 717–722.
- [13] G. R. Srinivasan, "Modelling the Cosmic Ray-Induced Soft-Error Rate in Integrated Circuits: An Overview," *Microelectronics Reliability*, vol. 37, no. 4, pp. 691–691, 1997.
- [14] F. Wang and V. D. Agrawal, "Single event upset: An embedded tutorial," in VLSI Design, 2008. Held jointly with 7th International Conference on Embedded Systems., 21th International Conference on, 2008, pp. 429–434.
- [15] C. Zhao and S. Dey, "Evaluating and Improving Transient Error Tolerance of CMOS Digital VLSI Circuits," in *Test Conference, ITC '06. IEEE International*, 2006, pp. 1–10.
- [16] J. F. Ziegler, "IBM Experience in Soft Fails in Computer Electronics (1978-1994)," *IBM Journal of Research and Development*, vol. 40, no. 1, pp. 3–18, 1996.
- [17] J. F. Ziegler, "Terrestrial cosmic rays," *IBM Journal of Research and Development*, vol. 40, no. 1, pp. 19–39, 1996.