From: DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS Edited by Israel Koren (Plenum Publishing Corporation, 1989) # ON THE PROBABILITY OF FAULT OCCURRENCE Sharad C. Seth and Vishwani D. Agrawal\* University of Nebraska Department of Computer Science Lincoln, NE 68588 \*AT&T Bell Laboratories Computer Systems Research Department Murray Hill, NJ 07974 ### INTRODUCTION In this paper we introduce the probability of fault occurrence. For a modeled fault (e.g., a stuck fault) this is the probability with which the fault will occur on a chip. The occurrence of a fault is only observable as fault indication by a test capable of detecting it. We determine the probability of fault occurrence from chip test data. No attempt is made to find these probabilities for individual faults but they are determined only as a distribution for all faults. Combined with detection probabilities (related to the conventional fault coverage), fault occurrence probabilities provide a revised coverage requirement versus chip quality relation. In the previous work [1,2,3], attempts are made to find a relation between fault coverage and product quality. Product quality is measured as *reject ratio* or the ratio of faulty chips among those passing tests. The reject ratio is found by combining the actual chip (or wafer) data with fault coverage data. For a required reject ratio, it is possible to determine the necessary fault coverage of tests. This coverage, in general, depends on certain processing parameters that also determine the yield. The work in the papers cited above has been used by several workers to analyze experimental data [4,5,6]. However, there are difficulties with these analyses. For generally acceptable reject ratios, e.g., 1 in 10,000, most computed fault coverage requirements turn out to be almost 100 percent. Such coverage is not considered possible by the designers of large VLSI chips. There are several reasons for this. First, large circuits have some redundant faults and algorithms to find redundancies are very complex. Second, the alternative approach of exhaustive testing (where redundant faults may be ignored) is applicable only to the special class of circuits that can be partitioned into small combinational blocks with access to inputs and outputs. A third reason is the arbitrariness in measuring fault coverage. Different fault simulators make different assumptions about race faults, oscillation faults, hyperactive faults, etc. The detection probability of a fault is really a conditional probability. It is the probability of detection by an input vector, given the fact that the fault is present. We have studied this probability in a recent paper [7]. In the present work, we also consider the fault occurrence probability. It is defined for each fault as the probability of its being found on a chip. Just as all faults are not equally detectable, they also do not occur with equal probability. The product of these two probabilities is the absolute failure probability of a chip by a test vector. In other words, a more realistic coverage requirement can be obtained if the faults were weighted by their occurrence probability. For example, a fault that never occurs will have a zero weight and is not required to be covered. We analyze wafer test data to illustrate our analysis. The data on the measured fraction of failing chips versus the number of vectors is used to empirically determine a failure probability density function. From this, true yield and reject ratio are easily estimated without any fault coverage analysis. # NOTATION c Total number of chips tested N Total number of test vectors applied $c_i$ Number of chips that fail exactly at vector number i y True yield, i.e., fraction of good chips $y_n$ Estimated yield of chips after application of n vectors $r_n$ Reject ratio after application of n vector, $(y_n - y)/y_n$ #### **THEORY** We assume that a chip failure on an applied test vector is a random event with the associated density function f(x). Thus, f(x)dx is the fraction of chips that fail on a vector with a probability between x and x+dx. Since only a fraction 1-y of the total chips can fail, we have $$f(x) = y\delta(x) + p(x)$$ where, $\delta(x)$ is the Kronecker delta function and p(x) is a partial density function. Clearly, x = Prob. (A fault has occurred and is tested by the vector) Since f(x) is a density function, $$\int_{0}^{1} f(x)dx = y + \int_{0}^{1} p(x)dx = 1$$ Therefore, $$\int_{0}^{1} p(x)dx = 1 - y$$ Now suppose after application of n test vectors, a certain fraction of the chips has not failed. The expected value of this fraction is the yield of chips after n vectors and is denoted by $y_n$ . Clearly, $$y_n = \int_0^1 (1-x)^n f(x) dx = y + \int_0^1 (1-x)^n p(x) dx$$ (1) It is easily verified that $y_0 = 1$ and $y_{\infty} = y$ , that is, before the testing starts the yield is 100% and if testing continues indefinitely, only the good chips would contribute to the yield. Our aim is to estimate f(x) from the chip failure data obtained while testing a sample of c chips by a test sequence of N vectors. As these vectors are applied, we record the number of chips that fail for the first time at each vector. For vector number i we will denote the number of such chips by $c_i$ . Using Bayes theorem [8] with a uniform a priori failure probability, we have $$f(x) = y\delta(x) + \left(1 - y - \frac{1}{c} \sum_{i=1}^{N} c_i\right) \frac{(1-x)^N}{\int\limits_0^1 (1-x)^N dx} + \frac{1}{c} \sum_{i=1}^{N} c_i \frac{x(1-x)^{i-1}}{\int\limits_0^1 x(1-x)^{i-1} dx}$$ The derivation of the above equation is similar to the one described in a recent paper [7]. The three terms on the right hand side represent, respectively, the contributions to f(x) due to the fault-free chips, the faulty chips not rejected by any of the N vectors, and the remaining chips that are grouped according to vectors at which they failed. On evaluating the integrals on the right hand side we can rewrite the above equation as $$f(x) = y\delta(x) + \left(1 - y - \frac{1}{c}\sum_{i=1}^{N}c_{i}\right)(N+1)(1-x)^{N} + \frac{1}{c}\sum_{i=1}^{N}c_{i}i(i+1)x(1-x)^{i-1}$$ (2) Substituting this expression for f(x) in Equation (1), we find the yield after n vectors as $$y_n = y + \left(1 - y - \frac{1}{c} \sum_{i=1}^{N} c_i\right) \frac{N+1}{N+n+1} + \frac{1}{c} \sum_{i=1}^{N} c_i \frac{i(i+1)}{(n+i)(n+i+1)}$$ (3) In the above expressions, the yield y is still an unknown parameter. It can be evaluated by equating the estimated yield after N vectors to the measured yield. Thus $$y_N = 1 - \frac{1}{c} \sum_{i=1}^{N} c_i$$ Substituting for $y_N$ from Equation (3) and solving for $y_N$ we get $$y = 1 - \frac{1}{c} \sum_{i=1}^{N} c_i - \frac{2N+1}{N} \cdot \frac{1}{c} \sum_{i=1}^{N} c_i \frac{i(i+1)}{(N+i)(N+i+1)}$$ (4) From Equations (3) and (4) it is now possible to compute the reject ratio as follows: $$r_n = \frac{y_n - y}{y_n} \tag{5}$$ ### **EXPERIMENT** To illustrate the application of the above analysis, we will use the experimental test data of a VLSI chip from a previous paper [2]. The measured yield of this chip for a test sequence of 3062 vectors is 0.469062. Of the 1002 tested chips, 470 did not fail after all vectors had been applied. The measured yield as a function of vectors is shown in Fig. 1. The true yield, as computed from Equation (4), was 0.46082. Figure 1 also shows the estimated yield $y_n$ , as computed from Equation (3), for three different values of true yield. Only for the yield given by Equation (4), $y_n$ closely matches the experimental data. This shows the sensitivity of our expression to the value of the true yield y and hence its suitability for estimating yield. Reject ratio, as computed from Equation (5), is shown in Fig. 2. At the end of 3062 vectors, the computed reject ratio is about 0.017 or 17 faulty chips per thousand. The fault coverage of these vectors was about 90 percent [2]. Figure 2 predicts a reject ratio of 0.0001 or one faulty chip per 10,000 for about 500,000 vectors. Notice that this computation does not use fault coverage but only involves the use of the probabilities of fault occurrence and detection that are obtained from the actual measured data. Fig. 1 Yield determination from measured test data. Fig. 2 Computed reject ratio. Fig. 3 Failure probability density function. Figure 3 shows the failure probability density function f(x) as computed from Equation (2). The delta function at the origin corresponds to the estimated true yield or good chips. The remaining part, i.e., p(x), is the density of faulty chips. As explained earlier, this is the joint probability of a fault occurring and of its being detected by a vector. Evidently, p(x) has two parts. One is a sharp peak near the origin and the other is a function of relatively lower magnitude. The peak corresponds to those faulty chips that survived all 3,062 vectors. This portion is mainly responsible for field rejects. It can be conjectured that the second portion of p(x) corresponds to failure modes similar to stuck type faults. Since the vectors were generated for such faults, the detection probabilities are higher. The faults in the peak near the origin might correspond to other failure modes like delay faults, stuck open faults, etc., for which the tests were not specifically targeted. Further investigation should lead to better insight. ### CONCLUSION We have given an entirely new method of assessing VLSI test quality. The tested product quality is directly evaluated from test data. This eliminates the need for complex fault models and fault simulation. Results are more realistic than coverage evaluation because the probability of fault occurrence is implicit in the analysis. Further investigation may lead to better methods of test generation. For example, tests are required to detect the low detection probability faults that have non-zero probability of occurrence rather than detecting stuck type faults that may never occur. # REFERENCES - [1] V. D. Agrawal, S. C. Seth, and P. Agrawal, "Fault Coverage Requirement in Production Testing of LSI Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-27, pp. 57-61, Feb. 1982. - [2] S. C. Seth and V. D. Agrawal, "Characterizing the LSI Yield from Wafer Test Data," IEEE Trans. on CAD, Vol. CAD-3, pp. 123-126, April 1984. - [3] T. W. Williams and N. C. Brown, "Defect Level as a Function of Fault Coverage," *IEEE Trans. Computers*, Vol. C-30, pp. 987-988, December 1981. - [4] W. R. Mann, Private communication. - [5] P. T. Wagner, Private communication. - [6] B. W. Woodhall, B. D. Newman, and A. G. Sammuli, "Empirical Results on Undetected CMOS Stuck-Open Failures," *Proc. Int. Test Conf.*, Washington, D.C., pp. 166-170, September 1987. - [7] V. D. Agrawal, H. Farhat, and S. Seth, "Test Generation by Fault Sampling," Proc. Int. Conf. Comp. Des. (ICCD-88), Rye Brook, NY, October 1988. - [8] A. Papoulis, *Probability, Random Variables, and Stochastic Processes*, McGraw-Hill, New York, 1965.