Zhao 2018

Prediction of Electrical and Physical Failure
Analysis Success Using Artificial Neural Networks

Lin Zhao, SH Goh, YH Chan, BL Yeoh, Hao Hu, MH Thor, Alan Tan, Jeffrey Lam
GLOBALFOUNDRIES, Technology Development, Product Test and Failure Analysis, Singapore
Email: szuhuat.goh@globalfoundries.com
Abstract- It is well known that fail dies that exhibit obvious static
power supply leakage current have a higher success of finding a
defect, hence, a higher likelihood to be selected for failure
analysis. When confronted with choices, fail dies with supply
current similar to reference will be omitted. Valuable defect
learnings are lost as a result. In this work, an artificial neural
network is developed to perform the task of predicting failure
analysis success confidence level on fail dies. The input
parameters are dynamic current measured from multiple power
supplies and an output accuracy of 80% on average is achieved.
Besides an enhanced productivity from automating the dies
selection process, more importantly, fail dies which are otherwise
neglected, can be identified. This first demonstration is a new
milestone for the application of machine learning to the failure
analysis domain.
Keywords – Artificial Neural Network, ATE testing, Failure
Analysis Success Rate, Dynamic Photon Emission Microscopy.
I. INTRODUCTION Fig. 1. Dynamic Current.
To achieve an efficient and effective product yield ramp is Beyond AC current, a previous report suggests characterizing
more than striving for 100% physical failure analysis (FA) failing dies on wafer level according to failing pins/ cycles [1] in
success. The fact is: Finding the right defect that really matters an effort to seek dies with abnormal AC current response.
to yield is the key. Conventional approaches using curve trace However, the electrical fault isolation (EFI) success rate
and DC supply current variation to identify fail dies for analysis outcome is still low. Based on experience, it is not more than
is no longer adequate to reveal defects that do not exhibit 30%. In other words, more FA resource is required to achieve
abnormal observable traits. This is common for subtle shorts, sufficient amount of FA learnings. This increases the cost and
opens in interconnects, functional failures and quiescent current lowers productivity which is not desirable.
(IDDQ), just to name a few. An example is shown in Table 1. With increasing demands to shorten the yield ramp cycle in
Bad die 2 has similar static DC current to reference dies and sub-20nm technologies, it is crucial as part of the fault isolation
would be omitted as compared to bad die 1. However, the process, not just to identify suitable fail dies faster (productivity,
dynamic current (current extracted during test vector run) as efficiency) and ensure a high success rate (effectiveness), it is
shown in figure 1 clearly shows that bad die 2 has lower equally important to expand the scope of diagnosis (holistic
dynamic current as compared to the reference dies, indicating analysis). In the current context, this implies that failing dies
some anomaly. Valuable learnings are lost as a result of the that do not exhibit strong AC anomaly should not be omitted for
cherry-picking nature of the DC analysis for selection. analysis in order to minimize missed discovering valuable
defects for critical process feedback.
TABLE 1. Static DC current of some reference and bad Presently, these challenges, especially the aspect on holistic
dies. learning, are not well addressed by research reports which
essentially centers on advancing resolution and sensitivity of FI
techniques. We propose machine learning to bridge the gap. In
recent years, there has been considerable renewed interest in
machine learning with the evolution of industries to incorporate
big data analytics to enhance productivity, derive new insights
or to make better and timely decisions [2]. Artificial Neural
Networks (ANN) is one class of machine learning. It is capable
to handle large amount of data with complex and non-linear that the choice of inputs is device-specific. It means different
relationship. ANN has been proven to be a universal device will have different criteria for input selection. The hidden
approximator even with a single hidden layer if there are nodes take in the sum of the product of the weights wji and input
sufficient hidden nodes [3]. Figure 2 depicts a typical data xi, and execute a sigmoid function to obtain the input for the
multi-layer ANN architecture. It consists of the input layer output node:
which holds the input parameters to feed into the system for
processing; the hidden layers which consists of nodes with
appropriate activation functions and are inaccessible by the , (1)
developer during processing; and the output layer which
provides the result depending on the purpose of the system. In
where
general, the number of hidden layers is not limited to 2 as shown
in the figure.
. (2)
The output node in turn sums the product of calculated weights

and uj to obtain the output v which is defined as:
, (3)
where
. (4)
Fig. 2. Typical ANN Architecture.
In integrated circuits (IC) defect characterization, ANN has

been adopted for fault classification on analog and mixed
signals circuits in the absence of comprehensive fault test
models [4-6]. There is no known report on the deployment of
ANN to enhance the FA process, specifically, EFI which is the
most critical step in FA to reduce the search area before physical
inspection.
In this paper, an ANN is developed to predict whether a fail
die will give rise to a positive PFA to guide the die selection
process for EFI. Both static DC and dynamic current under
various test conditions are extracted as the input parameters. A
Fig. 3. Proposed ANN system.
supervised feedforward network is explored. The next section
describes the training and optimization procedures for the
system. Section 3 discusses the impact on learning rate and the
number of learning iterations on the accuracy. Section 4 assesses
the system’s performance using actual memory built-in self-test
failures and dynamic photon emission microscopy. The last
section concludes.
II. TRAINING AND LEARNING
A. Neural Network and Learning Algorithm

Figure 3 shows the proposed multi-layer perceptron system
used in this work. It consists of one input layer with 32 inputs,
one hidden layer with a number of nodes and an output layer.
The 32 inputs comprise standard DC current measurement and
dynamic current values from the various supply domains of the Fig. 4. VDD dynamic current with extracted values as input
device, extracted at the peaks and troughs of the dynamic signals indicated.
current. An extract is shown in figure 4. It should be emphasized
The sigmoid and hyperbolic tangent functions are commonly this case, the objective is to predict fail dies that would give rise
used activation functions in ANN. They have good continuity to both an EFA and PFA success to guide dies selection for FA.
and differentiability because of the nature of equation which
will benefit the backpropagation algorithm used in this ANN
system [7]. A bias node with a value of 1 is added in the input
layer and hidden layer which provide more flexibility for the
model to fit the data especially when all the data is normalized to
(0, 1). The learning of the ANN is based on the backpropagation
algorithm [8]. In essence, the system first initializes wji to small
random values between (-1. 1). The product sum of inputs and
weights reach the hidden layer nodes. After passing through
activation function, the result is directed to outputs. Next, as the (a) (b)
training dataset has known PFA result, the average Mean Square Fig. 5. (a) Signal overlay to optical image analyzed in
Error (MSE) between network output and known outcome in refractive solid immersion lens and (b) SEM image of the
each training dataset is calculated. In the backpropagation active bridge defect.
algorithm, the Steepest Descent iterative algorithm is applied to
minimize the MSE [9]. Therefore, by deriving MSE with respect TABLE 2. List of dies selected for experiment.
to weights, the amount of adjustment can be calculated. As a Training set Validation set Test set 1
result, in the next iteration, weights will be updated by the Bin1 77 18 14
multiplication of adjustment and learning rate. This process EFA/PFA
continues until the average MSE achieves a desired threshold or 51 12 15
success
training iterations reach the limit. MSE is defined as:
Total 128 30 29
. (5) C. Optimization
During the learning process, it is important to not under-fit or
over-fit the system. System is under-fitted when the training
iteration is insufficient and as a result, the trained solution is not
where d is the desired known outcome and N is the total number optimal. Over-fitting refers to a system which is trained too well
of training datasets, in this case, the total number of dies used for to fit the training dataset. In this case, although the training
training. Next, by the differentiation of MSE with respect to accuracy can be very high, the trained model also learns the
weights, the amount of adjustment in weights can be calculated. details and noise of the training dataset which are not applicable
to an unfamiliar testing dataset. It has a negative impact to the
. (6) training result. In this work, the plot of average MSE of training
set and validation set against number of iterations is adopted to
assess the minimum number of iterations required. According to
where w is the weight, E is the MSE as defined in Eq. (5) and η the validation based early-stopping technique, the optimal point
is the learning rate which refers to the step size of the changes. is believed to be at the minimum MSE of the validation set [10].
Figure 6 shows the proposed ANN learning outcome. The
B. Data Sets number of iterations required is close to 6500. Although the
In general, there are three types of datasets, namely the MSE for the training set continues to reduce, this is expected but
training, validation and testing sets. Both the training and not ideal because too much training iterations will lead to
validation sets have desired outcomes for supervised learning. A over-fitting.
total of 78 fail dies that have successfully revealed defects (of The learning rate LR is another key parameter. It affects the
various types ie. opens, bridges, active defects, metal defects number of iterations. The current learning rate is set at 0.008
etc.) after dynamic EFA are involved in training the ANN. In
this work, we leverage the failures from memory built-in
self-test to demonstrate the value of ANN. Dynamic photon
emission microscopy is employed for FI. Figure 5 shows the
results on a fail die. Figure 5(a) and (b) show the EFA signal
overlay to optical image and the SEM image of the bridge defect
respectively. Approximately 65% of the known bad dies are
designated as the training set while 15% are reserved as the
validation set which is necessary to avoid system over-fitting.
The rest of dies form test set to evaluate the system
performance. Reference dies (bin 1) are also included in the
datasets. Table 2 summarizes. The choice of dies designated for
training and validation depends on the purpose of the ANN. In
Fig. 6. MSE variations of training and validation sets with

number of learning iterations.
granulations in this work. In general, an increase in learning rate on these dies. By applying the trained network, it is found that
reduces the minimum number of iterations as shown in figure 7. an accuracy of approximately 83% can be achieved in
Although this saves training time, there are two tradeoffs. First, classifying a high-confidence EFA/PFA success or Bin 1 die
the MSE increases as well. This directly affects the accuracy as correctly. This is a reasonably good performance as a first report
shown in figure 7(a). Second, a too high LR such as 0.25 leads to although a gap for further improvement exists. Nevertheless,
a scenario whereby the minimum point of the corresponding this result demonstrates the feasibility of automating the bad
MSE plot cannot be easily discerned. The MSE is oscillating as dies selection which will result in positive PFA. Additionally, to
shown in figure 7(b). Therefore, it is important to characterize demonstrate the effects adopting a lower and higher number of
the LR against a known testing set to ensure an optimal network. iterations required, taking reference from the minimum MSE
point (based on validation set), the accuracies were calculated.
The result is an approximate 45% and 70% respectively which is
expected.
TABLE 3. Accuracy achieved on different datasets.

Training set Validation set Test set
Bin1 98.70% 100% 80%
EFA/PFA
success 98.04% 100% 85.71%
Total 98.06% 100% 82.76%
B. Value-add PFA learning

The second test case consists of fail dies on a wafer with no
prior FA analysis. Conventional dynamic current analysis is
(a) employed to seek fail dies with abnormal current for EFI. Figure
8 presents the results of some of the bad dies that do not exhibit
any obvious anomaly. They have current characteristics similar
to the reference fail dies. Following custom practice, they will
be excluded for further FA analysis due to perceived low
success rate. As discussed earlier, valuable defect learnings
could be lost.
(b)
Fig. 7. MSE against number of iterations (a) at different LR
and (b) zoom-in on selected number of iterations for
LR=0.25.
III. EXPERIMENTAL RESULTS

Fig. 8. Vdd dynamic current of some fail and reference dies
A. Actual Performance from testing dataset 2.
It is important to emphasize that both the training and
validation sets play the vital role of training the system as ANN analysis is performed for fail dies selection. In addition
detailed in the previous section. Table 3 shows the accuracy to fail dies that exhibit abnormal current, ANN further suggests
indices that are derived from the training. It should be 2 fail die which are not included in the conventional selection.
highlighted that they indicate the best achievable accuracy and To demonstrate the value-proposition, dynamic photon
the actual performance of the system in predicting the fail dies emission microscopy is performed and the result is shown in
correctly depends on subsequent testing. figure 9(a). Signals are observed in the memory array. PFA
The first test case comprises of 29 dies of which 15 are bad reveals a missing contact defect. The absence of a short explains
dies with abnormal current. Prior EFA and PFA are performed why the current is comparable to reference.
[5] A Rathinam et al., “Fault Diagnosis in Analog Integrated
Circuits Using Artificial Neural Networks”, International
Journal of Computer Applications, Vol 1, Issue 27, 2010, pp.
78-84.
[6] A Rathinam et al., “Fault Classification in Mixed Signal
Circuits using Artificial Neural Networks”, Indian Journal
of Science and Technology, Vol 9, Issue 38, 2016, pp. 1-7.
[7] P Sibi, S Allwynjones, P Siddarth, “Analysis of different
activation functions using back propagation neural
networks”, J Theor Appl Inf Technol, 47(3), 2013, pp.
(a) (b) 1344–1348.
Fig. 9. (a) Signal overlay to optical image and (b) SEM image [8] B Krose and PVD Smagt, “An Introduction to Neural
of the missing contact defect. Networks”, University of Amsterdam, 1996.
[9] P. Baldi, "Gradient descent learning algorithm overview: a
IV. CONCLUSION general dynamical systems perspective," in IEEE
Transactions on Neural Networks, vol. 6, no. 1, pp. 182-195,
The importance and challenges of identifying suitable fail Jan 1995.
dies that do not exhibit abnormal electrical characteristics for FI [10] A. Lodwich, Y. Rangoni and T. Breuel, "Evaluation of
is emphasized. The defect learnings from such dies potentially robustness and performance of Early Stopping Rules with
hold the answer to explain a scenario of plateauing product yield Multi Layer Perceptrons," 2009 International Joint
which is highly undesirable to meet current demands in product Conference on Neural Networks, Atlanta, GA, 2009, pp.
time to market. Because of a low EFI success rate, the natural 1877-1884.
practice to overcome this challenge is to analyze multiple dies
and hope for a lucky breakthrough. Certainly, this is not an
efficient approach but there is no better solution till date.
We present the development of an ANN system based on
multiple but easily accessible tester measurements as inputs to
address the issue. It is demonstrated using real test cases that the
trained ANN is capable to identify fail dies that could
potentially lead to positive PFA findings with an accuracy of
approximately 80%. These dies include those that are usually
omitted for EFI. This methodology is proven on 2 dies that
undergo the entire FA process.
Machine learning has immense potential for FA applications
in terms of productivity and techniques enhancements, yet to be
explored. This work is just the beginning. More work to
optimize the number of inputs and the other key parameters as
highlighted in this paper are underway.
REFERENCES
[1] SH Goh et al., “Yield-Oriented Logic Failure

Characterization for FA Prioritization”, EDFA Magazine,
vol. 16, Issue 3, ASM, 2014, pp. 4–12.
[2] B Widrow et al., “Neural networks: applications in industry,
business and science”, Communications of the ACM, 1994,
pp. 93.
[3] K Hornik, M Stinchcombe, H White, “Multilayer
feedforward networks are universal approximators”, Neural
Networks, Elsevier, Volume 2, Issue 5, 1989, pp. 359-366.
[4] V Stopjakova et al., “Defect detection in analog and mixed
circuits by neural networks using wavelet analysis”, IEEE
Transactions on Reliability, Vol 54, Issue 3, 2005, pp. 441 –
448.

Zhao 2018

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zhao 2018

Uploaded by

Copyright:

Available Formats

Prediction of Electrical and Physical Failure

Analysis Success Using Artificial Neural Networks

I. INTRODUCTION Fig. 1. Dynamic Current.

The output node in turn sums the product of calculated weights

Fig. 2. Typical ANN Architecture.

In integrated circuits (IC) defect characterization, ANN has

II. TRAINING AND LEARNING

A. Neural Network and Learning Algorithm

Fig. 6. MSE variations of training and validation sets with

TABLE 3. Accuracy achieved on different datasets.

B. Value-add PFA learning

III. EXPERIMENTAL RESULTS

[1] SH Goh et al., “Yield-Oriented Logic Failure

You might also like