Professional Documents
Culture Documents
Zhao 2018
Zhao 2018
Email: szuhuat.goh@globalfoundries.com
Abstract- It is well known that fail dies that exhibit obvious static
power supply leakage current have a higher success of finding a
defect, hence, a higher likelihood to be selected for failure
analysis. When confronted with choices, fail dies with supply
current similar to reference will be omitted. Valuable defect
learnings are lost as a result. In this work, an artificial neural
network is developed to perform the task of predicting failure
analysis success confidence level on fail dies. The input
parameters are dynamic current measured from multiple power
supplies and an output accuracy of 80% on average is achieved.
Besides an enhanced productivity from automating the dies
selection process, more importantly, fail dies which are otherwise
neglected, can be identified. This first demonstration is a new
milestone for the application of machine learning to the failure
analysis domain.
Keywords – Artificial Neural Network, ATE testing, Failure
Analysis Success Rate, Dynamic Photon Emission Microscopy.
To achieve an efficient and effective product yield ramp is Beyond AC current, a previous report suggests characterizing
more than striving for 100% physical failure analysis (FA) failing dies on wafer level according to failing pins/ cycles [1] in
success. The fact is: Finding the right defect that really matters an effort to seek dies with abnormal AC current response.
to yield is the key. Conventional approaches using curve trace However, the electrical fault isolation (EFI) success rate
and DC supply current variation to identify fail dies for analysis outcome is still low. Based on experience, it is not more than
is no longer adequate to reveal defects that do not exhibit 30%. In other words, more FA resource is required to achieve
abnormal observable traits. This is common for subtle shorts, sufficient amount of FA learnings. This increases the cost and
opens in interconnects, functional failures and quiescent current lowers productivity which is not desirable.
(IDDQ), just to name a few. An example is shown in Table 1. With increasing demands to shorten the yield ramp cycle in
Bad die 2 has similar static DC current to reference dies and sub-20nm technologies, it is crucial as part of the fault isolation
would be omitted as compared to bad die 1. However, the process, not just to identify suitable fail dies faster (productivity,
dynamic current (current extracted during test vector run) as efficiency) and ensure a high success rate (effectiveness), it is
shown in figure 1 clearly shows that bad die 2 has lower equally important to expand the scope of diagnosis (holistic
dynamic current as compared to the reference dies, indicating analysis). In the current context, this implies that failing dies
some anomaly. Valuable learnings are lost as a result of the that do not exhibit strong AC anomaly should not be omitted for
cherry-picking nature of the DC analysis for selection. analysis in order to minimize missed discovering valuable
defects for critical process feedback.
TABLE 1. Static DC current of some reference and bad Presently, these challenges, especially the aspect on holistic
dies. learning, are not well addressed by research reports which
essentially centers on advancing resolution and sensitivity of FI
techniques. We propose machine learning to bridge the gap. In
recent years, there has been considerable renewed interest in
machine learning with the evolution of industries to incorporate
big data analytics to enhance productivity, derive new insights
or to make better and timely decisions [2]. Artificial Neural
Networks (ANN) is one class of machine learning. It is capable
to handle large amount of data with complex and non-linear that the choice of inputs is device-specific. It means different
relationship. ANN has been proven to be a universal device will have different criteria for input selection. The hidden
approximator even with a single hidden layer if there are nodes take in the sum of the product of the weights wji and input
sufficient hidden nodes [3]. Figure 2 depicts a typical data xi, and execute a sigmoid function to obtain the input for the
multi-layer ANN architecture. It consists of the input layer output node:
which holds the input parameters to feed into the system for
processing; the hidden layers which consists of nodes with
appropriate activation functions and are inaccessible by the , (1)
developer during processing; and the output layer which
provides the result depending on the purpose of the system. In
where
general, the number of hidden layers is not limited to 2 as shown
in the figure.
. (2)
, (3)
where
. (4)
. (5) C. Optimization
During the learning process, it is important to not under-fit or
over-fit the system. System is under-fitted when the training
iteration is insufficient and as a result, the trained solution is not
where d is the desired known outcome and N is the total number optimal. Over-fitting refers to a system which is trained too well
of training datasets, in this case, the total number of dies used for to fit the training dataset. In this case, although the training
training. Next, by the differentiation of MSE with respect to accuracy can be very high, the trained model also learns the
weights, the amount of adjustment in weights can be calculated. details and noise of the training dataset which are not applicable
to an unfamiliar testing dataset. It has a negative impact to the
. (6) training result. In this work, the plot of average MSE of training
set and validation set against number of iterations is adopted to
assess the minimum number of iterations required. According to
where w is the weight, E is the MSE as defined in Eq. (5) and η the validation based early-stopping technique, the optimal point
is the learning rate which refers to the step size of the changes. is believed to be at the minimum MSE of the validation set [10].
Figure 6 shows the proposed ANN learning outcome. The
B. Data Sets number of iterations required is close to 6500. Although the
In general, there are three types of datasets, namely the MSE for the training set continues to reduce, this is expected but
training, validation and testing sets. Both the training and not ideal because too much training iterations will lead to
validation sets have desired outcomes for supervised learning. A over-fitting.
total of 78 fail dies that have successfully revealed defects (of The learning rate LR is another key parameter. It affects the
various types ie. opens, bridges, active defects, metal defects number of iterations. The current learning rate is set at 0.008
etc.) after dynamic EFA are involved in training the ANN. In
this work, we leverage the failures from memory built-in
self-test to demonstrate the value of ANN. Dynamic photon
emission microscopy is employed for FI. Figure 5 shows the
results on a fail die. Figure 5(a) and (b) show the EFA signal
overlay to optical image and the SEM image of the bridge defect
respectively. Approximately 65% of the known bad dies are
designated as the training set while 15% are reserved as the
validation set which is necessary to avoid system over-fitting.
The rest of dies form test set to evaluate the system
performance. Reference dies (bin 1) are also included in the
datasets. Table 2 summarizes. The choice of dies designated for
training and validation depends on the purpose of the ANN. In
(b)
Fig. 7. MSE against number of iterations (a) at different LR
and (b) zoom-in on selected number of iterations for
LR=0.25.
REFERENCES