Professional Documents
Culture Documents
An Unsupervised Neural Network Approach For Automatic Semiconductor Wafer Defect Inspection
An Unsupervised Neural Network Approach For Automatic Semiconductor Wafer Defect Inspection
An Unsupervised Neural Network Approach For Automatic Semiconductor Wafer Defect Inspection
com
Expert Systems
with Applications
Expert Systems with Applications 36 (2009) 950–958
www.elsevier.com/locate/eswa
Abstract
Semiconductor wafer defect inspection is an important process before die packaging. The defective regions are usually identified
through visual judgment with the aid of a scanning electron microscope. Dozens of people visually check wafers and hand-mark their
defective regions. Consequently, potential misjudgment may be introduced due to human fatigue. In addition, the process can incur sig-
nificant personnel costs. Prior work has proposed automated visual wafer defect inspection that is based on supervised neural networks.
Since it requires learned patterns specific to each application, its disadvantage is the lack of product flexibility. Self-organizing neural
networks (SONNs) have been proven to have the capabilities of unsupervised auto-clustering. In this paper, an automatic wafer inspec-
tion system based on a self-organizing neural network is proposed. Based on real-world data, experimental results show, with good per-
formance, that the proposed method successfully identifies the defective regions on wafers.
Ó 2007 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2007.10.033
C.-Y. Chang et al. / Expert Systems with Applications 36 (2009) 950–958 951
Fig. 1. The conceptual diagram of the automatic semiconductor wafer inspection system.
of this approach are limited to boundary defect problems and correct circuit templates. Experimental results show
such as die cracks. In addition, the inspection time of the effectiveness and efficiency of the proposed AWDIS.
Zang’s approach was about 30 min per wafer. Su et al. Fig. 1 shows the conceptual diagram of the automatic
(2002) proposed a neural network approach for semicon- semiconductor wafer inspection system, which consists of
ductor wafer visual inspection. Three neural learning meth- several pieces of equipment including an SEM, a conveyor,
ods, including backpropagation (BP), radial basis function a dicing saw, and a die picker. The operation process is
(RBF), and learning vector quantization (LVQ) networks, described as follows: The wafer was cut into dies by the dic-
were proposed and tested. However, all these neural net- ing saw and then repositioned on a thin-film as a testing
works are based on supervised-learning. That is, to use wafer. The conveyor transfers testing wafers for inspection.
these neural networks, we require learned patterns, e.g. spe- A die image is acquired from the SEM and quantized into a
cific defect patterns, predetermined regions, and die bound- full-color image. The die image is then inspected by the
aries. Consequently, the product flexibility is low. proposed AWDIS. A wafer map file is created to record
Meanwhile, to achieve an efficient inspection, a minor bor- the inspection result. This process continues until all dies
der cut and complex orientation adjusting processes were are inspected. After the inspection process, the die picker
applied. These processes are quite complex and time selects the good dies and discards the defected dies from
consuming. the testing wafer, according to the wafer map. The good
The self-organizing neural network (SONN) is an unsu- dies are placed in a chip tray for the packaging process.
pervised clustering network with a competitive learning Then another wafer is fed in, and a new inspection cycle
capability and SONNs have been proven to have the capa- commences.
bility of auto-clustering (Ham & Kostanic, 2001; Haykin, The remainder of this paper is organized as follows. In
1999). Wang, Krishnan, Kugean, and Tjoa (2001) pro- Section 2, the overall algorithm of the automatic wafer
posed a self-organizing neural network method for the clas- inspection system is presented. Section 3 discusses the
sification of endoscopic images. experimental results. Finally, conclusions are drawn in Sec-
Therefore, to remove the limitations caused by super- tion 4.
vised neural networks, a self-organizing neural network-
based automatic wafer defect inspection system (AWDIS)
is proposed in this paper. Based on color variance and 2. Automatic wafer defect inspection system (AWDIS)
sharp irregularity, the AWDIS classifies the wafer images
into four classes. Finally, a heuristic defect detection algo- Fig. 2 shows the overall process of the proposed auto-
rithm is applied to determine which class contains defective matic wafer defect inspecting system (AWDIS). There are
regions. It is noted that the proposed AWDIS is zoomed- four major steps of the AWDIS. First, a median filter is
independent, case-independent and orientation-indepen- used to remove noise. Second, the blocking process divides
dent, i.e. the proposed AWDIS is a high-level inspection the wafer image into blocks to obtain the contextual and
system. The inspection does not require the zoom parame- spatial information of wafers. Thirdly, according to the
ters and patterns of the good dies. In addition, the pro- contextual and spatial information of the blocks, the wafer
posed AWDIS does not need to adjust image positions image is classified into four clusters. Finally, a proposed
952 C.-Y. Chang et al. / Expert Systems with Applications 36 (2009) 950–958
…
Image image
regions are irregular. Therefore, by comparing the normal
regions with the defective regions, we found that the defec-
Inspection Defect Self-Organizing tive regions reveal large color variances and irregular
Result Detecting Neural Network
shapes. Nevertheless, the color of the regular pad area is
similar to the defective regions. Therefore, it is difficult to
Fig. 2. The overall process of the proposed automatic wafer defect
separate the defective regions from the wafer image directly
inspection system, AWDIS.
by a histogram-based image processing scheme, such as
thresholding.
heuristic defect detecting algorithm is used to determine
which cluster represents defective regions. 2.2. Median filtering
similar to the defective regions. Thus, the segmentation the sharper the structural variations in the image are. En-
performance of the histogram-based dynamic thresholding tropy is a measure of the information content of C(a, b).
method (Cheriet, Said, & Suen, 1998) is poor. In order to Large spaces have little information content, whereas clut-
obtain the appropriate clustering results, we have to con- tered areas have large information content. It has a mini-
sider the color information of the image and the spatial mum value of 0 when all pixels have the same intensity.
information of the neighboring pixels at the same time Additionally, a small value means that the large values of
(Wang et al., 2001). Accordingly, the wafer images are C(a, b) lie near the principal diagonal, in homogeneity. En-
divided into n blocks of size m m to obtain the spatial ergy gives a measure of the homogeneity of an image. It is a
information. Then the spatial information of each block suitable measure for detection of disorders in textures. For
is applied to the input of the proposed SONN. homogeneous textures value of energy turns out to be small
compared to non-homogeneous ones. The color informa-
tion is represented by the average value of the RGB com-
2.4. Self-organizing neural network
ponents. The average value of the RGB components
which represent the color information of the ith block is
Fig. 4 shows the architecture of the proposed self-orga-
determined as
nizing neural network. It is a fully connected network. The
wafer image is divided into n blocks of size m m. Hence, 1 X m X m
w Rj wGj w Bj wCj w Ee Et H
j wj wj j … p
Fi
xiR xiG xiB xiC xiEe xiEt xiH i … n
Fig. 4. The architecture of the self-organizing neural network. The inputs are fully connected to each output neuron; however, only a few connections are
shown.
954 C.-Y. Chang et al. / Expert Systems with Applications 36 (2009) 950–958
ity), respectively. The synaptic weight vector of neuron j in crease gradually with increasing time k. This can be defined
the two dimensional array is given as as
T
Wj ¼ wR wG wBj ; wCj ; wEe wEt wH k
j ; j ; j ; j ; j lðkÞ ¼ l0 exp ð14Þ
s0
j ¼ 1; 2; . . . ; p ð11Þ
where s0 is a time constant; hqj(k) is the neighborhood func-
where p is the total number of output neurons. tion centered around the winning neuron q(Fi) at the dis-
The weight vector is also composed of the color compo- crete time index k, which is defined as:
nents and the contextual features. The best match of the ith !
input vector Fi with the synaptic weight vector Wj is deter- d 2qj
mined from hqj ðkÞ ¼ exp 2 ; ð15Þ
2r ðkÞ
i ¼ 1; 2 . . . ; n
qðFi Þ ¼ min kFi Wj k2 ð12Þ where the lateral distance dqj and r(k) are defined by
8i;j j ¼ 1; 2 . . . ; p
d 2qj ¼ krj rq k2 ; ð16Þ
where q(Fi) is the index into the output neuron array that
specifically identifies the winning neuron, and kk2 is the and
Euclidean norm.
k
After the winning neuron is identified, the synaptic rðkÞ ¼ r0 exp ; ð17Þ
weight vector associated with the winning neuron and the s1
neurons within a defined neighborhood of the winning neu- respectively. The rj and rq represent the coordinates of the
ron is given by neuron j and q, respectively. r0 is the initialized value and
Wj ðk þ 1Þ ¼ Wj ðkÞ þ lðkÞhqj ðkÞ½Fi ðkÞ Wj ðkÞ ð13Þ s1 is a time constant. As time k increases, the width r(k) de-
creases at an exponential rate. Accordingly, the size of the
where the learning rate parameter, l(k), should be time topological neighborhood shrinks with time. In practice,
varying. It should start at an initial value l0, and then de- both the neighborhood hqj(k) and learning rate parameter
Fig. 5. The inspected results of defective wafer images, Fig. 3a, by the proposed AWDIS, the histogram-based dynamic thresholding method, k-means
clustering method, and fuzzy c-means clustering method are presented in (a), (b), (c), and (d), respectively.
C.-Y. Chang et al. / Expert Systems with Applications 36 (2009) 950–958 955
l (k) are relatively large in the beginning of the training and defective class. The average intensity of each class can be
then decrease monotonically with time. represented as follows:
The algorithm of the proposed SONN is summarized as 1X R
follows: IðuÞ ¼ ðx þ xG B
i þ xi Þ; ð18Þ
s i2u i
(Step 1) Initialization: Randomly assign the initial synap- where s is the number of blocks of the u-th class, while xR i ,
tic weights. Initialize learning rate parameter. xG
i and x B
i represent the average value of red, green and blue
Define the topological neighborhood function. components of ith block in uth class, respectively. Accord-
(Step 2) Similarity matching: Use Eq. (12) to find the best- ingly, the suspicious defective class is defined as the class
matching neuron q(Fi) at time step k. whose I(u) is minimal. To determine whether the suspicious
(Step 3) Updating: Adjust the synaptic weight vectors of defective class represents defective regions, a threshold is
all neurons by using update formulas Eq. (13). used for testing:
(Step 4) According to Eqs. (14)–(17), update the learning
I defect ¼ min IðuÞ; IðuÞ 6 T ; ð19Þ
rate and neighborhood function. 8u
(Step 5) Repeat Step 2 through Step 4 until no noticeable where Idefect is the class number of the defective class and T
changes occur in the synaptic weights. is a predefined threshold. The heuristic defect detecting
algorithm is summarized as:
2.5. Defect detection
Input: the classified results by SONN.
According to the contextual and spatial information of Output: the class number represents defective regions.
the blocks, the SONN classifies the wafer image into four
classes. Then the proposed heuristic distinguishing algo- (Step 1) Calculate the number of blocks for each class.
rithm is used to determine which class contains defective (Step 2) Ignore the class whose number of blocks is zero.
region. As mentioned in Section 2.1, the intensity of defec- (Step 3) Use Eq. (18) to calculate the average intensity of
tive regions tends to be dark. Therefore, we select the class blocks for each class.
which has the lowest average intensity as the suspicious (Step 4) Use Eq. (19) to determine the defective class.
Fig. 6. The inspected results of defective wafer images, Fig. 3b, by the proposed AWDIS, the histogram-based dynamic thresholding method, k-means
clustering method, and fuzzy c-means clustering method are presented in (a), (b), (c), and (d), respectively.
956 C.-Y. Chang et al. / Expert Systems with Applications 36 (2009) 950–958
3. Experimental results Fig. 7a, and Fig. 8a. Fig. 5a shows the scattered contami-
nated regions correctly detected by the proposed AWDIS.
Two types of zoomed wafer images (50 and 100) Fig. 5b shows the inspection results of the dynamic thres-
acquired from SEM were used for the test in this section. holding method with the threshold equal to 125. Since
The image sizes used for testing are 640 480 pixels, each the color of the bounding pad area is similar to the contam-
pixel consists of 24 bits (full color). The experiments were inations, the histogram-based dynamic thresholding
conducted using Delphi programming on a Pentium 4 method could not detect the defective regions correctly.
2.8 GHz platform with 256 M of memory. Fig. 5c shows the results of the k-means clustering method
To illustrate that the proposed AWDIS has the capacity of 5 clusters, which also suffers from the problem caused
to detect defective regions, the proposed system was com- from similar colors. As can be seen, the k-means method
pared with a histogram-based dynamic thresholding failed to separate the defective regions from the bounding
method (Wang et al., 2001) which does not consider a pix- pad area. Fig. 5d shows the fuzzy c-means clustering
el’s spatial information, a k-means clustering method (Dha- method performing better than both the dynamic thres-
wan, 2003; Kulkarni, 2001) which is a popular partitioning holding and k-means methods, but there still many por-
method assuming the number of clusters is known, and a tions of the bounding pad area in the defective cluster.
fuzzy c-means method (Karayiannis et al., 1999) which is Fig. 6a shows that the scattered contaminations were
a powerful method for clustering similar samples in a class. also detected correctly. However, the histogram-based
In the histogram-based dynamic thresholding method, dynamic thresholding method, with the threshold value
the wafer image shown in Fig. 3 was transformed into gray of 120, fails to detect the defective regions as shown in
scale images with 256 gray levels. Figs. 5–8 show the Fig. 6b. The k-means method, with k = 5, performs better
inspection results of Fig. 3 by using the proposed AWDIS, than the histogram-based dynamic thresholding method,
the dynamic thresholding method, a k-means clustering but many of the bounding pad areas are classified in the
method, and a fuzzy c-means method. With the block size defective class as shown in Fig. 6c. Similarly, the fuzzy c-
set at 8 8, and the cluster number equal to four, the pro- means clustering method, with c = 5, performs better than
posed AWDIS with mean feature outlined the defective the k-means method, with a smaller portion of the bound-
regions with rectangles as shown in Fig. 5a, Fig. 6a, ing pad area in the defective class, as shown in Fig. 6d.
Fig. 7. The inspected results of defective wafer images, Fig. 3c, by the proposed AWDIS, the histogram-based dynamic thresholding method, k-means
clustering method, and fuzzy c-means clustering method are presented in (a), (b), (c), and (d), respectively.
C.-Y. Chang et al. / Expert Systems with Applications 36 (2009) 950–958 957
AWDIS precisely outlines the defective region with rect- block number of positive ROIs, and Nn denote the total
angles as shown in the center of Fig. 7a. Fig. 7b–d, respec- block number of negative ROIs. We also define Ntp to be
tively, show the inspection results of the dynamic the block number of detected ROIs which contains defec-
thresholding method, the k-means clustering method with tive regions and are actually detected. And, finally, let
two clusters, and fuzzy c-means method, also with two Nfp be the block number of ROIs which contain no defec-
clusters. They reveal that if the colors of the defective tive regions but are falsely detected. Similarly, the true neg-
regions and normal regions were sufficiently different, all ative number (Ntn) and false negative number (Nfn) can be
these methods, including the AWDIS, could detect the con- defined by Ntn = Nn Nfp and Nfn = Np Ntp, respec-
tamination correctly. tively. According to Penedo et al. (1998), we can further
Fig. 8a shows another result of the AWDIS. There were define the sensitivity and specificity by:
11 defective blocks detected, losing several tiny contami- N tp
nated regions. Nevertheless, this does not adversely affect Sensitivity ¼ ð20Þ
Np
the final inspection result. No matter how many defective
N tn
regions there are, we can always ascertain that a die is dam- Specificity ¼ ð21Þ
aged as long as we find one defect on it. A lot of dark pat- Nn
terns are misdetected in the inspection results obtained by In our evaluation, the positive ROIs were outlined by an
the dynamic thresholding method, with a threshold value experienced testing engineer manually, while the detected
of 135, as shown in Fig. 8b. Even the fuzzy c-means ROIs were detected by the proposed AWDIS. Thus, the re-
method, shown as Fig. 8d, performs better than the gions excluding the defective ROIs were negative ROIs.
k-means clustering method, shown as Fig. 8c. Both the k- Accordingly, the region that is truly positive is the intersec-
means clustering and the fuzzy c-means methods fail to tion of the positive ROIs and the detected ROIs. The false
separate the bounding pad area from the defective class. negative regions are the regions of regions intersection that
To evaluate the performance of the proposed AWDIS, excluded the defective region by the testing engineer and
we adopt the criteria suggested in Penedo, Carreria, Mos- the proposed AWDIS.
quera, and Cabello (1998) to define the sensitivity and spec- In order to obtain the most appropriate block size and
ificity for performance evaluation. Let Np be the total number of clusters for AWDIS, experiments for various
Fig. 8. The inspected results of defective wafer images, Fig. 3d, by the proposed AWDIS, the histogram-based dynamic thresholding method, k-means
clustering method, and fuzzy c-means clustering method are presented in (a), (b), (c), and (d), respectively.
958 C.-Y. Chang et al. / Expert Systems with Applications 36 (2009) 950–958