Download as pdf
Download as pdf
You are on page 1of 14
sri92015 ‘Seforgerizing Map implementation - CodeProject 1,467,502 members (64185 onfne) Member 12691953 104 Sign out ®) 00) CODE PROJECT Search for articles, questions, ips & Q&A forums lounge Self-organizing Map Implementation Peter Leow, 25 Jul 2014 CPOL Rate: KK HK KH 5.00 6 votes) Get real with an implementation of a mini SOM project. Download source - 122.6 KB Click on the following image to view a demo video on YouTube SSGGGSSSS5 SSGSS5GSS55 SSoSogngGSE55 ASEGDnness SSSGcmn2as SSGUUsnaTE SSSGGSNcES SEERSSS5E5 ntroduction In my previous article Self-organizing Map Demystified, you have learned the concept, architecture, and algorithm of self-organizing map (SOM). From here on, you will embark on a journey to designing and implementing a mini SOM for clustering of handwitten digits. For training the SOM, I have obtained the training dataset from Machine Learning Repository of the Center for Machine Learning and Intelligent Systems, MATLAB will be used for programming the SOM. Some pf the design considerations are generic and can be applied in other types of machine learning projects. Let's get started. ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion aa sri92015 ‘Seforgerizing Map implementation - CodeProject Preparing the Ingredients The original training dataset file is called “optdigits-orig.tra.2". It consists of a total of 1934 handwritten digits from 0 to 9 collected from a total of 30 people. Each digit sample has been normalized to 32x32 binary image that has 0 or 1 for each pixel. The distribution of the training dataset is considered well balanced as shown in Table 1. It is important to have balanced training dataset where the number of samples in every class is more or less equal so as to prevent biases in training, e.g. classes with overwhelmingly large number tend to get chosen more often than those in minority thus affecting the accuracy and reliability of the training result. Table 1: Distribution of Classes Class Number of Samples 0 189 1 198 2 195 3 199 4 186 5 187 6 195 7 201 8 180 9 204 Figure 1 shows some of the binary images contained in the training dataset: ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion sri92015 ‘Selforgerizing Map implementation - CodeProject ee SRE se NSE 5 se eee SRE se ce Noe CSREES wae ow SS Figure 1: Some of the Binary Images in the Training Dataset In the original data file, each block of 32x32 binary image is followed by a class label that indicates the digit that this sample belongs to. For example, the "8" bitmap block (in Figure 1) is followed by its class label of 8 and so on. To facilitate processing by MATLAB, I have further pre-processed the data as such 1. Separate the class labels from their bitmaps in the original data file into two different files. 2. Make the class labels, a total of 1934 digits, into a 141934 vector called "train_labeland save it as a MATLAB data file called "raining tabel.mat’. 3. Make the bitmap data, from its original form of 61888(rows)x32(columns) bits after removing their class labels, into a 1024x1934 matrix called "train data’, and save it as “training data.mat’. In the train_data matrix, each column represents a training sample and each sample has 1024 bits, ie. each training sample has been transformed from its original 32x32 bitmap into a 1024x1 vector. These two files - "training labelmat” and “training data.mat” are available for download. Unzip and placed them in a folder, say "som_experiment". If you are curious to see how these digits look like, fire up MATLAB, set the Current Folder to “som_experiment”, enter the following code inside the Command Window, and run it. Hide Copy Code % View a Digit from the train_data clear cle load ‘training data’ ; img = reshape(train_data(:,10), 32, 32)'s imshow(double(ime)) This code will: * load the “training labelmat’ file that contains the 1024x1934 train_data matrix into the memory; * train_data(:, n) will extract the nt” column vector (1024x1) vector from train_data matrix, where n coincides with the position of the digits in the dataset. In this code, the n is 10, so it will extract the tenth digit, you can change it to any number up to the total number of digits in the dataset, ie. 1934; ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion ana sroz01s Satt-rgeizing Map implemertaton- CodeProject * reshape(train_data(:,10), 32, 32)" will first convert the column vector into a row vector and then reshape the row vector into a 32x32 matrix, effectively reverting it back to its original shape like those shown in Figure 1; * imshow(double(img)) will display the digit as binary image where pixels with the value zero are shown as black and 1 as white, such as Biot eG Fi Ec Vii Ins To Des Win He » Oe asls|2z- 7 i Figure 2: A Binary Image on Screen Setting the Parameters Let's set the parameters for subsequent development. * The size of the SOM map is 10x10. © The total iteration is set at 1000. In this experiment, we will only attempt the first sub-phase of the adaptation phase. * The neighborhood function: Is en(n) = exp — Fae where 5 ,c(x)is the Euclidean distance from neuron to the winner neuron ¢(X). @(n)is the effective width of the topological neighborhood at n'” iteration, v » n=0,1,2.. , n= log a0 a(n) = 09 exp(— where ois the initial effective width which is set to be 5, ie. the radius of the 10x10 map. Tis the time constant * The weight updating equation: ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion ana sri92015 ‘Seforgerizing Map implementation - CodeProject w(n + 1) = wj(n) +7(r)hj ¢2)(m)(a — w,(n)) where 7)(1)is the time-varying leaning rate at n** iteration and is computed as such n(n) =moexp(——) , where Nois the initial learning rate which is set to 0.1. Tyis a time constant which is set to \V We have designed the parameters for the mini SOM. Its time to make things happend. Ready to Cook (Code) The MATLAB script for implementing the min SOM is saved in this file called “training_som.m" and is available for download. I have created this script as “proof of concept" for the sole purpose of reinforcing the learning of SOM algorithm. You are free to implement it in any programming languages. After all, programming languages are just media of implementation of problem solving techniques using computer. Unzip and placed it in the "som_experiment" folder. Open the "training_som.m" script in MATLAB's Editor, and you will see the recipe as shown below. The code has been painstakingly commented and therefore sufficiently self-explanatory. Nevertheless, I will still round up those parts of the code that correspond to the various phases of SOM algorithm so that you can understand and related them better. Hide Shrink & Copy Self-organizing Map Clustering of Handwritten Digits ‘training_som.m Peter Leow 10 July 2014 % clean up the previous act close all; clear; % delete all memory cic; % clear windows screen clf; —% clear figure screen she; % put figure screen on top of all screens COIR LARISA COO ISLC OOO LILO IIIT, % Ground breaking! OOOO ICIS ICO CLIO IIITE, % Load training data.mat that consists of train_data % of 1024x1934 matrix % Consists of a total of 1934 input samples and % each sample posseses 1024 attributes (dimenstions) load training data; % datarow % datacol number of attributes (dimensions) of each sample, i.e. 1024 total number of training samples, i.e. 1934 ipa codepoject com/Articless7SG537/Set-o garizing-Map-implemertation sna sri92015 ‘Seforgerizing Map implementation - CodeProject [dataRow, dataol] = size(train_data); [OOOO OOOO OOO SANTOS TUCO ALTO OSSSTA, % SOM Architecture OOOO RAC OOS AXON OOOO SALOON LOIS IITA, % Determine the number of rows and colurns of som map somRow = 10; somCol = 10; % Initialize 10x10x1024 som matrix % The is SOM Map of 10x10 neurons % Each neuron carries a weight vector of 1024 elements som = zeros(sonRow, sonCol, dataRow); OOO OOIOOI ROCCO OOOO SITTIN COO ALTO OOISATL, % Paraneters Settings CROISSANCE LLL SSS, % Max number of iteration N = 1000; % Initial effective width signalnitial = 5; % Time constant for signa t1 = N/ log(signatnitial); % Initialize 1x10 matrix to store Euclidean distances % of each neurons on the map euclidean = zeros(somRow, sonCol); % Initialize 1x10 matrix to store neighbourhood functions % of each neurons on the map neighbourhoodF = zeros(somRow, somCol); % initial learning rate etainitial = 0.1; % time constant for eta t2=N5 ORRIN LEO ILLICIT ILO SINT, % Initialization OOOO COCCI ALTO OO SITTIN OS IITA, % Generate random weight vectors [dataRow x 1] % and assign it to the third dimension of som for r=1:somRow for ‘omCol som(r, cy end rand(dataRow, 1); end % Initialize iteration count to one nea OOOO IRIE IOI ITO IOI L IIIS ITE, % Start of Iterative Training SOCORRO ISCO TOO OCIA STOO ITAA CO IIIS SSSL, % Start of one iterative loop While n <= N signa = sigmaInitial * exp(-n/t1); variance = signa*2; eta = etalnitial * exp(-n/t2); % Prevent eta from falling below 0.01 if (eta < 0.01) ipa. codepoject com/Aricless7SG537/Set-o gaizing-Map-implemertsion en

You might also like