Professional Documents
Culture Documents
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-Y
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-Y
- SAP Community
The first step is creating a table that will contain information on customers mobile phone usage habits with the following
structure:
"DAY_TIME_CALLS" DOUBLE, --> Percentage of Calls made during day time hours (9 a.m. - 6 p.m.)
"WEEK_DAY_CALLS" DOUBLE, --> Percentage of Calls made during week days (Monday thru Friday)
After executing this code we should see a new procedure in the _SYS_AFL schema called PAL_KMEANS_TELCO
I generated the K-Means procedure so now I need to write the code that will execute it:
Ok, I have my code ready, but I’m missing a very important part, I still don’t know how many Ks I need to specify as the
input parameter (well, I do know because I created the sample data, but let’s pretend I don’t know). There are multiple
techniques to find out how many groups will produce the best clustering, in this case I will use the Elbow Criterion. The
elbow criterion is a common rule of thumb that says that one should choose a number of clusters so that adding another
cluster does not add sufficient information. I will run the code above specifying different number of clusters and for each run
I will measure the total intra-cluster distance. When the distance does not decrease much from one run to the other I will
know the number of groups I need to use. I built the chart below with the results: