Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

University of Queensland, School of Information Technology and Electrical Engineering

Programming with Mahout


Outline: 1, Target and introduction 2, How to install Mahout under Linux 3, Using mahout to show KMeans display 3, Reference

Target and Introduction


For students

Write a simple example of KMeans using Mahout Review the result

Write a simple example of KMeans using Mahout

1, Set up the environment


For how to install mahout, please read the previous tutorial. In this tutorial, you should have Hadoop up-and-running and have eclipse installed.

1.1 , Install Maven Step1. Update source and package


sudo apt-get upgrade sudo apt-get update

Step2. Search the Maven package in Ubuntu with apt command


sudo apt-cache search maven

Step3. Install maven


sudo apt-get install maven2

Step4. Check installation


mvn version

Step5. Set Maven path to eclipse workspace


1

University of Queensland, School of Information Technology and Electrical Engineering

mvn -Declipse.workspace=<path-to-eclipse-workspace> eclipse:add-maven-repo

1.2. Install SVN


sudo apt-get install subversion

1.2.1 Check out from subversion: (by default the file will be stored in the home folder) svn co http://svn.apache.org/repos/asf/mahout/trunk 1.2.2 change the checked out project folder name to MAHOUT (or any name you like)
mv trunk MAHOUT

1.2.3 Go to the MAHOUT directory and clean install


mvn clean install -DskipTests=true (This will take a while)

1.3. Setting up in Eclipse


1.3.1 Install eclipse2maven Eclipse: Help->Install New Software->Add http://m2eclipse.sonatype.org/sites/m2e 1.3.2 Import the maven project (That is the MAHOUT folder you just checked out) File->Import->Maven->Existing Maven Projects->Next Select the root Directory as the MAHOUT directory, see the figure 1:

Figure 1

1.4, Mahout Display Mahout allows us to do iterative MapReduce job during the processing. It is especially useful when dealing with Data Mining problems, e.g. KMeans. Here we just use some built-in examples to show how does Mahout display clusters with random sample data.
2

University of Queensland, School of Information Technology and Electrical Engineering

In this tutorial, we will find the example under mahout-examples:

Now we have the mahout project in our Eclipse workspace, normally, we have the latest version of the mahout. It is quite convenient for us to study, develop mahout application. Here, we have the source code, you can download at our website. File Name ClustersFilter.java Display.java Graphic.java ReadData.java Description This java file implements the PathFilter Class of Mahout 0.6 Show the results Initialize all Disaplay.java Read CSV file the methods needed in

To write your own code, you should Add all mahout 0.6 libraries. You can find them under the folder

University of Queensland, School of Information Technology and Electrical Engineering

Reference: File Description ClustersCanopy.java: File Name DisplayCanopy.java DisplayDirichlet.java DisplayFuzzyKMeans.java DisplayKMeans.java DisplayMeansShift.java DisplaySpectralKMeans.java Description https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+ Clustering https://cwiki.apache.org/confluence/display/MAHOUT/Dirichlet +Process+Clustering https://cwiki.apache.org/confluence/display/MAHOUT/Fuzzy+KMeans https://cwiki.apache.org/confluence/display/MAHOUT/K-Means +Clustering https://cwiki.apache.org/confluence/display/MAHOUT/Mean+S hift+Clustering https://cwiki.apache.org/confluence/display/MAHOUT/Spectral +Clustering

You might also like