Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Mahout – Comprehensive Workshop

Training Contents

DELIVERY METHOD 


 Class room | ILT COURSE OVERVIEW
 Apache Mahout is an Apache project to produce
COURSE DURATION 
free implementations of distributed or otherwise
 67 hours
scalable machine learning algorithms focused
TARGET AUDIENCE  primarily in the areas of collaborative filtering,
 This course is designed for all those who are clustering and classification, often leveraging,
interested in learning Big Data technologies and but not limited to, the Hadoop platform.
write intelligent applications using Apache Mahout.

The Apache Mahout project aims to make


PREREQUISITES  building intelligent applications easier and
 Some of the prerequisites for learning Apache
faster. Mahout co-founder Grant Ingersoll
Mahout are familiarity with Hadoop framework and
introduces the basic concepts of machine
other ecosystem components. Also, having a
mathematical background with Beginner level Java learning and then demonstrates how to use

development knowledge will be an added Mahout to cluster documents, make


advantage. The basic Java and Hadoop knowledge recommendations, and organize content.
is recommended and not mandatory as these
concepts will also be covered during the course.
COURSE CONTENTS 
LAB SETUP   Intro to Machine Learning
 Intro to Apache Mahout
 Recommendation Engine

Windows user  Clustering

Windows 7 – 64 bit OS, Min 4 GB RAM  Classification

VMWare Player 5.0.0  Intro to recommendation systems

Linux VM– Ubuntu 12.04 LTS : {Unicom will be  Content Based

providing the VM(virtual machine)} o Collaborative filtering

Eclipse 3.6+ o User based

Putty – For opening Telnet sessions to the Linux VM o Nearest N Users

WinSCP – For transferring files between Windows and o Threshold

Linux VM o Item based


 Mahout Optimizations
 An overview of a recommendation platform
Linux/Mac Users (preferably a 64 bit machine):
o Similarity measures
Min 4 GB RAM
o Manhattan distance
Eclipse 3.6+
o Euclidean distance
JDK 1.6 or higher installed on your machine
o Cosine Similarity
SSH installed
o Pearson's Correlation Similarity
o Loglikihood Similarity

SYSINNOVA Infotech Pvt. Ltd. mail:info@sysinnova.com


Mahout – Comprehensive Workshop
Training Contents

o Tanimoto
 Evaluating Recommendation engines
o Online
o Offline
 Intro to Clustering
o Common Clustering Algorithms
o K-means
o Fuzzy K-means, Mean Shift etc
o Representing data
o Feature Selection
o Vectorization
o Representing Vectors
 Intro to Classification
o Examples
o Basics
o Common Algorithms
 Mahout on Hadoop
 Apache Mahout & Myrrix

SYSINNOVA Infotech Pvt. Ltd. mail:info@sysinnova.com

You might also like