Professional Documents
Culture Documents
Week 2 Seminar
Week 2 Seminar
Week 2 Seminar
Hadoop
Hugo Lam
Today's Agenda
2
Source: http://www.mssqltips.com
2. Use Pig to Perform Word
Count
14
➢ Pig Latin
A dataflow language: allows you to define a data stream and a
series of transformations that are applied to the data as it flows
through your application.
2. Use Pig to Perform Word
Count
15
➢ Pig Latin
Sample codes to illustrate the data flow sequences:
A = LOAD 'data_file.txt';
...
B = GROUP ... ;
...
C = FILTER ...;
...
DUMP C;
..
STORE C INTO 'Results’;
➢ Go to /user/maria_dev
2. Use Pig to Perform Word
Count
19
➢ View Results
3. Use Hive to Perform Word
Count
27
Source: http://www.mssqltips.com
3. Use Hive to Perform Word
Count
28
HiveQL
➢ Hive’s query language.
➢ A SQL-like declarative language.
➢ Enables users familiar with SQL to query the data in
Hadoop without learning Java.
➢ Supports custom MapReduce scripts to be plugged
into queries.
➢ Hive Language Manual:
https://cwiki.apache.org/confluence/display/Hive/Lang
uageManual
3. Use Hive to Perform Word
Count
29
HiveQL
CREATE EXTERNAL TABLE myinput (line STRING) LOCATION
'/user/maria_dev/wordcount/input/';
CREATE TABLE wordcount AS
SELECT word, count(1) AS count
FROM (SELECT
EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(line,'[\\p{Punct},
\\p{Cntrl}]','')),' '))
AS word FROM myinput) words
GROUP BY word
ORDER BY word ASC, count DESC;
SELECT * FROM wordcount;
3. Use Hive to Perform Word
Count
30
➢ View Results
3. Use Hive to Perform Word
Count
34