The document describes using Hive SQL to create a table called word_counts that counts the occurrences of words from a docs table. It performs an explode operation to split each line into words, then groups and counts the words to produce a table with columns for word and count. The query launches two MapReduce jobs to perform this operation and outputs statistics on the jobs such as map/reduce tasks, CPU time, HDFS read/write bytes.
Original Description:
Sample HIVE SQL (HQL) output to understand how does the out of Hive look like
The document describes using Hive SQL to create a table called word_counts that counts the occurrences of words from a docs table. It performs an explode operation to split each line into words, then groups and counts the words to produce a table with columns for word and count. The query launches two MapReduce jobs to perform this operation and outputs statistics on the jobs such as map/reduce tasks, CPU time, HDFS read/write bytes.
The document describes using Hive SQL to create a table called word_counts that counts the occurrences of words from a docs table. It performs an explode operation to split each line into words, then groups and counts the words to produce a table with columns for word and count. The query launches two MapReduce jobs to perform this operation and outputs statistics on the jobs such as map/reduce tasks, CPU time, HDFS read/write bytes.
> (SELECT explode(split(line, '\s')) AS word FROM docs) w > GROUP BY word > ORDER BY word; Query ID = hadmin_20150831192538_f452244a-5068-418b-9c33-9abe58ddf1df Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1441021809222_0001, Tracking URL = http://namenode:8088/proxy /application_1441021809222_0001/ Kill Command = /home/hadmin/hadoop-2.7.1/bin/hadoop job -kill job_1441021809222 _0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2015-08-31 19:25:56,226 Stage-1 map = 0%, reduce = 0% 2015-08-31 19:26:12,115 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 4.57 sec 2015-08-31 19:26:13,184 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.83 se c 2015-08-31 19:26:28,061 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.91 sec MapReduce Total cumulative CPU time: 9 seconds 910 msec Ended Job = job_1441021809222_0001 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1441021809222_0002, Tracking URL = http://namenode:8088/proxy /application_1441021809222_0002/ Kill Command = /home/hadmin/hadoop-2.7.1/bin/hadoop job -kill job_1441021809222 _0002 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2015-08-31 19:26:44,805 Stage-2 map = 0%, reduce = 0% 2015-08-31 19:26:56,350 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.0 sec 2015-08-31 19:27:08,542 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 7.8 s ec MapReduce Total cumulative CPU time: 7 seconds 800 msec Ended Job = job_1441021809222_0002 Moving data to: hdfs://namenode:9000/user/hive/warehouse/word_counts Table default.word_counts stats: [numFiles=1, numRows=129528, totalSize=3078781, rawDataSize=2949253] MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 9.91 sec HDFS Read: 3298482 HDFS Write: 5201068 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 7.8 sec HDFS Read: 5205329 HDFS Write: 3078865 SUCCESS Total MapReduce CPU Time Spent: 17 seconds 710 msec OK Time taken: 92.226 seconds hive>