Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Hadoop – Avro Data Formats

Hive

1. Create a Table
CREATE TABLE departments(departmentID int, departmentName string) ROW
FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;

2. Load data from the dataset


LOAD DATA LOCAL INPATH ‘/tmp/departments.txt’;

3. Create Table using avro schema file (avsc)


CREATE TABLE departments_avro ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
'/tmp/departments/' TBLPROPERTIES('avro.schema.url'='/tmp/departments.avsc');

4. Fill data to avro table from Existing Table


INSERT OVERWRITE TABLE departments_avro SELECT * FROM departments;

5. Check for description of the table


DESCRIBE FORMATTED departments_avro

Intellipaat Software Services Pvt. Ltd. Page 1


Hadoop – Avro Data Formats

PIG

1. Copy a avro data file into HDFS /tmp/departments.avro

2. Register all jars


REGISTER /usr/lib/pig/piggybank.jar;
REGISTER /usr/lib/pig/lib/avro-1.7.4.jar;
REGISTER /usr/lib/pig/lib/jackson-core-asl-1.8.8.jar;
REGISTER /usr/lib/pig/lib/jackson-mapper-asl-1.8.8.jar;
REGISTER /usr/lib/pig/lib/json-simple-1.1.jar;
REGISTER /usr/lib/pig/lib/jython-standalone-2.5.2.jar;
REGISTER snappy-java-1.0.4.1.jar;

3. Load data using Avro Schema File


deps= LOAD '/tmp/departments.avro' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check','schema_fil
e','/tmp/departments.avsc');

Intellipaat Software Services Pvt. Ltd. Page 2

You might also like