Tuesday, 24 July 2018

HCatalog Loader and Storer – Usage & Example

HCatalog Loader and Storer – Usage & Example


1. HCatalog Loader and Storer

In our last HCatalog tutorial, we discussed HCatalog Commands. Today, we will see HCatalog Loader and Storer. Moreover, we will also discuss the examples of HCatalog loader and storer. Basically, to read and write data in HCatalog-managed tables, the HCatLoader and HCatStorer interfaces are used with Pig scripts.
So, let’s learn both HCatalog Loader and Storer in detail:
HCatalog Loader and Storer
HCatalog Loader and Storer – Usage & Example

2. HCatLoader

In order to read data from HCatalog-managed tables, we use HCatLoader with Pig scripts.

a. Usage

Via a Pig load statement, we can access HCatLoader.
  1. A = LOAD 'tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

b. Assumptions

However, make sure the table name is specified in single quotes: LOAD ‘tablename’. Also, we must specify our input as ‘dbname.tablename’, if we are using a non-default database. Moreover, we must create our database and table prior to running the Pig script, if we are using Pig 0.9.2 or earlier. In addition, we can issue these create commands in Pig using the SQL command, beginning with Pig 0.10.  
Furthermore, without specifying a database, the Hive metastore lets us create tables; however, the database name is ‘default’, if we create the tables this way.
Although, make sure we can indicate which partitions to scan by immediately following the load statement with a partition filter statement if the table is partitioned.

c. HCatLoader Data Types

Also, make sure HCatLoader can only read the Hive data types, such as:
Types in Hive 0.12.0 and Earlier
1. boolean
2. int
3. long
4. float
5. double
6. string
7. binary
some complex data types:
8. map – here key type must be the string
9. ARRAY<any type>
10. struct<any type fields>
Types in Hive 0.13.0 and Later
11. tinyint
12. smallint
13. date
14. timestamp
15. decimal
16. char(x)
17. varchar(x)
3. HCatStorer
Further, in order to write data to HCatalog-managed tables, we use HCatStorer with Pig scripts.
HCatalog Loader and Storer
HCatalog Loader and Storer – HCatStorer

a. Usage

via a Pig store statement, we access HCatStorer.
  1. A = LOAD ...
  2. B = FOREACH A ...
  3. ...
  4. ...
  5. my_processed_data = ...
  6. STORE my_processed_data INTO 'tablename'
  7. USING org.apache.hive.hcatalog.pig.HCatStorer();

b. Assumptions

As similar as HCatStorer, here also table name must be in single quotes, like LOAD ‘tablename’. Moreover, to run script make sure that both the database and table must be created prior. Also, we must specify our input as ‘dbname.tablename’, if we are using a non-default database. And, we need to create our database and table prior to running the Pig script, if we are using Pig 0.9.2 or earlier. Further, we can issue these create commands in Pig using the SQL command, beginning with Pig 0.10.
As the best feature, without even specifying a database, the Hive metastore lets us create tables. So, after that, the database name is ‘default’ if we create tables this way, and also there is no need to specify the database name in the store statement.

c. Store Examples

By using HCatStorer we can write to a non-partitioned table simply. Also, the table contents will be overwritten:
store z into ‘web_data’ using org.apache.hive.hcatalog.pig.HCatStorer();
In addition, specify the partition value in the store function, to add one new partition to a partitioned table. Make sure that the whole string should be single-quoted as well as separated with an equals sign:
store z into ‘web_data’ using org.apache.hive.hcatalog.pig.HCatStorer(‘datestamp=20110924’);
Ensure that the partition column is present in our data, then only call HCatStorer with no argument, to write into multiple partitions at once:
store z into ‘web_data’ using org.apache.hive.hcatalog.pig.HCatStorer();
 — datestamp must be a field in the relation z
d. HCatStorer Data Types
Types in Hive 0.12.0 and Earlier
1. Boolean
2. int
3. long
4. float
5. double
6. chararray
7. bytearray
Some complex data types:
8. map
9. bag
10. tuple
Types in Hive 0.13.0 and Later
11. short
12. datetime
13. bigdecimal
So, this was all about HCatalog Loader and Storer. Hope you like our explanation.

4. Conclusion: HCatalog Loader and Storer

Hence, we have seen the concept of HCatalog Loader and Storer. So, this article will definitely help to clear all doubts regarding HCatalog loader and storer. Still, if any doubt, ask in the comment tab.

0 comments:

Post a Comment