Best 30 HCatalog Interview Questions and Answers ~ Tech Blog

Tuesday, 24 July 2018

Best 30 HCatalog Interview Questions and Answers

1. Top HCatalog Interview Questions and Answers

So, you have completed our HCatalog tutorial, now it’s time to test yourself. Here, we are providing some important HCatalog Interview Questions with their Answers. These Hcatalog Interview Questions will help you to increase your knowledge for HCatalog. These Interview Questions for HCatalog are specially designed for both freshers and experience of HCatalog. If you want to crack your HCatalog Interview, follow all the questions.

So, let’s start tricky HCatalog Interview Questions.

Best 30 HCatalog Interview Questions and Answers 2018

2. Mostly Asked HCatalog Interview Questions

So, here is a list of frequently asked HCatalog Interview Questions along with their answers:

Que 1. Explain the term HCatalog?

Ans. In simple words, a table storage management tool for Hadoop is HCatalog. As it main function it exposes the tabular data of Hive metastore to other Hadoop applications. Moreover, to easily write data onto a grid, it enables users with different data processing tools (Pig, MapReduce). Also, we don’t have to worry about where or in what format their data is stored.

So, basically, a key component of Hive which enables the users to store their data in any format and any structure is HCatalog.

Que 2. What are the general Prerequisites to learn HCatalog?

Ans. In order to learn HCatalog, an individual must have a basic knowledge of CoreJava along with Database concepts of SQL. In addition, one must know about Hadoop File system and any of Linux operating system flavors for learning it.

Que 3. Who is intended audience to learn HCatalog?

Ans. The professionals those are aspiring to make a career in Big Data Analytics by using Hadoop Framework, must go for this tutorial. Apart from them, all the ETL developers and professionals those are into analytics, in general, can learn through this tutorial for good effect.

Que 4. Why HCatalog?

Ans. Some specific reasons for using HCatalog are:

Enabling the right tool for right Job
Capture processing states to enable sharing
Integrate Hadoop with everything

Que 5. Explain how HCatalog enables right tool for right Job?

Ans. Especially, for data processing such as Hive, Pig, and MapReduce, Hadoop ecosystem have different tools. Since as a benefit these tools don’t need metadata, so, they can still benefit from it when it is present. Though, to share data more easily, sharing a metadata store also enables users across tools. In addition, it is very common that using a Pig or MapReduce, a workflow where data is loaded and normalized and then analyzed via Hive. So, users of each tool have immediate access to data created with another tool, if all these tools share one metastore. In all, there is no need of loading or transfer steps here.

Que 6. How HCatalog helps to capture processing states to enable sharing?

Ans. Straight forward, we can publish our analytics results via HCatalog. Thus different programmer can access our analytics platform with the help of “REST”. So, all the published schemas by us are also useful to other data scientists. They can use our discoveries as inputs into a subsequent discovery.

Que 7. HCatalog helps to Integrate Hadoop with everything. Explain?

Ans. Since, for the enterprise, Hadoop opens up a lot of opportunities; but it must work with and augment existing tools in order to fuel adoption. Similarly, In HCatalog, with a familiar API and SQL-like language REST services opens up the platform to the enterprise. In addition, to more deeply integrate with the Hadoop platform, enterprise data management systems uses HCatalog.

Que 8. Explain HCatalog Architecture in Brief?

Ans. Basically, in any format for which a SerDe (serializer-deserializer) can be written, HCatalog supports reading and writing files. Formats like RCFile, CSV, JSON, SequenceFile, and ORC file are supported by default. Although, we must offer the InputFormat, OutputFormat, and SerDe, to use a custom format.

On top of the Hive metastore and incorporates Hive’s DDL, HCatalog is built. Moreover, for Pig and MapReduce, HCatalog offers read as well as write interfaces and also for issuing data definition and metadata exploration commands it uses Hive’s command line interface.

Que 9. How to invoke Command Line Interface?

Ans. From the command $HIVE_HOME/HCatalog/bin/hcat where $HIVE_HOME is the home directory of Hive, it is possible to invoke HCatalog Command Line Interface (CLI). Further, to initialize the HCatalog server, we use hcat command.

command to initialize HCatalog command line:

cd $HCAT_HOME/bin
./hcat

Que 10. State some command line options.

Ans. Commands supported by HCatalog CLI are −

-g

Usage- hcat -g mygroup …

Basically, to create must have the group “mygroup” in the table which we need.

-p

Usage-hcat -p rwxr-xr-x …

Moreover, make sure that the table which we need to create must have several permissions like read, write, and execute.

-f

Usage- hcat -f myscript.HCatalog …

Further to make execution myscript.HCatalog is a script file have some DDL commands.

-e

Usage- hcat -e ‘create table mytable(a int);’ …

First of all, consider the following string as a DDL command only after then it is possible to execute it.

-D

Usage- hcat -Dkey = value …

HCatalog CLI -D, passes the key-value pair to HCatalog, especially as a Java system property.

hcat

In order to print a usage message, we sue hcat CLI.

HCatalog Interview Questions for Freshers – Q. 1,2,3,4,5

HCatalog Interview Questions for Experienced – Q. 6,7,9,10

Que 11. State some DDL Command with brief Description.

Ans. Some DDL CommandS are:

CREATE TABLE
ALTER TABLE
DROP TABLE
CREATE/ALTER/DROP VIEW
SHOW TABLES
SHOW PARTITIONS
Create/Drop Index
DESCRIBE

Que 12. Explain HCatalog Create Table CLI along with its syntax.

Ans. In HCatalog, we sue Create Table statement to create a table in Hive metastore.

Syntax-

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]

Que 13. Which command do we use to insert data in HCatalog?

Ans. In general, by using the Insert statement, we can insert data just after creating a table in SQL. However, we use the LOAD DATA statement in HCatalog for inserting data.

Make sure to store bulk records, LOAD DATA is a better option while inserting data into HCatalog. It is possible to load data in two different ways, one of them is from the local file system or another one is from the Hadoop file system.

Syntax-

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename
[PARTITION (partcol1=val1, partcol2=val2 ...)]

Que 14. Explain Alter Table Statement in HCatalog.

Ans. In order to alter a table, we can use the ALTER TABLE statement.
Syntax-
There are various syntaxes, we can use any of them according to what attributes we want to modify in a table:

ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])

Que 15. How can we drop a table in HCatalog?

Ans. Drop a table means removing table/column data and their metadata. So, while we use Drop table, it removes the table/column data and their metadata. That table can be of any type either normal or external table. Here Normal table means a table which is stored in metastore, whereas external table means a table which is stored in the local file system. However, irrespective of their types, HCatalog treats both the tables in the same way.

Syntax −

DROP TABLE [IF EXISTS] table_name;

Que 16. How to create and manage a view in HCatalog?

Ans. A statement CREATE VIEW, do create a view with the given name. However, if a table or view with the same name already exists, an error is thrown. Though, there is a flexibility that we can skip the error by using IF NOT EXISTS option.

Syntax –

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...;
C

Que 17. Explain Drop View Statement along with syntax.

Ans. Basically, a DROP VIEW Statement in HCatalog removes metadata for the specified view. Although, make sure no warning is given, when dropping a view referenced by other views.

Syntax-

DROP VIEW [IF EXISTS] view_name;

Que 18. Which command is used to list all the tables in a database or list all the columns in a table?

Ans. On defining Show Tables statement, it simply displays the names of all tables. However, it lists tables either from the current database, or with the IN clause, or in a specified database, by default.

Syntax of SHOW TABLES is−

SHOW TABLES [IN database_name] ['identifier_with_wildcards'];

A query which displays a list of tables −

./hcat –e “Show tables;”

Que 19. Which command is used to SHOW PARTITIONS lists in HCatalog?

Ans. Basically, to see the partitions that exist in a particular table, we can use SHOW PARTITIONS command.

Syntax −

SHOW PARTITIONS table_name;

Que 20. State syntax of the command that is used to drop a partition.

Ans. In order to drop a partition, the syntax is −

./hcat –e "ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec,.
  PARTITION partition_spec,...;"

HCatalog Interview Questions for Freshers – Q.11,13,14,15,16,17,18

HCatalog Interview Questions for Experienced – Q. 12,19,20

Que 21. Explain Creating an Index.

Ans. In simple words, a pointer on a particular column of a table is what we call an Index. So, we can say creating an index simply means creating a pointer on a particular column of a table.

Syntax:

CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name = property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)][
  [ ROW FORMAT ...] STORED AS ...
  | STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]

Que 22. State syntax of the command to drop an index.

Ans. Syntax to drop an index is−

DROP INDEX <index_name> ON <table_name>

Que 23. What is the role of data transfer API in HCatalog?

Ans. In HCatalog there is a data transfer API for parallel input as well as output without even using MapReduce. It uses a basic storage abstraction of tables and rows for the purpose of reading and writing data from/into it.

Que 24. What are the main classes of Data Transfer API?

Ans. There are several classes of Data Transfer API, such as −

HCatReader − This helps to Read data from a Hadoop cluster.

HCatWriter − However, HCatWriter Writes data into a Hadoop cluster.

DataTransferFactory − Well, DataTransferFactory class generates reader as well as writer instances.

Que 25. Explain HCatReader.

Ans. An abstract class internal to HCatalog is what we call HCatReader. Basically, from where the records are to be retrieved, HCatReader abstracts away the complexities of the underlying system.

Que 26. Explain HCatWriter.

Ans. Basically, this abstraction is internal to HCatalog. As the main function, this abstraction facilitates writing to HCatalog from external systems. Although, make sure to use DataTransferFactory, rather than instantiate it directly.

Que 27. Explain HCatInputFormat and HCatOutputFormat.

Ans. HCatInputFormat-

In order to read data from HCatalog-managed tables, we use HCatInputFormat along with MapReduce jobs. Also, it exposes a Hadoop 0.20 MapReduce API. That API helps for reading data as if it had been published to a table.

HCatInputFormat –

Similarly, we use HCatOutputFormat with MapReduce jobs, but to write data to HCatalog-managed tables. This also exposes a Hadoop 0.20 MapReduce API for the purpose of writing data to a table.

Que 28. Explain HCatLoader and HCatStorer APIs.

Ans. HCatLoader

Basically, to read data from HCatalog-managed tables we use HCatLoader along with Pig scripts.

Syntax:

A = LOAD 'tablename' USING org.apache.HCatalog.pig.HCatLoader();

HCatStorer

Whereas to write data to HCatalog-managed tables, we can use HCatStorer along with Pig scripts.

Syntax:

A = LOAD ...
B = FOREACH A ...
...
...
my_processed_data = ...
STORE my_processed_data INTO 'tablename' USING org.apache.HCatalog.pig.HCatStorer();

Que 29. Name all HCatalog Features.

Ans. Here is the list of best HCatalog Features:

Table and storage management layer
Table abstraction layer
Any Format
Shared schema and data type
Integration with other tools
Expose the information
Binary format
Authentication
Adding columns to partitions
Support Hive tables

Que 30. Name Applications and Use Cases of HCatalog.

Ans. Some key uses could be:

Enabling the right tool for right Job
Capture processing states to enable sharing
Integrate Hadoop with everything

Some applications of HCatalog:

SQL INTERFACE FOR HADOOP? HCATALOG AS ENABLER…
HADOOP DEVELOPER PRODUCTIVITY AND HCATALOG
GOOD FOR THE ECOSYSTEM IS GOOD FOR YOU

HCatalog Interview Questions for Freshers – Q. 21,23,24,25,26,29,30

HCatalog Interview Questions for Experienced – Q. 22,27,28

So, this was all in HCatalog Interview Questions and Answers.

3. Conclusion

Hence, we have seen mostly asked HCatalog Interview Questions along with the answers. However, if you have any query related to HCatalog Interview Questions and Answers, do let us know by leaving a comment. We will be happy to solve any query regarding HCatalog Interview Questions. Hope these HCatalog Interview Questions helps you.

Tech Blog

Tuesday, 24 July 2018