Apache Hive Data Types
1. Objective
In our Previous blog, we have discussed the Apache Hive introduction and Hive Architecture in detail. Now in this blog, we are going to cover Apache Hive Data Types with examples. We will discuss different types of data types in Hive: Hive Primitive Data type, Hive Complex Data types, Hive Literals, Hive Column Datatypes etc. We will also cover the data types that fall into these categories.
2. Apache Hive Data Types
Hive Data types are used for specifying the column/field type in Hive tables.
3. Types of Data Types in Hive
Mainly Hive Data Types are classified into 5 major categories, let’s discuss them one by one:
a. Primitive Data Types in Hive
Primitive Data Types also divide into 4 types which are as follows:
- Numeric Data Type
- Date/Time Data Type
- String Data Type
- Miscellaneous Data Type
Let us now discuss these Hive Primitive data types one by one-
i. Numeric Data Type
The Hive Numeric Data types also classified into two types-
- Integral Data Types
- Floating Data Types
* Integral Data Types
Integral Hive data types are as follows-
- TINYINT (1-byte (8 bit) signed integer, from -128 to 127)
- SMALLINT (2-byte (16 bit) signed integer, from -32, 768 to 32, 767)
- INT (4-byte (32-bit) signed integer, from –2,147,483,648to 2,147,483,647)
- BIGINT (8-byte (64-bit) signed integer, from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
* Floating Data Types
Floating Hive data types are as follows-
- FLOAT (4-byte (32-bit) single-precision floating-point number)
- DOUBLE (8-byte (64-bit) double-precision floating-point number)
- DECIMAL (Arbitrary-precision signed decimal number)
ii. Date/Time Data Type
The second category of Apache Hive primitive data type is Date/Time data types. The following Hive data types comes into this category-
- TIMESTAMP (Timestamp with nanosecond precision)
- DATE (date)
- INTERVAL
iii. String Data Type
String data types are the third category under Hive data types. Below are the data types that come into this-
- STRING (Unbounded variable-length character string)
- VARCHAR (Variable-length character string)
- CHAR (Fixed-length character string)
iv. Miscellaneous Data Type
Miscellaneous data types has two types of Hive data types-
- BOOLEAN (True/false value)
- BINARY (Byte array)
b. Complex Data Types in Hive
In this category of Hive data types following data types are come-
- Array
- MAP
- STRUCT
- UNION
let’s now discuss the Hive Complex data types with the example-
i. ARRAY
An ordered collection of fields. The fields must all be of the same type.
Syntax: ARRAY<data_type>
E.g. array (1, 2)
ii. MAP
An unordered collection of key-value pairs. Keys must be primitives; values may be any type. For a particular map, the keys must be the same type, and the values must be the same type.
Syntax: MAP<primitive_type, data_type>
E.g. map(‘a’, 1, ‘b’, 2).
iii. STRUCT
A collection of named fields. The fields may be of different types.
Syntax: STRUCT<col_name : data_type [COMMENT col_comment],…..>
E.g. struct(‘a’, 1 1.0),[b] named_struct(‘col1’, ‘a’, ‘col2’, 1, ‘col3’, 1.0)
iv. UNION
A value that may be one of a number of defined data. The value is tagged with an integer (zero-indexed) representing its data type in the union.
Syntax: UNIONTYPE<data_type, data_type, …>
E.g. create_union(1, ‘a’, 63)
c. Column Data Types in Hive
Column Hive data types are furthermore divide into 6 categories:
- Integral Type
- Strings
- Timestamp
- Dates
- Decimals
- Union Types
Let us discuss these Hive Column data types one by one-
i. Integral type
In this category of Hive data types following 4 data types are come-
- TINYINT
- SMALLINT
- INT/INTEGER
- BIGINT
By default, Integral literals are assumed to be INT. When the data range exceeds the range of INT, we need to use BIGINT. If the data range is smaller than the INT, we uses SMALLINT. And TINYINT is smaller than SMALLINT.
Table | Postfix | Example |
TINYINT | Y | 100Y |
SMALLINT | S | 100S |
BIGINT | L | 100L |
ii. Strings
The string data types in Hive, can be specified with either single quotes (‘) or double quotes (“). Apache Hive use C-style escaping within the strings.
Data Type | Length |
VARCHAR | 1 to 65355 |
CHAR | 255 |
* VARCHAR
Varchar- Hive data types are created with a length specifier (between 1 and 65355). It defines the maximum number of characters allowed in the character string.
* Char
Char – Hive data types are similar to VARCHAR. But they are fixed-length meaning that values shorter than the specified length value are padded with spaces but trailing spaces are not important during comparisons. 255 is the maximum fixed length.
iii. Timestamp
Hive supports traditional UNIX timestamp with operational nanosecond precision. Timestamps in text files use format ”YYYY-MM-DD HH:MM:SS.fffffffff” and “yyyy-mm-dd hh:mm:ss.ffffffffff”.
iv. Dates
DATE values are described in particular year/month/day (YYYY-MM-DD) format.E.g. DATE ‘2017-01-01’. These types don’t have a time of day component. This type supports range of values for 0000-01-01 to 9999-12-31.
v. Decimals
In Hive DECIMAL type is similar to Big Decimal format of java. This represents immutable arbitrary precision. The syntax and example are below:
Apache Hive 0.11 and 0.12 has the precision of the DECIMAL type fixed. And it’s limited to 38 digits.
Apache Hive 0.13 users can specify scale and precision when creating tables with the DECIMAL data type using DECIMAL (precision, scale) syntax. If the scale is not specified, then it defaults to 0 (no fractional digits). If no precision is specified, then it defaults to 10.
- CREATE TABLE foo (
- a DECIMAL, -- Defaults to decimal(10,0)b DECIMAL(9, 7)
- b DECIMAL(9, 7)
- )
vi. Union Types
Union Hive data types are the collection of heterogeneous data types. By using create union we can create an instance. The syntax and example are below:
- CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>);
- SELECT foo FROM union_test;
- {0:1}
- {1:2.0}
- {2:["three","four"]}
- {3:{"a":5,"b":"five"}}
- {2:["six","seven"]}
- {3:{"a":8,"b":"eight"}}
- {0:9}
- {1:10.0}
d. Literals Data Types in Hive
In Hive data types following literals are used:
- Floating Point Types
- Decimal Type
i. Floating Point Types
These are nothing but numbers with decimal points. This type of data is composed of the DOUBLE data types in Hive.
ii. Decimal Type
This type is nothing but floating point value with the higher range than the DOUBLE data type. The decimal type range is approximate -10-308 to 10308.
e. Null Value Data Types in Hive
In this Hive Data Types, missing values are represented by the special value NULL.
3. Conclusion
Hence, in this Apache Hive data type tutorial, we have discussed all the data types in Hive in detail. Hope this blog will help you to understand all the data types in hive easily. In the next section, we will discuss the Hive Operators in detail.
0 comments:
Post a Comment