What is Hive - Step by Step Part1
It uses SQL Like language called HiveQL (Open Source) and it
is for STRUCTURED DATA only, Generates MapReduce jobs that run on the Hadoop
cluster. Originally developed by Facebook.
Why Hive?
o
More productive than writing MapReduce directly
§
Five lines of HiveQL might be equivalent to
100's of lines of Java code.
o
Brings large-scale data analysis to a broader
audience
§
Leverage existing knowledge of SQL
o
Offers interoperability with other systems
§
Extensible through Java and external scripts
§
Many BI tools support Hive
Hive is associated
with metastore.
Metasore is the internal data store of the HIVE. I.e. all
the tabular metadata info will get stored in metastore. They are
i)
table name
ii)
schema definition
iii)
column info
iv)
partition key if any
NOTE: Default database of Hive is Derby DB
How to configure metastore in Hive?
Modify the file in "hive/conf/hive-site.xml"
Modify the file in "hive/conf/hive-site.xml"
i)
Connection URL details
ii)
Driver class name details
HiveQL Datatype’s:-
TinyInt, SmallInt, Int, BigInt, floatdouble,
String
Collection types:-
Map, array, struct
NOTE: Every table in Hive is created as a Directory
Differences between
SQL and HiveQL
How Hive loads and
Stores Data?
-
Hive’s queries operate on tables, just like
RDBMS
o
A table is simply an HDFS directory containing
one or more files
o
Default path:
/user'lhive/warehouse/<tab1e_name>
o
Hive supports many formats for data storage and
retrieval
-
How does Hive knows the structure and location
of tables?
o
These are specified when tables are created
o
This metadata is stored in Hive’s metastore
§
Contained in an RDBMS such as MySQL
-
Hive consults the metastore to determine data
format and location
o
The query itself operates on data stored on a
filesystem (typically HDFS)
Hive Tables:
1)
Managed Tables (Internal Tables)
2)
External Tables
Part 2: Click Here
@SsaiK
@SsaiK