Managed
Tables/Internal tables:
1. When we
create a table in hive, by default Hive will take care of the data
2. It means
that Hive engine will move the data into its warehouse directory.
CREATE TABLE
managed_table (name STRING);
LOAD DATA INPATH
'/user/file1.txt' INTO table managed_table ;
So here file1.txt
file will be moved to default warehouse directory which we specify in
hive.metastore.warehouse.dir
configuration
3.
Here if we mistakenly or intentional remove
the table, then data associated to that table also be removed and we can’t get
back that file.
To avoid this
data loss we have to go for External table.
External Table:
1.
In external table, Hive will refer to
the existing file location and if we remove the table then it simply reference
will be removed but not data
2.
To create external table we simply
need to place External keyword while creating the data
hive> CREATE EXTERNAL TABLE sample (id INT, name STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/home/user/file1.txt';
3.
Here /home/user/file1.txt file will
not be moved to warehouse directory instead it will be just referenced to the Sample table
4.
Hive strongly recommends External
table