Showing posts with label Hive. Show all posts

Monday, October 9, 2017

SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb

Chitchatiq 10/09/2017 06:37:00 PM Hive, Problems&Solutions, sqoop No comments

Problem:

sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password hadoop --table weather --export-dir /user/hortonworks/weather/ --fields-terminated-by ',';

Some times when we run sqoop commands like above we will get following error

“ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.

java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection.”

Solution:

Simple solution would be adding driver parameter with value like below

--driver com.mysql.jdbc.Driver

Hive Order by Vs Sort by

Chitchatiq 7/13/2017 04:50:00 PM Hive No comments

Today we will discuss about how and where we can use Order by and Sort by clause in Hive

ORDER BY:

è Forces all the data to go into the same reducer node, by doing this, Order by ensure that entire dataset is totally ordered

è Uses a single reducer to guarantee total order in output

Drawbacks:

è Single reducer will take a long time to sort very large outputs

Sort By:

è Sort the rows based on the given columns per reducer. If there are more than one reducer, then the output per reducer will be sorted

Drawbacks:

If we have more than one reducer, then order of total output is not guaranteed to be sorted.

Let’s take one simple example. Currently Dept. table has following data

First will try to run the Order by query by setting reducer count as 2

If you see above screenshot all the data got sorted based on deptno column in Ascending order.

Now will try to run Sort by command.

We can clearly see that individual reducer level results are sorted but not at complete data set level.

However, sometimes we do not require total ordering. For example, suppose you have a table called user_action_table where each row has user_id, action, and time. Your goal is to order them by time per user_id and in this situation, we can use Sort By clause

Getting Error while accessing Hive from command line interface

Chitchatiq 3/10/2017 01:05:00 PM Hive, Problems&Solutions No comments

Some times we see below error while launching Hive from command line.

Error:

Logging initialized using configuration in file:/etc/hive/2.5.0.0-1245/0/hive-log4j.properties

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

Here issue will be you are trying to launch the Hive with non HDFS users and it might be Root account.

Below are the steps to solve the problem:

Create Root or the users which is using to launch the hive

sudo -u hdfs hdfs dfs -mkdir /user/<<root>>

Do the HDFS ownership change from HDFS to the required user

sudo -u hdfs hdfs dfs -chown -R root:hdfs /user/root

Managed Tables/Internal tables and External Tables

Chitchatiq 8/12/2016 07:40:00 PM Big Data, BigData&Hadoop, external tables, Hive, Internal tables, Managed tables, SQL Server No comments

Managed Tables/Internal tables:

1. When we create a table in hive, by default Hive will take care of the data

2. It means that Hive engine will move the data into its warehouse directory.

CREATE TABLE managed_table (name STRING);

LOAD DATA INPATH '/user/file1.txt' INTO table managed_table ;

So here file1.txt file will be moved to default warehouse directory which we specify in

hive.metastore.warehouse.dir configuration

3. Here if we mistakenly or intentional remove the table, then data associated to that table also be removed and we can’t get back that file.

To avoid this data loss we have to go for External table.

External Table:

1. In external table, Hive will refer to the existing file location and if we remove the table then it simply reference will be removed but not data

2. To create external table we simply need to place External keyword while creating the data

hive>   CREATE EXTERNAL TABLE sample (id INT, name STRING) ROW FORMAT

              DELIMITED FIELDS TERMINATED BY ','

             STORED AS TEXTFILE

              LOCATION '/home/user/file1.txt';

3. Here /home/user/file1.txt file will not be moved to warehouse directory instead it will be just referenced to the Sample table

4. Hive strongly recommends External table

Monday, October 9, 2017

SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb

Thursday, July 13, 2017

Hive Order by Vs Sort by

Friday, March 10, 2017

Getting Error while accessing Hive from command line interface

Logging initialized using configuration in file:/etc/hive/2.5.0.0-1245/0/hive-log4j.properties

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

Friday, August 12, 2016

Managed Tables/Internal tables and External Tables

Popular Posts

Facebook

Categories

Blog Archive

Popular Tags

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

Pages

Popular Posts