TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions
Showing posts with label Hive. Show all posts
Showing posts with label Hive. Show all posts

Monday, October 9, 2017

SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb

 Chitchatiq     10/09/2017 06:37:00 PM     Hive, Problems&Solutions, sqoop     No comments   



Problem:

sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password hadoop --table weather --export-dir /user/hortonworks/weather/  --fields-terminated-by ',';


Some times when we run sqoop commands like above we will get following error

“ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.

java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection.”

Solution:
Simple solution would be adding driver parameter with value like below
--driver com.mysql.jdbc.Driver


sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password hadoop --table weather --export-dir /user/hortonworks/weather/  --fields-terminated-by ',' --driver com.mysql.jdbc.Driver
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Thursday, July 13, 2017

Hive Order by Vs Sort by

 Chitchatiq     7/13/2017 04:50:00 PM     Hive     No comments   

Today we will discuss about how and where we can use Order by and Sort by clause in Hive


ORDER BY:
è Forces all the data to go into the same reducer node, by doing this, Order by ensure that entire dataset is totally ordered
è Uses a single reducer to guarantee total order in output
Drawbacks:
è Single reducer will take a long time to sort very large outputs

Sort By:
è Sort the rows based on the given columns per reducer. If there are more than one reducer, then the output per reducer will be sorted
Drawbacks:
If we have more than one reducer, then order of total output is not guaranteed to be sorted.

Let’s take one simple example. Currently Dept. table has following data


First will try to run the Order by query by setting reducer count as 2


If you see above screenshot all the data got sorted based on deptno column in Ascending order.

Now will try to run Sort by command.

We can clearly see that individual reducer level results are sorted but not at complete data set level.


However, sometimes we do not require total ordering. For example, suppose you have a table called user_action_table where each row has user_id, action, and time. Your goal is to order them by time per user_id and in this situation, we can use Sort By clause



Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Friday, March 10, 2017

Getting Error while accessing Hive from command line interface

 Chitchatiq     3/10/2017 01:05:00 PM     Hive, Problems&Solutions     No comments   

Some times we  see below error while launching Hive from command line.


Error:

Logging initialized using configuration in file:/etc/hive/2.5.0.0-1245/0/hive-log4j.properties

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

 










Here issue will be you are trying to launch the Hive with non HDFS users and it might be Root account.

Below are the steps to solve the problem:

  1. Create Root or the users which is using to launch the hive
sudo -u hdfs hdfs dfs -mkdir /user/<<root>>

  1. Do the HDFS  ownership change from HDFS to the required user

sudo -u hdfs hdfs dfs -chown -R root:hdfs /user/root
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Friday, August 12, 2016

Managed Tables/Internal tables and External Tables

 Chitchatiq     8/12/2016 07:40:00 PM     Big Data, BigData&Hadoop, external tables, Hive, Internal tables, Managed tables, SQL Server     No comments   



Managed Tables/Internal tables:

     1.     When we create a table in hive, by default Hive will take care of the data
     2.     It means that Hive engine will move the data into its warehouse directory.

CREATE TABLE managed_table (name STRING);
LOAD DATA INPATH '/user/file1.txt' INTO table managed_table ;

So here file1.txt file will be moved to default warehouse directory which we specify in
hive.metastore.warehouse.dir configuration

    3.     Here if we mistakenly or intentional remove the table, then data associated to that table also be removed and we can’t get back that file.

To avoid this data loss we have to go for External table. 
External Table:
    1.     In external table, Hive will refer to the existing file location and if we remove the table then it simply reference will be removed but not data
    2.     To create external table we simply need to place External keyword while creating the data 
hive>   CREATE EXTERNAL TABLE sample (id INT, name STRING) ROW FORMAT
              DELIMITED FIELDS TERMINATED BY ','
             STORED AS TEXTFILE
              LOCATION '/home/user/file1.txt';
    3.     Here /home/user/file1.txt file will not be moved to warehouse directory instead it will be just referenced to the Sample table

   4.     Hive strongly recommends External table 
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Older Posts Home

Popular Posts

  • Powershell Script for SSRS Deployment
    Scenarios: Deploy SSRS reports using automation #Set variables with configure values $Environment = "DEV" $rep...
  • SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb
    Problem: sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password had...
  • Greenplum Architecture
    Parallel Processing:  Taking all the rows from table and spreading them among many parallel processing units. In greenplum these...
  • How to Install MySQL on CentOS 7
    Steps to Install MySQL on Linux-CentOS 7 Step 1: Download MYSQL repositories wget http : //dev.mysql.com/get/mysql57-communit...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu