TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions

Friday, August 12, 2016

Managed Tables/Internal tables and External Tables

 Chitchatiq     8/12/2016 07:40:00 PM     Big Data, BigData&Hadoop, external tables, Hive, Internal tables, Managed tables, SQL Server     No comments   



Managed Tables/Internal tables:

     1.     When we create a table in hive, by default Hive will take care of the data
     2.     It means that Hive engine will move the data into its warehouse directory.

CREATE TABLE managed_table (name STRING);
LOAD DATA INPATH '/user/file1.txt' INTO table managed_table ;

So here file1.txt file will be moved to default warehouse directory which we specify in
hive.metastore.warehouse.dir configuration

    3.     Here if we mistakenly or intentional remove the table, then data associated to that table also be removed and we can’t get back that file.

To avoid this data loss we have to go for External table. 
External Table:
    1.     In external table, Hive will refer to the existing file location and if we remove the table then it simply reference will be removed but not data
    2.     To create external table we simply need to place External keyword while creating the data 
hive>   CREATE EXTERNAL TABLE sample (id INT, name STRING) ROW FORMAT
              DELIMITED FIELDS TERMINATED BY ','
             STORED AS TEXTFILE
              LOCATION '/home/user/file1.txt';
    3.     Here /home/user/file1.txt file will not be moved to warehouse directory instead it will be just referenced to the Sample table

   4.     Hive strongly recommends External table 
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Thursday, August 11, 2016

FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

 Chitchatiq     8/11/2016 04:16:00 PM     Error 10294, Problems&Solutions, SQL Server, transaction manager     No comments   





Error:









Resolution:
        1.     Check if Your table storage format is ORC
        2.     Check if your table is partitioned and bucketed, if not do partition and bucket
        3.     Check all the below configurations from ambari or Hive_site.xml
hive.support.concurrency – true
 hive.enforce.bucketing – true
 hive.exec.dynamic.partition.mode – nonstrict
 hive.txn.manager –org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
 hive.compactor.initiator.on – true
 hive.compactor.worker.threads – 1










4.     After that try to update the records
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Error: Failed with exception Unable to move source hdfs: to destination dfs://hadoop1.dev.com/apps/ FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

 Chitchatiq     8/11/2016 03:55:00 PM     Hadoop, org.apache.hadoop.hive.ql.exec.MoveTask, Problems&Solutions, return code 1, SQL Server     No comments   


 Failed with exception Unable to move source hdf

Resolution:
1.    Generally Move task will move the files from source location to destination location. If the user doesn’t have access to source path then it will throw above error
2.    First check that whether current executing user has execute/write permission on source folder. If not provide by using chmod command
3.    Additionally check below parameter from Ambari or Hive_site.xml file
          hive.metastore.client.setugi=true and hive.metastore.server.setugi=true. 
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Tuesday, August 9, 2016

BIG_DATA: Hadoop History and Overview

 Chitchatiq     8/09/2016 08:57:00 PM     Hadoop, SQL Server, What is Hadoop     No comments   


Apache™ Hadoop® is a distributed and highly scalable storage framework to process very large data sets across hundreds to thousands of computing nodes that operate in parallel.

Hadoop was created by Doug Cutting and he introduced the name as hadoop with Elephant toy symbol by seeing elephant toy with his son
1. Two main concepts associated with Hadoop are the Hadoop FileSystem (HDFS) and the
MapReduce processing engine.
2. The HDFS provides the storage whereas the MapReduce executes the program

BENEFITS
Some of the reasons organizations use Hadoop is its’ ability to store, manage and analyze vast amounts of structured and unstructured data quickly, reliably, flexibly and at low-cost
·        Scalability and Performance – distributed processing of data local to each node in a cluster enables Hadoop to store,manage, process and analyze data at petabyte scale.
·         Reliability- Large computing clusters are prone to failure of individual nodes in the cluster. Hadoop is fundamentally resilient – when a node fails processing is re-directed to the remaining nodes in the cluster and data is automatically re-replicated in preparation for future node failures.
·       Flexibility- Unlike traditional relational database management systems, you don’t have to create structured schemas before storing data. You can store data in any format, including semi-structured or unstructured formats, and then parse and apply schema to the data when read.
·       Low Cost- unlike proprietary software, Hadoop is open source and runs on low-cost commodity hardware.
We can talk about some of the differences between RDBMS and Hadoop 

RDBMS
Hadoop
1
Need to model your data
No need to model your data
2
Schema on write
Schema on read
3
Suits for OLTP
Suits for Batch processing jobs (OLAP)
4
Structured Data
All varieties of Data
5
Downtime is needed to do any kind of maintenance on storage or data files
No downtime is required to add storage
6
In standalone database systems, to add
processing power such as more CPU,
physical memory in non-virtualized environment, a downtime is needed for RDBMS such as DB2, Oracle, and SQL Server.
However, Hadoop systems are individual independent nodes that can be added in an as needed basis.
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Newer Posts Older Posts Home

Popular Posts

  • How to Fix ERROR: CANNOT PARALLELIZE AN UPDATE STATEMENT THAT UPDATES THE DISTRIBUTION COLUMNS Printable View
    Error: [DataDirect][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: Cannot parallelize an UPDATE statement that updates the distri...
  • What is BiG Data
                              1.     Big data is a term that describes the large volume of structured, semi-structured, un-structured da...
  • BIG_DATA: Hadoop History and Overview
    Apache™ Hadoop® is a distributed and highly scalable storage framework to process ve...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu