TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions

Thursday, July 13, 2017

Hive Order by Vs Sort by

 Chitchatiq     7/13/2017 04:50:00 PM     Hive     No comments   

Today we will discuss about how and where we can use Order by and Sort by clause in Hive


ORDER BY:
è Forces all the data to go into the same reducer node, by doing this, Order by ensure that entire dataset is totally ordered
è Uses a single reducer to guarantee total order in output
Drawbacks:
è Single reducer will take a long time to sort very large outputs

Sort By:
è Sort the rows based on the given columns per reducer. If there are more than one reducer, then the output per reducer will be sorted
Drawbacks:
If we have more than one reducer, then order of total output is not guaranteed to be sorted.

Let’s take one simple example. Currently Dept. table has following data


First will try to run the Order by query by setting reducer count as 2


If you see above screenshot all the data got sorted based on deptno column in Ascending order.

Now will try to run Sort by command.

We can clearly see that individual reducer level results are sorted but not at complete data set level.


However, sometimes we do not require total ordering. For example, suppose you have a table called user_action_table where each row has user_id, action, and time. Your goal is to order them by time per user_id and in this situation, we can use Sort By clause



  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Email ThisBlogThis!Share to XShare to Facebook
Newer Post Older Post Home

0 comments:

Post a Comment

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum Best Practises
    Best Practices: A distribution key should not have more than 2 columns, recommended is 1 column. While modeling a database,...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...
  • How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)
    Please follow below steps to fix HDP Sandbox -Mysql root password issue: Enter mysql -u root UPDATE mysql.user SET Password=PASSWOR...
  • HADOOP - HDFS OPERATIONS
    Starting HDFS To format the configured HDFS file system, execute the following command in namenode HDFS server, $ hadoop namenode ...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu