TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions

Monday, September 11, 2017

Greenplum Best Practises

 Chitchatiq     9/11/2017 08:33:00 PM     Best Practices, Greenplum     No comments   




Best Practices:

  1. A distribution key should not have more than 2 columns, recommended is 1 column.
  2. While modeling a database, we need to make sure all the tables that gets involved in longer query execution are distributed using the joining column.
  3. Distribution keys should be domain key but not surrogate keys.
  4. Explicitly define a column or random distribution for all tables. Do not use the default as it may cause skew on segments.
  5. Do not distribute on columns that will be used in the WHERE clause of a query. You should partition the data on the column that is used in WHERE clause
  6. The distribution key column data should contain unique values
  7. Do not distribute on dates or timestamps. Distributing data on date or timestamp may cause poor system performance
  8. There are 2 questions to ask while deciding the distribution key of a table:
    1. Is the Distribution Key used in joins?
      1. Makes sure, no redistribution is happening during query execution
    2. Does the key 'Equi-distributes'  the data across all the segments?
      1. Makes sure all the segments get approximately the same amount of data, hence the same amount of work during query execution.
      2. We need to make sure data skewing is minimum for every table.

SELECT * FROM gp_toolkit.gp_skew_coefficients
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Email ThisBlogThis!Share to XShare to Facebook
Newer Post Older Post Home

0 comments:

Post a Comment

Popular Posts

  • TEZ
    YARN - Giant leap in hadoop. Has provided facility to use “App Master” to control the process flow. So how can we leverage this?? ...
  • SQL Server: Row Count Of All The Tables In Particular Database
    Some times we want to find out the row counts for all the tables in a particular database in SQL Server. We can do it in different ways. I...
  • How to check if any view has metadata binding errors after DB refresh
    Scenario:  In my recent project after Source Database refresh we need to push to target server. We have views created in separate data...
  • File Formats in Hive
    As Hive uses HDFS to store the structured data in files … multiple file formats are supported by hive with various use cases. 1.   ...
  • SELECT TOP COUNT dynamically
    Scenario: Some times we want to make TOP Count as Dynamically then passing parameter to TOP command will not work. We have to add paramete...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • How to Install MySQL on CentOS 7
    Steps to Install MySQL on Linux-CentOS 7 Step 1: Download MYSQL repositories wget http : //dev.mysql.com/get/mysql57-communit...
  • Powershell Script for SSRS Deployment
    Scenarios: Deploy SSRS reports using automation #Set variables with configure values $Environment = "DEV" $rep...

Copyright © TECHTalksPro
Designed by Vasu