TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions
Showing posts with label Best Practices. Show all posts
Showing posts with label Best Practices. Show all posts

Monday, September 11, 2017

Greenplum Best Practises

 Chitchatiq     9/11/2017 08:33:00 PM     Best Practices, Greenplum     No comments   




Best Practices:

  1. A distribution key should not have more than 2 columns, recommended is 1 column.
  2. While modeling a database, we need to make sure all the tables that gets involved in longer query execution are distributed using the joining column.
  3. Distribution keys should be domain key but not surrogate keys.
  4. Explicitly define a column or random distribution for all tables. Do not use the default as it may cause skew on segments.
  5. Do not distribute on columns that will be used in the WHERE clause of a query. You should partition the data on the column that is used in WHERE clause
  6. The distribution key column data should contain unique values
  7. Do not distribute on dates or timestamps. Distributing data on date or timestamp may cause poor system performance
  8. There are 2 questions to ask while deciding the distribution key of a table:
    1. Is the Distribution Key used in joins?
      1. Makes sure, no redistribution is happening during query execution
    2. Does the key 'Equi-distributes'  the data across all the segments?
      1. Makes sure all the segments get approximately the same amount of data, hence the same amount of work during query execution.
      2. We need to make sure data skewing is minimum for every table.

SELECT * FROM gp_toolkit.gp_skew_coefficients
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Older Posts Home

Popular Posts

  • Powershell Script for SSRS Deployment
    Scenarios: Deploy SSRS reports using automation #Set variables with configure values $Environment = "DEV" $rep...
  • SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb
    Problem: sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password had...
  • Greenplum Architecture
    Parallel Processing:  Taking all the rows from table and spreading them among many parallel processing units. In greenplum these...
  • How to Install MySQL on CentOS 7
    Steps to Install MySQL on Linux-CentOS 7 Step 1: Download MYSQL repositories wget http : //dev.mysql.com/get/mysql57-communit...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu