TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions

Monday, September 11, 2017

Greenplum Architecture

 Chitchatiq     9/11/2017 08:30:00 PM     Greenplum     No comments   




Parallel Processing:
 Taking all the rows from table and spreading them among many parallel processing units. In greenplum these parallel  processing units are called segments.

SMP(symmetric multi processing system):
  A smp system has multiple processors but share a single OS,memory pool,and share access to disks.SMP is a single server which is referred as node.A single SMP system is called as a Segment Host ,multiple Segment Hosts are connected together to create a Massively Parallel Processing (MPP) system.the number of segments per each segment host is defined by number of CPU’s the segment Host contains.

Commodity Hardware:
  Greenplum provides incredible speeds with commodity hardware,this allows to create one segment per CPU. If you have two dual-core CPU processors in a server you should build four segments.

Roles of segment Host and Segment:
  The host is the boss and the segments are the workers.Users login to the host and never communicate directly with the segments. The host builds a plan for the segments to follow that is delivered in plan slices. Each slice instructs the segments what to do. When the segments have done their work they return it to the host.
  The Host is the brains behind the entire operation.

Segment Host Roles:
  1.When a user logs into Greenplum, the host will log them in and be responsible for their session.
  2.The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds a plan for the segments to follow.
  3.The host uses system statistics and statistics from the ANALYZE command to build the best plan.
  4.The host doesn't hold user data, but instead holds the Global System Catalog.
  5.The host always delivers the final answer set to the user.
Segment Roles:
  1.Segments are responsible for storing and retrieving rows from their assigned disk (Virtual disk).
  2.Segments lock the tables and rows.
  3.Segments sort rows and do all aggregation.
  4.Segments handle all the join processing.
  5.Segments handle all space management and space accounting.
Master Host will broadcast the plan to each segment simultaneously, but if you use the distribution key in the WHERE clause of your SQL with an equality statement, then only a single segment will be contacted to return the row.
For example: If the system contains 3 parallel processing units and if Employee_table contains 9 rows then each parallel processing unit holds 3 rows
      
select * from employee_table
              where empno=2000;
 If above query is written without using distribution key in where clause then for fetching the result all of the 3 segments are visited,but in above query empno is used in where clause with equality so only the segment which contain that employee number is visited.
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Email ThisBlogThis!Share to XShare to Facebook
Newer Post Older Post Home

0 comments:

Post a Comment

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum Best Practises
    Best Practices: A distribution key should not have more than 2 columns, recommended is 1 column. While modeling a database,...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...
  • How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)
    Please follow below steps to fix HDP Sandbox -Mysql root password issue: Enter mysql -u root UPDATE mysql.user SET Password=PASSWOR...
  • HADOOP - HDFS OPERATIONS
    Starting HDFS To format the configured HDFS file system, execute the following command in namenode HDFS server, $ hadoop namenode ...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu