Parallel Processing:
Taking
all the rows from table and spreading them among many parallel processing
units. In greenplum these parallel processing units are called segments.
SMP(symmetric multi processing system):
A smp system has multiple processors but share a single OS,memory
pool,and share access to disks.SMP is a single server which is referred as
node.A single SMP system is called as a Segment Host ,multiple Segment Hosts are connected together to
create a Massively Parallel Processing (MPP) system.the number of segments per
each segment host is defined by number of CPU’s the segment Host contains.
Commodity Hardware:
Greenplum provides incredible speeds with
commodity hardware,this allows to create one segment per CPU. If you have two
dual-core CPU processors in a server you should build four segments.
Roles of segment Host and Segment:
The host is the boss and the segments are the workers.Users login
to the host and never communicate directly with the segments. The host builds a
plan for the segments to follow that is delivered in plan slices. Each slice
instructs the segments what to do. When the segments have done their work they
return it to the host.
The Host is the brains behind the entire
operation.
Segment Host Roles:
1.When a user logs into Greenplum, the host will log them in and be responsible for their session.
2.The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds
a plan for the segments to follow.
3.The host uses system statistics and statistics from the ANALYZE command to
build the best plan.
4.The host doesn't hold user data, but instead holds the Global System Catalog.
5.The host always delivers the final answer set to the user.
Segment Roles:
1.Segments are responsible for storing and retrieving rows from
their assigned disk (Virtual disk).
2.Segments lock the tables and rows.
3.Segments sort rows and do all
aggregation.
4.Segments handle all the join processing.
5.Segments handle all space management and
space accounting.
Master Host will broadcast the plan to each segment
simultaneously, but if you use the distribution key in the WHERE clause of your
SQL with an equality statement, then only a single segment will be contacted to
return the row.
For example: If the system contains 3 parallel
processing units and if Employee_table contains 9 rows then each parallel
processing unit holds 3 rows
select * from employee_table
where
empno=2000;
If above query is written without using
distribution key in where clause then for fetching the result all of the 3
segments are visited,but in above query empno is used in where clause with
equality so only the segment which contain that employee number is visited.
0 comments:
Post a Comment