Thursday, September 21, 2017

How to create a column in a table and that should throw error when we do SELECT * or SELECT col

Chitchatiq 9/21/2017 06:45:00 PM Problems&Solutions No comments

Problem:

Create a column in a table and that should give an error if we give SELECT * or SELECT of that column

Solution:

Friends, don’t think toooooo much like how to solve this problem. Here is the simple solution that I got.

Just use any computed column and that should not have any syntactical errors but it should have logical error (E..g 1/0, 1+’srinivas’)

Sample Script:

create table #test

(id1 int,id2 int,Col1 as id1+'srinu')

select * from #test

Here We will get

Conversion failed when converting the varchar value 'srinu' to data type int.

That’s all friends……!

Thanks for reading this article and please post a comment if you have any questions/thoughts

Greenplum System-Useful Queries

Chitchatiq 9/11/2017 08:35:00 PM Greenplum No comments

Useful Queries:

Query to verify the list of segments in a Greenplum system

select * from gp_segment_configuration;

Result:

Column Value	Description
Content = -1	Master Node
Content = 0 - n	Slave segments

Query to verify Skewing in a database

select * from gp_toolkit.gp_skew_coefficients;

Query to verify the number of records in each segment of a table

select gp_segment_id, count(*)

from zc.zctimesheet1

group by gp_segment_id

order by gp_segment_id;

Query to verify the size of a database/ relation (table)

select pg_size_pretty(pg_database_size('DistributionTest'));

select pg_size_pretty(pg_relation_size('zc.zctimesheet1'));

Greenplum Best Practises

Chitchatiq 9/11/2017 08:33:00 PM Best Practices, Greenplum No comments

Best Practices:

A distribution key should not have more than 2 columns, recommended is 1 column.
While modeling a database, we need to make sure all the tables that gets involved in longer query execution are distributed using the joining column.
Distribution keys should be domain key but not surrogate keys.
Explicitly define a column or random distribution for all tables. Do not use the default as it may cause skew on segments.
Do not distribute on columns that will be used in the WHERE clause of a query. You should partition the data on the column that is used in WHERE clause
The distribution key column data should contain unique values
Do not distribute on dates or timestamps. Distributing data on date or timestamp may cause poor system performance
There are 2 questions to ask while deciding the distribution key of a table:

Is the Distribution Key used in joins?

Makes sure, no redistribution is happening during query execution

Does the key 'Equi-distributes' the data across all the segments?

Makes sure all the segments get approximately the same amount of data, hence the same amount of work during query execution.
We need to make sure data skewing is minimum for every table.

SELECT * FROM gp_toolkit.gp_skew_coefficients

Greenplum Architecture

Chitchatiq 9/11/2017 08:30:00 PM Greenplum No comments

Parallel Processing:

Taking all the rows from table and spreading them among many parallel processing units. In greenplum these parallel processing units are called segments.

SMP(symmetric multi processing system):

A smp system has multiple processors but share a single OS,memory pool,and share access to disks.SMP is a single server which is referred as node.A single SMP system is called as a Segment Host ,multiple Segment Hosts are connected together to create a Massively Parallel Processing (MPP) system.the number of segments per each segment host is defined by number of CPU’s the segment Host contains.

Commodity Hardware:

Greenplum provides incredible speeds with commodity hardware,this allows to create one segment per CPU. If you have two dual-core CPU processors in a server you should build four segments.

Roles of segment Host and Segment:

The host is the boss and the segments are the workers.Users login to the host and never communicate directly with the segments. The host builds a plan for the segments to follow that is delivered in plan slices. Each slice instructs the segments what to do. When the segments have done their work they return it to the host.

The Host is the brains behind the entire operation.

Segment Host Roles:

1.When a user logs into Greenplum, the host will log them in and be responsible for their session.

2.The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds a plan for the segments to follow.

3.The host uses system statistics and statistics from the ANALYZE command to build the best plan.

4.The host doesn't hold user data, but instead holds the Global System Catalog.

5.The host always delivers the final answer set to the user.

Segment Roles:

1.Segments are responsible for storing and retrieving rows from their assigned disk (Virtual disk).

2.Segments lock the tables and rows.

3.Segments sort rows and do all aggregation.

4.Segments handle all the join processing.

5.Segments handle all space management and space accounting.

Master Host will broadcast the plan to each segment simultaneously, but if you use the distribution key in the WHERE clause of your SQL with an equality statement, then only a single segment will be contacted to return the row.

For example: If the system contains 3 parallel processing units and if Employee_table contains 9 rows then each parallel processing unit holds 3 rows

select * from employee_table

where empno=2000;

If above query is written without using distribution key in where clause then for fetching the result all of the 3 segments are visited,but in above query empno is used in where clause with equality so only the segment which contain that employee number is visited.

How to find and delete the top 100 most used files in Linux/Unix

Chitchatiq 9/06/2017 06:15:00 PM Problems&Solutions No comments

Steps to find and delete the top 100 most used files in Linux/Unix

Command to find most disk space used files:

find / -xdev -type f -size +100M -exec du -sh {} ';' | sort -rh | head -n100

Command to delete the content without removing the file

> <<filename>>

E.g.

> sample.txt

Here > is used to empty the file

Thanks for reading this article and please do subscribe to get more updates

How to Install MySQL on CentOS 7

Chitchatiq 9/02/2017 02:43:00 PM MySQL Installation, Problems&Solutions No comments

Steps to Install MySQL on Linux-CentOS 7

Step 1: Download MYSQL repositories

wget http://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm

Step 2: Prepare repository to install MySQL

sudo rpm -Uvh mysql57-community-release-el7-9.noarch.rpm

Step 3: Install MySQL on your server

sudo yum install mysql-server

During the installation, you will be asked to provide Y or N. Just type Y. If you see Complete! message, then it means that Install completed successfully.

Step 4: Start the MySQL service

sudo systemctl start mysqld

Step 5: Check the service status

sudo systemctl status mysqld

Thanks for reading the article. If you feel it solves your problem, please do like,comment and share this !!!!!!!!!

Thursday, September 21, 2017

How to create a column in a table and that should throw error when we do SELECT * or SELECT col

Monday, September 11, 2017

Greenplum System-Useful Queries

Greenplum Best Practises

Greenplum Architecture

Wednesday, September 6, 2017

How to find and delete the top 100 most used files in Linux/Unix

Saturday, September 2, 2017

How to Install MySQL on CentOS 7

Popular Posts

Facebook

Categories

Blog Archive

Popular Tags

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

Pages

Popular Posts