TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions

Thursday, September 21, 2017

How to create a column in a table and that should throw error when we do SELECT * or SELECT col

 Chitchatiq     9/21/2017 06:45:00 PM     Problems&Solutions     No comments   


Problem:
Create a column in a table and that should give an error if we give SELECT * or SELECT of that column

Solution:
Friends, don’t think toooooo much like how to solve this problem. Here is the simple solution that I got.

Just use any computed column and that should not have any syntactical errors but it should have logical error (E..g 1/0, 1+’srinivas’)

Sample Script:
create table #test
(id1 int,id2 int,Col1 as id1+'srinu')


select *  from #test

Here We will get
Conversion failed when converting the varchar value 'srinu' to data type int.


That’s all friends……!

Thanks for reading this article and please post a comment if you have any questions/thoughts












Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Monday, September 11, 2017

Greenplum System-Useful Queries

 Chitchatiq     9/11/2017 08:35:00 PM     Greenplum     No comments   



Useful Queries:

  1. Query to verify the list of segments in a Greenplum system

select * from gp_segment_configuration;

Result:
Column Value
Description
Content = -1
Master Node
Content = 0 - n
Slave segments

  1. Query to verify Skewing in a database

select * from gp_toolkit.gp_skew_coefficients;

  1. Query to verify the number of records in each segment of a table

select gp_segment_id, count(*)
from zc.zctimesheet1
group by gp_segment_id
order by gp_segment_id;

  1. Query to verify the size of a database/ relation (table)

select pg_size_pretty(pg_database_size('DistributionTest'));
select pg_size_pretty(pg_relation_size('zc.zctimesheet1'));


Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Greenplum Best Practises

 Chitchatiq     9/11/2017 08:33:00 PM     Best Practices, Greenplum     No comments   




Best Practices:

  1. A distribution key should not have more than 2 columns, recommended is 1 column.
  2. While modeling a database, we need to make sure all the tables that gets involved in longer query execution are distributed using the joining column.
  3. Distribution keys should be domain key but not surrogate keys.
  4. Explicitly define a column or random distribution for all tables. Do not use the default as it may cause skew on segments.
  5. Do not distribute on columns that will be used in the WHERE clause of a query. You should partition the data on the column that is used in WHERE clause
  6. The distribution key column data should contain unique values
  7. Do not distribute on dates or timestamps. Distributing data on date or timestamp may cause poor system performance
  8. There are 2 questions to ask while deciding the distribution key of a table:
    1. Is the Distribution Key used in joins?
      1. Makes sure, no redistribution is happening during query execution
    2. Does the key 'Equi-distributes'  the data across all the segments?
      1. Makes sure all the segments get approximately the same amount of data, hence the same amount of work during query execution.
      2. We need to make sure data skewing is minimum for every table.

SELECT * FROM gp_toolkit.gp_skew_coefficients
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Greenplum Architecture

 Chitchatiq     9/11/2017 08:30:00 PM     Greenplum     No comments   




Parallel Processing:
 Taking all the rows from table and spreading them among many parallel processing units. In greenplum these parallel  processing units are called segments.

SMP(symmetric multi processing system):
  A smp system has multiple processors but share a single OS,memory pool,and share access to disks.SMP is a single server which is referred as node.A single SMP system is called as a Segment Host ,multiple Segment Hosts are connected together to create a Massively Parallel Processing (MPP) system.the number of segments per each segment host is defined by number of CPU’s the segment Host contains.

Commodity Hardware:
  Greenplum provides incredible speeds with commodity hardware,this allows to create one segment per CPU. If you have two dual-core CPU processors in a server you should build four segments.

Roles of segment Host and Segment:
  The host is the boss and the segments are the workers.Users login to the host and never communicate directly with the segments. The host builds a plan for the segments to follow that is delivered in plan slices. Each slice instructs the segments what to do. When the segments have done their work they return it to the host.
  The Host is the brains behind the entire operation.

Segment Host Roles:
  1.When a user logs into Greenplum, the host will log them in and be responsible for their session.
  2.The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds a plan for the segments to follow.
  3.The host uses system statistics and statistics from the ANALYZE command to build the best plan.
  4.The host doesn't hold user data, but instead holds the Global System Catalog.
  5.The host always delivers the final answer set to the user.
Segment Roles:
  1.Segments are responsible for storing and retrieving rows from their assigned disk (Virtual disk).
  2.Segments lock the tables and rows.
  3.Segments sort rows and do all aggregation.
  4.Segments handle all the join processing.
  5.Segments handle all space management and space accounting.
Master Host will broadcast the plan to each segment simultaneously, but if you use the distribution key in the WHERE clause of your SQL with an equality statement, then only a single segment will be contacted to return the row.
For example: If the system contains 3 parallel processing units and if Employee_table contains 9 rows then each parallel processing unit holds 3 rows
      
select * from employee_table
              where empno=2000;
 If above query is written without using distribution key in where clause then for fetching the result all of the 3 segments are visited,but in above query empno is used in where clause with equality so only the segment which contain that employee number is visited.
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Wednesday, September 6, 2017

How to find and delete the top 100 most used files in Linux/Unix

 Chitchatiq     9/06/2017 06:15:00 PM     Problems&Solutions     No comments   


Steps to find and delete the top 100 most used files in Linux/Unix

Command to find most disk space used files:
find / -xdev -type f -size +100M -exec du -sh {} ';' | sort -rh | head -n100

Command to delete the content without removing the file
> <<filename>>
E.g.
> sample.txt
Here > is used to empty the file

Thanks for reading this article and please do subscribe to get more updates 


Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Saturday, September 2, 2017

How to Install MySQL on CentOS 7

 Chitchatiq     9/02/2017 02:43:00 PM     MySQL Installation, Problems&Solutions     No comments   


Steps to Install MySQL on Linux-CentOS 7


Step 1: Download MYSQL repositories

wget http://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm

Step 2: Prepare repository to install MySQL

sudo rpm -Uvh mysql57-community-release-el7-9.noarch.rpm

Step 3: Install MySQL on your server

sudo yum install mysql-server

During the installation, you will be asked to provide Y or N. Just type Y.  If you see Complete! message, then it means that Install completed successfully.


Step 4: Start the MySQL service

sudo systemctl start mysqld

Step 5: Check the service status

sudo systemctl status mysqld

Thanks for reading the article. If you feel it solves your problem, please do like,comment and share this !!!!!!!!!
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Newer Posts Older Posts Home

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum Best Practises
    Best Practices: A distribution key should not have more than 2 columns, recommended is 1 column. While modeling a database,...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...
  • How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)
    Please follow below steps to fix HDP Sandbox -Mysql root password issue: Enter mysql -u root UPDATE mysql.user SET Password=PASSWOR...
  • HADOOP - HDFS OPERATIONS
    Starting HDFS To format the configured HDFS file system, execute the following command in namenode HDFS server, $ hadoop namenode ...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu