Monday, December 4, 2017

How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)

Chitchatiq 12/04/2017 10:01:00 PM No comments

Please follow below steps to fix HDP Sandbox -Mysql root password issue:

Enter

mysql -u root

UPDATE mysql.user SET Password=PASSWORD('hadoop') WHERE User='root';

FLUSH PRIVILEGES;

How to Prepare and Clear Hortonworks Data Platform Certified Developer Exam (HDPCD -English)

Chitchatiq 11/26/2017 09:06:00 PM No comments

HORTONWORKS UNIVERSITY is offering HDPCD certificate and its exclusively hands-on, performance-based exam that require we to complete a set of tasks on actual hadoop cluster instead of just guessing multiple choice questions.

Basically HDPCD certificate categorised into 3 categories

1. Data ingestion

2. Data transformation

3. Data analysis

As mentioned in Hortonworks webiste, here I tried to keep all the tasks

Data Ingestion

Import data from a table in a relational database into HDFS

Import the results of a query from a relational database into HDFS

Import a table from a relational database into a new or existing Hive table

Insert or update data from HDFS into a table in a relational database

Given a Flume configuration file, start a Flume agent

Given a configured sink and source, configure a Flume memory channel with a specified capacity

Data Transformation

Write and execute a Pig script

Load data into a Pig relation without a schema

Load data into a Pig relation with a schema

Load data from a Hive table into a Pig relation

Use Pig to transform data into a specified format

Transform data to match a given Hive schema

Group the data of one or more Pig relations

Use Pig to remove records with null values from a relation

Store the data from a Pig relation into a folder in HDFS

Store the data from a Pig relation into a Hive table

Sort the output of a Pig relation

Remove the duplicate tuples of a Pig relation

Specify the number of reduce tasks for a Pig MapReduce job

Join two datasets using Pig

Perform a replicated join using Pig

Run a Pig job using Tez

Within a Pig script, register a JAR file of User Defined Functions

Within a Pig script, define an alias for a User Defined Function

Within a Pig script, invoke a User Defined Function

Data Analysis

Write and execute a Hive query

Define a Hive-managed table

Define a Hive external table

Define a partitioned Hive table

Define a bucketed Hive table

Define a Hive table from a select query

Define a Hive table that uses the ORCFile format

Create a new ORCFile table from the data in an existing non-ORCFile Hive table

Specify the storage format of a Hive table

Specify the delimiter of a Hive table

Load data into a Hive table from a local directory

Load data into a Hive table from an HDFS directory

Load data into a Hive table as the result of a query

Load a compressed data file into a Hive table

Update a row in a Hive table

Delete a row from a Hive table

Insert a new row into a Hive table

Join two Hive tables

Run a Hive query using Tez

Run a Hive query using vectorization

Output the execution plan for a Hive query

Use a subquery within a Hive query

Output data from a Hive query that is totally ordered across multiple reducers

Set a Hadoop or Hive configuration property from within a Hive query

Reference:

https://hortonworks.com/services/training/certification/exam-objectives/#hdpcd

How to use Password file with Sqoop

Chitchatiq 10/30/2017 11:21:00 AM BigData&Hadoop, Problems&Solutions, sqoop No comments

How to use local password-file parameter with sqoop

Problem: sometimes we have to read password from file for sqoop command.

Solution: Passing password parameter file to sqoop command is not a big deal. Just follow below steps:

1. Create a password with Echo -n “<<password>> > <<passwordfilename>>

E.g.

Echo -n “Srinivas” > passwordfile – Here we can give file name with location

Sqoop Command:

sqoop list-tables --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password-file file:///usr/Srinivas/ passwordfile

SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb

Chitchatiq 10/09/2017 06:37:00 PM Hive, Problems&Solutions, sqoop No comments

Problem:

sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password hadoop --table weather --export-dir /user/hortonworks/weather/ --fields-terminated-by ',';

Some times when we run sqoop commands like above we will get following error

“ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.

java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection.”

Solution:

Simple solution would be adding driver parameter with value like below

--driver com.mysql.jdbc.Driver

How to create a column in a table and that should throw error when we do SELECT * or SELECT col

Chitchatiq 9/21/2017 06:45:00 PM Problems&Solutions No comments

Problem:

Create a column in a table and that should give an error if we give SELECT * or SELECT of that column

Solution:

Friends, don’t think toooooo much like how to solve this problem. Here is the simple solution that I got.

Just use any computed column and that should not have any syntactical errors but it should have logical error (E..g 1/0, 1+’srinivas’)

Sample Script:

create table #test

(id1 int,id2 int,Col1 as id1+'srinu')

select * from #test

Here We will get

Conversion failed when converting the varchar value 'srinu' to data type int.

That’s all friends……!

Thanks for reading this article and please post a comment if you have any questions/thoughts

Greenplum System-Useful Queries

Chitchatiq 9/11/2017 08:35:00 PM Greenplum No comments

Useful Queries:

Query to verify the list of segments in a Greenplum system

select * from gp_segment_configuration;

Result:

Column Value	Description
Content = -1	Master Node
Content = 0 - n	Slave segments

Query to verify Skewing in a database

select * from gp_toolkit.gp_skew_coefficients;

Query to verify the number of records in each segment of a table

select gp_segment_id, count(*)

from zc.zctimesheet1

group by gp_segment_id

order by gp_segment_id;

Query to verify the size of a database/ relation (table)

select pg_size_pretty(pg_database_size('DistributionTest'));

select pg_size_pretty(pg_relation_size('zc.zctimesheet1'));

Greenplum Best Practises

Chitchatiq 9/11/2017 08:33:00 PM Best Practices, Greenplum No comments

Best Practices:

A distribution key should not have more than 2 columns, recommended is 1 column.
While modeling a database, we need to make sure all the tables that gets involved in longer query execution are distributed using the joining column.
Distribution keys should be domain key but not surrogate keys.
Explicitly define a column or random distribution for all tables. Do not use the default as it may cause skew on segments.
Do not distribute on columns that will be used in the WHERE clause of a query. You should partition the data on the column that is used in WHERE clause
The distribution key column data should contain unique values
Do not distribute on dates or timestamps. Distributing data on date or timestamp may cause poor system performance
There are 2 questions to ask while deciding the distribution key of a table:

Is the Distribution Key used in joins?

Makes sure, no redistribution is happening during query execution

Does the key 'Equi-distributes' the data across all the segments?

Makes sure all the segments get approximately the same amount of data, hence the same amount of work during query execution.
We need to make sure data skewing is minimum for every table.

SELECT * FROM gp_toolkit.gp_skew_coefficients

Greenplum Architecture

Chitchatiq 9/11/2017 08:30:00 PM Greenplum No comments

Parallel Processing:

Taking all the rows from table and spreading them among many parallel processing units. In greenplum these parallel processing units are called segments.

SMP(symmetric multi processing system):

A smp system has multiple processors but share a single OS,memory pool,and share access to disks.SMP is a single server which is referred as node.A single SMP system is called as a Segment Host ,multiple Segment Hosts are connected together to create a Massively Parallel Processing (MPP) system.the number of segments per each segment host is defined by number of CPU’s the segment Host contains.

Commodity Hardware:

Greenplum provides incredible speeds with commodity hardware,this allows to create one segment per CPU. If you have two dual-core CPU processors in a server you should build four segments.

Roles of segment Host and Segment:

The host is the boss and the segments are the workers.Users login to the host and never communicate directly with the segments. The host builds a plan for the segments to follow that is delivered in plan slices. Each slice instructs the segments what to do. When the segments have done their work they return it to the host.

The Host is the brains behind the entire operation.

Segment Host Roles:

1.When a user logs into Greenplum, the host will log them in and be responsible for their session.

2.The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds a plan for the segments to follow.

3.The host uses system statistics and statistics from the ANALYZE command to build the best plan.

4.The host doesn't hold user data, but instead holds the Global System Catalog.

5.The host always delivers the final answer set to the user.

Segment Roles:

1.Segments are responsible for storing and retrieving rows from their assigned disk (Virtual disk).

2.Segments lock the tables and rows.

3.Segments sort rows and do all aggregation.

4.Segments handle all the join processing.

5.Segments handle all space management and space accounting.

Master Host will broadcast the plan to each segment simultaneously, but if you use the distribution key in the WHERE clause of your SQL with an equality statement, then only a single segment will be contacted to return the row.

For example: If the system contains 3 parallel processing units and if Employee_table contains 9 rows then each parallel processing unit holds 3 rows

select * from employee_table

where empno=2000;

If above query is written without using distribution key in where clause then for fetching the result all of the 3 segments are visited,but in above query empno is used in where clause with equality so only the segment which contain that employee number is visited.

How to find and delete the top 100 most used files in Linux/Unix

Chitchatiq 9/06/2017 06:15:00 PM Problems&Solutions No comments

Steps to find and delete the top 100 most used files in Linux/Unix

Command to find most disk space used files:

find / -xdev -type f -size +100M -exec du -sh {} ';' | sort -rh | head -n100

Command to delete the content without removing the file

> <<filename>>

E.g.

> sample.txt

Here > is used to empty the file

Thanks for reading this article and please do subscribe to get more updates

How to Install MySQL on CentOS 7

Chitchatiq 9/02/2017 02:43:00 PM MySQL Installation, Problems&Solutions No comments

Steps to Install MySQL on Linux-CentOS 7

Step 1: Download MYSQL repositories

wget http://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm

Step 2: Prepare repository to install MySQL

sudo rpm -Uvh mysql57-community-release-el7-9.noarch.rpm

Step 3: Install MySQL on your server

sudo yum install mysql-server

During the installation, you will be asked to provide Y or N. Just type Y. If you see Complete! message, then it means that Install completed successfully.

Step 4: Start the MySQL service

sudo systemctl start mysqld

Step 5: Check the service status

sudo systemctl status mysqld

Thanks for reading the article. If you feel it solves your problem, please do like,comment and share this !!!!!!!!!

Monday, December 4, 2017

How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)

Please follow below steps to fix HDP Sandbox -Mysql root password issue:

Enter

mysql -u root

UPDATE mysql.user SET Password=PASSWORD('hadoop') WHERE User='root';

FLUSH PRIVILEGES;

Sunday, November 26, 2017

How to Prepare and Clear Hortonworks Data Platform Certified Developer Exam (HDPCD -English)

Monday, October 30, 2017

How to use Password file with Sqoop

Monday, October 9, 2017

SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb

Thursday, September 21, 2017

How to create a column in a table and that should throw error when we do SELECT * or SELECT col

Monday, September 11, 2017

Greenplum System-Useful Queries

Greenplum Best Practises

Greenplum Architecture

Wednesday, September 6, 2017

How to find and delete the top 100 most used files in Linux/Unix

Saturday, September 2, 2017

How to Install MySQL on CentOS 7

Popular Posts

Facebook

Categories

Blog Archive

Popular Tags

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

Pages

Popular Posts