TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions

Monday, December 4, 2017

How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)

 Chitchatiq     12/04/2017 10:01:00 PM     No comments   


Please follow below steps to fix HDP Sandbox -Mysql root password issue:

Enter

mysql -u root


UPDATE mysql.user SET Password=PASSWORD('hadoop') WHERE User='root';


FLUSH PRIVILEGES;

Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Sunday, November 26, 2017

How to Prepare and Clear Hortonworks Data Platform Certified Developer Exam (HDPCD -English)

 Chitchatiq     11/26/2017 09:06:00 PM     No comments   

 HORTONWORKS UNIVERSITY is offering HDPCD certificate and its exclusively hands-on, performance-based exam that require we to complete a set of tasks on actual hadoop cluster instead of just guessing multiple choice questions.

Basically HDPCD certificate categorised into 3 categories

1. Data ingestion
2. Data transformation
3. Data analysis

As mentioned in Hortonworks webiste, here I tried to keep all the tasks

Data Ingestion
Import data from a table in a relational database into HDFS
Import the results of a query from a relational database into HDFS
Import a table from a relational database into a new or existing Hive table
Insert or update data from HDFS into a table in a relational database
Given a Flume configuration file, start a Flume agent
Given a configured sink and source, configure a Flume memory channel with a specified capacity
Data Transformation
Write and execute a Pig script
Load data into a Pig relation without a schema
Load data into a Pig relation with a schema
Load data from a Hive table into a Pig relation
Use Pig to transform data into a specified format
Transform data to match a given Hive schema
Group the data of one or more Pig relations
Use Pig to remove records with null values from a relation
Store the data from a Pig relation into a folder in HDFS
Store the data from a Pig relation into a Hive table
Sort the output of a Pig relation
Remove the duplicate tuples of a Pig relation
Specify the number of reduce tasks for a Pig MapReduce job
Join two datasets using Pig
Perform a replicated join using Pig
Run a Pig job using Tez
Within a Pig script, register a JAR file of User Defined Functions
Within a Pig script, define an alias for a User Defined Function
Within a Pig script, invoke a User Defined Function

Data Analysis
Write and execute a Hive query
Define a Hive-managed table
Define a Hive external table
Define a partitioned Hive table
Define a bucketed Hive table
Define a Hive table from a select query
Define a Hive table that uses the ORCFile format
Create a new ORCFile table from the data in an existing non-ORCFile Hive table
Specify the storage format of a Hive table
Specify the delimiter of a Hive table
Load data into a Hive table from a local directory
Load data into a Hive table from an HDFS directory
Load data into a Hive table as the result of a query
Load a compressed data file into a Hive table
Update a row in a Hive table
Delete a row from a Hive table
Insert a new row into a Hive table
Join two Hive tables
Run a Hive query using Tez
Run a Hive query using vectorization
Output the execution plan for a Hive query
Use a subquery within a Hive query
Output data from a Hive query that is totally ordered across multiple reducers
Set a Hadoop or Hive configuration property from within a Hive query

Reference:
https://hortonworks.com/services/training/certification/exam-objectives/#hdpcd







Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Monday, October 30, 2017

How to use Password file with Sqoop

 Chitchatiq     10/30/2017 11:21:00 AM     BigData&Hadoop, Problems&Solutions, sqoop     No comments   


How to use local password-file parameter with sqoop

Problem: sometimes we have to read password from file for sqoop command.
Solution: Passing password parameter file to sqoop command is not a big deal. Just follow below steps:

1.    Create a password with Echo -n “<<password>> > <<passwordfilename>>
E.g.
Echo -n “Srinivas” > passwordfile – Here we can give file name with location

Sqoop Command:

sqoop list-tables  --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password-file file:///usr/Srinivas/ passwordfile


Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Monday, October 9, 2017

SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb

 Chitchatiq     10/09/2017 06:37:00 PM     Hive, Problems&Solutions, sqoop     No comments   



Problem:

sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password hadoop --table weather --export-dir /user/hortonworks/weather/  --fields-terminated-by ',';


Some times when we run sqoop commands like above we will get following error

“ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.

java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3c2d5cfb is still active. No statements may be issued when any streaming result sets are open and in use on a given connection.”

Solution:
Simple solution would be adding driver parameter with value like below
--driver com.mysql.jdbc.Driver


sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password hadoop --table weather --export-dir /user/hortonworks/weather/  --fields-terminated-by ',' --driver com.mysql.jdbc.Driver
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Thursday, September 21, 2017

How to create a column in a table and that should throw error when we do SELECT * or SELECT col

 Chitchatiq     9/21/2017 06:45:00 PM     Problems&Solutions     No comments   


Problem:
Create a column in a table and that should give an error if we give SELECT * or SELECT of that column

Solution:
Friends, don’t think toooooo much like how to solve this problem. Here is the simple solution that I got.

Just use any computed column and that should not have any syntactical errors but it should have logical error (E..g 1/0, 1+’srinivas’)

Sample Script:
create table #test
(id1 int,id2 int,Col1 as id1+'srinu')


select *  from #test

Here We will get
Conversion failed when converting the varchar value 'srinu' to data type int.


That’s all friends……!

Thanks for reading this article and please post a comment if you have any questions/thoughts












Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Monday, September 11, 2017

Greenplum System-Useful Queries

 Chitchatiq     9/11/2017 08:35:00 PM     Greenplum     No comments   



Useful Queries:

  1. Query to verify the list of segments in a Greenplum system

select * from gp_segment_configuration;

Result:
Column Value
Description
Content = -1
Master Node
Content = 0 - n
Slave segments

  1. Query to verify Skewing in a database

select * from gp_toolkit.gp_skew_coefficients;

  1. Query to verify the number of records in each segment of a table

select gp_segment_id, count(*)
from zc.zctimesheet1
group by gp_segment_id
order by gp_segment_id;

  1. Query to verify the size of a database/ relation (table)

select pg_size_pretty(pg_database_size('DistributionTest'));
select pg_size_pretty(pg_relation_size('zc.zctimesheet1'));


Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Greenplum Best Practises

 Chitchatiq     9/11/2017 08:33:00 PM     Best Practices, Greenplum     No comments   




Best Practices:

  1. A distribution key should not have more than 2 columns, recommended is 1 column.
  2. While modeling a database, we need to make sure all the tables that gets involved in longer query execution are distributed using the joining column.
  3. Distribution keys should be domain key but not surrogate keys.
  4. Explicitly define a column or random distribution for all tables. Do not use the default as it may cause skew on segments.
  5. Do not distribute on columns that will be used in the WHERE clause of a query. You should partition the data on the column that is used in WHERE clause
  6. The distribution key column data should contain unique values
  7. Do not distribute on dates or timestamps. Distributing data on date or timestamp may cause poor system performance
  8. There are 2 questions to ask while deciding the distribution key of a table:
    1. Is the Distribution Key used in joins?
      1. Makes sure, no redistribution is happening during query execution
    2. Does the key 'Equi-distributes'  the data across all the segments?
      1. Makes sure all the segments get approximately the same amount of data, hence the same amount of work during query execution.
      2. We need to make sure data skewing is minimum for every table.

SELECT * FROM gp_toolkit.gp_skew_coefficients
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Greenplum Architecture

 Chitchatiq     9/11/2017 08:30:00 PM     Greenplum     No comments   




Parallel Processing:
 Taking all the rows from table and spreading them among many parallel processing units. In greenplum these parallel  processing units are called segments.

SMP(symmetric multi processing system):
  A smp system has multiple processors but share a single OS,memory pool,and share access to disks.SMP is a single server which is referred as node.A single SMP system is called as a Segment Host ,multiple Segment Hosts are connected together to create a Massively Parallel Processing (MPP) system.the number of segments per each segment host is defined by number of CPU’s the segment Host contains.

Commodity Hardware:
  Greenplum provides incredible speeds with commodity hardware,this allows to create one segment per CPU. If you have two dual-core CPU processors in a server you should build four segments.

Roles of segment Host and Segment:
  The host is the boss and the segments are the workers.Users login to the host and never communicate directly with the segments. The host builds a plan for the segments to follow that is delivered in plan slices. Each slice instructs the segments what to do. When the segments have done their work they return it to the host.
  The Host is the brains behind the entire operation.

Segment Host Roles:
  1.When a user logs into Greenplum, the host will log them in and be responsible for their session.
  2.The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds a plan for the segments to follow.
  3.The host uses system statistics and statistics from the ANALYZE command to build the best plan.
  4.The host doesn't hold user data, but instead holds the Global System Catalog.
  5.The host always delivers the final answer set to the user.
Segment Roles:
  1.Segments are responsible for storing and retrieving rows from their assigned disk (Virtual disk).
  2.Segments lock the tables and rows.
  3.Segments sort rows and do all aggregation.
  4.Segments handle all the join processing.
  5.Segments handle all space management and space accounting.
Master Host will broadcast the plan to each segment simultaneously, but if you use the distribution key in the WHERE clause of your SQL with an equality statement, then only a single segment will be contacted to return the row.
For example: If the system contains 3 parallel processing units and if Employee_table contains 9 rows then each parallel processing unit holds 3 rows
      
select * from employee_table
              where empno=2000;
 If above query is written without using distribution key in where clause then for fetching the result all of the 3 segments are visited,but in above query empno is used in where clause with equality so only the segment which contain that employee number is visited.
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Wednesday, September 6, 2017

How to find and delete the top 100 most used files in Linux/Unix

 Chitchatiq     9/06/2017 06:15:00 PM     Problems&Solutions     No comments   


Steps to find and delete the top 100 most used files in Linux/Unix

Command to find most disk space used files:
find / -xdev -type f -size +100M -exec du -sh {} ';' | sort -rh | head -n100

Command to delete the content without removing the file
> <<filename>>
E.g.
> sample.txt
Here > is used to empty the file

Thanks for reading this article and please do subscribe to get more updates 


Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Saturday, September 2, 2017

How to Install MySQL on CentOS 7

 Chitchatiq     9/02/2017 02:43:00 PM     MySQL Installation, Problems&Solutions     No comments   


Steps to Install MySQL on Linux-CentOS 7


Step 1: Download MYSQL repositories

wget http://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm

Step 2: Prepare repository to install MySQL

sudo rpm -Uvh mysql57-community-release-el7-9.noarch.rpm

Step 3: Install MySQL on your server

sudo yum install mysql-server

During the installation, you will be asked to provide Y or N. Just type Y.  If you see Complete! message, then it means that Install completed successfully.


Step 4: Start the MySQL service

sudo systemctl start mysqld

Step 5: Check the service status

sudo systemctl status mysqld

Thanks for reading the article. If you feel it solves your problem, please do like,comment and share this !!!!!!!!!
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Older Posts Home

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum Best Practises
    Best Practices: A distribution key should not have more than 2 columns, recommended is 1 column. While modeling a database,...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...
  • How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)
    Please follow below steps to fix HDP Sandbox -Mysql root password issue: Enter mysql -u root UPDATE mysql.user SET Password=PASSWOR...
  • HADOOP - HDFS OPERATIONS
    Starting HDFS To format the configured HDFS file system, execute the following command in namenode HDFS server, $ hadoop namenode ...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu