TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions
Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Wednesday, August 30, 2017

Hortonworks:Service 'userhome' check failed: File does not exist: /user/admin

 Chitchatiq     8/30/2017 06:08:00 PM     BigData&Hadoop, Hadoop, Problems&Solutions     No comments   


Problem: Service 'userhome' check failed: File does not exist: /user/admin


Solution:
sudo -u hdfs hadoop fs -mkdir /user/admin

sudo -u hdfs hdfs dfs -chown -R admin:hdfs /user/admin





similar kind of issues :

service 'ats' check failed: server error
could not write file /user/admin/hive/jobs/hive-job
service 'userhome' check failed: authentication required
hadoop mkdir permission denied
failed to get cluster information associated with this view instance
java io filenotfoundexception file does not exist hdfs


Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Monday, December 19, 2016

HADOOP - HDFS OPERATIONS

 Chitchatiq     12/19/2016 04:08:00 PM     Big Data, BigData&Hadoop, Hadoop, Hadoop Commands, SQL Server     No comments   

Starting HDFS
To format the configured HDFS file system, execute the following command in namenode HDFS server,
$ hadoop namenode -format
Start the distributed file system. After formatting the HDFS, The following command will start the namenode as well as the data nodes as cluster.
$ start-dfs.sh
Listing Files in HDFS
After loading the information in the server, we can find the list of files in a directory, status of a file, using ‘ls’. Given below is the syntax of ls that you can pass to a directory or a filename as an argument.
$ $HADOOP_HOME/bin/hadoop fs -ls <args>
Inserting Data into HDFS

Imagine we have data in the file called file.txt in the local file system which we intend to save in the Hadoop file system (HDFS). Just follow the steps...

Step 1

-MaKe a DIRrectory in HDFS if you want your file in a new directory
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step 2

"-put" your file there.
$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input

Step 3

Check your file while taking a look at the "-LiSt "
$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS

After being able to load the data, now you also need to know how to view the data in those files.
Just view the data in the file... call a "-cat "


-cat can be used to get our job done.
$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile

No no no .. i want the file into my local...

Don't worry.. just "-get" it from hadoop fs

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/


Shutting Down the HDFS

Aaaaah!! Done ?? Wrap up by shutting down the HDFS.

$ stop-dfs.sh

Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Thursday, August 11, 2016

Error: Failed with exception Unable to move source hdfs: to destination dfs://hadoop1.dev.com/apps/ FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

 Chitchatiq     8/11/2016 03:55:00 PM     Hadoop, org.apache.hadoop.hive.ql.exec.MoveTask, Problems&Solutions, return code 1, SQL Server     No comments   


 Failed with exception Unable to move source hdf

Resolution:
1.    Generally Move task will move the files from source location to destination location. If the user doesn’t have access to source path then it will throw above error
2.    First check that whether current executing user has execute/write permission on source folder. If not provide by using chmod command
3.    Additionally check below parameter from Ambari or Hive_site.xml file
          hive.metastore.client.setugi=true and hive.metastore.server.setugi=true. 
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Tuesday, August 9, 2016

BIG_DATA: Hadoop History and Overview

 Chitchatiq     8/09/2016 08:57:00 PM     Hadoop, SQL Server, What is Hadoop     No comments   


Apache™ Hadoop® is a distributed and highly scalable storage framework to process very large data sets across hundreds to thousands of computing nodes that operate in parallel.

Hadoop was created by Doug Cutting and he introduced the name as hadoop with Elephant toy symbol by seeing elephant toy with his son
1. Two main concepts associated with Hadoop are the Hadoop FileSystem (HDFS) and the
MapReduce processing engine.
2. The HDFS provides the storage whereas the MapReduce executes the program

BENEFITS
Some of the reasons organizations use Hadoop is its’ ability to store, manage and analyze vast amounts of structured and unstructured data quickly, reliably, flexibly and at low-cost
·        Scalability and Performance – distributed processing of data local to each node in a cluster enables Hadoop to store,manage, process and analyze data at petabyte scale.
·         Reliability- Large computing clusters are prone to failure of individual nodes in the cluster. Hadoop is fundamentally resilient – when a node fails processing is re-directed to the remaining nodes in the cluster and data is automatically re-replicated in preparation for future node failures.
·       Flexibility- Unlike traditional relational database management systems, you don’t have to create structured schemas before storing data. You can store data in any format, including semi-structured or unstructured formats, and then parse and apply schema to the data when read.
·       Low Cost- unlike proprietary software, Hadoop is open source and runs on low-cost commodity hardware.
We can talk about some of the differences between RDBMS and Hadoop 

RDBMS
Hadoop
1
Need to model your data
No need to model your data
2
Schema on write
Schema on read
3
Suits for OLTP
Suits for Batch processing jobs (OLAP)
4
Structured Data
All varieties of Data
5
Downtime is needed to do any kind of maintenance on storage or data files
No downtime is required to add storage
6
In standalone database systems, to add
processing power such as more CPU,
physical memory in non-virtualized environment, a downtime is needed for RDBMS such as DB2, Oracle, and SQL Server.
However, Hadoop systems are individual independent nodes that can be added in an as needed basis.
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg

Wednesday, July 13, 2016

What is BiG Data

 Chitchatiq     7/13/2016 03:27:00 PM     Big Data, Hadoop, SQL Server     No comments   

                       
1.    Big data is a term that describes the large volume of structured, semi-structured, un-structured data that is generating from our day to day life




2.    Big data technologies talks not only about storing the Huge data, it also talks about how data will be processed and how the business is using that data to take decisions





3.    70% of the data has grown in recent 4-5 years only
E.g.
1.    The New York Stock Exchange generates about one terabyte of new trade data per day.
2.    Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.

4.    Different kind of computer memory sizes are available
            Computer Memory size:

1 Bit                =         1 Byte
8 Bits               =         1 Byte
1024 Byte       =         1 KB (Kilo Byte)
1024 KB          =         1 MB( Mega Byte)
1024 MB        =         1 GB(Giga Byte)
1024 GB         =         1 TB(1 Terra Byte)
1024 TB          =         1 PB( Peta Byte)
1024 PB          =         1 EB(Exa Byte)
1024 EB          =         1 ZB(Zetta Byte) currently ZB of data generating everyday
1024 ZB          =         1 YB(Yotta Byte)
1024 YB          =         1 (Bronto Byte)
1024 BB          =         1 (Geop Byte)
Geop Byte is the highest memory as of today


5.    Big data can be defined by using 4 V’s

  1. Volume
  2. Velocity
  3. Variety
  4. Veracity
  5. Value


Volume: Large amount of data (MB-> GB-> TB-> PB->EB-> ZB…)

e.g Whatsapp
Consider Whatsapp previously chat history coming from users are in some GB or TB
Nowadays data is coming in EB/ZB

Velocity: Rate of data coming and processing (Velocity =(volume/Time))
e.g Facebook:
Consider Facebook users upload more than 500 million photos/videos a day. Velocity is the measure of how fast the data is coming in and how fast the data is processing.
Facebook has to handle a tsunami of photographs/videos every day. It has to ingest it all, process it, file it, and somehow, later, be able to retrieve it.

Variety: Structured Data, Semi structured data, Unstructured

Structured:  Data which can be stored in database SQL in table with rows and columns

Semi structured: It doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze
E.g XML, JSON

Unstructured: Now a day’s world has around 80% of unstructured data. It often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations
E.G. Twitter tweets, Face book scraps, videos, images, Radar data, Plan

Veracity:  Messiness of the data. With many forms of big data, quality and accuracy are less controllable (e.g. twitter posts with hash tags, abbreviations, typos..) so Big data and analytics technologies now allow us to work with these types of data

Value: turn raw data into value

Why BiG Data
Big data doesn’t mean that how much data you have, but what you do with it. We can take any kind of data from any source and analyze it to find answers with less cost and less time. With Existing Data warehouse system, we take some portion of data (like recent 2-3 years sales data) and analyzing the trends of data to take the business decision. But with the Big data technologies we can analyze the whole dataset and we can take better business decisions
When we combine big data with analytics technologies (Machine learning R), you can accomplish business-related tasks such as:

  • Fraud Detections in banking system
  • Finding the most churn customer in telecommunications
  • etc...
Who uses Big Data
Almost all industries are using the Big Data.
Ø  Banking
Ø  Education
Ø  Government
Ø  Health Care
Ø  Manufacturing
Ø  Retail
 How Big Data Works
With Big Data technologies (Hadoop), we can store, process, and analyze the large amount of data with
Ø  Less cost of hardware
Ø  Fast processing
Storage + Processing + Analyze =Big Data
Hadoop specially designed to handle Big Data. We will discuss about hadoop in next session
                                                                                                                           Session 2
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Home

Popular Posts

  • Powershell Script for SSRS Deployment
    Scenarios: Deploy SSRS reports using automation #Set variables with configure values $Environment = "DEV" $rep...
  • SQOOP-SqlManager-Error reading from database-java.sql.SQLException-set com.mysql.jdbc.RowDataDynamic-3c2d5cfb
    Problem: sqoop export --connect "jdbc:mysql://sandbox.hortonworks.com:3306/hdpcdpractise" --username hadoop --password had...
  • Greenplum Architecture
    Parallel Processing:  Taking all the rows from table and spreading them among many parallel processing units. In greenplum these...
  • How to Install MySQL on CentOS 7
    Steps to Install MySQL on Linux-CentOS 7 Step 1: Download MYSQL repositories wget http : //dev.mysql.com/get/mysql57-communit...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu