HORTONWORKS UNIVERSITY
is offering HDPCD certificate and its exclusively hands-on, performance-based
exam that require we to complete a set of tasks on actual hadoop cluster
instead of just guessing multiple choice questions.
Basically HDPCD certificate categorised into 3 categories
1. Data ingestion
2. Data transformation
3. Data analysis
As mentioned in Hortonworks webiste, here I tried to keep all the
tasks
Data
Ingestion
Import data from a table in a relational
database into HDFS
Import the results of a query from a relational
database into HDFS
Import a table from a relational database into
a new or existing Hive table
Insert or update data from HDFS into a table in
a relational database
Given a Flume configuration file, start a Flume
agent
Given a configured sink and source, configure a
Flume memory channel with a specified capacity
Data
Transformation
Write and execute a Pig script
Load data into a Pig relation without a schema
Load data into a Pig relation with a schema
Load data from a Hive table into a Pig relation
Use Pig to transform data into a specified
format
Transform data to match a given Hive schema
Group the data of one or more Pig relations
Use Pig to remove records with null values from
a relation
Store the data from a Pig relation into a
folder in HDFS
Store the data from a Pig relation into a Hive
table
Sort the output of a Pig relation
Remove the duplicate tuples of a Pig relation
Specify the number of reduce tasks for a Pig
MapReduce job
Join two datasets using Pig
Perform a replicated join using Pig
Run a Pig job using Tez
Within a Pig script, register a JAR file of
User Defined Functions
Within a Pig script, define an alias for a User
Defined Function
Within a Pig script, invoke a User Defined
Function
Data
Analysis
Write and execute a Hive query
Define a Hive-managed table
Define a Hive external table
Define a partitioned Hive table
Define a bucketed Hive table
Define a Hive table from a select query
Define a Hive table that uses the ORCFile
format
Create a new ORCFile table from the data in an
existing non-ORCFile Hive table
Specify the storage format of a Hive table
Specify the delimiter of a Hive table
Load data into a Hive table from a local
directory
Load data into a Hive table from an HDFS
directory
Load data into a Hive table as the result of a
query
Load a compressed data file into a Hive table
Update a row in a Hive table
Delete a row from a Hive table
Insert a new row into a Hive table
Join two Hive tables
Run a Hive query using Tez
Run a Hive query using vectorization
Output the execution plan for a Hive query
Use a subquery within a Hive query
Output data from a Hive query that is totally
ordered across multiple reducers
Set a Hadoop or Hive configuration property
from within a Hive query
Reference:
https://hortonworks.com/services/training/certification/exam-objectives/#hdpcd