- Data will be available in HDFS. The table is going to create on HDFS data.
- We can call this one as schema on data.
- At the time of dropping the table it drops only schema, the data will be still available in HDFS as before.
- External tables provide an option to create multiple schemas for the data stored
- in HDFS instead of deleting the data every time whenever schema updates
Pages
- Manual Testing Tutorials
- Manual Testing Materials
- Manual Testing Interview Q & A
- ISTQB
- UNIX /Linux
- SQL
- Agile Methodology
- Selenium with Java
- Selenium with Python
- Automation Testing Materials
- API Testing
- Advanced Java
- Cypress Tutorials
- ETL Testing Documents
- ETL Testing videos
- Big Data Hadoop
- SDET Essentials
- Miscellaneous Topics
- Career Guidance
- Mock Interviews
- Resume Templates
- YouTube Videos
- Online Training
- Udemy Courses
Apache Hive External Tables
Apache Hive Complex Data Types(Collections)
Complex Data Types
- arrays: ARRAY
- maps: MAP
- structs: STRUCT
How to test Python MapReduce Jobs in Hadoop
Example: Count Number of
words in a text file (word count)
1)
Create Python scripts mapper.py & reducer.py
2)
Test mapper.py and reducer.py scripts locally before using them in a
MapReduce job.
Test1:
[cloudera@quickstart
training]$ echo "abc xyz abc abc abc xyz pqr" | python
/home/cloudera/training/wordcount-python/mapper.py |sort -k1,1 | python
/home/cloudera/training/wordcount-python/reducer.py
abc 4
pqr 1
xyz 2
Test2:
[cloudera@quickstart
training]$ cat wordcount.txt | python
/home/cloudera/training/wordcount-python/mapper.py |sort -k1,1 | python
/home/cloudera/training/wordcount-python/reducer.py
are 3
how 2
is 1
welcome 1
where 1
you 4
3) Create ‘wordcountinput’ directory in HDFS
then copy wordcount.txt to HDFS .
[cloudera@quickstart
training]$ hdfs dfs -mkdir /wordcountinput
[cloudera@quickstart
training]$ hdfs dfs -put wordcount.txt /wordcountinput
4)
Execute MapReduce job using streaming jar file .
Location:
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming
[cloudera@quickstart
training]$ hadoop jar
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmapred.reduce.tasks=1 -file /home/cloudera/training/wordcount-python/mapper.py
/home/cloudera/training/wordcount-python/reducer.py -mapper "python
mapper.py" -reducer "python reducer.py" -input
/wordcountinput/wordcount.txt -output /wordcountoutput
5)
Check the output
[cloudera@quickstart
training]$ hdfs dfs -ls /wordcountoutput
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-24 00:40
/wordcountoutput/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 41 2018-05-24 00:40
/wordcountoutput/part-00000
[cloudera@quickstart
training]$ hdfs dfs -cat /wordcountoutput/part-00000
are 3
how 2
is 1
welcome 1
where 1
you 4
How to test Java MapReduce Jobs in Hadoop
Example-1
(wordcount)
Developer activities:
Step1:
Develop MapReduce Code
Step2: Unit Testing of Map Reduce code using MRUnit
framework
Step3:
Create Jar file for MapReduce code
Testing activities:
Step1: Create a new directory in HDFS then copy data
file from local to HDFS directory
[cloudera@quickstart
training]$ hdfs dfs -mkdir /mapreduceinput
[cloudera@quickstart
training]$ hdfs dfs -put wordcount.txt /mapreduceinput
Step2
: Run jar file by providing data file as
an input
[cloudera@quickstart
training]$ hadoop jar wordcount.jar WordCount /mapreduceinput/wordcount.txt
/mapreduceoutput/
Step3: Check output file created on HDFS.
[cloudera@quickstart
training]$ hdfs dfs -ls /mapreduceoutput
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-23 22:00
/mapreduceoutput/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 41 2018-05-23 22:00
/mapreduceoutput/part-00000
[cloudera@quickstart
training]$ hdfs dfs -cat /mapreduceoutput/part-00000
are 3
how 2
is 1
welcome 1
where 1
you 4
Example-2 (Find
out Number of Products Sold in Each Country)
Step1: Create a new directory in HDFS then copy data
file from local to HDFS directory
[cloudera@quickstart
training]$ hdfs dfs -mkdir /productsalesinput
[cloudera@quickstart
training]$ hdfs dfs -put SalesJan2009.csv /productsalesinput
Step2
: Run jar file by providing data file as
an input
[cloudera@quickstart
training]$ hadoop jar ProductSalesperCountry.jar
SalesCountry.SalesCountryDriver /productsalesinput/SalesJan2009.csv /productsalesoutput
Step3: Check output file created on HDFS.
[cloudera@quickstart
training]$ hdfs dfs -ls /productsalesoutput
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-23 23:52
/productsalesoutput/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 661 2018-05-23 23:52
/productsalesoutput/part-00000
[cloudera@quickstart
training]$ hdfs dfs -cat /productsalesoutput/part-00000
Example-3
(MapReduce Join – Multiple Input Files)
Step1: Create a new directory in HDFS then copy data
file from local to HDFS directory
[cloudera@quickstart
training]$ hdfs dfs -mkdir /multipleinputs
[cloudera@quickstart
training]$ hdfs dfs -put customer.txt /multipleinputs
[cloudera@quickstart
training]$ hdfs dfs -put delivery.txt /multipleinputs
Step2
: Run jar file by providing data file as
an input
[cloudera@quickstart
training]$ hadoop jar MultipleInput.jar /multipleinputs/customer.txt
/multipleinputs/delivery.txt /multipleoutput
Step3: Check output file created on HDFS.
[cloudera@quickstart
training]$ hdfs dfs -ls /multipleoutput
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-26 23:14
/multipleoutput/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 22 2018-05-26 23:14
/multipleoutput/part-r-00000
[cloudera@quickstart
training]$ hdfs dfs -cat /multipleoutput/part-r-00000
mani 0
vijay 1
ravi 1
MRUnit test case for wordcount example
Pre-Requisites
Download the latest version of MRUnit jar from
Apache website: https://repository.apache.org/content/repositories/releases/org/apache/mrunit/mrunit/.
mrunit-0.5.0-incubating.jar
Maven pom.xml dependency ( If you are using
Maven Project)
Step1: Create a new Java project in Eclipse then add
JUnit Library.
Step2: Add
external Jars which are required to run Junit test case
/usr/lib/hadoop
/usr/lib/hadoop-0.20-mapreduce
/home/cloudera/training/MRUnit/mrunit-0.5.0-incubating.jar
In Addition we need to
also add wordcount.jar ( The classes from wordcount.jar will be used in JUnit test case)
Step3:
Create Junit Test case
Word
count MRUnit test case:
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mrunit.MapDriver;
import org.apache.hadoop.mrunit.MapReduceDriver;
import
org.apache.hadoop.mrunit.ReduceDriver;
import org.junit.Before;
import org.junit.Test;
public class TestWordCount {
MapReduceDriver mapReduceDriver;
MapDriver mapDriver;
ReduceDriver reduceDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();
mapDriver = new MapDriver();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
@Test
public void testReducer() {
List values = new ArrayList();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
reduceDriver.withInput(new Text("cat"), values);
reduceDriver.withOutput(new Text("cat"), new IntWritable(2));
reduceDriver.runTest();
}
@Test
public void testMapReduce() {
mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2));
mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1));
mapReduceDriver.runTest();
}
}
Step4: Run JUint test case
Step5: Results should be passed.
Hadoop HDFS Commands
HDFS Commands
- jps
HDFS Command to print Hadoop
processes.
[root@quickstart Desktop]# jps
- fsck
HDFS Command to check the health of
the Hadoop file system.
[cloudera@quickstart
training]$ hdfs fsck /
·
ls
HDFS Command to display the list of
Files and Directories in HDFS.
[cloudera@quickstart
training]$ hdfs dfs -ls /
- mkdir
HDFS Command to create the directory
in HDFS.
[cloudera@quickstart
training]$ hdfs dfs -mkdir /bigdatatesting
[cloudera@quickstart
training]$ hdfs dfs -ls /
drwxr-xr-x - cloudera supergroup 0 2018-05-23 00:46 /bigdatatesting
Note: Here we are trying to create a directory
named “Bigdatatesting” in HDFS.
- touchz
HDFS Command to create a file in
HDFS with file size 0 bytes.
[cloudera@quickstart
training]$ hdfs dfs -touchz /bigdatatesting/test.dat
[cloudera@quickstart
training]$ hdfs dfs -ls /bigdatatesting/
Found 1 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-23 00:48
/bigdatatesting/test.tx
Note: Here we are trying to create a file named “test.dat” in the directory “bigdatatesting” of hdfs with file size 0 bytes.
- du
HDFS Command to check the file
size.
[cloudera@quickstart
training]$ hdfs dfs -du -s /bigdatatesting/test.dat
0
0 /bigdatatesting/test.dat
·
appendToFile
Appends the contents to the given
destination file on HDFS. The destination file will be created if it does not
exist.
[cloudera@quickstart
training]$ hdfs dfs -appendToFile - /bigdatatesting/test.dat
- cat
HDFS Command that reads a file
on HDFS and prints the content of that file to the standard output.
[cloudera@quickstart training]$ hdfs dfs -cat
/bigdatatesting/test.dat
- copyFromLocal
HDFS Command to copy the file from a
Local file system to HDFS.
Step1:
Create a file in Local File System.
[cloudera@quickstart
training]$ cat>> test1.dat
[cloudera@quickstart
training]$ ls test1.dat
test1.dat
Step2:
Copy file from Local File system to HDFS
[cloudera@quickstart
training]$ hdfs dfs -copyFromLocal test1.dat /bigdatatesting/
Note: Here the test is the file present in the local
directory /home/cloudera/training and after the command gets executed the test
file will be copied in /bigdatatesting directory of HDFS.
- copyToLocal
HDFS Command to copy the file from
HDFS to Local File System.
Step1:
Check test.dat file present in local file system.
[cloudera@quickstart
training]$ ls test.dat
ls: cannot access test.dat: No such
file or directory
Step2:
Copy test.dat file from HDFS to local file system.
[cloudera@quickstart
training]$ hdfs dfs -copyToLocal /bigdatatesting/test.dat
/home/cloudera/training
Step3:
Check again test.dat file present in local file system.
[cloudera@quickstart
training]$ ls test.dat
test.dat
Note: Here test.dat is a file present in the bigdatatesting
directory of HDFS and after the command gets executed the test.dat file will be
copied to local directory /home/Cloudera/training
- put
HDFS Command to copy single source
or multiple sources from local file system to the destination file system.
Step1:
Create a file in Local File System.
[cloudera@quickstart
training]$ cat>> test2.dat
[cloudera@quickstart
training]$ ls test2.dat
test1.dat
Step2:
Copy file from Local File system to HDFS
[cloudera@quickstart
training]$ hdfs dfs -put test2.dat /bigdatatesting/
Note: Here the test2.dat is the file present in the local
directory /home/cloudera/training and after the command gets executed the
test2.dat file will be copied in /bigdatatesting directory of HDFS.
Note: The command put is similar to copyFromLocal
command.
- · get
HDFS Command to copy files from hdfs
to the local file system.
Step1:
Create a new file test3.dat on HDFS.
[cloudera@quickstart
training]$ hdfs dfs -touchz /bigdatatesting/test3.dat
Step2:
Copy test3.dat file from HDFS to local file system.
[cloudera@quickstart
training]$ hdfs dfs -get /bigdatatesting/test3.dat /home/cloudera/training
Step3:
Check again test3.dat file present in local file system.
[cloudera@quickstart
training]$ ls test3.dat
Test3.dat
Note1: Here test3.dat is a file present in the
bigdatatesting directory of HDFS and after the command gets executed the test.dat
file will be copied to local directory /home/Cloudera/training
Note2: The command get is similar to copyToLocal command
- cp
HDFS Command to copy files from
source to destination. This command allows multiple sources as well, in which
case the destination must be a directory.
[cloudera@quickstart
training]$ hdfs dfs -mkdir /hadooptesting/
[cloudera@quickstart
training]$ hdfs dfs -cp /bigdatatesting/test.dat /hadooptesting
- mv
HDFS Command to move files from
source to destination. This command allows multiple sources as well, in which
case the destination needs to be a directory.
[cloudera@quickstart
training]$ hdfs dfs -mv /bigdatatesting/test1.dat /hadooptesting/
- rm
HDFS Command to remove the file from
HDFS.
[cloudera@quickstart
training]$ hdfs dfs -rm /bigdatatesting/test2.dat
Deleted /bigdatatesting/test2.dat
- rm -r
HDFS Command to remove the entire
directory and all of its content from HDFS.
[cloudera@quickstart
training]$ hdfs dfs -rm -r /hadooptesting
Deleted /hadooptesting
- rmdir
HDFS Command to remove the directory
if it is empty.
[cloudera@quickstart
training]$ hdfs dfs -rmdir /bigdatatesting
- usage
HDFS Command that returns the help
for an individual command.
[cloudera@quickstart
training]$ hdfs dfs -usage mkdir
Note: By using usage command you can get
information about any command.
- help
HDFS Command that displays help for
given command or all commands if none is specified.
[cloudera@quickstart
training]$ hdfs dfs -help
Subscribe to:
Posts (Atom)
Popular Posts
- How To Explain Project In Interview Freshers and Experienced
- Selenium Frequently Asked Questions & Answers Part-6
- API/Webservices Testing using RestAssured (Part 1)
- How to use HashMap in Selenium WebDriver
- Java Programs for Selenium
- Manual & Automation Testing Free Video Tutorials | YouTube Playlists
- Manual Testing Interview Questions & Answers-PART1
- ETL Test Scenarios and Test Cases
- Python Interview Questions and Answers Part-1
Followers
Labels
a Software Tester or a Developer?
(1)
Adhoc Testing
(1)
Agile
(33)
Agile Team
(1)
Agile Testing
(2)
apache poi
(1)
Appium
(1)
Appium FAQ'S
(1)
Banking Domain
(1)
Core Java scripts
(4)
Cross-browser Web Testing
(1)
How to use Java Collections
(1)
ISTQB
(1)
ISTQB Sample Question Paper
(10)
Java
(5)
Java Interview Questions
(2)
Java Programs for Selenium
(1)
Jira
(1)
Linux
(7)
Manual Testing
(48)
Manual Testing Interview Questions
(2)
Maven Questions & Answers
(1)
Mobile Application
(1)
Mobile application testing
(1)
Mobile Application Types
(1)
Mobile Testing
(2)
NoSQL
(1)
ORACLE
(9)
PL/SQL
(1)
Scrum
(1)
SDLC
(33)
Selenium
(6)
Selenium Common Exceptions
(1)
Selenium FAQ
(5)
Selenium FAQ's
(1)
Selenium Grid
(1)
Selenium Interview Questions
(1)
Set Career Goals
(1)
Shell Scripting
(6)
Skills Required for Software Tester
(1)
Software Testing
(43)
Sprint
(1)
SQL
(11)
STLC
(33)
T-SQL
(1)
Testing E-commerce Websites
(1)
Testing Life Cycle
(33)
Testing process
(8)
TestNG
(7)
TestNG Questions & Answers
(1)
TestNG Assertions
(1)
TestNG Scripts
(4)
VI Editor
(6)
Web Services Testing
(1)
Web Testing
(1)
WebDriver
(6)
Webdriver Questions & Answers
(1)
Webservices API Testing
(1)
Writing Good Agile User Stories
(1)
WwebDriver Scripts
(4)