Example-1
(wordcount)
Developer activities:
Step1:
Develop MapReduce Code
Step2: Unit Testing of Map Reduce code using MRUnit
framework
Step3:
Create Jar file for MapReduce code
Testing activities:
Step1: Create a new directory in HDFS then copy data
file from local to HDFS directory
[cloudera@quickstart
training]$ hdfs dfs -mkdir /mapreduceinput
[cloudera@quickstart
training]$ hdfs dfs -put wordcount.txt /mapreduceinput
Step2
: Run jar file by providing data file as
an input
[cloudera@quickstart
training]$ hadoop jar wordcount.jar WordCount /mapreduceinput/wordcount.txt
/mapreduceoutput/
Step3: Check output file created on HDFS.
[cloudera@quickstart
training]$ hdfs dfs -ls /mapreduceoutput
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-23 22:00
/mapreduceoutput/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 41 2018-05-23 22:00
/mapreduceoutput/part-00000
[cloudera@quickstart
training]$ hdfs dfs -cat /mapreduceoutput/part-00000
are 3
how 2
is 1
welcome 1
where 1
you 4
Example-2 (Find
out Number of Products Sold in Each Country)
Step1: Create a new directory in HDFS then copy data
file from local to HDFS directory
[cloudera@quickstart
training]$ hdfs dfs -mkdir /productsalesinput
[cloudera@quickstart
training]$ hdfs dfs -put SalesJan2009.csv /productsalesinput
Step2
: Run jar file by providing data file as
an input
[cloudera@quickstart
training]$ hadoop jar ProductSalesperCountry.jar
SalesCountry.SalesCountryDriver /productsalesinput/SalesJan2009.csv /productsalesoutput
Step3: Check output file created on HDFS.
[cloudera@quickstart
training]$ hdfs dfs -ls /productsalesoutput
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-23 23:52
/productsalesoutput/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 661 2018-05-23 23:52
/productsalesoutput/part-00000
[cloudera@quickstart
training]$ hdfs dfs -cat /productsalesoutput/part-00000
Example-3
(MapReduce Join – Multiple Input Files)
Step1: Create a new directory in HDFS then copy data
file from local to HDFS directory
[cloudera@quickstart
training]$ hdfs dfs -mkdir /multipleinputs
[cloudera@quickstart
training]$ hdfs dfs -put customer.txt /multipleinputs
[cloudera@quickstart
training]$ hdfs dfs -put delivery.txt /multipleinputs
Step2
: Run jar file by providing data file as
an input
[cloudera@quickstart
training]$ hadoop jar MultipleInput.jar /multipleinputs/customer.txt
/multipleinputs/delivery.txt /multipleoutput
Step3: Check output file created on HDFS.
[cloudera@quickstart
training]$ hdfs dfs -ls /multipleoutput
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2018-05-26 23:14
/multipleoutput/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 22 2018-05-26 23:14
/multipleoutput/part-r-00000
[cloudera@quickstart
training]$ hdfs dfs -cat /multipleoutput/part-r-00000
mani 0
vijay 1
ravi 1
MRUnit test case for wordcount example
Pre-Requisites
Download the latest version of MRUnit jar from
Apache website: https://repository.apache.org/content/repositories/releases/org/apache/mrunit/mrunit/.
mrunit-0.5.0-incubating.jar
Maven pom.xml dependency ( If you are using
Maven Project)
Step1: Create a new Java project in Eclipse then add
JUnit Library.
Step2: Add
external Jars which are required to run Junit test case
/usr/lib/hadoop
/usr/lib/hadoop-0.20-mapreduce
/home/cloudera/training/MRUnit/mrunit-0.5.0-incubating.jar
In Addition we need to
also add wordcount.jar ( The classes from wordcount.jar will be used in JUnit test case)
Step3:
Create Junit Test case
Word
count MRUnit test case:
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mrunit.MapDriver;
import org.apache.hadoop.mrunit.MapReduceDriver;
import
org.apache.hadoop.mrunit.ReduceDriver;
import org.junit.Before;
import org.junit.Test;
public class TestWordCount {
MapReduceDriver mapReduceDriver;
MapDriver mapDriver;
ReduceDriver reduceDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();
mapDriver = new MapDriver();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
@Test
public void testReducer() {
List values = new ArrayList();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
reduceDriver.withInput(new Text("cat"), values);
reduceDriver.withOutput(new Text("cat"), new IntWritable(2));
reduceDriver.runTest();
}
@Test
public void testMapReduce() {
mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2));
mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1));
mapReduceDriver.runTest();
}
}
Step4: Run JUint test case
Step5: Results should be passed.