How to test Java MapReduce Jobs in Hadoop


Example-1 (wordcount)

Developer activities:
Step1: Develop MapReduce Code
Step2:  Unit Testing of Map Reduce code using MRUnit framework
Step3: Create Jar file for MapReduce code

Testing activities:
Step1:  Create a new directory in HDFS then copy data file from local to HDFS directory
[cloudera@quickstart training]$ hdfs dfs -mkdir /mapreduceinput
[cloudera@quickstart training]$ hdfs dfs -put wordcount.txt /mapreduceinput
Step2 : Run  jar file by providing data file as an input
[cloudera@quickstart training]$ hadoop jar wordcount.jar WordCount /mapreduceinput/wordcount.txt /mapreduceoutput/
Step3:  Check output file created on HDFS.
[cloudera@quickstart training]$ hdfs dfs -ls /mapreduceoutput
Found 2 items
-rw-r--r--   1 cloudera supergroup          0 2018-05-23 22:00 /mapreduceoutput/_SUCCESS
-rw-r--r--   1 cloudera supergroup         41 2018-05-23 22:00 /mapreduceoutput/part-00000
[cloudera@quickstart training]$ hdfs dfs -cat /mapreduceoutput/part-00000
are       3
how     2
is         1
welcome         1
where 1
you      4


Example-2 (Find out Number of Products Sold in Each Country)

Step1:  Create a new directory in HDFS then copy data file from local to HDFS directory
[cloudera@quickstart training]$ hdfs dfs -mkdir /productsalesinput
[cloudera@quickstart training]$ hdfs dfs -put SalesJan2009.csv /productsalesinput

Step2 : Run  jar file by providing data file as an input
[cloudera@quickstart training]$ hadoop jar ProductSalesperCountry.jar SalesCountry.SalesCountryDriver /productsalesinput/SalesJan2009.csv /productsalesoutput

Step3:  Check output file created on HDFS.
[cloudera@quickstart training]$ hdfs dfs -ls /productsalesoutput
Found 2 items
-rw-r--r--   1 cloudera supergroup          0 2018-05-23 23:52 /productsalesoutput/_SUCCESS
-rw-r--r--   1 cloudera supergroup        661 2018-05-23 23:52 /productsalesoutput/part-00000
[cloudera@quickstart training]$ hdfs dfs -cat /productsalesoutput/part-00000

Example-3 (MapReduce Join – Multiple Input Files)

Step1:  Create a new directory in HDFS then copy data file from local to HDFS directory
[cloudera@quickstart training]$ hdfs dfs -mkdir /multipleinputs
[cloudera@quickstart training]$ hdfs dfs -put customer.txt /multipleinputs
[cloudera@quickstart training]$ hdfs dfs -put delivery.txt /multipleinputs

Step2 : Run  jar file by providing data file as an input
[cloudera@quickstart training]$ hadoop jar MultipleInput.jar /multipleinputs/customer.txt /multipleinputs/delivery.txt /multipleoutput
Step3:  Check output file created on HDFS.
[cloudera@quickstart training]$ hdfs dfs -ls /multipleoutput
Found 2 items
-rw-r--r--   1 cloudera supergroup          0 2018-05-26 23:14 /multipleoutput/_SUCCESS
-rw-r--r--   1 cloudera supergroup         22 2018-05-26 23:14 /multipleoutput/part-r-00000
[cloudera@quickstart training]$ hdfs dfs -cat /multipleoutput/part-r-00000
mani   0
vijay    1
ravi      1

MRUnit test case for wordcount example

Pre-Requisites
Download the latest version of MRUnit jar from Apache  website: https://repository.apache.org/content/repositories/releases/org/apache/mrunit/mrunit/
mrunit-0.5.0-incubating.jar

Maven pom.xml dependency ( If you are using Maven Project)

org.apache.mrunit
mrunit
0.9.0-incubating
hadoop1 


Step1: Create a new Java project in Eclipse then add JUnit Library.

Step2:  Add external Jars which are required to run Junit test case
/usr/lib/hadoop  
 /usr/lib/hadoop-0.20-mapreduce
/home/cloudera/training/MRUnit/mrunit-0.5.0-incubating.jar
In Addition we need to also add wordcount.jar ( The classes from wordcount.jar  will be used in JUnit test case)

Step3:  Create Junit Test case

Word count MRUnit test case:
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.MapDriver;
import org.apache.hadoop.mrunit.MapReduceDriver;
import org.apache.hadoop.mrunit.ReduceDriver;
import org.junit.Before;
import org.junit.Test;

public class TestWordCount {
            MapReduceDriver mapReduceDriver;
            MapDriver mapDriver;
            ReduceDriver reduceDriver;

            @Before
            public void setUp() {
                        WordMapper mapper = new WordMapper();
                        SumReducer reducer = new SumReducer();
                        mapDriver = new MapDriver();
                        mapDriver.setMapper(mapper);
                        reduceDriver = new ReduceDriver();
                        reduceDriver.setReducer(reducer);
                        mapReduceDriver = new MapReduceDriver();
                        mapReduceDriver.setMapper(mapper);
                        mapReduceDriver.setReducer(reducer);
            }

            @Test
            public void testMapper() {
                        mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
                        mapDriver.withOutput(new Text("cat"), new IntWritable(1));
                        mapDriver.withOutput(new Text("cat"), new IntWritable(1));
                        mapDriver.withOutput(new Text("dog"), new IntWritable(1));
                        mapDriver.runTest();
            }

            @Test
            public void testReducer() {
                        List values = new ArrayList();
                        values.add(new IntWritable(1));
                        values.add(new IntWritable(1));
                        reduceDriver.withInput(new Text("cat"), values);
                        reduceDriver.withOutput(new Text("cat"), new IntWritable(2));
                        reduceDriver.runTest();
            }

            @Test
            public void testMapReduce() {
                        mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
                        mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2));
                        mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1));
                        mapReduceDriver.runTest();
            }

}

Step4: Run JUint test case

Step5: Results should be passed.


Followers