Prerequisites: 

Java and Eclipse are installed

Need a environment where hadoop and its daemons are running


Usecase: How to Debug a Map Reduce program in Eclipse.


Please refer to the below recording in which it has been discussed.

https://edureka.wistia.com/medias/mjn5o15j1x


In this, we will consider the files located in your local file system as input and  run the Map reduce Program to check if it is resulting the correct results or not.


If it gives correct results, then you can execute this Map Reduce program in the Hadoop Cluster you have by creating a jar file.


If it gives incorrect results, then cross check where it went wrong by writing some System.out.println statements wherever necessary and debug the program.


In this we will reduce the Wastage of Storage space available in HDFS and time taken to find out the issue.


Below are the jar files used to run mapreduce programs on Eclipse:


  • common-cli-1.2.jar                    -   /usr/lib/hadoop-0.20/lib
  • common-httpclient-3.0.1.jar     -   /usr/lib/hadoop-0.20/lib

  • common-logging-1.1.1.jar        -   /usr/lib/hadoop-0.20/lib

  • hadoop-0.20.2-cdh3u0core.jar -   /usr/lib/hadoop-0.20/lib

  • jackson-core-asl-1.8.8.jar         -   /usr/lib/hadoop-0.20/lib

  • jackson-mapper-asl-1.8.8.jar    -   /usr/lib/hadoop-0.20/lib

  • log4j-1.2.15.jar                           -   /usr/lib/hadoop-0.20/lib


Note: The paths mentioned above are relevant to Cloudera CDH3. If you are using your own Hadoop cluster, you can find the jars in the lib folder of Hadoop directory where it is installed.


Please feel free to revert if you need any further help.