Dear Learner,

I hope you are doing good.

In the seqinput.txt file you need to give the path of the images files uploaded in HDFS.
 
Please find the steps below to run the Sequence File program.
 
  • Place all the images in a directory of the VM. Lets assume I have placed all the images in /home/edureka/images/
  • Create a directory in hdfs using the following command.
Command: hadoop dfs -mkdir /<dir name>
E.g: hadoop dfs -mkdir /sequenceimages
 
  •  Transfer the images into that directory by using the below command:
Command: hadoop dfs -put <path of the image> /<dir name>
E.g:  hadoop dfs -put /home/edureka/images/* /sequenceimages/
 
  • Now create a text file - seqinput.txt and put the path of the images in hdfs and transfer it to hdfs. 
For ex: in my case the hdfs directory name is sequenceimages and the name of the image is blur1.jpg. So, in the text file we have to mention as /sequenceimages/blur1.jpg  Similarly do the same for second and third image mention the directory name and file name
 
Command: vi seqinput.txt

In my case I have added the below lines.

/sequenceimages/auto_loan.png 
/sequenceimages/blur1.jpg
/sequenceimages/credit_card_loan.bmp
/sequenceimages/flower1.jpg
/sequenceimages/flower2.jpg
/sequenceimages/flower.jpg
/sequenceimages/glow_blur.jpg
/sequenceimages/home_loan.jpg
/sequenceimages/personal_loan.gif
/sequenceimages/sleigh.jpg
/sequenceimages/threshold_blur.jpg


Transferred seqinput.txt into hdfs.
Command: hadoop dfs -put seqinput.txt /
 
  • Now create the jar file of only BinaryFilesToHadoopSequenceFile program and execute the below command:
Command: hadoop jar <jar name> <name of text file> <output file>
E.g: hadoop jar binarytosequence.jar /seqinput.txt /seqoutput1

The output file which you will get will be in binary format. 
 
  • Now import the ImageDriver.java , ImageDuplicatesMapper.java and ImageDupsReducer.java into Eclipse and create the jar file and give the input file which we have got as the output of BinaryFilesToHadoopSequenceFile program and give the below command:
Command: hadoop jar <jar name> <output of BinaryFilesToHadoopSequenceFile program > <output file>
E.g: hadoop jar  sequence.jar  /seqoutput1  /seqoutput2
 
Now goto the output file location of hdfs to view the output. In my case it would be seqoutput2

Please try it and let me know if this helps you.

Waiting for your response.

Please feel free to revert if you need any further help we will be glad to assist you.