About

Edit photo

Friday, February 12, 2016

Hadoop Word Count - step by step execution


Step1: Open eclipse -> File -> New -> Others



Select Maven Project 


The following picture is perfect and do the same


use the filer: org.apache.maven click next


Use Group Id and Artifact Id as the below pic, or you can write anything else


Well, Application project is created successfully, now write the program in step2:

Step2:
pom.xml is the important file, it contains the configuration files of the maven. add the hadoop dependency to the program by editing the pom.xml file.


Copy the following dependency code and use it in pom.xml file.

<dependency>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-core</artifactId>
 <version>1.2.1</version>
</dependency>


Step3:
Add new class file to the project as below



give the class name, here used wc and click finish.




Step4: The following is the basic Word Count program is used to find the words along with the repeated number.

Copy this code and use it in eclipse.

package npu.edu.hadoop;
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
 extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
 ) throws IOException, InterruptedException {
 StringTokenizer itr = new StringTokenizer(value.toString());
 while (itr.hasMoreTokens()) {
 word.set(itr.nextToken());
 context.write(word, one);
 }
 }
}
public static class IntSumReducer
 extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
 Context context
 ) throws IOException, InterruptedException {
 int sum = 0;
 for (IntWritable val : values) {
 sum += val.get();
 }
 result.set(sum);
 context.write(key, result);
 }
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,
args).getRemainingArgs();
if (otherArgs.length != 2) {
 System.err.println("Usage: WordCount <in> <out>");
 System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
 }
}


(Optional) Run -> Run configuration -> Arguments -> 
add (input output)



Step5: Run the code and you'll get the error like this, because we use eclipse is only for writing the program, and remaing task is to be done with Hadoop.

Right click on the project select Run As -> Maven Install .. you'll get Build Success, 
Build Failed? just try couple of times  of Maven Install, otherwise clear errors.


You'll get a *.jar file comes with the Build Success, find it in Target Directory.



How to Run the program in Hadoop? Click Here

If you like my Material, Like and Share and Please subscribe my Youtube Channel.

1 comment:

  1. Your blog is very informative. Eating mindfully has been very hard for people these days. It's all because of their busy schedules, work or lack of focus on themselves. As a student I must admit that I have not been eating mindfully but because of this I will start now. It could help me enjoy my food and time alone. Eating mindfully may help me be aware of healthy food and appreciating food. Duplicate word counter

    ReplyDelete