What is combiner in hadoop?
What is combiner?
Combiner is a Mini Reducer that performs the local reduce task, because Combiner does the same work of Reducer and uses the same program of Reducer or can use custom code too. It receives the input from the mapper on a particular node and sends the output to the reducer.
Use of combiner?
If combiner is doing the same work of reducer, then do we need combiner? yes, Combiners help in enhancing the efficiency of MapReduce by reducing the quantum of data that is required to be sent to the reducers.
How data is efficient?
Combiner does the work of reducer in the mapper it self and send the data to the reducer, and reducer combines all the data from all the mappers and produce final result in the form of <key, value>
for example, take the below example, Mapper produces output in <key, value>. the following data should be sent to the reducer via network.
before reducer there is another phase, called shuffle & sort. used to shuffle the data and sort it in ascending order. Lets take the below picture, after shuffle and sort, assume the output is 100mb, takes 60sec to reach the reducer over network.
but if combiner comes in the picture, the data may be decreased depends on the number of repeated keys. if there are any repeated keys, the data is decreased, assume there are repeated keys, so the combiner runs reducer program in each mapper, so the size of the mapper output is decreased somewhat, guess 80mb, so it takes 40sec to reach reducer over network. So performance is increased by minimizing the data size.
example:
How to use combiner in program?
It's very simple, just Add setCombinerClass(<reducercode>.class); in Driver Code, here can use reducer code or any custom code of your choice.
job.setJarByClass(lengthofword.class);
job.setMapperClass(mapperProg.class);
job.setCombinerClass(reducerProg.class);
job.setReducerClass(reducerProg.class);
job.setOutputKeyClass(Text.class);
NOTE:
combiner may not work if there is no free mapper available in the cluster.
If you like share my post and give boost. Post by @SsaiK
Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support
ReplyDeleteabout this area.
hadoop training in
bangalore