hadoop - Controlling number of map and reduce jobs spawned? -
i trying understand how may map reduce jobs started task , how control number of mr jobs.
say have 1tb file in hdfs , block size 128mb. mr task on 1tb file if specify input split size 256mb how many map , reduce jobs gets started. understanding dependent on split size. i.e number of map jobs = total size of file / split size , in case works out 1024 * 1024 mb / 256 mb = 4096
. number of map task started hadoop 4096.
1) right?
2) if think inappropriate number, can inform hadoop start less number of jobs or more number of jobs. if yes how?
and how number of reducer jobs spawned, think totally controlled user.
3) how , should mention number of reducer jobs required.
1. yes, you're right. no of mappers=(size of data)/(input split size). so, in case 4096
as per understanding ,before hadoop-2.7 can hint system create number of mapper
conf.setnummaptasks(int num)
mapper created own. hadoop-2.7 can limit number of mappermapreduce.job.running.map.limit
. see jira ticketby default number of reducer 1. can change job.setnumreducetasks(integer_numer);
you can provide parameter cli -dmapred.reduce.tasks=<num reduce tasks>
Comments
Post a Comment