hadoop - Controlling number of map and reduce jobs spawned? -


i trying understand how may map reduce jobs started task , how control number of mr jobs.

say have 1tb file in hdfs , block size 128mb. mr task on 1tb file if specify input split size 256mb how many map , reduce jobs gets started. understanding dependent on split size. i.e number of map jobs = total size of file / split size , in case works out 1024 * 1024 mb / 256 mb = 4096. number of map task started hadoop 4096.
1) right?

2) if think inappropriate number, can inform hadoop start less number of jobs or more number of jobs. if yes how?

and how number of reducer jobs spawned, think totally controlled user.
3) how , should mention number of reducer jobs required.

1. yes, you're right. no of mappers=(size of data)/(input split size). so, in case 4096

  1. as per understanding ,before hadoop-2.7 can hint system create number of mapper conf.setnummaptasks(int num) mapper created own. hadoop-2.7 can limit number of mapper mapreduce.job.running.map.limit. see jira ticket

  2. by default number of reducer 1. can change job.setnumreducetasks(integer_numer);

you can provide parameter cli -dmapred.reduce.tasks=<num reduce tasks>


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -