java - MR read OCR files- ClassNotFoundException: org.apache.hadoop.hive.common.io.DiskRange -
i new hadoop mapreduce , hive.
right now: need read records several orc files (hive tables), hope interact them directly reading text files normal mapreduce program, means not want interact hive in mapreduce.
hadoop version: hadoop 2.6.0-cdh5.9.0 hive version: hive 1.1.0-cdh5.9.0
i ran program got error below:
error: java.lang.classnotfoundexception: org.apache.hadoop.hive.common.io.diskrange @ java.net.urlclassloader$1.run(urlclassloader.java:366) @ java.net.urlclassloader$1.run(urlclassloader.java:355) @ java.security.accesscontroller.doprivileged(native method) @ java.net.urlclassloader.findclass(urlclassloader.java:354) @ java.lang.classloader.loadclass(classloader.java:425) @ sun.misc.launcher$appclassloader.loadclass(launcher.java:308) @ java.lang.classloader.loadclass(classloader.java:358) @ org.apache.orc.orcfile.createreader(orcfile.java:251) @ org.apache.orc.mapreduce.orcinputformat.createrecordreader(orcinputformat.java:64) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.(maptask.java:515) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:758) @ org.apache.hadoop.mapred.maptask.run(maptask.java:341) @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:164) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1698) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:158)
i curious why there error message goes "org.apache.hadoop.hive.common.io.diskrange notfoundexception" , how fix problem , read content in orc files successfully**?
currently, got java codes , show em below:
import:
import org.apache.hadoop.conf.configuration; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.nullwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.job; import org.apache.hadoop.mapreduce.mapper; import org.apache.hadoop.mapreduce.reducer; import org.apache.hadoop.mapreduce.lib.input.fileinputformat; import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; import org.apache.orc.mapred.orcstruct; import org.apache.orc.mapreduce.orcinputformat;
config in main
public static void main(string[] args) throws exception { configuration conf = new configuration(); job job = job.getinstance(conf, "test count"); job.setjarbyclass(dopersona.class); job.setmapperclass(tokenizermapper.class); job.setreducerclass(intsumreducer.class); job.setinputformatclass(orcinputformat.class); job.setmapoutputkeyclass(text.class); job.setmapoutputvalueclass(text.class); job.setoutputkeyclass(text.class); job.setoutputvalueclass(text.class); fileinputformat.addinputpath(job, new path(args[0])); fileoutputformat.setoutputpath(job, new path(args[1])); system.exit(job.waitforcompletion(true) ? 0 : 1);
}
mapper
the mapper modified official word count file:
public static class tokenizermapper extends mapper<nullwritable, orcstruct, text, text> { public void map(nullwritable key, orcstruct value, context context) throws ioexception, interruptedexception { if (value.getnumfields() == 4) { text rid = (text) value.getfieldvalue(0); text mac = (text) value.getfieldvalue(1); text dm = (text) value.getfieldvalue(2); text mdm = (text) value.getfieldvalue(3); context.write(mac, mdm); } } }
- reducer
the reducer need print out records
public static class intsumreducer extends reducer<text, text, text, text> { public void reduce(text key, iterable<text> values, context context) throws ioexception, interruptedexception { (text val : values) { context.write(key, val); } } }
Comments
Post a Comment