hadoop - how to save a text file to hive using table of context as schema -

hadoop - how to save a text file to hive using table of context as schema -

January 15, 2014

i have many project reports in text format (word , pdf). these files contains data want extract; such references, keywords, names mentioned .......

i want process these files apache spark , save result hive, use power of dataframe (use table of context schema) possible?

may share me ideas how process these files?

as far understand, need parse files using tika , manually create custom schema s described here.

let me know if helps. cheers.

Comments