hadoop - how to save a text file to hive using table of context as schema -
i have many project reports in text format (word , pdf). these files contains data want extract; such references, keywords, names mentioned .......
i want process these files apache spark , save result hive, use power of dataframe (use table of context schema) possible?
may share me ideas how process these files?
as far understand, need parse files using tika , manually create custom schema s described here.
let me know if helps. cheers.
Comments
Post a Comment