hadoop - how to save a text file to hive using table of context as schema -


i have many project reports in text format (word , pdf). these files contains data want extract; such references, keywords, names mentioned .......

i want process these files apache spark , save result hive, use power of dataframe (use table of context schema) possible?

may share me ideas how process these files?

as far understand, need parse files using tika , manually create custom schema s described here.

let me know if helps. cheers.


Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -