How to read the json file in spark using scala? -


i want read json file in below format:-

 {   "titlename": "periodic",     "atom": [          {           "usage": "neutron",           "dailydata": [     {       "utcacquisitiontime": "2017-03-27t22:00:00z",       "datatimezone": "+02:00",       "intervalvalue": 28128,       "intervaltime": 15               },     {       "utcacquisitiontime": "2017-03-27t22:15:00z",       "datatimezone": "+02:00",       "intervalvalue": 25687,       "intervaltime": 15               }    ]   }  ] } 

i writing read line as:

sqlcontext.read.json("user/files_fold/testing-data.json").printschema 

but not getting desired result-

root                                                                               |-- _corrupt_record: string (nullable = true) 

please me on this

i suggest using wholetextfiles read file , apply functions convert single-line json format.

val json = sc.wholetextfiles("/user/files_fold/testing-data.json").   map(tuple => tuple._2.replace("\n", "").trim)  val df = sqlcontext.read.json(json) 

you should have final valid dataframe

+--------------------------------------------------------------------------------------------------------+---------+ |atom                                                                                                    |titlename| +--------------------------------------------------------------------------------------------------------+---------+ |[[wrappedarray([+02:00,15,28128,2017-03-27t22:00:00z], [+02:00,15,25687,2017-03-27t22:15:00z]),neutron]]|periodic | +--------------------------------------------------------------------------------------------------------+---------+ 

and valid schema

root  |-- atom: array (nullable = true)  |    |-- element: struct (containsnull = true)  |    |    |-- dailydata: array (nullable = true)  |    |    |    |-- element: struct (containsnull = true)  |    |    |    |    |-- datatimezone: string (nullable = true)  |    |    |    |    |-- intervaltime: long (nullable = true)  |    |    |    |    |-- intervalvalue: long (nullable = true)  |    |    |    |    |-- utcacquisitiontime: string (nullable = true)  |    |    |-- usage: string (nullable = true)  |-- titlename: string (nullable = true) 

Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -