How to read the json file in spark using scala? -
i want read json file in below format:-
{ "titlename": "periodic", "atom": [ { "usage": "neutron", "dailydata": [ { "utcacquisitiontime": "2017-03-27t22:00:00z", "datatimezone": "+02:00", "intervalvalue": 28128, "intervaltime": 15 }, { "utcacquisitiontime": "2017-03-27t22:15:00z", "datatimezone": "+02:00", "intervalvalue": 25687, "intervaltime": 15 } ] } ] } i writing read line as:
sqlcontext.read.json("user/files_fold/testing-data.json").printschema but not getting desired result-
root |-- _corrupt_record: string (nullable = true) please me on this
i suggest using wholetextfiles read file , apply functions convert single-line json format.
val json = sc.wholetextfiles("/user/files_fold/testing-data.json"). map(tuple => tuple._2.replace("\n", "").trim) val df = sqlcontext.read.json(json) you should have final valid dataframe
+--------------------------------------------------------------------------------------------------------+---------+ |atom |titlename| +--------------------------------------------------------------------------------------------------------+---------+ |[[wrappedarray([+02:00,15,28128,2017-03-27t22:00:00z], [+02:00,15,25687,2017-03-27t22:15:00z]),neutron]]|periodic | +--------------------------------------------------------------------------------------------------------+---------+ and valid schema
root |-- atom: array (nullable = true) | |-- element: struct (containsnull = true) | | |-- dailydata: array (nullable = true) | | | |-- element: struct (containsnull = true) | | | | |-- datatimezone: string (nullable = true) | | | | |-- intervaltime: long (nullable = true) | | | | |-- intervalvalue: long (nullable = true) | | | | |-- utcacquisitiontime: string (nullable = true) | | |-- usage: string (nullable = true) |-- titlename: string (nullable = true)
Comments
Post a Comment