json - Python: json_normalize a pandas series gives TypeError -

January 15, 2011

i have tens of thousands rows of json snippets in pandas series df["json"]

[{     'ids': [{         'lotid': '1',         'id': '123456'     }],     'date': '2009-04-17',     'bidscount': 2, }, {     'ids': [{         'lotid': '2',         'id': '123456'     }],     'date': '2009-04-17',     'bidscount': 4, }, {     'ids': [{          'lotid': '3',          'id': '123456'     }],     'date': '2009-04-17',     'bidscount': 8, }]

sample of original file:

{"type": "open","title": "rainbow","json": [{"ids": [{"lotid": "1","id": "123456"}],"date": "2009-04-17","bidscount": 2,}, {"ids": [{"lotid": "2","id": "123456"}],"date": "2009-04-17","bidscount": 4,}, {"ids": [{"lotid": "3","id": "123456"}],"date": "2009-04-17","bidscount": 8,}]} {"type": "closed","title": "clouds","json": [{"ids": [{"lotid": "1","id": "23345"}],"date": "2009-05-17","bidscount": 2,}, {"ids": [{"lotid": "2","id": "23345"}],"date": "2009-05-17","bidscount": 4,}, {"ids": [{"lotid": "3","id": "23345"}],"date": "2009-05-17","bidscount": 8,}]}   df = pd.read_json("file.json", lines=true)

i trying make them data frame, like

id      lotid      bidscount    date 123456  1          2            2009-04-17 123456  2          4            2009-04-17 123456  3          8            2009-04-17

by using

json_normalize(df["json"])

however get

attributeerror: 'list' object has no attribute 'values'

i guess json snippet seen list, can not figure out how make work otherwise. appreciated!

i think df['json'] nested list. can use loop , concatenate dataframe big dataframe i.e

data:

{"type": "open","title": "rainbow","json": [{"ids": [{"lotid": "1","id": "123456"}],"date": "2009-04-17","bidscount": 2,}, {"ids": [{"lotid": "2","id": "123456"}],"date": "2009-04-17","bidscount": 4,}, {"ids": [{"lotid": "3","id": "123456"}],"date": "2009-04-17","bidscount": 8,}]} {"type": "closed","title": "clouds","json": [{"ids": [{"lotid": "1","id": "23345"}],"date": "2009-05-17","bidscount": 2,}, {"ids": [{"lotid": "2","id": "23345"}],"date": "2009-05-17","bidscount": 4,}, {"ids": [{"lotid": "3","id": "23345"}],"date": "2009-05-17","bidscount": 8,}]}  df = pd.read_json("file.json", lines=true)

dataframe:

new_df = pd.concat([pd.dataframe(json_normalize(x)) x in df['json']],ignore_index=true)

output:

                                 ids  bidscount        date 0  [{'id': '123456', 'lotid': '1'}]          2  2009-04-17 1  [{'id': '123456', 'lotid': '2'}]          4  2009-04-17 2  [{'id': '123456', 'lotid': '3'}]          8  2009-04-17 3   [{'id': '23345', 'lotid': '1'}]          2  2009-05-17 4   [{'id': '23345', 'lotid': '2'}]          4  2009-05-17 5   [{'id': '23345', 'lotid': '3'}]          8  2009-05-17

if want keys of ids columns use

new_df['lotid'] = [x[0]['lotid'] x in new_df['ids']] new_df['ids'] = [x[0]['id'] x in new_df['ids']]

       ids  bidscount        date lotid 0  123456          2  2009-04-17     1 1  123456          4  2009-04-17     2 2  123456          8  2009-04-17     3 3   23345          2  2009-05-17     1 4   23345          4  2009-05-17     2 5   23345          8  2009-05-17     3

Search This Blog

RT

json - Python: json_normalize a pandas series gives TypeError -

Comments

Post a Comment

Popular posts from this blog

html - How to custom Bootstrap grid height? -

javascript - pass values from mssql to views in node -

ruby - unknown property method: 'wait' on EC2 windows server Instance -