python - Pandas to hdf5 index entropy quality, Converting Pandas Dataframe to Pytable object -
we got pandas dataframe few millions of rows , 1000 columns, interested store dataframe in pytable hdf5 format in order enjoy fast read / write format capabilities , query data. our table big (few millions rows , 1000 columns), there 2 columns index in order query. means time of reading file.
so there method of pandas - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.to_hdf.html,
which can use following, when 'col_a' , 'col_b' should indexed.
example: df.to_hdf('filename', key = 'dataset', format='table', append = true, data_columns = ['col_a', 'col_b'])
the thing is, performance not enough us, read in pytable documentations -
http://www.pytables.org/usersguide/libref/structured_storage.html#tables.column.create_index quote:
"the optimization level building index. levels ranges 0 (no optimization) 9 (maximum optimization). higher levels of optimization mean better chances reducing entropy of index @ price of using more cpu, memory , i/o resources creating index."
so, questions are:
1) optimization level pandas use when writing hdf5 table using pytable?
2) if optimization not maximal, how can set ? can think converting pandas dataframe object pytable object , configure pytable object desired index performance. didnt see documentation how beside saving file , read pytable.
thank you
Comments
Post a Comment