python - Pandas to hdf5 index entropy quality, Converting Pandas Dataframe to Pytable object -


we got pandas dataframe few millions of rows , 1000 columns, interested store dataframe in pytable hdf5 format in order enjoy fast read / write format capabilities , query data. our table big (few millions rows , 1000 columns), there 2 columns index in order query. means time of reading file.

so there method of pandas - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.to_hdf.html,

which can use following, when 'col_a' , 'col_b' should indexed.

example: df.to_hdf('filename', key = 'dataset', format='table', append = true, data_columns = ['col_a', 'col_b'])

the thing is, performance not enough us, read in pytable documentations -

http://www.pytables.org/usersguide/libref/structured_storage.html#tables.column.create_index quote:

"the optimization level building index. levels ranges 0 (no optimization) 9 (maximum optimization). higher levels of optimization mean better chances reducing entropy of index @ price of using more cpu, memory , i/o resources creating index."

so, questions are:

1) optimization level pandas use when writing hdf5 table using pytable?

2) if optimization not maximal, how can set ? can think converting pandas dataframe object pytable object , configure pytable object desired index performance. didnt see documentation how beside saving file , read pytable.

thank you


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -