python - pandas read_csv usecols and names out of sync -

June 15, 2013

when trying read columns using indices tabular file pandas read_csv seems usecols , names out of sync each other.

for example, having file test.csv:

foo   -46450.494736   0.0728830817231 foo   -46339.7126846  0.0695018062805 foo   -46322.4942905  0.0866205763556 foo b   -46473.3117983  0.0481618121947 foo b   -46537.6827055  0.0436893868921 foo b   -46467.2102205  0.0485001911304 bar c   -33424.1224914  6.7981041851 bar c   -33461.4101485  7.40607068177 bar c   -33404.6396495  4.72117502707

and trying read 3 columns index without preserving original order:

cols = [1, 2, 0] names = ['x', 'y', 'z']  df = pd.read_csv(                 'test.csv', sep='\t',                 header=none,                 index_col=none,                 usecols=cols, names=names)

i'm getting following dataframe:

     x  y             z 0  foo  -46450.494736 1  foo  -46339.712685 2  foo  -46322.494290 3  foo  b -46473.311798 4  foo  b -46537.682706 5  foo  b -46467.210220 6  bar  c -33424.122491 7  bar  c -33461.410148 8  bar  c -33404.639650

whereas expect column z have foo , bar, this:

     z  x             y 0  foo  -46450.494736 1  foo  -46339.712685 2  foo  -46322.494290 3  foo  b -46473.311798 4  foo  b -46537.682706 5  foo  b -46467.210220 6  bar  c -33424.122491 7  bar  c -33461.410148 8  bar  c -33404.639650

i know pandas stores dataframes dictionary order of columns may different requested usecols, problem here using usecols indices , names doesn't make sense.

i need read columns indices , assign names them. there workaround this?

the documentation clearer on (feel free make issue, or better submit pull request!) usecols set-like - not define order of columns, tested against membership.

from io import stringio  pd.read_csv(stringio("""a,b,c 1,2,3 4,5,6"""), usecols=[0, 1, 2])  out[31]:      b  c 0  1  2  3 1  4  5  6  pd.read_csv(stringio("""a,b,c 1,2,3 4,5,6"""), usecols=[2, 1, 0])  out[32]:      b  c 0  1  2  3 1  4  5  6

names on other hand ordered. in case, answer specify names in order want them.

Search This Blog

RT

python - pandas read_csv usecols and names out of sync -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -