python - pandas read_csv usecols and names out of sync -
when trying read columns using indices tabular file pandas read_csv
seems usecols
, names
out of sync each other.
for example, having file test.csv
:
foo -46450.494736 0.0728830817231 foo -46339.7126846 0.0695018062805 foo -46322.4942905 0.0866205763556 foo b -46473.3117983 0.0481618121947 foo b -46537.6827055 0.0436893868921 foo b -46467.2102205 0.0485001911304 bar c -33424.1224914 6.7981041851 bar c -33461.4101485 7.40607068177 bar c -33404.6396495 4.72117502707
and trying read 3 columns index without preserving original order:
cols = [1, 2, 0] names = ['x', 'y', 'z'] df = pd.read_csv( 'test.csv', sep='\t', header=none, index_col=none, usecols=cols, names=names)
i'm getting following dataframe:
x y z 0 foo -46450.494736 1 foo -46339.712685 2 foo -46322.494290 3 foo b -46473.311798 4 foo b -46537.682706 5 foo b -46467.210220 6 bar c -33424.122491 7 bar c -33461.410148 8 bar c -33404.639650
whereas expect column z
have foo
, bar
, this:
z x y 0 foo -46450.494736 1 foo -46339.712685 2 foo -46322.494290 3 foo b -46473.311798 4 foo b -46537.682706 5 foo b -46467.210220 6 bar c -33424.122491 7 bar c -33461.410148 8 bar c -33404.639650
i know pandas stores dataframes dictionary order of columns may different requested usecols, problem here using usecols indices , names doesn't make sense.
i need read columns indices , assign names them. there workaround this?
the documentation clearer on (feel free make issue, or better submit pull request!) usecols
set-like - not define order of columns, tested against membership.
from io import stringio pd.read_csv(stringio("""a,b,c 1,2,3 4,5,6"""), usecols=[0, 1, 2]) out[31]: b c 0 1 2 3 1 4 5 6 pd.read_csv(stringio("""a,b,c 1,2,3 4,5,6"""), usecols=[2, 1, 0]) out[32]: b c 0 1 2 3 1 4 5 6
names
on other hand ordered. in case, answer specify names in order want them.
Comments
Post a Comment