python - What is an efficient way to generate the top N pandas numeric columns with highest frequency of a particular number? -
i trying top n numeric columns highest frequency of 1s (with other value being 0). understand easiest way sum on numeric columns , sort them, pythonic/efficient way achieve this?
sample following dataframe:
df
non-numericcol1 non-numericcol2 col1 col2 col3 ... coln abc pqr 1 0 1 0 xyz lmn 0 0 0 1 abc lmn 0 1 1 0
i wish achieve, let's say, top 3 column names.
example: d= {'col3': 2000, 'col10200': 1500, 'col4900': 1000}
i okay output being in other format (such pandas dataframe). there 10000 total columns 6000 rows.
try this:
in [113]: df out[113]: non-numericcol1 non-numericcol2 col1 col2 col3 col4 coln 0 abc pqr 1 0 1 0 0 1 xyz lmn 0 0 0 0 1 2 abc lmn 0 1 1 0 0 in [114]: df.select_dtypes(['number']).sum().nlargest(3) out[114]: col3 2 col1 1 col2 1 dtype: int64
Comments
Post a Comment