python - How to one-hot-encode from a pandas column containing a list? -
i break down pandas column consisting of list of elements many columns there unique elements i.e. one-hot-encode
them (with value 1
representing given element existing in row , 0
in case of absence).
for example, taking dataframe df
col1 col2 col3 c 33 [apple, orange, banana] 2.5 [apple, grape] b 42 [banana]
i convert to:
df
col1 col2 apple orange banana grape c 33 1 1 1 0 2.5 1 0 0 1 b 42 0 0 1 0
how can use pandas/sklearn achieve this?
we can use sklearn.preprocessing.multilabelbinarizer:
from sklearn.preprocessing import multilabelbinarizer mlb = multilabelbinarizer() df = df.join(pd.dataframe(mlb.fit_transform(df.pop('col3')), columns=mlb.classes_, index=df.index))
result:
in [77]: df out[77]: col1 col2 apple banana grape orange 0 c 33.0 1 1 0 1 1 2.5 1 0 1 0 2 b 42.0 0 1 0 0
Comments
Post a Comment