python - How to one-hot-encode from a pandas column containing a list? -


i break down pandas column consisting of list of elements many columns there unique elements i.e. one-hot-encode them (with value 1 representing given element existing in row , 0 in case of absence).

for example, taking dataframe df

col1   col2         col3  c      33     [apple, orange, banana]       2.5    [apple, grape]  b      42     [banana]  

i convert to:

df

col1   col2   apple   orange   banana   grape  c      33     1        1        1       0       2.5    1        0        0       1  b      42     0        0        1       0 

how can use pandas/sklearn achieve this?

we can use sklearn.preprocessing.multilabelbinarizer:

from sklearn.preprocessing import multilabelbinarizer  mlb = multilabelbinarizer() df = df.join(pd.dataframe(mlb.fit_transform(df.pop('col3')),                           columns=mlb.classes_,                           index=df.index)) 

result:

in [77]: df out[77]:   col1  col2  apple  banana  grape  orange 0    c  33.0      1       1      0       1 1      2.5      1       0      1       0 2    b  42.0      0       1      0       0 

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -