python - Replace a variable number of asterisks with NaN in a dataframe -
i'm trying scrub climate data ncdc , has columns varying number of asterisks indicate empty fields. want replace np.nan
.
i have tried df.replace
struggling regex syntax handle variable length of asterisks in field. suspect need df.replace('?', np.nan)
'?'
regex match number of asterisks.
here clip of data:
usaf wban yr--modahrmn dir spd gus clg skc l m ... slp alt stp max min pcp01 pcp06 pcp24 pcpxx sd 0 722543 12977 200601010053 160 6 *** 722 clr * * ... 1010.9 29.83 1007.2 *** *** 0.00 ***** ***** ***** ** 1 722543 12977 200601010153 160 9 *** 722 clr * * ... 1011.0 29.83 1007.2 *** *** 0.00 ***** ***** ***** ** 2 722543 12977 200601010253 160 9 *** 722 clr * * ... 1011.1 29.83 1007.2 *** *** 0.00 ***** ***** ***** ** 3 722543 12977 200601010313 160 10 *** 722 sct * * ... ****** 29.83 1007.2 *** *** ***** ***** ***** ***** ** 4 722543 12977 200601010321 160 10 *** 4 bkn * * ... ****** 29.83 1007.2 *** *** ***** ***** ***** ***** **
df.replace
regex pattern '^\*+$'
works enough this:
in [790]: df.replace('^\*+$', np.nan, regex=true) out[790]: usaf wban yr--modahrmn dir spd gus clg skc l m slp \ 0 722543 12977 200601010053 160 6 nan 722 clr nan nan 1010.9 1 722543 12977 200601010153 160 9 nan 722 clr nan nan 1011.0 2 722543 12977 200601010253 160 9 nan 722 clr nan nan 1011.1 3 722543 12977 200601010313 160 10 nan 722 sct nan nan nan alt stp max min pcp01 pcp06 pcp24 pcpxx sd 0 29.83 1007.2 nan nan 0.00 nan nan nan nan 1 29.83 1007.2 nan nan 0.00 nan nan nan nan 2 29.83 1007.2 nan nan 0.00 nan nan nan nan 3 29.83 1007.2 nan nan nan nan nan nan nan
Comments
Post a Comment