python regex match and replace beginning and end of string but keep the middle -
i have dataframe holiday names. have problem on days, holidays observed on different days, on day of holiday. here example problems:
1 "independence day (observed)" 2 "christmas eve, christmas day (observed)" 3 "new year's eve, new year's day (observed)" 4 "martin luther king, jr. day"
i want replace ' (observed)' '' , before comma if ' (observed)' matched. output should be:
1 "independence day" 2 "christmas day" 3 "new year's day" 4 "martin luther king, jr. day"
i able both independently:
(foo['holiday'] .replace(to_replace=' \(observed\)', value='', regex=true) .replace(to_replace='.+, ', value='', regex=true))
but caused problem 'martin luther king, jr. day'.
replace.py
import re input = [ "independence day (observed)", "christmas eve, christmas day (observed)", "new year's eve, new year's day (observed)", "martin luther king, jr. day" ] holiday in input: print re.sub('^(.*?, )?(.*?)( \(observed\))$', '\\2', holiday)
output
> python replace.py independence day christmas day new year's day martin luther king, jr. day
explanation
^
: match @ start of string.(.*?, )?
: match followed command , space. make lazy match, doesn't consume portion of string want keep. last?
makes whole thing optional, because of sample input doesn't have comma @ all.(.*?)
: grab part want later use in capturing group. part lazy match because...( \(observed\))
: strings might have " (observed)" on end, declare in separate group here. lazy match in prior piece won't consume this.$
: match @ end of string.
Comments
Post a Comment