python regex match and replace beginning and end of string but keep the middle -
i have dataframe holiday names. have problem on days, holidays observed on different days, on day of holiday. here example problems:
1 "independence day (observed)" 2 "christmas eve, christmas day (observed)" 3 "new year's eve, new year's day (observed)" 4 "martin luther king, jr. day" i want replace ' (observed)' '' , before comma if ' (observed)' matched. output should be:
1 "independence day" 2 "christmas day" 3 "new year's day" 4 "martin luther king, jr. day" i able both independently:
(foo['holiday'] .replace(to_replace=' \(observed\)', value='', regex=true) .replace(to_replace='.+, ', value='', regex=true)) but caused problem 'martin luther king, jr. day'.
replace.py
import re input = [ "independence day (observed)", "christmas eve, christmas day (observed)", "new year's eve, new year's day (observed)", "martin luther king, jr. day" ] holiday in input: print re.sub('^(.*?, )?(.*?)( \(observed\))$', '\\2', holiday) output
> python replace.py independence day christmas day new year's day martin luther king, jr. day explanation
^: match @ start of string.(.*?, )?: match followed command , space. make lazy match, doesn't consume portion of string want keep. last?makes whole thing optional, because of sample input doesn't have comma @ all.(.*?): grab part want later use in capturing group. part lazy match because...( \(observed\)): strings might have " (observed)" on end, declare in separate group here. lazy match in prior piece won't consume this.$: match @ end of string.
Comments
Post a Comment