python regex match and replace beginning and end of string but keep the middle -
i have dataframe holiday names. have problem on days, holidays observed on different days, on day of holiday. here example problems:
1  "independence day (observed)" 2  "christmas eve, christmas day (observed)" 3  "new year's eve, new year's day (observed)" 4  "martin luther king, jr. day"   i want replace ' (observed)' '' , before comma if ' (observed)' matched. output should be:
1  "independence day" 2  "christmas day" 3  "new year's day" 4  "martin luther king, jr. day"   i able both independently:
(foo['holiday']  .replace(to_replace=' \(observed\)', value='', regex=true)  .replace(to_replace='.+, ', value='', regex=true))   but caused problem 'martin luther king, jr. day'.
replace.py
import re  input = [     "independence day (observed)",     "christmas eve, christmas day (observed)",     "new year's eve, new year's day (observed)",     "martin luther king, jr. day" ]  holiday in input:     print re.sub('^(.*?, )?(.*?)( \(observed\))$', '\\2', holiday)   output
> python replace.py  independence day christmas day new year's day martin luther king, jr. day   explanation
^: match @ start of string.(.*?, )?: match followed command , space. make lazy match, doesn't consume portion of string want keep. last?makes whole thing optional, because of sample input doesn't have comma @ all.(.*?): grab part want later use in capturing group. part lazy match because...( \(observed\)): strings might have " (observed)" on end, declare in separate group here. lazy match in prior piece won't consume this.$: match @ end of string.
Comments
Post a Comment