html - Regex not capturing newlines when used in sed or perl -
this question has answer here:
i have csv file i'm trying clean, , part of removing html tags inside of values. came across solution: sed -e 's/<[^>]*>//g' file.html
thread.
before trying out, tested regex (/<[^>]*>/g
) using regexr. used following text sample:
<asd> < asd > < asdsad adsad >
on regexr, 3 tags matched, however, when use sed command remove tags, third tag remains, i.e. i'm left with:
< asdsad adsad >
i need able remove multiline tags well, many of tags in csv i'm attempting clean have attributes quotes, class="some-class-name"
, , quotes messing csv formatting.
i've tried perl command, perl supposed have better multiline handling. tried perl -pe 's/<[^>]*>//g' file
, had same result sed.
edit: address concerns of possible duplicate, question based around why 1 regex engine (regexr) capturing different entities (sed , perl) , how others display first's results. possible duplicate's answer happened solve problem, although question came different (yet similar) place.
for clarity i'll post answer here, @lukstorms' comment. answer this thread.
tl;dr: -0
flag solved issue, @ least perl.
full command: perl -0pe 's/<[^>]*>//g' file
Comments
Post a Comment