html - Regex not capturing newlines when used in sed or perl -


this question has answer here:

i have csv file i'm trying clean, , part of removing html tags inside of values. came across solution: sed -e 's/<[^>]*>//g' file.html thread.

before trying out, tested regex (/<[^>]*>/g) using regexr. used following text sample:

<asd> < asd > < asdsad  adsad > 

on regexr, 3 tags matched, however, when use sed command remove tags, third tag remains, i.e. i'm left with:

< asdsad  adsad > 

i need able remove multiline tags well, many of tags in csv i'm attempting clean have attributes quotes, class="some-class-name", , quotes messing csv formatting.

i've tried perl command, perl supposed have better multiline handling. tried perl -pe 's/<[^>]*>//g' file, had same result sed.

edit: address concerns of possible duplicate, question based around why 1 regex engine (regexr) capturing different entities (sed , perl) , how others display first's results. possible duplicate's answer happened solve problem, although question came different (yet similar) place.

for clarity i'll post answer here, @lukstorms' comment. answer this thread.

tl;dr: -0 flag solved issue, @ least perl.

full command: perl -0pe 's/<[^>]*>//g' file


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -