html - Regex not capturing newlines when used in sed or perl -

August 15, 2010

this question has answer here:

how search , replace across multiple lines perl? 2 answers

i have csv file i'm trying clean, , part of removing html tags inside of values. came across solution: sed -e 's/<[^>]*>//g' file.html thread.

before trying out, tested regex (/<[^>]*>/g) using regexr. used following text sample:

<asd> < asd > < asdsad  adsad >

on regexr, 3 tags matched, however, when use sed command remove tags, third tag remains, i.e. i'm left with:

< asdsad  adsad >

i need able remove multiline tags well, many of tags in csv i'm attempting clean have attributes quotes, class="some-class-name", , quotes messing csv formatting.

i've tried perl command, perl supposed have better multiline handling. tried perl -pe 's/<[^>]*>//g' file, had same result sed.

edit: address concerns of possible duplicate, question based around why 1 regex engine (regexr) capturing different entities (sed , perl) , how others display first's results. possible duplicate's answer happened solve problem, although question came different (yet similar) place.

for clarity i'll post answer here, @lukstorms' comment. answer this thread.

tl;dr: -0 flag solved issue, @ least perl.

full command: perl -0pe 's/<[^>]*>//g' file

Search This Blog

RT

html - Regex not capturing newlines when used in sed or perl -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -