How to grep any format of percentages from a file in r? -

September 15, 2010

i grep function extract percentages multiple files formatted differently. example, can written in following ways: (5%, 2.46%, 12.9%, 5 %, 2.46 %, 5 12.9 %, 5 percent, 2.46 percent, 5 per cent,...etc) , want make sure there @ least space in front , behind avoid extracting html codes, or things like:

<td width="97%"></td>

this code working wrong, thinking maybe there way place in placeholders asterisks below variety of numbers looking this:

  txt<-trycatch(readlines(ds2[i,temp]), error = function(e) readlines(ds2[i,temp] ))   t<-grep("**.**%", txt)

rather write single regex expression, may easier in multiple steps. using examples gave:

x <- c('5%', '2.46%', '12.9%', '5 %', '2.46 %', '5 12.9 %',         '5 percent', '2.46 percent', '5 per cent',          'etc..', '<td width="97%"></td>')  get_pct <- function(x) {     x <- gsub('="[^"]+%"', '', x)     x <- gsub('\\s*per\\s*cent|\\s*%', '%', x)     is_pct <- grepl('\\d+(\\.\\d+)?', x)     as.numeric(ifelse(is_pct, gsub('.*?(\\d+\\.?\\d*)%.*', '\\1\\2', x), na)) }  f(x) [1]  5.00  2.46 12.90  5.00  2.46 12.90  5.00  2.46  5.00    na    na

here's same thing step step

# eliminate percentages html tags x <- gsub('="[^"]+%"', '', x) x [1] "5%"              "2.46%"           "12.9%"           "5 %"             "2.46 %"          "5 12.9 %"        [7] "5 percent"       "2.46 percent"    "5 per cent"      "etc.."           "<td width></td>"  # standardize % symbol x <- gsub('\\s*per\\s*cent|\\s*%', '%', x) x [1] "5%"              "2.46%"           "12.9%"           "5%"              "2.46%"           "5 12.9%"         [7] "5%"              "2.46%"           "5%"              "etc.."           "<td width></td>"  # find percentages is_pct <- grepl('\\d+(\\.\\d+)?', x)  # extract values x <- ifelse(is_pct, gsub('.*?(\\d+\\.?\\d*)%.*', '\\1\\2', x), na) as.numeric(x) [1]  5.00  2.46 12.90  5.00  2.46 12.90  5.00  2.46  5.00    na    na

Search This Blog

RT

How to grep any format of percentages from a file in r? -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

javascript - Replicate keyboard event with html button -