R - Extract & count all unique 2- and 3-place subsets of strings -

August 15, 2010

i have column in dataframe contains sequences (of uneven length 1 row next) composed of 0, 1, 2. (the strings don't need numeric; represent degrees of stress in syllables on whole sentences.) minimal (very simplified) example:

> df     b foo   0100101 bar   01201

what need dataframe provides every 2- or 3-place combination within column (numbers can occur themselves, e.g. 00) , total count of each combination on whole dataframe. (a count each row nice, too, fear take reshaping, , it's not goal right now.) abbreviated desired result:

> output combo     count 00        1 01        5 10        2     ... 001       1 010       2     ...

and on. i've tried numerous variations on str_count without success.

get 2 or 3 element combination strings in df$b , use table count frequency

table(unlist(lapply(c(2, 3),                 function(i) lapply(df$b,                                 function(x) sapply(1:(nchar(x) - (i - 1)),                                                 function(j) substr(x, j, j + - 1))))))  # 00 001  01 010 012  10 100 101  12 120  20 201  #  1   1   5   2   1   2   1   1   1   1   1   1

use expand.grid combinations of 2 , 3 elements 0, 1, , 2. then, use gregexpr count occurrences in df$b

sapply(c(do.call(paste0, (expand.grid(0:2, 0:2))),          do.call(paste0, (expand.grid(0:2, 0:2, 0:2)))),        function(x){            temp = unlist(gregexpr(pattern = x, text = df$b))            length(temp[temp != -1])        }) # 00  10  20  01  11  21  02  12  22 000 100 200 010 110 210 020 120 220 001  #  1   2   1   5   0   0   0   1   0   0   1   0   2   0   0   0   1   0   1  #101 201 011 111 211 021 121 221 002 102 202 012 112 212 022 122 222  #  1   1   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0

data

df = structure(list(a = c("foo", "bar"), b = c("0100101", "01201")), .names = c("a",  "b"), row.names = c(na, -2l), class = "data.frame")

Search This Blog

RT

R - Extract & count all unique 2- and 3-place subsets of strings -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

javascript - Replicate keyboard event with html button -