R - Extract & count all unique 2- and 3-place subsets of strings -
i have column in dataframe contains sequences (of uneven length 1 row next) composed of 0, 1, 2
. (the strings don't need numeric; represent degrees of stress in syllables on whole sentences.) minimal (very simplified) example:
> df b foo 0100101 bar 01201
what need dataframe provides every 2- or 3-place combination within column (numbers can occur themselves, e.g. 00
) , total count of each combination on whole dataframe. (a count each row nice, too, fear take reshaping, , it's not goal right now.) abbreviated desired result:
> output combo count 00 1 01 5 10 2 ... 001 1 010 2 ...
and on. i've tried numerous variations on str_count
without success.
1
get 2 or 3 element combination strings in df$b
, use table
count frequency
table(unlist(lapply(c(2, 3), function(i) lapply(df$b, function(x) sapply(1:(nchar(x) - (i - 1)), function(j) substr(x, j, j + - 1)))))) # 00 001 01 010 012 10 100 101 12 120 20 201 # 1 1 5 2 1 2 1 1 1 1 1 1
2
use expand.grid
combinations of 2 , 3 elements 0
, 1
, , 2
. then, use gregexpr
count occurrences in df$b
sapply(c(do.call(paste0, (expand.grid(0:2, 0:2))), do.call(paste0, (expand.grid(0:2, 0:2, 0:2)))), function(x){ temp = unlist(gregexpr(pattern = x, text = df$b)) length(temp[temp != -1]) }) # 00 10 20 01 11 21 02 12 22 000 100 200 010 110 210 020 120 220 001 # 1 2 1 5 0 0 0 1 0 0 1 0 2 0 0 0 1 0 1 #101 201 011 111 211 021 121 221 002 102 202 012 112 212 022 122 222 # 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
data
df = structure(list(a = c("foo", "bar"), b = c("0100101", "01201")), .names = c("a", "b"), row.names = c(na, -2l), class = "data.frame")
Comments
Post a Comment