dplyr - Conditional counting R. Add if both rows==TRUE -
i have data frame of bacterial colony counts (accn) 2 different methods of sampling: swabs , plates. i'd count times when colony count agree both methods series of standards (e.g. if accn<="2.5", etc.).
head(ea) sample group accn 1 e 1 14.84500 2 s 1 2.07500 3 e 2 13.70167 4 s 2 6.60000 5 e 3 11.45833 6 s 3 7.90000
so far i've got:
s<-(ea$accn<="2.5" & ea$sample=="s") p<-(ea$accn<="2.5" & ea$sample=="p") pe<-cbind(s,p) pe<-as.data.frame(pe) sum(pe)
but receive error: error in fun(x[[i]], ...) : defined on data frame numeric variables
with dplyr:
library(dplyr) ea %>% mutate(s = ifelse(as.numeric(accn) <= 2.5 & sample == "s", 1, 0)) %>% mutate(p = ifelse(as.numeric(accn) <= 2.5 & sample == "p", 1, 0)) %>% summarise(pe_sum = sum(s, p))
but, if want dataframe itself, then:
ea %>% mutate(s = ifelse(as.numeric(accn) <= 2.5 & sample == "s", 1, 0)) %>% mutate(p = ifelse(as.numeric(accn) <= 2.5 & sample == "p", 1, 0))
if don't care having distinct "p" , "s" column, can write more succinctly:
ea %>% mutate(new = ifelse(as.numeric(accn) <= 2.5 & sample %in% c("s", "p"), 1, 0)) %>% summarise(new_sum = sum(new))
or use have:
s<-(ea$accn<="2.5" & ea$sample=="s") p<-(ea$accn<="2.5" & ea$sample=="p")
but, then:
sum(s, p)
or:
s<-(ea$accn<="2.5" & ea$sample=="s") p<-(ea$accn<="2.5" & ea$sample=="p") pe<-cbind(s,p)
but then:
sum(pe) # keeping object matrix, not spinning dataframe.
to sum, each value 1 30 (optional), per question in comment section, answer be:
library(dplyr) x <- 1:30 (sapply(x, function(x) {ifelse(as.numeric(ea$accn) <= x & ea$sample == "s", 1, 0)}) + sapply(x, function(x) {ifelse(as.numeric(ea$accn) <= x & ea$sample == "p", 1, 0)})) %>% as.data.frame() %>% summarise_all(sum)
though don't know exact structure of output you're seeking.
Comments
Post a Comment