r - Avoiding intermediate dlply step when starting with a dataframe and ending with a dataframe -

April 15, 2010

i using plyr perform bootstrapping function on subsets of dataset.

because boot function creates list object, using dlply store output of function, ddply parts of bootfunction want out

my example dataset follows:

dat = data.frame(x = rnorm(10, sd = 1),y = rnorm(10, sd = 1),z = rep(c("sppa", "sppb", "sppc", "sppd", "sppe"), 2),u = rep(c("sitea", "siteb"), 5))

the exact function isn't terribly important, sake of reproducibility, here function i'm using:

boot_fun = function(x,i) {   = sample(1:24, 24, replace = true)   ts1 = mean(x[i,"x"])   ts2 = sample(x[i,"y"])   mean(ts1) - mean(ts2) }

my plyr function following:

temp = dlply(dat, c("z", "u"), summarise, boot_object = boot(dat, boot_fun, r = 1000))

since want out of boot object mean , ci, perform following plyr function:

temp2 = ldply(temp, summarise, mean = mean(boot$t), lowci = quantile(boot$t, 0.025), highci = quantile(boot$t, 0.975))

this works , accomplishes want (although error subsetting doesn't seem affect care about), feel there should way skip intermediate dlply step.

-edit- clarify on i'm trying if didn't need splitting groups

if manually splitting instead of using plyr, following:

temp = boot(dat[dat$z == "sppa" & dat$u == "sitea",], boot_fun, r = 1000) temp2$mean = mean(temp$t) temp2$lowci = quantile(temp$t, 0.025) temp2$highci = quantile(temp$t, 0.975)

if didn't care groups @ , wanted whole group like

temp = boot(dat, boot_fun, r = 1000) temp2$mean = mean(temp$t) temp2$lowci = quantile(temp$t, 0.025) temp2$highci = quantile(temp$t, 0.975)

your example not reproducible me.

when temp = boot(dat, boot_fun, r = 1000), warning:

ordinary nonparametric bootstrap call: boot(data = dat, statistic = boot_fun, r = 1000) bootstrap statistics : warning: values of t1* na

i think current code pretty efficient, if you're looking other possibilities, try tidyverse 1) group_by relevant columns, 2) nest relevant data bootstrapping, 3) run bootstrap nested data, 4) isolate statistics desire, 5) return normal data frame

library(boot) library(tidyverse) dat1 <- dat %>%           group_by(z,u) %>%           nest() %>%           mutate(data=map(data,~boot(.x, boot_fun, r=1000))) %>%           mutate(data=map(data,~data.frame(mean=mean(.x$t), lowci=quantile(.x$t, 0.025), highci=quantile(.x$t,0.975)))) %>%           unnest(data)

Search This Blog

RT

r - Avoiding intermediate dlply step when starting with a dataframe and ending with a dataframe -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

javascript - Replicate keyboard event with html button -