Operation on lists as elements of dataframe in R -


i have time series id, , list of dates @ event occurred. want know how many times event has happened given date within time series.

here sample dataframe:

id <- c(1,1,1,2,2,2,3,3,3) date <- c(2000,2001,2002) df <- data.frame(id,date)  rand1 <- c(runif(5)*4+1999) rand2 <- c(runif(6)*4+1999) rand3 <- c(runif(100)*4+1999)  df$events <- list(rand1, rand1, rand1, rand2, rand2, rand2,rand3, rand3, rand3 ) 

this code solve problem correctly:

for (i in c(1:9)){   print(i)   df[i,]$past <- sum( df[i,]$events[[1]] < df[i,]$date) } 

but seems wildly inefficient go line line through dataframe. real dataset has 4 million rows, need little more sensible.

here tried first: i'm not sure it's doing, ends creating elements of df$past2 integer.

df$past2 <- sum(df$events[[1]] < df$date) 

resulting df:

id  date        events   past past2 <dbl> <dbl>     <list>  <dbl> <int> 1   2000      <dbl [5]>   3     6 1   2001      <dbl [5]>   3     6 1   2002      <dbl [5]>   4     6 2   2000      <dbl [6]>   0     6 2   2001      <dbl [6]>   3     6 2   2002      <dbl [6]>   5     6 3   2000    <dbl [100]>  26     6 3   2001    <dbl [100]>  55     6 3   2002    <dbl [100]>  74     6 

so,

1) df$past2 calculation doing?

2) there way kind of operation on lists elements of dataframe without going line line?

thanks.

the problem df$past2 df$events[[1]] return df[1,]$df$events[[1]].

one solution problem split each row of dataframe list , use lapply:

df$past2 = unlist(lapply(split(df,seq(nrow(df))),function(x) sum(x$events[[1]]< x$date))) 

however, because there data manipulation, not sure efficient 4 million lines dataframe. might need data.table or dplyrto find more efficient solution.


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -