r - Extracting data from a list of lists into its own `data.frame` with `purrr` -
representative sample data (list of lists):
l <- list(structure(list(a = -1.54676469632688, b = "s", c = "t", d = structure(list(id = 5l, label = "utah", link = "asia/anadyr", score = -0.21104594634643), .names = c("id", "label", "link", "score")), e = 49.1279871269422), .names = c("a", "b", "c", "d", "e")), structure(list(a = -0.934821052832427, b = "k", c = "t", d = list(structure(list(id = 8l, label = "south carolina", link = "pacific/wallis", score = 0.526540892113734, externalid = -6.74354377676955), .names = c("id", "label", "link", "score", "externalid")), structure(list( id = 9l, label = "nebraska", link = "america/scoresbysund", score = 0.250895465294041, externalid = 16.4257470807879), .names = c("id", "label", "link", "score", "externalid"))), e = 52.3161400117052), .names = c("a", "b", "c", "d", "e")), structure(list(a = -0.27261485993069, b = "f", c = "p", d = list(structure(list(id = 8l, label = "georgia", link = "america/nome", score = 0.526494135483816, externalid = 7.91583574935589), .names = c("id", "label", "link", "score", "externalid")), structure(list( id = 2l, label = "washington", link = "america/shiprock", score = -0.555186440792989, externalid = 15.0686663219837), .names = c("id", "label", "link", "score", "externalid")), structure(list( id = 6l, label = "north dakota", link = "universal", score = 1.03168296038975), .names = c("id", "label", "link", "score")), structure(list(id = 1l, label = "new hampshire", link = "america/cordoba", score = 1.21582056168681, externalid = 9.7276418869132), .names = c("id", "label", "link", "score", "externalid")), structure(list( id = 1l, label = "alaska", link = "asia/istanbul", score = -0.23183264861979), .names = c("id", "label", "link", "score")), structure(list(id = 4l, label = "pennsylvania", link = "africa/dar_es_salaam", score = 0.590245339334121), .names = c("id", "label", "link", "score"))), e = 132.1153538536), .names = c("a", "b", "c", "d", "e")), structure(list(a = 0.202685974077313, b = "x", c = "o", d = structure(list(id = 3l, label = "delaware", link = "asia/samarkand", score = 0.695577130634724, externalid = 15.2364820698193), .names = c("id", "label", "link", "score", "externalid")), e = 97.9908914452971), .names = c("a", "b", "c", "d", "e")), structure(list(a = -0.396243444741009, b = "z", c = "p", d = list(structure(list(id = 4l, label = "north dakota", link = "america/tortola", score = 1.03060272795705, externalid = -7.21666936522344), .names = c("id", "label", "link", "score", "externalid")), structure(list( id = 9l, label = "nebraska", link = "america/ojinaga", score = -1.11397997280413, externalid = -8.45145052697411), .names = c("id", "label", "link", "score", "externalid"))), e = 123.597945533926), .names = c("a", "b", "c", "d", "e")))
i have list of lists, virtue of json data download.
the list has 176 elements, each 33 nested elements of lists of varying length.
i interested in analyzing data contained in particular nested list, has length of ~150 each of 176 has either 4 or 5 elements -- have 4 , have 5. trying extract nested list of interest , convert data.frame
able perform analysis.
in representative sample data above, interested in nested list d
each of 5 elements of l
. desired data.frame
therefore like:
id label link score externalid 5 utah asia/anadyr -0.2110459 na 8 south carolina pacific/wallis 0.5265409 -6.743544 . .
i've been attempting use purrr
appears have sensible , consistent flow processing data in lists, running errors can't understand cause of -- don't understand commands/logic of purrr
or lists (likely both). code i've been attempting throws associated error:
df <- map_df(l, "d", ~as.data.frame(.)) error: incompatible sizes (5 != 4)
i believe has differing lengths of d
each component, or perhaps differing contained data (sometimes 4 elements 5) or perhaps function i've used here misspecified -- truthfully i'm not entirely sure.
i have worked around using loop, know inefficient , hence question here on so.
this loop employ:
df <- data.frame(id = integer(), label = character(), score = numeric(), externalid = numeric()) for(i in seq_along(l)){ df_temp <- l[[i]][[4]] %>% map_df(~as.data.frame(.)) df <- rbind(df, df_temp) }
some assistance preferably purrr
- alternatively version of apply
still superior for-loop - appreciated. if there's resource above i'd understand rather find right code.
you can in 3 steps, first pulling out d
, binding rows within each element of d
, , binding single object.
i use bind_rows
dplyr within-list row binding. map_df
final row binding.
library(purrr) library(dplyr) l %>% map("d") %>% map_df(bind_rows)
this equivalent:
map_df(l, ~bind_rows(.x[["d"]] ) )
the result looks like:
# tibble: 12 x 5 id label link score externalid <int> <chr> <chr> <dbl> <dbl> 1 5 utah asia/anadyr -0.2110459 na 2 8 south carolina pacific/wallis 0.5265409 -6.743544 3 9 nebraska america/scoresbysund 0.2508955 16.425747 4 8 georgia america/nome 0.5264941 7.915836 5 2 washington america/shiprock -0.5551864 15.068666 6 6 north dakota universal 1.0316830 na 7 1 new hampshire america/cordoba 1.2158206 9.727642 8 1 alaska asia/istanbul -0.2318326 na 9 4 pennsylvania africa/dar_es_salaam 0.5902453 na 10 3 delaware asia/samarkand 0.6955771 15.236482 11 4 north dakota america/tortola 1.0306027 -7.216669 12 9 nebraska america/ojinaga -1.1139800 -8.451451
Comments
Post a Comment