r - Iterating over columns in a data frame in order to replace values from matching data in list of data frames -
i'm interested in building function making use of apply
/sapply
or map
iterate on available columns in dta
, replace values in each column matched values data frame available in nameless list of data frames list item index corresponding column number of dta
data frame.
example
given objects:
set.seed(1) size <- 20 # data set dta <- data.frame( unita = sample(letters[1:4], size = size, replace = true), unitb = sample(letters[16:20], size = size, replace = true), unitc = sample(month.abb[1:4], size = size, replace = true), somevalue = sample(1:1e6, size = size, replace = true) ) # meta data lstmeta <- list( # unit definitions data.frame( v1 = c("a", "b", "d"), v2 = c("letter a", "letter b", "letter d") ), # unit b definitions data.frame( v1 = c("t", "q"), v2 = c("small t", "small q") ), # unit c definitions data.frame( v1 = c("mar", "jan"), v2 = c("march", "january") ) )
desired results
when applied on dta
, function should return data.frame
corresponding extract below:
unita unitb unitc somevalue letter b small t apr 912876 letter b small q march 293604 c s apr 459066 letter d p march 332395 letter small q march 650871 letter d small q apr 258017 letter d p january 478546 c small q feb 766311 c small t march 84247 letter small q march 875322 letter r feb 339073 letter r ap 839441 c r feb 346684 letter b p january 333775 letter d small t january 476352 (...)
existing approach
replacelbls <- function(dataset, lstdict) { sapply(seq_along(dataset), function(i) { # take corresponding metadata data frame dtadict <- lstdict[[i]] # replace values in selected column # matches on v1 push corrsponding values v2 dataset[,i][match(dataset[,i], dtadict[,1])] <- dtadict[,2][match(dtadict[,1], dataset[,i])] }) } # testing ----------------------------------------------------------------- replacelbls(dataset = dta, lstdict = lstmeta)
of course approach proposed above not work try use na
in assignments; summarises want achieve:
error in
x[...] <- m
:nas
not allowed in subscripted assignments in addition: warning message: in[<-.factor(*tmp*, match(dataset[, i], dtadict[, 1]), value = c(na,
: invalid factor level, na generated
additional remarks
source data set
the key characteristics of data are:
- the list nameless subsetting has done item numbers not names
- item number correspond column numbers
- there no full match between metadata data frames available in list of data frames , unit columns available in data
- the
somevalue
column should iterated on may contain labels should replaced
solution
- i'm not interested in
dplyr
/data.table
/sqldf
-based solutions. - i'm not interested in nested
for
-loops
the following approach works example data:
replacelbls <- function(dataset, lstdict) { dataset[seq_along(lstdict)] <- map(function(x, lst) { x <- as.character(x) idx <- match(x, as.character(lst$v1)) replace(x, !is.na(idx), as.character(lst$v2)[na.omit(idx)]) }, dataset[seq_along(lstdict)], lstdict) dataset } head(replacelbls(dta, lstmeta)) # unita unitb unitc somevalue # 1 letter b small t apr 912876 # 2 letter b small q march 293604 # 3 c s apr 459066 # 4 letter d p march 332395 # 5 letter small q march 650871 # 6 letter d small q apr 258017
this assumes want apply changes first x column of data long meta-list. might want include step convert factor since approach converts adjusted columns character class.
another remark on factors: potentially speed performance working on levels of factor variables instead whole column. general process similar requires few more steps check classes etc.
Comments
Post a Comment