r - How to efficiently swap elements between columns in a dataframe? -
i asked similar question before, realized previous example little special in sense factor levels equally-spaced. here want reframe question in more generic way, , solutions in old thread not work properly.
suppose have following dataframe in r
:
set.seed(1) (tmp <- data.frame(x = 1:10, r1 = sample(c('a','d','f','g','i'), 10, replace = true), r2 = sample(c('d','f','g','i','z'), 10, replace = true), stringsasfactors=false)) x r1 r2 1 1 d f 2 2 d d 3 3 f 4 4 f 5 5 d 6 6 g 7 7 8 8 g z 9 9 g f 10 10
notice 2 columns r1
, r2
not share same elements. want following: if difference between elemet index (sequential order among elements) of column r1
, of column r2
odd number, levels of 2 factors need switched between them, can performed through following code:
for(ii in 1:dim(tmp)[1]) { kk <- which(levels(as.factor(tmp$r2)) %in% tmp[ii,'r2'], arr.ind = true) - which(levels(as.factor(tmp$r1)) %in% tmp[ii,'r1'], arr.ind = true) if(kk%%2!=0) { # swap elements between 2 columns qq <- tmp[ii,]$r1 tmp[ii,]$r1 <- tmp[ii,]$r2 tmp[ii,]$r2 <- qq } }
as 2 columns r1
, r2
don't share same elements, purposefully created dataframe tmp
r1
, r2
not factors swamp elements between 2 columns kludge code above. below output after swapping:
x r1 r2 1 1 d f 2 2 d d 3 3 f 4 4 f 5 5 d 6 6 g 7 7 8 8 z g 9 9 f g 10 10
my solution awkward , slow big dataframe. elegant way perform operation?
# convert character dat[, c("r1", "r2")] <- lapply(dat[, c("r1", "r2")], as.character)
next, vectorize row-change condition. true elements rows evaluated , swapped if necessary.
# logical inidcator elements change changeind <- !!((match(dat$r2, levels(as.factor(dat$r2))) - match(dat$r1, levels(as.factor(dat$r1)))) %% 2) # perform swapping given rows dat[changeind, c("r1", "r2")] <- dat[changeind, c("r2", "r1")]
here, use match
select rows changes needed. after this, perform simple swapping of variables [
.
this returns
dat x r1 r2 1 1 d f 2 2 d d 3 3 f 4 4 f 5 5 d 6 6 g 7 7 8 8 g z 9 9 f g 10 10
note there may typo in desired output. since
identical((sapply(seq_len(nrow(dat)), function(x) which(levels(as.factor(dat$r2)) %in% dat[x,'r2'], arr.ind = true) - which(levels(as.factor(dat$r1)) %in% dat[x,'r1'], arr.ind = true)) %% 2) != 0, changeind) [1] true
data
dat <- structure(list(x = 1:10, r1 = structure(c(1l, 1l, 4l, 4l, 1l, 3l, 4l, 5l, 2l, 4l), .label = c("d", "f", "g", "i", "z"), class = "factor"), r2 = structure(c(3l, 2l, 3l, 3l, 5l, 5l, 5l, 4l, 4l, 1l), .label = c("a", "d", "f", "g", "i"), class = "factor")), .names = c("x", "r1", "r2"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
Comments
Post a Comment