Calculating distance to minimum of similar cases (observations) in R -
i have dataset describes results of applying 3 algorithms number of cases.for each combination of algorithm , case, there result.
df = data.frame( c("case1", "case1", "case1", "case2", "case2", "case2"), c("algo1", "algo2", "algo3", "algo1", "algo2", "algo3"), c(10, 11, 12, 22, 23, 20) ); names(df) <- c("case", "algorithm", "result"); df
these algorithms aim minimize result value. each algorithm , case want calculate gap lowest achieved result, achieved algorithm same case.
gap <- function(caseid, result) { filtered = subset(df, case==caseid) return (result - min(filtered[,'result'])); }
when apply function manually, expected results.
gap("case1", 10) # prints 0, since 10 best value case1 gap("case1", 11) # prints 1, since 11-10=1 gap("case1", 12) # prints 2, since 12-10=1 gap("case2", 22) # prints 2, since 22-20=2 gap("case2", 23) # prints 3, since 23-20=3 gap("case2", 20) # prints 0, since 20 best value case2
however, when want calculate new column across whole dataset, bogus results case2.
df$gap <- gap(df$case, df$result) df
this produces
case algorithm result gap 1 case1 algo1 10 0 2 case1 algo2 11 1 3 case1 algo3 12 2 4 case2 algo1 22 12 5 case2 algo2 23 13 6 case2 algo3 20 10
it seems gap function working against overall result minimum of whole dataframe, whereas should consider rows same case. maybe subset filtering in gap function not working properly?
we can use dplyr
library(dplyr) df %>% group_by(case) %>% mutate(result = result - min(result)) # tibble: 6 x 3 # groups: case [2] # case algorithm result # <fctr> <fctr> <dbl> #1 case1 algo1 0 #2 case1 algo2 1 #3 case1 algo3 2 #4 case2 algo1 2 #5 case2 algo2 3 #6 case2 algo3 0
Comments
Post a Comment