Collapse columns in a dataframe (R) -
basically, have dataframe, df
beginning1 protein2 protein3 protein4 biomarker1 pathway3 g na na f pathway8 z g na na e pathway9 g z h f pathway6 y g z h e pathway2 g d na f pathway5 q g d na e pathway1 d k na f pathway7 b c d f pathway4 v b c d e
and want combine dataframe rows when identical "protein2" "protein4" condense, giving following:
beginning1 protein2 protein3 protein4 biomarker1 pathway3 a,z g na na f,e pathway9 a,y g z h f,e pathway2 a,q g d na f,e pathway1 d k na f pathway7 a,v b c d f,e
this similar question asked before (consolidating duplicate rows in dataframe), difference consolidating "beginning1" row.
so far, have tried:
library(dat.table) dat<-data.table(df) total_collapse <- dat[, .( biomarker1 = paste0(biomarker1, collapse = ", ")), = .(beginning1, protein1, protein2, protein3)] total_collapse <- dat[, .( beginning1 = paste0(beginning1, collapse = ", ")), = .(protein1, protein2, protein3)]
which gives output:
beginning1 protein2 protein3 protein4 biomarker1 pathway3 g na na f,e pathway9 g z h f,e pathway2 g d na f,e pathway1 d k na f pathway7 b c d f,e
does know how fix problem? have tried duplicating solution collapse / concatenate / aggregate column single comma separated string within each group, have had no success.
i sorry if simple error- pretty new r.
here's possible solution using dplyr
df %>% group_by_at(vars(protein2:protein4)) %>% summarize_all(paste, collapse=",")
Comments
Post a Comment