Using sqlite and dplyr for fuzzy matching in R -
i working extremely large dataset of names , using rsqlite run in r. have taken original file (14 million rows) , grouped identical names using "group by" dplyr. i'm making sure separate columns count number of times values have been grouped , mean scores are. single file, not matching against master file. works well, reduces file down 5 million rows based on identical string matching.
is there way group names through fuzzy match, while keeping count , mean variables?
here code i'm using:
employees <- company %>% select(name, score) head(employees) employeescores <- employees %>% group_by(name) %>% summarize(count = n(), score = mean(score)) write.csv(employeescores, file = "employeescores.csv", row.names=false)
Comments
Post a Comment