Using sqlite and dplyr for fuzzy matching in R -


i working extremely large dataset of names , using rsqlite run in r. have taken original file (14 million rows) , grouped identical names using "group by" dplyr. i'm making sure separate columns count number of times values have been grouped , mean scores are. single file, not matching against master file. works well, reduces file down 5 million rows based on identical string matching.

is there way group names through fuzzy match, while keeping count , mean variables?

here code i'm using:

employees <- company %>% select(name, score)  head(employees)  employeescores <- employees %>% group_by(name) %>% summarize(count = n(), score = mean(score))  write.csv(employeescores, file = "employeescores.csv", row.names=false) 


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -