Using sqlite and dplyr for fuzzy matching in R -

May 15, 2011

i working extremely large dataset of names , using rsqlite run in r. have taken original file (14 million rows) , grouped identical names using "group by" dplyr. i'm making sure separate columns count number of times values have been grouped , mean scores are. single file, not matching against master file. works well, reduces file down 5 million rows based on identical string matching.

is there way group names through fuzzy match, while keeping count , mean variables?

here code i'm using:

employees <- company %>% select(name, score)  head(employees)  employeescores <- employees %>% group_by(name) %>% summarize(count = n(), score = mean(score))  write.csv(employeescores, file = "employeescores.csv", row.names=false)

Search This Blog

RT

Using sqlite and dplyr for fuzzy matching in R -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

javascript - Replicate keyboard event with html button -