sql - Querying database to find potential duplicates based on multiple columns -

July 15, 2015

i have table in database have 100k clients , erroneous system in place quite time coming find out might potentially have thousands of duplicate records, should not have been duplicates.

this basic structure of table...

clientid         fname           lname           dob          sysemid ---------------------------------------------------------------------- 123              janette         marukan         1/2/1990     111111s 478              jannete         maruckan        1/2/1990     111111s

these not fields in system goal able use columns above in search query. i'm trying take first 2 or 3 letters of first , last names, dob sysemid , find records potential duplicates.

as can seen above in table have 2 clients, names different - in fact 2 clients both same, saved in system two. there way me somehow query whole tables , find that?

i don't have 1 specific client in place - it's more of general kind of query show me potential duplicates of clients same 2 or 3 letters of 1st name, last name , dob starters?

you use window functions:

select t (select t.*,              count(*) on (partition left(fname, 2), left(lname, 2), dob, sysemid) cnt       t      ) t cnt > 1 order sysemid, dob, fname, lname;

this may 1 of rare situations soundex() useful. after all, designed strings in mind. so:

select t (select t.*,              count(*) on (partition soundex(fname), soundex(lname), dob, sysemid) cnt       t      ) t cnt > 1 order sysemid, dob, fname, lname;

Search This Blog

RT

sql - Querying database to find potential duplicates based on multiple columns -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -