sql - Querying database to find potential duplicates based on multiple columns -
i have table in database have 100k clients , erroneous system in place quite time coming find out might potentially have thousands of duplicate records, should not have been duplicates.
this basic structure of table...
clientid fname lname dob sysemid ---------------------------------------------------------------------- 123 janette marukan 1/2/1990 111111s 478 jannete maruckan 1/2/1990 111111s
these not fields in system goal able use columns above in search query. i'm trying take first 2 or 3 letters of first , last names, dob sysemid , find records potential duplicates.
as can seen above in table have 2 clients, names different - in fact 2 clients both same, saved in system two. there way me somehow query whole tables , find that?
i don't have 1 specific client in place - it's more of general kind of query show me potential duplicates of clients same 2 or 3 letters of 1st name, last name , dob starters?
you use window functions:
select t (select t.*, count(*) on (partition left(fname, 2), left(lname, 2), dob, sysemid) cnt t ) t cnt > 1 order sysemid, dob, fname, lname;
this may 1 of rare situations soundex()
useful. after all, designed strings in mind. so:
select t (select t.*, count(*) on (partition soundex(fname), soundex(lname), dob, sysemid) cnt t ) t cnt > 1 order sysemid, dob, fname, lname;
Comments
Post a Comment