sql server - Selecting records for deletions based on relationship with previous and next records -
i have sql server 2014 table millions of gps coordinates, each @ particular time. interval between registrations not fixed , varies 1 second couple of hours. want keep 1 measurement every 4 minutes, other records have deleted.
i tried while loop in t-sql traverses every record, inside loop select statement double cross apply return record if sits in beween 2 other records not more 4 minutes apart. strategy turns out slow.
can done set-based solution ? or there way speed-up query ? (the test query below printing, not yet deleting)
select * #mytemp gps order timepoint asc declare @id uniqueidentifier declare @d1 varchar(19) declare @d2 varchar(19) declare @d3 varchar(19) while exists (select * #mytemp ) begin select top 1 @id = id #mytemp order timepoint asc select @d1 = convert(varchar(19), a.justbefore, 121), @d2 = convert(varchar(19), b.tijdstip, 121), @d3 = convert(varchar(19), c.justafter, 121) gps b cross apply ( select top 1 timepoint justbefore gps (b.timepoint > timepoint ) , (b.id = @id ) order timepoint desc ) cross apply ( select top 1 timepoint justafter gps (datediff(n,a.justbefore,timepoint ) between -4 , 0) , (b.timepoint < timepoint ) order timepoint asc ) c print 'id=' + cast(@id varchar(50)) + ' / d1=' + @d1 + ' / d2=' + @d2 + ' / d3=' + @d3 delete #mytemp id = @id end
--
sample data: id timepoint lat lon 1 20170725 13:05:27 12,256 24,123 2 20170725 13:10:27 12,254 24,120 3 20170725 13:10:29 12,253 24,125 4 20170725 13:11:55 12,259 24,127 5 20170725 13:11:59 12,255 24,123 6 20170725 13:14:28 12,254 24,126 7 20170725 13:16:52 12,259 24,121 8 20170725 13:20:53 12,257 24,125
in case records 3,4,5 should deleted. record 7 should stay gap between 7 , 8 longer 4 minutes.
looking @ numbers... looks 1 & 2 stay (5 mins apart)...3, 4, & 5 should go... 6 stays (4 mins 2)... 7 should go (only 2 mins 6) , 8 stays (6 mins 6)...
if correct, following you're looking for... if object_id('tempdb..#testdata', 'u') not null drop table #testdata; create table #testdata ( id int not null primary key clustered, timepoint datetime2(0) not null, lat decimal(9,3), lon decimal(9,3) ); insert #testdata (id, timepoint, lat, lon) values (1, '20170725 13:05:27', 12.256, 24.123), (2, '20170725 13:10:27', 12.254, 24.120), (3, '20170725 13:10:29', 12.253, 24.125), (4, '20170725 13:11:55', 12.259, 24.127), (5, '20170725 13:11:59', 12.255, 24.123), (6, '20170725 13:14:28', 12.254, 24.126), (7, '20170725 13:16:52', 12.259, 24.121), (8, '20170725 13:20:53', 12.257, 24.125); -- select * #testdata td; --================================================================================ cte_addlag ( select td.id, td.timepoint, td.lat, td.lon, minfromprev = datediff(mi, lag(td.timepoint, 1) on (order td.timepoint), td.timepoint) #testdata td ), cte_timegroup ( select *, timegroup = isnull(sum(al.minfromprev) on (order al.timepoint rows unbounded preceding) / 4, 0) cte_addlag al ) select top 1 ties tg.id, tg.timepoint, tg.lat, tg.lon cte_timegroup tg order row_number() on (partition tg.timegroup order tg.timepoint);
results...
id timepoint lat lon ----------- --------------------------- --------------------------------------- --------------------------------------- 1 2017-07-25 13:05:27 12.256 24.123 2 2017-07-25 13:10:27 12.254 24.120 6 2017-07-25 13:14:28 12.254 24.126 8 2017-07-25 13:20:53 12.257 24.125
hth, jason
Comments
Post a Comment