sql server - Standalone R and R - SQL give different results -
i working on forecasting model monthly data intend use in sql server 2016 (in-database).
i created simple tbats model testing:
dataset <- msts(data = dataset[,3], start = c(as.numeric(dataset[1,1]), as.numeric(dataset[1,2])), seasonal.periods = c(1,12)) dataset <- tsclean(dataset, replace.missing = true, lambda = boxcox.lambda(dataset, method = "loglik", lower = -2, upper = 1)) dataset <- tbats(dataset, use.arma.errors = true, use.parallel = true, num.cores = null ) dataset <- forecast(dataset, level =c (80,95), h = 24) dataset <- as.data.frame(dataset) dataset imported .csv file created sql query.
later, used same code in sql server, input being same query used .csv file (meaning data same aswell)
however, when executed script, noticed got different results. numbers fine , make perfect sense, both sql , standalone r give forecast table, numbers between 2 tables differ few % (about 3% on average).
is there explanation this? bothers me need best possible results.
edit: how data looks easier understanding. it's 3 column table: year, month, value of transactions (numbers randomised because data classified). alltogether have data 9 years.
2008 11 1093747561919.38 2008 12 816860005030.31 2009 1 341394536377.06 2009 2 669993867646.25 2009 3 717585597605.75 2009 4 627553319006.03 2009 5 984146176491.78 2009 6 605488762214.33 2009 7 355366795222.40 2009 8 549252969698.07 2009 9 598237364101.23 this example of results. top 2 rows sql server, bottom 2 rows rstudio.
t point lo80 hi80 1 872379.7412 557105.271 1187654.211 2 1093817.266 778527.1078 1409107.424 1 806050.6884 517606.464 1094494.913 2 1031845.483 743387.015 1320303.95 edit 2: checked each part of code , figured out difference in results happens @ tbats model.
sql server returns: tbats(0.684, {0,0}, -, {<12,5>})
rstudio returns: tbats(0.463, {0,0}, -, {<12,5>})
this explains difference in forecast values, question remains these should same.
i'll answer having problems in future:
seems there difference in execution in r engine depending on os , runtime. tested runing standalone r on pc , on server using rstudio , microsoft r open , runing r in database on pc , on server. tested different runtimes.
if wants test themseves, r runtime can changed in tools - global options - general - r version (for rstudio)
all tests returned different results. not mean results wrong (in case @ least, i'm forecasting real business data , results have wide intervals anyway).
this may not actual solution, hope can prevent panicking week did.
Comments
Post a Comment