r - Interpreting ACF and PACF plots for SARIMA model -
i'm new time series , used monthly ozone concentration data rob hyndman's website forecasting.
after doing log transformation , differencing lags 1 , 12 rid of trend , seasonality respectively, plotted acf , pacf shown [in image][2]. on right track , how interpret sarima?
there seems pattern every 11 lags in pacf plot, makes me think should more differencing (at 11 lags), doing gives me worse plot. i'd appreciate of help!
edit: got rid of differencing @ lag 1 , used lag 12 instead, , this got acf , pacf.
from there, deduced that: sarima(1,0,1)x(1,1,1) (aic: 520.098) or sarima(1,0,1)x(2,1,1) (aic: 521.250) fit, auto.arima gave me (3,1,1)x(2,0,0) (aic: 560.7) , (1,1,1)x(2,0,0) (aic: 558.09) without stepwise , approximation.
i confused on model use, based on lowest aic, sar(1,0,1)x(1,1,1) best? also, thing concerns me none of models pass ljung-box test. there way can fix this?
it quite difficult manually select model order perform @ forecasting dataset. why rob has built 'auto.arima' function in r forecast package, figure out model may perform best based on metrics.
when see pacf plot negative lags means have on differenced data. try removing 1st order difference , keeping 12 order difference. carry on making best guess.
i'd recommend trying auto.arima function , passing time series object frequency = 12. has writeup of seasonal arima models here:
https://www.otexts.org/fpp/8/9
if more insight manually selecting sarima model order, read:
https://onlinecourses.science.psu.edu/stat510/node/67
in response edit: think beneficial post if clarify objective. of following trying achieve?
- find model residuals satisfy ljung box test
- produce accurate out of sample forecast
- manually select lag orders such acf , pacf plots show no significant lags remaining.
in opinion, #2 sought after objective i'll assume goal. experience, #3 produces poor results out of sample. in regards #1, not concerned correlations remaining in residuals. know not have true model time-series, not feel there's reason expect approximate model performs out of sample not have left behind in residuals more complex perhaps, or nonlinear etc.
to provide sarima result, ran data through code i've developed , found following equation produced minimal error on cross-validation period.
final model is: sarima [0,1,1] [1,1,1]12 constant using log normal of time-series. errors in cross validation period are: mape = 16% mae = 0.46 rsqr = 74%
here partial autocorrelation plot of residuals information.
this similar in methodology selecting equation based on aicc understanding, different approach. regardless, if objective out of sample accuracy, i'd recommend evaluating equations in terms of out of sample accuracy versus in-sample fit, tests, or plots.
Comments
Post a Comment