ARIMA Forecasting Model Reference
This notebook is a practical reference for ARIMA-style forecasting.
It shows:
- how to create or load a small univariate time series
- what ARIMA order parts mean
- how to search candidate orders by AIC
- how to forecast one step or several steps ahead
- how to run rolling predictions without future leakage
- how to save outputs
Install packages
In [ ]:
pip install pandas numpy statsmodels matplotlib scikit-learn
Create folders
In [1]:
from pathlib import Path
data_dir = Path("data")
output_dir = Path("outputs")
data_dir.mkdir(parents=True, exist_ok=True)
output_dir.mkdir(parents=True, exist_ok=True)
# print(data_dir.resolve())
# print(output_dir.resolve())
Load a small time series
In [2]:
import numpy as np
import pandas as pd
try:
import statsmodels.api as sm
raw = sm.datasets.sunspots.load_pandas().data.copy()
raw["YEAR"] = raw["YEAR"].astype(int)
series_df = pd.DataFrame({
"date": pd.to_datetime(raw["YEAR"].astype(str) + "-01-01"),
"value": raw["SUNACTIVITY"].astype(float)
})
except Exception:
rng = np.random.default_rng(42)
n = 180
date_index = pd.date_range("1900-01-01", periods=n, freq="YS")
value = 50 + 20 * np.sin(np.arange(n) / 7) + rng.normal(0, 5, n)
series_df = pd.DataFrame({
"date": date_index,
"value": value
})
series_df = series_df.sort_values("date").reset_index(drop=True)
series_df.to_csv(data_dir / "arima_reference_series.csv", index=False)
print(series_df.shape)
series_df.head()
(309, 2)
Out[2]:
| date | value | |
|---|---|---|
| 0 | 1700-01-01 | 5.0 |
| 1 | 1701-01-01 | 11.0 |
| 2 | 1702-01-01 | 16.0 |
| 3 | 1703-01-01 | 23.0 |
| 4 | 1704-01-01 | 36.0 |
Create a pandas Series
In [3]:
series = series_df.set_index("date")["value"].astype(float)
series.tail()
Out[3]:
date 2004-01-01 40.4 2005-01-01 29.8 2006-01-01 15.2 2007-01-01 7.5 2008-01-01 2.9 Name: value, dtype: float64
Plot the series
In [4]:
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 4))
plt.plot(series.index, series.values)
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Reference Time Series")
plt.tight_layout()
plt.show()
ARIMA order parts
| Part | Meaning |
|---|---|
p |
Number of autoregressive past value terms. |
d |
Number of differencing operations. |
q |
Number of moving-average past error terms. |
AIC |
Model selection score balancing fit and complexity. |
Split train and validation data
In [5]:
split_idx = int(len(series) * 0.80)
train_series = series.iloc[:split_idx]
valid_series = series.iloc[split_idx:]
print(train_series.index.min(), train_series.index.max())
print(valid_series.index.min(), valid_series.index.max())
1700-01-01 00:00:00 1946-01-01 00:00:00 1947-01-01 00:00:00 2008-01-01 00:00:00
Select ARIMA order by AIC
In [6]:
import warnings
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.filterwarnings("ignore", category=UserWarning, module="statsmodels")
warnings.filterwarnings("ignore", category=ConvergenceWarning)
p_grid = [0, 1, 2, 3]
d_grid = [0, 1]
q_grid = [0, 1, 2, 3]
def select_arima_order(values):
y = np.asarray(values, dtype=float)
y = y[np.isfinite(y)]
best_order = None
best_aic = np.inf
rows = []
for p in p_grid:
for d in d_grid:
for q in q_grid:
if p == 0 and d == 0 and q == 0:
continue
order = (p, d, q)
try:
fit = ARIMA(
y,
order=order,
enforce_stationarity=False,
enforce_invertibility=False
).fit()
aic = float(fit.aic) if np.isfinite(fit.aic) else np.inf
except Exception:
aic = np.inf
rows.append({
"p": p,
"d": d,
"q": q,
"aic": aic
})
if aic < best_aic:
best_aic = aic
best_order = order
return best_order, best_aic, pd.DataFrame(rows)
best_order, best_aic, order_search = select_arima_order(train_series.values)
print(best_order)
print(best_aic)
order_search.sort_values("aic").head()
(3, 0, 3) 1988.401253640831
Out[6]:
| p | d | q | aic | |
|---|---|---|---|---|
| 26 | 3 | 0 | 3 | 1988.401254 |
| 30 | 3 | 1 | 3 | 1994.446241 |
| 21 | 2 | 1 | 2 | 1997.513314 |
| 22 | 2 | 1 | 3 | 2006.143511 |
| 28 | 3 | 1 | 1 | 2007.765344 |
Fit the selected model and forecast validation range
In [7]:
fit = ARIMA(
train_series.values,
order=best_order,
enforce_stationarity=False,
enforce_invertibility=False
).fit()
forecast_values = fit.forecast(steps=len(valid_series))
forecast_df = pd.DataFrame({
"date": valid_series.index,
"actual": valid_series.values,
"forecast": forecast_values
})
forecast_df.head()
Out[7]:
| date | actual | forecast | |
|---|---|---|---|
| 0 | 1947-01-01 | 151.6 | 118.908909 |
| 1 | 1948-01-01 | 136.3 | 120.537470 |
| 2 | 1949-01-01 | 134.7 | 105.618410 |
| 3 | 1950-01-01 | 83.9 | 79.149295 |
| 4 | 1951-01-01 | 69.4 | 49.478250 |
Evaluate the forecast
In [8]:
from sklearn.metrics import mean_absolute_error, mean_squared_error
metrics = pd.DataFrame([
{
"model": "ARIMA",
"order": str(best_order),
"mae": mean_absolute_error(forecast_df["actual"], forecast_df["forecast"]),
"rmse": np.sqrt(mean_squared_error(forecast_df["actual"], forecast_df["forecast"]))
}
])
metrics
Out[8]:
| model | order | mae | rmse | |
|---|---|---|---|---|
| 0 | ARIMA | (3, 0, 3) | 33.93463 | 46.552381 |
Rolling ARIMA predictions
In [9]:
def rolling_arima_predictions(series, start_position, horizon, order):
rows = []
for pos in range(start_position, len(series) - horizon):
train_to_date = series.iloc[:pos + 1]
decision_date = series.index[pos]
target_date = series.index[pos + horizon]
actual = series.iloc[pos + horizon]
try:
model = ARIMA(
train_to_date.values,
order=order,
enforce_stationarity=False,
enforce_invertibility=False
).fit()
forecast_values = model.forecast(steps=horizon)
predicted = float(forecast_values[-1])
except Exception:
predicted = np.nan
rows.append({
"decision_date": decision_date,
"target_date": target_date,
"horizon": horizon,
"actual": actual,
"predicted": predicted
})
return pd.DataFrame(rows)
rolling_df = rolling_arima_predictions(
series,
start_position=split_idx,
horizon=1,
order=best_order
)
rolling_df.head()
Out[9]:
| decision_date | target_date | horizon | actual | predicted | |
|---|---|---|---|---|---|
| 0 | 1947-01-01 | 1948-01-01 | 1 | 136.3 | 160.871123 |
| 1 | 1948-01-01 | 1949-01-01 | 1 | 134.7 | 108.012351 |
| 2 | 1949-01-01 | 1950-01-01 | 1 | 83.9 | 106.334482 |
| 3 | 1950-01-01 | 1951-01-01 | 1 | 69.4 | 39.007939 |
| 4 | 1951-01-01 | 1952-01-01 | 1 | 31.5 | 45.923533 |
Evaluate rolling predictions
In [10]:
rolling_clean = rolling_df.dropna(subset=["predicted"]).copy()
rolling_metrics = pd.DataFrame([
{
"model": "Rolling ARIMA",
"order": str(best_order),
"horizon": 1,
"mae": mean_absolute_error(rolling_clean["actual"], rolling_clean["predicted"]),
"rmse": np.sqrt(mean_squared_error(rolling_clean["actual"], rolling_clean["predicted"]))
}
])
rolling_metrics
Out[10]:
| model | order | horizon | mae | rmse | |
|---|---|---|---|---|---|
| 0 | Rolling ARIMA | (3, 0, 3) | 1 | 15.765002 | 20.96658 |
Plot rolling predictions
In [11]:
plt.figure(figsize=(8, 4))
plt.plot(rolling_clean["target_date"], rolling_clean["actual"], label="Actual")
plt.plot(rolling_clean["target_date"], rolling_clean["predicted"], label="Predicted")
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Rolling ARIMA Predictions")
plt.legend()
plt.tight_layout()
plt.show()