Ctrl K

ARIMA Forecasting Model Reference

This notebook is a practical reference for ARIMA-style forecasting.

It shows:

  • how to create or load a small univariate time series
  • what ARIMA order parts mean
  • how to search candidate orders by AIC
  • how to forecast one step or several steps ahead
  • how to run rolling predictions without future leakage
  • how to save outputs

Install packages

In [ ]:
pip install pandas numpy statsmodels matplotlib scikit-learn

Create folders

In [1]:
from pathlib import Path

data_dir = Path("data")
output_dir = Path("outputs")

data_dir.mkdir(parents=True, exist_ok=True)
output_dir.mkdir(parents=True, exist_ok=True)

# print(data_dir.resolve())
# print(output_dir.resolve())

Load a small time series

In [2]:
import numpy as np
import pandas as pd

try:
    import statsmodels.api as sm

    raw = sm.datasets.sunspots.load_pandas().data.copy()
    raw["YEAR"] = raw["YEAR"].astype(int)
    series_df = pd.DataFrame({
        "date": pd.to_datetime(raw["YEAR"].astype(str) + "-01-01"),
        "value": raw["SUNACTIVITY"].astype(float)
    })
except Exception:
    rng = np.random.default_rng(42)
    n = 180
    date_index = pd.date_range("1900-01-01", periods=n, freq="YS")
    value = 50 + 20 * np.sin(np.arange(n) / 7) + rng.normal(0, 5, n)
    series_df = pd.DataFrame({
        "date": date_index,
        "value": value
    })

series_df = series_df.sort_values("date").reset_index(drop=True)
series_df.to_csv(data_dir / "arima_reference_series.csv", index=False)

print(series_df.shape)
series_df.head()
(309, 2)
Out[2]:
date value
0 1700-01-01 5.0
1 1701-01-01 11.0
2 1702-01-01 16.0
3 1703-01-01 23.0
4 1704-01-01 36.0

Create a pandas Series

In [3]:
series = series_df.set_index("date")["value"].astype(float)

series.tail()
Out[3]:
date
2004-01-01    40.4
2005-01-01    29.8
2006-01-01    15.2
2007-01-01     7.5
2008-01-01     2.9
Name: value, dtype: float64

Plot the series

In [4]:
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 4))
plt.plot(series.index, series.values)
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Reference Time Series")
plt.tight_layout()
plt.show()
No description has been provided for this image

ARIMA order parts

Part Meaning
p Number of autoregressive past value terms.
d Number of differencing operations.
q Number of moving-average past error terms.
AIC Model selection score balancing fit and complexity.

Split train and validation data

In [5]:
split_idx = int(len(series) * 0.80)

train_series = series.iloc[:split_idx]
valid_series = series.iloc[split_idx:]

print(train_series.index.min(), train_series.index.max())
print(valid_series.index.min(), valid_series.index.max())
1700-01-01 00:00:00 1946-01-01 00:00:00
1947-01-01 00:00:00 2008-01-01 00:00:00

Select ARIMA order by AIC

In [6]:
import warnings

from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tools.sm_exceptions import ConvergenceWarning

warnings.filterwarnings("ignore", category=UserWarning, module="statsmodels")
warnings.filterwarnings("ignore", category=ConvergenceWarning)

p_grid = [0, 1, 2, 3]
d_grid = [0, 1]
q_grid = [0, 1, 2, 3]

def select_arima_order(values):
    y = np.asarray(values, dtype=float)
    y = y[np.isfinite(y)]

    best_order = None
    best_aic = np.inf
    rows = []

    for p in p_grid:
        for d in d_grid:
            for q in q_grid:
                if p == 0 and d == 0 and q == 0:
                    continue

                order = (p, d, q)

                try:
                    fit = ARIMA(
                        y,
                        order=order,
                        enforce_stationarity=False,
                        enforce_invertibility=False
                    ).fit()
                    aic = float(fit.aic) if np.isfinite(fit.aic) else np.inf
                except Exception:
                    aic = np.inf

                rows.append({
                    "p": p,
                    "d": d,
                    "q": q,
                    "aic": aic
                })

                if aic < best_aic:
                    best_aic = aic
                    best_order = order

    return best_order, best_aic, pd.DataFrame(rows)

best_order, best_aic, order_search = select_arima_order(train_series.values)

print(best_order)
print(best_aic)

order_search.sort_values("aic").head()
(3, 0, 3)
1988.401253640831
Out[6]:
p d q aic
26 3 0 3 1988.401254
30 3 1 3 1994.446241
21 2 1 2 1997.513314
22 2 1 3 2006.143511
28 3 1 1 2007.765344

Fit the selected model and forecast validation range

In [7]:
fit = ARIMA(
    train_series.values,
    order=best_order,
    enforce_stationarity=False,
    enforce_invertibility=False
).fit()

forecast_values = fit.forecast(steps=len(valid_series))

forecast_df = pd.DataFrame({
    "date": valid_series.index,
    "actual": valid_series.values,
    "forecast": forecast_values
})

forecast_df.head()
Out[7]:
date actual forecast
0 1947-01-01 151.6 118.908909
1 1948-01-01 136.3 120.537470
2 1949-01-01 134.7 105.618410
3 1950-01-01 83.9 79.149295
4 1951-01-01 69.4 49.478250

Evaluate the forecast

In [8]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

metrics = pd.DataFrame([
    {
        "model": "ARIMA",
        "order": str(best_order),
        "mae": mean_absolute_error(forecast_df["actual"], forecast_df["forecast"]),
        "rmse": np.sqrt(mean_squared_error(forecast_df["actual"], forecast_df["forecast"]))
    }
])

metrics
Out[8]:
model order mae rmse
0 ARIMA (3, 0, 3) 33.93463 46.552381

Rolling ARIMA predictions

In [9]:
def rolling_arima_predictions(series, start_position, horizon, order):
    rows = []

    for pos in range(start_position, len(series) - horizon):
        train_to_date = series.iloc[:pos + 1]
        decision_date = series.index[pos]
        target_date = series.index[pos + horizon]
        actual = series.iloc[pos + horizon]

        try:
            model = ARIMA(
                train_to_date.values,
                order=order,
                enforce_stationarity=False,
                enforce_invertibility=False
            ).fit()
            forecast_values = model.forecast(steps=horizon)
            predicted = float(forecast_values[-1])
        except Exception:
            predicted = np.nan

        rows.append({
            "decision_date": decision_date,
            "target_date": target_date,
            "horizon": horizon,
            "actual": actual,
            "predicted": predicted
        })

    return pd.DataFrame(rows)

rolling_df = rolling_arima_predictions(
    series,
    start_position=split_idx,
    horizon=1,
    order=best_order
)

rolling_df.head()
Out[9]:
decision_date target_date horizon actual predicted
0 1947-01-01 1948-01-01 1 136.3 160.871123
1 1948-01-01 1949-01-01 1 134.7 108.012351
2 1949-01-01 1950-01-01 1 83.9 106.334482
3 1950-01-01 1951-01-01 1 69.4 39.007939
4 1951-01-01 1952-01-01 1 31.5 45.923533

Evaluate rolling predictions

In [10]:
rolling_clean = rolling_df.dropna(subset=["predicted"]).copy()

rolling_metrics = pd.DataFrame([
    {
        "model": "Rolling ARIMA",
        "order": str(best_order),
        "horizon": 1,
        "mae": mean_absolute_error(rolling_clean["actual"], rolling_clean["predicted"]),
        "rmse": np.sqrt(mean_squared_error(rolling_clean["actual"], rolling_clean["predicted"]))
    }
])

rolling_metrics
Out[10]:
model order horizon mae rmse
0 Rolling ARIMA (3, 0, 3) 1 15.765002 20.96658

Plot rolling predictions

In [11]:
plt.figure(figsize=(8, 4))
plt.plot(rolling_clean["target_date"], rolling_clean["actual"], label="Actual")
plt.plot(rolling_clean["target_date"], rolling_clean["predicted"], label="Predicted")
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Rolling ARIMA Predictions")
plt.legend()
plt.tight_layout()
plt.show()
No description has been provided for this image