ARIMA Forecasting Model Reference

This notebook is a practical reference for ARIMA-style forecasting.

It shows:

how to create or load a small univariate time series
what ARIMA order parts mean
how to search candidate orders by AIC
how to forecast one step or several steps ahead
how to run rolling predictions without future leakage
how to save outputs

Install packages

In [ ]:

pip install pandas numpy statsmodels matplotlib scikit-learn

Create folders

In [1]:

from pathlib import Path

data_dir = Path("data")
output_dir = Path("outputs")

data_dir.mkdir(parents=True, exist_ok=True)
output_dir.mkdir(parents=True, exist_ok=True)

# print(data_dir.resolve())
# print(output_dir.resolve())

Load a small time series

In [2]:

import numpy as np
import pandas as pd

try:
    import statsmodels.api as sm

    raw = sm.datasets.sunspots.load_pandas().data.copy()
    raw["YEAR"] = raw["YEAR"].astype(int)
    series_df = pd.DataFrame({
        "date": pd.to_datetime(raw["YEAR"].astype(str) + "-01-01"),
        "value": raw["SUNACTIVITY"].astype(float)
    })
except Exception:
    rng = np.random.default_rng(42)
    n = 180
    date_index = pd.date_range("1900-01-01", periods=n, freq="YS")
    value = 50 + 20 * np.sin(np.arange(n) / 7) + rng.normal(0, 5, n)
    series_df = pd.DataFrame({
        "date": date_index,
        "value": value
    })

series_df = series_df.sort_values("date").reset_index(drop=True)
series_df.to_csv(data_dir / "arima_reference_series.csv", index=False)

print(series_df.shape)
series_df.head()

(309, 2)

Out[2]:

	date	value
0	1700-01-01	5.0
1	1701-01-01	11.0
2	1702-01-01	16.0
3	1703-01-01	23.0
4	1704-01-01	36.0

Create a pandas Series

In [3]:

series = series_df.set_index("date")["value"].astype(float)

series.tail()

Out[3]:

date
2004-01-01    40.4
2005-01-01    29.8
2006-01-01    15.2
2007-01-01     7.5
2008-01-01     2.9
Name: value, dtype: float64

Plot the series

In [4]:

import matplotlib.pyplot as plt

plt.figure(figsize=(8, 4))
plt.plot(series.index, series.values)
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Reference Time Series")
plt.tight_layout()
plt.show()

No description has been provided for this image

ARIMA order parts

Part	Meaning
`p`	Number of autoregressive past value terms.
`d`	Number of differencing operations.
`q`	Number of moving-average past error terms.
`AIC`	Model selection score balancing fit and complexity.

Split train and validation data

In [5]:

split_idx = int(len(series) * 0.80)

train_series = series.iloc[:split_idx]
valid_series = series.iloc[split_idx:]

print(train_series.index.min(), train_series.index.max())
print(valid_series.index.min(), valid_series.index.max())

1700-01-01 00:00:00 1946-01-01 00:00:00
1947-01-01 00:00:00 2008-01-01 00:00:00

Select ARIMA order by AIC

In [6]:

import warnings

from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tools.sm_exceptions import ConvergenceWarning

warnings.filterwarnings("ignore", category=UserWarning, module="statsmodels")
warnings.filterwarnings("ignore", category=ConvergenceWarning)

p_grid = [0, 1, 2, 3]
d_grid = [0, 1]
q_grid = [0, 1, 2, 3]

def select_arima_order(values):
    y = np.asarray(values, dtype=float)
    y = y[np.isfinite(y)]

    best_order = None
    best_aic = np.inf
    rows = []

    for p in p_grid:
        for d in d_grid:
            for q in q_grid:
                if p == 0 and d == 0 and q == 0:
                    continue

                order = (p, d, q)

                try:
                    fit = ARIMA(
                        y,
                        order=order,
                        enforce_stationarity=False,
                        enforce_invertibility=False
                    ).fit()
                    aic = float(fit.aic) if np.isfinite(fit.aic) else np.inf
                except Exception:
                    aic = np.inf

                rows.append({
                    "p": p,
                    "d": d,
                    "q": q,
                    "aic": aic
                })

                if aic < best_aic:
                    best_aic = aic
                    best_order = order

    return best_order, best_aic, pd.DataFrame(rows)

best_order, best_aic, order_search = select_arima_order(train_series.values)

print(best_order)
print(best_aic)

order_search.sort_values("aic").head()

(3, 0, 3)
1988.401253640831

Out[6]:

	p	d	q	aic
26	3	0	3	1988.401254
30	3	1	3	1994.446241
21	2	1	2	1997.513314
22	2	1	3	2006.143511
28	3	1	1	2007.765344

Fit the selected model and forecast validation range

In [7]:

fit = ARIMA(
    train_series.values,
    order=best_order,
    enforce_stationarity=False,
    enforce_invertibility=False
).fit()

forecast_values = fit.forecast(steps=len(valid_series))

forecast_df = pd.DataFrame({
    "date": valid_series.index,
    "actual": valid_series.values,
    "forecast": forecast_values
})

forecast_df.head()

Out[7]:

	date	actual	forecast
0	1947-01-01	151.6	118.908909
1	1948-01-01	136.3	120.537470
2	1949-01-01	134.7	105.618410
3	1950-01-01	83.9	79.149295
4	1951-01-01	69.4	49.478250

Evaluate the forecast

In [8]:

from sklearn.metrics import mean_absolute_error, mean_squared_error

metrics = pd.DataFrame([
    {
        "model": "ARIMA",
        "order": str(best_order),
        "mae": mean_absolute_error(forecast_df["actual"], forecast_df["forecast"]),
        "rmse": np.sqrt(mean_squared_error(forecast_df["actual"], forecast_df["forecast"]))
    }
])

metrics

Out[8]:

	model	order	mae	rmse
0	ARIMA	(3, 0, 3)	33.93463	46.552381

Rolling ARIMA predictions

In [9]:

def rolling_arima_predictions(series, start_position, horizon, order):
    rows = []

    for pos in range(start_position, len(series) - horizon):
        train_to_date = series.iloc[:pos + 1]
        decision_date = series.index[pos]
        target_date = series.index[pos + horizon]
        actual = series.iloc[pos + horizon]

        try:
            model = ARIMA(
                train_to_date.values,
                order=order,
                enforce_stationarity=False,
                enforce_invertibility=False
            ).fit()
            forecast_values = model.forecast(steps=horizon)
            predicted = float(forecast_values[-1])
        except Exception:
            predicted = np.nan

        rows.append({
            "decision_date": decision_date,
            "target_date": target_date,
            "horizon": horizon,
            "actual": actual,
            "predicted": predicted
        })

    return pd.DataFrame(rows)

rolling_df = rolling_arima_predictions(
    series,
    start_position=split_idx,
    horizon=1,
    order=best_order
)

rolling_df.head()

Out[9]:

	decision_date	target_date	horizon	actual	predicted
0	1947-01-01	1948-01-01	1	136.3	160.871123
1	1948-01-01	1949-01-01	1	134.7	108.012351
2	1949-01-01	1950-01-01	1	83.9	106.334482
3	1950-01-01	1951-01-01	1	69.4	39.007939
4	1951-01-01	1952-01-01	1	31.5	45.923533

Evaluate rolling predictions

In [10]:

rolling_clean = rolling_df.dropna(subset=["predicted"]).copy()

rolling_metrics = pd.DataFrame([
    {
        "model": "Rolling ARIMA",
        "order": str(best_order),
        "horizon": 1,
        "mae": mean_absolute_error(rolling_clean["actual"], rolling_clean["predicted"]),
        "rmse": np.sqrt(mean_squared_error(rolling_clean["actual"], rolling_clean["predicted"]))
    }
])

rolling_metrics

Out[10]:

	model	order	horizon	mae	rmse
0	Rolling ARIMA	(3, 0, 3)	1	15.765002	20.96658

Plot rolling predictions

In [11]:

plt.figure(figsize=(8, 4))
plt.plot(rolling_clean["target_date"], rolling_clean["actual"], label="Actual")
plt.plot(rolling_clean["target_date"], rolling_clean["predicted"], label="Predicted")
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Rolling ARIMA Predictions")
plt.legend()
plt.tight_layout()
plt.show()