FI-2010 Benchmark Dataset - EDA
The FI-2010 dataset is the standard benchmark for limit order book mid-price movement prediction research. It covers ten trading days of five Finnish stocks from the Helsinki Stock Exchange (June 1-14, 2010), sampled from the Nasdaq Nordic exchange, reconstructed at 10 levels of depth.
The dataset is structured as a supervised classification problem: given a 144-dimensional snapshot of the order book state, predict whether the mid-price will move up, down, or remain stationary over the next k events (k = 1, 2, 3, 5, 10).
Data source: Ntakaris et al. (2018), available via the authors upon request.
Original dataset: https://etsin.fairdata.fi/dataset/73eb48d7-4dbc-4a10-a52a-da745b47a649
This notebook uses a subset (fi2010_subset.npz): 2,000 rows sampled from each of the 9 cross-validation folds' test splits (18,000 rows total, ~10 MB). The full dataset contains over 300 thousand rows across train and test. All EDA code works identically on the full dataset. Replace fi2010_subset.npz with the full .txt files via the loader in the methodology section.
| Parameter | Value |
|---|---|
| Stocks | 5 Finnish stocks, Helsinki Stock Exchange |
| Period | June 1–14, 2010 (10 trading days) |
| LOB depth | 10 levels (bid and ask) |
| Features | 144 (40 raw LOB + 104 time-series derived) |
| Label horizons | k = 1, 2, 3, 5, 10 events ahead |
| Label classes | 1 = down, 2 = stationary, 3 = up |
| Normalisation | Z-score (NoAuction variant used here) |
| CV splits | 9 anchored walk-forward folds |
| Subset rows | 18,000 (2,000 per fold) |
Related papers
Benchmark Dataset for Mid-Price Forecasting of Limit Order Book Data with Machine Learning Methods
Ntakaris, A., Magris, M., Kanniainen, J., Gabbouj, M., and Iosifidis, A.
Journal of Forecasting, 37(8), 852-866, 2018DeepLOB: Deep Learning for Limit Order Books
Zhang, Z., Zohren, S., and Roberts, S.
IEEE Transactions on Signal Processing, 2019Temporal Attention Augmented Bilinear Network for Financial Time-Series Data Analysis
Tran, D.T., Iosifidis, A., Kanniainen, J., and Gabbouj, M.
IEEE Transactions on Neural Networks and Learning Systems, 2019
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import warnings
warnings.filterwarnings('ignore')
from pathlib import Path
from IPython.display import display, HTML
FL_BLUE = '#2563eb'
FL_SLATE = '#64748b'
FL_AMBER = '#f59e0b'
FL_GREEN = '#16a34a'
FL_RED = '#ef4444'
FL_BG = '#ffffff'
FL_GRID = '#e2e8f0'
FL_TEXT = '#0f172a'
FL_TEXT2 = '#334155'
FL_BORDER = '#e2e8f0'
matplotlib.rcParams.update({
'figure.facecolor': FL_BG,
'axes.facecolor': FL_BG,
'axes.edgecolor': FL_BORDER,
'axes.labelcolor': FL_TEXT2,
'axes.spines.top': False,
'axes.spines.right': False,
'axes.grid': True,
'grid.color': FL_GRID,
'grid.linewidth': 0.7,
'xtick.color': FL_TEXT2,
'ytick.color': FL_TEXT2,
'xtick.labelsize': 10,
'ytick.labelsize': 10,
'axes.labelsize': 11,
'axes.titlesize': 12,
'axes.titlecolor': FL_TEXT,
'axes.titlepad': 12,
'legend.frameon': False,
'legend.fontsize': 10,
'figure.dpi': 300,
'savefig.bbox': 'tight',
'font.family': 'sans-serif',
'font.sans-serif': ['Inter', 'Helvetica Neue', 'Arial', 'DejaVu Sans'],
})
# To use the full dataset instead, replace this block with load_fi2010_txt()
# defined in the Methodology section at the bottom of this notebook.
SUBSET_PATH = Path('data/fi2010_subset.npz')
arc = np.load(SUBSET_PATH, allow_pickle=True)
X = arc['X'] # (N, 144) float32
Y = arc['Y'] # (N, 5) int8 - columns = k=1,2,3,5,10
CF = arc['cf'] # (N,) int8 - fold index 1–9
HORIZON_LABELS = arc['horizon_labels'] # [1,2,3,5,10]
FEATURE_NAMES = arc['feature_names'] # 144 strings
N, D = X.shape
print(f'Subset loaded: {N:,} rows × {D} features')
#print(f'Label matrix: {Y.shape} (5 horizons)')
#print(f'Horizons: {HORIZON_LABELS}')
#print(f'Folds present: {np.unique(CF)}')
#print(f'Rows per fold: {np.bincount(CF)[1:]}')
LABEL_MAP = {1: 'Down', 2: 'Stationary', 3: 'Up'}
LABEL_COLORS = {1: FL_RED, 2: FL_SLATE, 3: FL_GREEN}
Subset loaded: 18,000 rows × 144 features
Dataset structure
Each row in the feature matrix represents a single order book state snapshot. The 144 features are organised as:
- Features 1-40: The raw LOB state - 10 bid prices, 10 ask prices, 10 bid sizes, 10 ask sizes (all z-score normalised)
- Features 41-144: Time-series derived features - three lags of the 40 raw features plus 24 additional statistics (means, differences, absolute differences)
The label matrix has 5 columns, one per prediction horizon. Each label is 1 (down), 2 (stationary), or 3 (up), representing the direction of the mid-price change over the next k order book events.
raw_lob = FEATURE_NAMES[:40]
derived = FEATURE_NAMES[40:]
summary_rows = []
for j, h in enumerate(HORIZON_LABELS):
vals, cnts = np.unique(Y[:, j], return_counts=True)
d = {LABEL_MAP[int(v)]: f'{c:,} ({c / N:.1%})' for v, c in zip(vals, cnts)}
summary_rows.append({
'Horizon k': int(h),
'Down (1)': d.get('Down', '-'),
'Stationary (2)': d.get('Stationary', '-'),
'Up (3)': d.get('Up', '-'),
})
feature_summary_df = pd.DataFrame([
{
'Metric': 'Feature matrix',
'Value': f'{N:,} rows x {D} columns'
},
{
'Metric': 'Raw LOB features',
'Value': f'features 1 to 40 ({len(raw_lob)} features)'
},
{
'Metric': 'Derived features',
'Value': f'features 41 to 144 ({len(derived)} features)'
}
])
label_summary_df = pd.DataFrame(summary_rows)
print('Feature summary')
display(feature_summary_df)
print('Label distribution by horizon')
display(label_summary_df)
Feature summary
| Metric | Value | |
|---|---|---|
| 0 | Feature matrix | 18,000 rows x 144 columns |
| 1 | Raw LOB features | features 1 to 40 (40 features) |
| 2 | Derived features | features 41 to 144 (104 features) |
Label distribution by horizon
| Horizon k | Down (1) | Stationary (2) | Up (3) | |
|---|---|---|---|---|
| 0 | 1 | 5,747 (31.9%) | 6,392 (35.5%) | 5,861 (32.6%) |
| 1 | 2 | 6,473 (36.0%) | 4,853 (27.0%) | 6,674 (37.1%) |
| 2 | 3 | 7,029 (39.1%) | 3,803 (21.1%) | 7,168 (39.8%) |
| 3 | 5 | 7,593 (42.2%) | 2,642 (14.7%) | 7,765 (43.1%) |
| 4 | 10 | 8,057 (44.8%) | 1,624 (9.0%) | 8,319 (46.2%) |
Label distribution across horizons
The dataset is intentionally class-imbalanced - the stationary class dominates at short horizons because most order book events do not move the mid-price. As the horizon grows (k=10), stationary cases decrease and directional moves become more balanced. This imbalance is a key challenge for all LOB prediction models.
selected_horizons = [1, 3, 10]
for h in selected_horizons:
h_idx = np.where(HORIZON_LABELS == h)[0][0]
vals, cnts = np.unique(Y[:, h_idx], return_counts=True)
labels = [LABEL_MAP[int(v)] for v in vals]
colors = [LABEL_COLORS[int(v)] for v in vals]
pcts = cnts / cnts.sum() * 100
plt.figure(figsize=(8, 4.5))
bars = plt.bar(labels, pcts, color=colors, alpha=0.85, width=0.5)
for bar, pct in zip(bars, pcts):
plt.text(
bar.get_x() + bar.get_width() / 2,
bar.get_height() + 0.5,
f'{pct:.1f}%',
ha='center',
fontsize=9,
color=FL_TEXT2
)
plt.title(f'Label distribution for k = {h}')
plt.ylabel('Share (%)')
plt.ylim(0, max(pcts) * 1.15)
plt.tick_params(axis='both', length=0)
plt.tight_layout()
plt.show()
Label distribution across CV folds (k=1)
The 9 cross-validation folds are anchored walk-forward splits - each fold corresponds to a later time window, with training data always preceding test data. The label distribution varies across folds, reflecting different market regimes on different days.
h_idx = 0
folds = np.unique(CF)
fold_dist = []
for cf in folds:
mask = CF == cf
y_cf = Y[mask, h_idx]
vals, cnts = np.unique(y_cf, return_counts=True)
row = {
'Fold': int(cf),
'N': int(mask.sum())
}
for v in [1, 2, 3]:
idx = np.where(vals == v)[0]
row[LABEL_MAP[v]] = f'{cnts[idx[0]] / mask.sum():.1%}' if len(idx) else '0.0%'
fold_dist.append(row)
fold_dist_df = pd.DataFrame(fold_dist)
print('Label distribution per fold for k = 1')
display(fold_dist_df)
plt.figure(figsize=(8, 4.5))
bottoms = np.zeros(len(folds))
for cls, color in [(1, FL_RED), (2, FL_SLATE), (3, FL_GREEN)]:
pcts = []
for cf in folds:
mask = CF == cf
pcts.append((Y[mask, h_idx] == cls).mean() * 100)
plt.bar(
folds,
pcts,
bottom=bottoms,
color=color,
alpha=0.85,
label=LABEL_MAP[cls],
width=0.6
)
bottoms += np.array(pcts)
plt.xlabel('CV fold')
plt.ylabel('Share (%)')
plt.title('Label distribution per fold for k = 1')
plt.xticks(folds)
plt.legend(loc='upper right')
plt.tick_params(axis='both', length=0)
plt.tight_layout()
plt.show()
Label distribution per fold for k = 1
| Fold | N | Down | Stationary | Up | |
|---|---|---|---|---|---|
| 0 | 1 | 2000 | 31.4% | 36.4% | 32.2% |
| 1 | 2 | 2000 | 23.6% | 49.2% | 27.2% |
| 2 | 3 | 2000 | 27.8% | 37.1% | 35.1% |
| 3 | 4 | 2000 | 35.9% | 29.2% | 34.9% |
| 4 | 5 | 2000 | 35.6% | 31.7% | 32.6% |
| 5 | 6 | 2000 | 34.6% | 30.6% | 34.7% |
| 6 | 7 | 2000 | 34.3% | 35.5% | 30.2% |
| 7 | 8 | 2000 | 31.5% | 32.5% | 36.0% |
| 8 | 9 | 2000 | 32.6% | 37.2% | 30.1% |
Raw LOB feature statistics
The first 40 features are the z-score normalised order book state: 10 bid prices, 10 ask prices, 10 bid sizes, 10 ask sizes. After normalisation, prices cluster near zero with unit variance. Size features show heavier tails due to occasional large orders.
X_raw = X[:, :40]
feat_mean = X_raw.mean(axis=0)
feat_std = X_raw.std(axis=0)
feature_groups = (
[f'BidP{i}' for i in range(1, 11)] +
[f'AskP{i}' for i in range(1, 11)] +
[f'BidS{i}' for i in range(1, 11)] +
[f'AskS{i}' for i in range(1, 11)]
)
group_colors = (
[FL_GREEN] * 10 +
[FL_RED] * 10 +
[FL_BLUE] * 10 +
[FL_AMBER] * 10
)
import matplotlib.patches as mpatches
legend = [
mpatches.Patch(color=FL_GREEN, label='Bid prices (1 to 10)'),
mpatches.Patch(color=FL_RED, label='Ask prices (1 to 10)'),
mpatches.Patch(color=FL_BLUE, label='Bid sizes (1 to 10)'),
mpatches.Patch(color=FL_AMBER, label='Ask sizes (1 to 10)'),
]
plt.figure(figsize=(8, 4.5))
plt.bar(range(40), feat_mean, color=group_colors, alpha=0.8, width=0.7)
plt.axhline(0, color=FL_GRID, linewidth=0.8)
plt.ylabel('Mean (z-score)')
plt.title('Raw LOB features: mean value after z-score normalization')
plt.xticks([])
plt.legend(handles=legend, fontsize=9, loc='upper right')
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
plt.figure(figsize=(8, 4.5))
plt.bar(range(40), feat_std, color=group_colors, alpha=0.8, width=0.7)
plt.ylabel('Std (z-score)')
plt.title('Raw LOB features: standard deviation')
plt.xticks(
range(0, 40, 5),
[feature_groups[i] for i in range(0, 40, 5)],
rotation=30,
ha='right'
)
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
raw_feature_stats_df = pd.DataFrame({
'Feature': feature_groups,
'Mean': feat_mean,
'Std': feat_std
})
raw_feature_stats_df[['Mean', 'Std']] = raw_feature_stats_df[['Mean', 'Std']].round(4)
print('Raw LOB feature summary')
display(raw_feature_stats_df.head())
Raw LOB feature summary
| Feature | Mean | Std | |
|---|---|---|---|
| 0 | BidP1 | 0.3906 | 0.1625 |
| 1 | BidP2 | -0.4987 | 0.3760 |
| 2 | BidP3 | 0.3891 | 0.1623 |
| 3 | BidP4 | -0.4689 | 0.2286 |
| 4 | BidP5 | 0.3912 | 0.1625 |
Feature correlation structure
Correlation matrix of the 40 raw LOB features. Bid and ask price levels are highly correlated with each other (the order book is a contiguous price ladder). Size features show lower correlation - volume at each level varies more independently.
corr = np.corrcoef(X[:, :40].T)
plt.figure(figsize=(8, 6))
im = plt.imshow(corr, cmap='RdYlGn', vmin=-1, vmax=1, aspect='auto')
plt.colorbar(im, fraction=0.03, pad=0.02)
for boundary in [10, 20, 30]:
plt.axhline(boundary - 0.5, color='white', linewidth=1.5)
plt.axvline(boundary - 0.5, color='white', linewidth=1.5)
plt.xticks(
[5, 15, 25, 35],
['Bid prices', 'Ask prices', 'Bid sizes', 'Ask sizes'],
fontsize=10
)
plt.yticks(
[5, 15, 25, 35],
['Bid prices', 'Ask prices', 'Bid sizes', 'Ask sizes'],
fontsize=10
)
plt.title('Raw LOB feature correlation matrix (40 features)')
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
Mid-price proxy series
The raw dataset does not include a direct price column - prices are z-score normalised. Feature 1 (BidPrice1, the best bid) and Feature 11 (AskPrice1, the best ask) can be combined to form a normalised mid-price proxy. Plotting this across a single fold shows the characteristic patterns that models are trained to predict.
fold1_mask = CF == 1
bid1 = X[fold1_mask, 0]
ask1 = X[fold1_mask, 10]
mid = (bid1 + ask1) / 2
spread = ask1 - bid1
n_fold = fold1_mask.sum()
idx = np.arange(n_fold)
plt.figure(figsize=(8, 4.5))
plt.plot(idx, mid, color=FL_BLUE, linewidth=0.8, alpha=0.9)
plt.ylabel('Normalized mid-price')
plt.title('Fold 1 normalized mid-price proxy (BidP1 + AskP1) / 2')
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
plt.figure(figsize=(8, 4.5))
plt.fill_between(idx, spread, alpha=0.3, color=FL_AMBER)
plt.plot(idx, spread, color=FL_AMBER, linewidth=0.6)
plt.ylabel('Normalized spread')
plt.title('Bid ask spread proxy (AskP1 - BidP1, z-score units)')
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
y_fold1 = Y[fold1_mask, 0]
plt.figure(figsize=(8, 4.5))
for cls, color in [(1, FL_RED), (3, FL_GREEN)]:
mask_cls = y_fold1 == cls
plt.scatter(
idx[mask_cls],
mid[mask_cls],
c=color,
s=1,
alpha=0.4,
label=LABEL_MAP[cls]
)
plt.scatter(
idx[y_fold1 == 2],
mid[y_fold1 == 2],
c=FL_SLATE,
s=0.5,
alpha=0.2,
label='Stationary'
)
plt.ylabel('Normalized mid-price')
plt.xlabel('Event index (fold 1)')
plt.title('Mid-price colored by k = 1 label')
plt.legend(markerscale=6, fontsize=9, loc='upper right')
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
Order book depth profile
Average z-score normalised size at each of the 10 bid and ask levels. Level 1 (best price) carries the most liquidity pressure. Deeper levels accumulate progressively more resting volume - a pattern consistent with the LOBSTER data examined in the AMZN EDA.
bid_size_feats = X[:, 20:30].mean(axis=0)
ask_size_feats = X[:, 30:40].mean(axis=0)
levels = np.arange(1, 11)
width = 0.35
plt.figure(figsize=(8, 4.5))
plt.bar(
levels - width / 2,
bid_size_feats,
width=width,
color=FL_GREEN,
alpha=0.85,
label='Bid size'
)
plt.bar(
levels + width / 2,
ask_size_feats,
width=width,
color=FL_RED,
alpha=0.85,
label='Ask size'
)
plt.xlabel('Level')
plt.ylabel('Mean z-score normalized size')
plt.title('Average order book depth per level')
plt.xticks(levels)
plt.axhline(0, color=FL_GRID, linewidth=0.8)
plt.legend()
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
depth_df = pd.DataFrame({
'Level': levels,
'Bid size mean': bid_size_feats,
'Ask size mean': ask_size_feats
})
depth_df[['Bid size mean', 'Ask size mean']] = depth_df[['Bid size mean', 'Ask size mean']].round(4)
display(depth_df)
| Level | Bid size mean | Ask size mean | |
|---|---|---|---|
| 0 | 1 | 0.3930 | 0.3848 |
| 1 | 2 | -0.6110 | -0.4346 |
| 2 | 3 | 0.3865 | 0.3947 |
| 3 | 4 | -0.6324 | -0.3377 |
| 4 | 5 | 0.3936 | 0.3838 |
| 5 | 6 | -0.4682 | -0.3653 |
| 6 | 7 | 0.3857 | 0.3955 |
| 7 | 8 | -0.5509 | -0.3405 |
| 8 | 9 | 0.3941 | 0.3828 |
| 9 | 10 | -0.3735 | -0.3402 |
Label entropy and class balance by horizon
Shannon entropy of the label distribution measures how predictable the task is: lower entropy = more imbalanced = the majority class dominates. As the horizon grows, the distribution becomes more balanced and entropy rises, making the prediction task structurally harder despite the longer lookahead.
from scipy.stats import entropy as shannon_entropy
rows = []
for j, h in enumerate(HORIZON_LABELS):
vals, cnts = np.unique(Y[:, j], return_counts=True)
probs = cnts / cnts.sum()
ent = float(shannon_entropy(probs, base=2))
majority_pct = probs.max()
rows.append({
'Horizon k': int(h),
'Down %': f'{probs[vals == 1][0]:.1%}' if 1 in vals else '0%',
'Stationary %': f'{probs[vals == 2][0]:.1%}' if 2 in vals else '0%',
'Up %': f'{probs[vals == 3][0]:.1%}' if 3 in vals else '0%',
'Majority class': f'{majority_pct:.1%}',
'Shannon entropy': round(ent, 3),
})
entropy_df = pd.DataFrame(rows)
print('Label balance summary by horizon')
display(entropy_df)
entropies = entropy_df['Shannon entropy'].tolist()
plt.figure(figsize=(8, 4.5))
plt.plot(HORIZON_LABELS, entropies, marker='o', color=FL_BLUE, linewidth=2)
plt.axhline(
np.log2(3),
color=FL_GRID,
linewidth=1,
linestyle='--',
label='Max entropy (uniform)'
)
plt.xlabel('Prediction horizon k')
plt.ylabel('Shannon entropy (bits)')
plt.title('Label entropy vs horizon')
plt.xticks(HORIZON_LABELS)
plt.legend()
plt.tick_params(length=0)
plt.tight_layout()
plt.show()
Label balance summary by horizon
| Horizon k | Down % | Stationary % | Up % | Majority class | Shannon entropy | |
|---|---|---|---|---|---|---|
| 0 | 1 | 31.9% | 35.5% | 32.6% | 35.5% | 1.583 |
| 1 | 2 | 36.0% | 27.0% | 37.1% | 37.1% | 1.571 |
| 2 | 3 | 39.1% | 21.1% | 39.8% | 39.8% | 1.533 |
| 3 | 5 | 42.2% | 14.7% | 43.1% | 43.1% | 1.455 |
| 4 | 10 | 44.8% | 9.0% | 46.2% | 46.2% | 1.347 |