Notebooks | MLNotebooks

30 results

ML Models

A practical ARIMA forecasting reference using a small time-series dataset with fallback data. Covers ordered splitting, ARIMA order selection, model fitting, rolling forecasts, forecast errors, and future prediction.

KNN Regression Reference

A practical K-nearest neighbors regression reference using a small scikit-learn dataset. Covers scaling, train-test split, pipeline fitting, distance behavior, k-value comparison, evaluation metrics, and prediction examples.

LSTM Regression Reference

A practical LSTM regression reference using generated sequence data. Covers sequence tensor creation, PyTorch model definition, scaling, training loop, validation loss tracking, prediction, and saved outputs.

Random Forest Regression Reference

A practical Random Forest regression reference using a tabular scikit-learn dataset. Covers model training, evaluation metrics, feature importance, tree-depth behavior, prediction inspection, and saved comparison outputs.

XGBoost Regression Reference

A practical XGBoost regression reference using a tabular scikit-learn dataset. Covers boosted tree training, parameter meanings, evaluation metrics, feature importance, boosting-round comparison, and prediction examples.

NLP

Article Fetching and Text Extraction Reference

A practical reference for fetching article pages with requests, inspecting HTTP responses, extracting readable text and metadata with Trafilatura, and debugging failed URLs.

NLP Text Embedding and Sentiment Reference

A practical NLP reference for article text cleaning, keyword filtering, VADER sentiment scoring, sentence embeddings, cosine similarity, and semantic search.

Topic Sentiment Index with GDELT

Daily News Sentiment Index derived from GDELT news coverage across selected topics. Built from 15-minute GKG tiles with keyword relevance scoring with optimized download and retrieval pipeline.

Statistics

Python Statistical Tests Reference

A practical Python reference for paired observations, grouped metric summaries, Friedman tests, Wilcoxon signed-rank tests, Bonferroni correction, and model comparison tables.

Data Science

Python Files and Data Objects Reference

A practical Python reference for pathlib paths, text files, JSON configs, NumPy arrays, reshape, hstack, vstack, NPZ, pickle, Parquet, CSV, and data export choices.

Remote Sensing

Earth Engine Satellite Features

A practical Earth Engine workflow for loading GeoJSON regions, initializing with an EE project, extracting compact Sentinel-1 and Sentinel-2 satellite features, returning small values directly, and aligning weekly features in pandas.

Information Retrieval

MultiNews

Large-scale multi-document news summarisation dataset exploratory analysis. Widely used for summarisation, retrieval, and multi-document NLP research.

Quantitative Finance

Bet Against Beta Strategy

An empirical test of the Frazzini & Pedersen (2014) BAB factor on NASDAQ stocks. Compares risk-adjusted returns across beta buckets and against QQQ buy-and-hold.

Company Analysis

Volatility, earnings impact, profitability, and risk analysis for major NASDAQ listed companies: AAPL, NVDA, GOOGL, MSFT, META.

Dubai Rent Yields by Area

Project level rental yield estimation, regional ROI comparison, and 10-year cumulative return projection using Dubai Land Department public transaction data.

S&P 500 Long-Term Performance

S&P 500 annual return history, loss probability by holding period, rolling volatility with calibrated regime bands, multi-window Sharpe ratios, and bootstrap 5-year forward interval.

S&P 500 Uncertainty with Bootstrap

A walkthrough of bootstrap resampling using real S&P 500 data. Constructing historical return distribution with 1,000 simulated 5-year paths, and uncertainty bands without parametric assumptions.

Selective VIXY Hedge Strategy

A hedged equity strategy that holds ten large-cap NASDAQ stocks long at all times and selectively adds a VIXY position during volatility spikes. Compared against equal-weight buy-and-hold and QQQ.

Technical Strategy Backtests

A combined strategy notebook comparing ATR Exit, Bollinger Band Breakout, Simple Moving Average Crossover, and Oversold Mean-Reversion across ten large-cap NASDAQ stocks, with equal-weight buy-and-hold and QQQ as baselines.

U.S. Macro Analysis

Key U.S. macroeconomic indicators from FRED: interest rates and yield curve dynamics, inflation measures, real GDP growth and labor market conditions, and federal debt trends.

Market Microstructure

Crypto LOB Data with Binance API

Live and historical cryptocurrency market data via Binance's public REST and WebSocket API. Covers spot order book snapshots, aggregated trades, and streaming depth updates at 100ms granularity.

FI-2010

The standard benchmark dataset for limit order book mid-price movement prediction. Working on ten trading days of five Finnish stocks from the Helsinki Stock Exchange, reconstructed at 10 levels of depth with 144 features over five prediction horizons.

LOBSTER

Tick-level limit order book data reconstructed from NASDAQ's Historical TotalView ITCH feed. Working on sample data message stream and order book snapshots for each trading day, at up to 50 levels of depth.

Data Sources

Brave Search API

Using the Brave Search API from Python: authentication, web search, and result parsing.

FRED

Federal Reserve Economic Data: over 800,000 U.S. and international macroeconomic time series from the Federal Reserve Bank of St. Louis. Covers interest rates, inflation, GDP, employment, trade, and commodity prices.

World Bank

Open development data from the World Bank covering GDP, trade, poverty, health, and education indicators across countries and spanning decades. Accessible via the Data360 REST API, no authentication required.

Coding

Python Foundations Reference

A compact reference for foundational Python operations and concepts: variables, types, expressions, data structures, functions, NumPy, and Pandas.

Research Tools

Citation Retriever for BibTeX

Parse a .bib file, resolve citation counts from the Semantic Scholar API, and produce a ranked summary of your bibliography.

Playwright Screenshots

Capture high-fidelity screenshots of selected web pages using headless Chromium from a notebook cell.

Podcast Transcription

Fetch a podcast episode from an RSS feed, transcribe locally with Whisper, and organize the output into a structured Markdown file using a local LLM.