The Problem
This started as an exploratory project with two goals: stress-test AI-assisted development on a genuinely hard problem, and learn the strengths and weaknesses of different ML training approaches across different data spaces. Algorithmic trading is a perfect whetstone. The problem space is well-defined, the feedback loop is honest, and the data challenges are real. Multi-source ingestion from Polygon for equities and CoinGecko, GeckoTerminal, and Binance for crypto, each with different schemas, gaps, and reliability characteristics. Model training across multiple timeframes, model types, feature groups, and position sizing strategies to understand where different approaches break down. Paper and live trading integrations with Alpaca and Binance, with strict validation and local ledger parity to catch any discrepancy between what the system thinks it did and what actually happened. A full frontend dashboard was built alongside the backend but was intentionally deprioritised. Keeping the UI in sync with rapidly changing model configurations added friction without adding signal. The honest result: capital protection models trained extremely well, showing strong resistance in bear and sideways markets. Generating consistent alpha is a harder problem. This is a project worth circling back to.
A platform built to run trading hypothesis experiments systematically at scale. Each hypothesis lives in a YAML experiment config, not in code — results tracked in PostgreSQL across binary and multi-class architectures, multiple timeframes, and dozens of feature group configurations. A three-tier model gate enforces scientific rigour before any model touches live trading. Data pipelines pull from four sources, normalise across different schemas, and fill gaps continuously. The system runs on dedicated hardware with GPU-accelerated training on an RTX 4090 and paper trading sessions running around the clock.
What's interesting
Hypotheses live in config, not code
Each experiment is a YAML file defining the model type, symbol, timeframe, feature groups, holding period, and profit target. Running a new hypothesis is a single command against a config file. This kept the codebase stable while allowing rapid iteration across dozens of configurations. All results are tracked in PostgreSQL, never in JSON registries or flat files.
Scientific method enforced at the architecture level
A three-tier model gate runs after every training run: sanity checks (files exist, metrics populated, minimum data coverage), performance checks (return positive, Sharpe above 0.5, win rate in range, beats buy-and-hold), and statistical validation. Data splits are always chronological, never shuffled. Forward returns are calculated after the split to prevent leakage. Only gate-passing models get committed to the models directory.
Feature pipeline as the single source of truth
Early in the project, training and backtesting were calculating rolling features over different historical windows, producing different feature distributions and making results unreliable. A centralised FeaturePipeline class enforces identical feature calculation across training, backtesting, and live inference. Every model saves its feature metadata at training time and loads it at inference. This was the unlock that made experiment results trustworthy.
Multi-source data pipeline with quality validation
Market data comes from Polygon (equities live and historical), CoinGecko (crypto 5-minute live), GeckoTerminal (crypto 1-minute live and historical backfill), and Binance (crypto gap fill and execution), each with different schemas, update frequencies, and reliability. A structural volume coverage issue with Binance.US USD pairs (7.6% of expected coverage, because trading is USDT-dominated) led to separate feature sets: 12 features for crypto, 14 for equities. Collectors run continuously and gap fillers backfill missing windows. Data quality checks run before training.