The Problem

Most algorithmic trading projects fail the same way: a hypothesis gets hardcoded into a training script, results look promising, and then changing any variable — the symbol, the timeframe, the feature set — requires editing code and hoping nothing else breaks. The experiment and the infrastructure are tangled. I wanted to build the infrastructure layer first: each hypothesis is a YAML config, the training runner is generic, results are tracked in PostgreSQL. Running a new experiment is a single command. The codebase stays stable while the hypothesis space expands. Algorithmic trading is the right problem to build this on — the feedback loop is honest, the data challenges are real (four sources with different schemas, gaps, and reliability characteristics), and the penalty for getting the infrastructure wrong shows up immediately in unreliable results.

A platform built to run trading hypothesis experiments systematically at scale. Each hypothesis lives in a YAML experiment config, not in code — results tracked in PostgreSQL across binary and multi-class architectures, multiple timeframes, and dozens of feature group configurations. A three-tier model gate enforces scientific rigour before any model touches live trading. Data pipelines pull from four sources, normalise across different schemas, and fill gaps continuously. The system runs on dedicated hardware with GPU-accelerated training on an RTX 4090 and paper trading sessions running around the clock.

What's interesting

Hypotheses live in config, not code

Each experiment is a YAML file defining the model type, symbol, timeframe, feature groups, holding period, and profit target. Running a new hypothesis is a single command against a config file. This kept the codebase stable while allowing rapid iteration across dozens of configurations. All results are tracked in PostgreSQL, never in JSON registries or flat files.

Scientific method enforced at the architecture level

A three-tier model gate runs after every training run: sanity checks (files exist, metrics populated, minimum data coverage), performance checks (return positive, Sharpe above 0.5, win rate in range, beats buy-and-hold), and statistical validation. Data splits are always chronological, never shuffled. Forward returns are calculated after the split to prevent leakage. Only gate-passing models get committed to the models directory.

Feature pipeline as the single source of truth

Early in the project, training and backtesting were calculating rolling features over different historical windows, producing different feature distributions and making results unreliable. A centralised FeaturePipeline class enforces identical feature calculation across training, backtesting, and live inference. Every model saves its feature metadata at training time and loads it at inference. This was the unlock that made experiment results trustworthy.

Multi-source data pipeline with quality validation

Market data comes from Polygon (equities live and historical), CoinGecko (crypto 5-minute live), GeckoTerminal (crypto 1-minute live and historical backfill), and Binance (crypto gap fill and execution), each with different schemas, update frequencies, and reliability. A structural volume coverage issue with Binance.US USD pairs (7.6% of expected coverage, because trading is USDT-dominated) led to separate feature sets: 12 features for crypto, 14 for equities. Collectors run continuously and gap fillers backfill missing windows. Data quality checks run before training.

Why this matters

Capital protection models trained reliably and the results held across different symbols and timeframes — bear and sideways markets have strong feature signatures that XGBoost identifies well. Generating consistent alpha is a harder problem. A pattern that works attracts capital and the pattern disappears; the right response is a faster iteration loop, not a more sophisticated model. The config system and three-tier gate were built for exactly that. This is a project worth circling back to.