Algorithmic Trading System

Gotham RL

Reinforcement learning for futures trading — combining multi-timeframe market structure analysis, LLM-powered regime classification, and MaskablePPO for discrete order generation on Nikkei 225 and Nasdaq 100 futures.

Observation features

Source packages

Data modules

Action combinations

01 — System Overview

What Gotham RL Does

A fully offline, broker-decoupled training simulation for futures trading. Processes historical market data through a feature pipeline, enriches it with LLM assessments, and trains an RL agent to generate trading decisions.

Market Structure Analysis

Inverse Fair Value Gaps, multi-timeframe trend alignment, and liquidity pools computed with pure Polars on 5-minute OHLCV candles.

LLM Regime Classification

Claude assesses market regime, setup quality, and risk-reward — encoded as 11 normalized observation features for the RL agent.

MaskablePPO Training

Gymnasium environment with action masking trains a PPO agent to select entry, size, stop, and target for each 5-minute bar.

Rigorous Evaluation

Sharpe, max drawdown, win rate, profit factor across episodes. Promotion criteria gate deployment quality.

Data Quality Framework

CandleValidator, gap detection, quality audits with per-day completeness, spike detection, and OHLC relationship checks.

Position Simulation

Fill simulation with 1-tick slippage, MAE/MFE tracking, breakeven trailing stops, and commission modeling (JPY & USD contracts).

02 — Trading Strategy

The Mental Model

An IFVG-centric trading strategy: detect structural gaps, wait for price inversion, confirm with multi-timeframe trend alignment, and size trades based on room-to-right.

Detect Fair Value Gaps

Scan 3-candle patterns for gaps ≥ 4 ticks. When price trades through, it becomes an Inverse FVG — a support/resistance zone. NIY tick size: 5.0 JPY, NQ: 0.25 USD.

features/ifvg.py

Confirm Trend Alignment

Multi-timeframe structure: daily (weight 0.4), 4-hour (0.35), 15-minute (0.25). Composite trend score from −1.0 to +1.0 filters for directional conviction.

features/trend.py

Assess Room-to-Right

Cluster swing points into liquidity pools, count obstacles toward target, check daily range exhaustion. Scored 0–100 for long and short.

features/room_to_right.py

LLM Regime Check

Claude classifies regime (trending, choppy, event-driven, low-liquidity), assesses quality, and provides confidence + narrative. Encoded as 11 features.

llm/

RL Agent Decides

MaskablePPO observes 39 features and selects from 81 action combinations: entry direction, position size, stop distance, and target type.

rl/env.py

03 — Architecture

System Architecture

Nine packages with clean dependency boundaries. Common provides the foundation, data handles ingestion, features compute market structure, sim manages offline simulation, llm integrates Claude, rl contains the environment, execution targets IB, and monitoring handles alerts.

04 — Data Ingestion

Data Sources & Backfill

Three pluggable data sources feed into a unified backfill pipeline that validates, normalizes, and stores 5-minute OHLCV candles in TimescaleDB.

HistData

1-minute CSVs aggregated to 5m via Polars group_by_dynamic. Symbol mapping: NQ→NSXUSD, NIY→JPXJPY. Semicolon or comma delimited.

data/sources.py

CSV Import

Pre-formatted 5m CSVs with standard 8-column schema. Auto-assigns contract months if missing. Supports Databento exports and custom sources.

data/sources.py

Interactive Brokers

Real-time and historical via IB Gateway on port 4002. Read-only mode by default. Client ID isolation for concurrent connections.

data/ib_client.py

Backfill Service

The BackfillService orchestrates historical data loading with resume support, batch processing, validation, and progress logging.

Check Resume Point

Query MAX(timestamp) from candles_5m for the instrument. Skip past already-stored dates.

Fetch in Batches

Configurable batch size (default 30 days). Each batch fetched from the CandleSource protocol.

Validate & Upsert

OHLCV validation rejects null/negative/inconsistent rows. Valid candles upserted to TimescaleDB. Commit per batch for crash resilience.

CandleSource Protocol

@runtime_checkable
class CandleSource(Protocol):
    def fetch(self, instrument: str, start: date, end: date) -> pl.DataFrame:
        """Return 5m OHLCV DataFrame with standard 8-column schema.
        Columns: timestamp, instrument, open, high, low, close,
                 volume, contract_month"""
        ...

05 — Data Quality

Validation & Quality Audit

Multi-layer validation ensures data integrity from ingestion through training. The CandleValidator runs four checks; the GapDetector finds missing candles; the QualityAudit produces per-day completeness reports.

CandleValidator Checks

`ohlc_invalid`	L ≤ O,C ≤ H relationship, null/missing values
`spike`	Close-to-close change > 5% threshold
`zero_volume`	Volume = 0 during active session hours
`gap`	Missing expected candles in session window

Severity levels: WARNING, ERROR, CRITICAL

Quality Audit Report

The QualityAudit scans date ranges producing DailyQualityMetrics:

completeness_pct — actual vs expected candle count
spike_count — anomalous price jumps
gap_count — missing candles per day
ohlc_issues — relationship violations
zero_volume_count — dead periods

Ingestion-level validation (validate_candles)

Fast Polars-based row filtering. Returns (valid_df, rejected_df). Checks:

# No NaN/null in OHLC, no negative prices
# low <= high, low <= open, low <= close
# high >= open, high >= close
# Volume: not null, not NaN, not negative

06 — Contract Management

Futures Contracts & Back-Adjustment

Quarterly contract calendar with automatic month assignment and two back-adjustment methods for creating seamless continuous price series across contract rolls.

Quarterly Calendar

H (Mar)	Jan – Mar
M (Jun)	Apr – Jun
U (Sep)	Jul – Sep
Z (Dec)	Oct – Dec

Example: 2024-02-15 → 2024H, 2024-07-01 → 2024U

Back-Adjustment Methods

Ratio (default) — Multiply prior prices by new_close / old_close at each roll. Preserves % returns, keeps prices positive.

Difference (Panama Canal) — Subtract the price gap at each roll from all earlier prices. Preserves absolute differences.

Roll detection & adjustment code

# Detect roll points where contract_month changes
rolls = detect_rolls(df_sorted)  # -> list[RollPoint]

# Each RollPoint contains:
#   timestamp, old_contract, new_contract,
#   old_close, new_close, ratio_factor, diff_factor

# Apply cumulative adjustment backward from newest contract
result = back_adjust(df, method=AdjustmentMethod.RATIO)

07 — Feature Pipeline

Market Structure Features

Six stages compute market structure from raw candles. Each is a pure function on Polars DataFrames — deterministic and side-effect free.

Pipeline Execution Order

1. Multi-Timeframe Aggregation

5m candles → 15m, 4h, daily frames via group_by_dynamic

sim/aggregate.py

2. Trend Features

EMA-20/50, slopes, ATR-14, swing points, structure classification, displacement detection. Composite score: D1×0.4 + H4×0.35 + M15×0.25

features/trend.py

3. IFVG Features

3-candle gap detection → inversion tracking → lifecycle states (active/tested/mitigated/expired) → quality scoring

features/ifvg.py

4. Session Features

Trading window boundaries, minutes since open, overnight range, prior session high/low/close

features/session.py

5. Room-to-Right Features

Liquidity pool clustering, obstacle counting, exhaustion measurement

features/room_to_right.py

6. Pre-Screen Gate

Boolean filter: IFVGs ≥ 2, |trend_score| ≥ 0.3, max RTR ≥ 30, in trading window

features/pre_screen.py

IFVG quality scoring criteria

Quality	Gap Ticks	Body Ratio
High	≥ 8	AND body > 70% of range
Medium	≥ 6	OR body > 60%
Low	Everything else

08 — Data Flow

From Raw Candles to Trained Agent

Data flows through five transformations: enrichment, assessment, episode construction, time-based splitting, and RL training. Each stage uses Polars DataFrames with Parquet I/O.

Episode Construction

Enriched candles grouped by session_date. Each session becomes one Episode with window boundaries from in_trading_window indices. Minimum 6 in-window candles required.

sim/episode_slicer.py

Time-Based Splits

Three-way split by date. No data leakage — strictly chronological:

Train ≤ 2024-06-30
Validation 2024-07-01 – 2024-12-31
Test > 2024-12-31

sim/splits.py

09 — LLM Integration

Claude Market Assessment

Anthropic Claude analyzes enriched candle windows to produce structured assessments with setup classification, confidence scoring, and market regime analysis. Assessments are cached to Parquet for reproducible training.

Prompt Engineering

Context builder sends last 50 bars as compact CSV plus trend summary, IFVG context, session timing, and room-to-right metrics. Target: under 4,000 tokens.

System prompt defines IFVG criteria, 5 setup types, 5 trend alignments, 4 market regimes, and room-to-right scoring guidelines.

llm/prompt_v1.txt llm/context_builder.py

LLMAssessment Schema

Pydantic model with 11 fields returned via tool use:

setup_type — 5 enum values
confidence — 0.0–1.0
ifvg_quality — high/medium/low
trend_alignment — per-timeframe dict
regime — 4 enum values
risk_reward_estimate, room_to_right_estimate
narrative, concerns

Assessment Encoding

The encode_assessment() function normalizes the structured assessment into 11 float features for the observation vector:

Feature	Encoding	Range
`llm_confidence`	Direct passthrough	[0, 1]
`llm_setup_type`	Enum index / 4.0	[0, 1]
`llm_ifvg_quality`	high=1.0, medium=0.66, low=0.33	[0.33, 1]
`llm_rr_estimate`	min(rr / 5.0, 1.0)	[0, 1]
`llm_regime`	Enum index / 3.0	[0, 1]
`llm_trend_*`	Bullish=1, Turning=±0.5, Neutral=0, Bearish=-1	[-1, 1]
`llm_rtr_estimate`	value / 100.0	[0, 1]
`llm_concern_count`	min(count / 5.0, 1.0)	[0, 1]

Setup types & market regimes

Setup Types: bullish_reversal, bearish_reversal, bullish_continuation, bearish_continuation, no_setup

Market Regimes: trending_day (strong directional), choppy (overlapping candles), event_driven (unusual volatility), low_liquidity (thin order book)

Trend Alignments: bullish, bearish, turning_bullish, turning_bearish, neutral

10 — RL Agent

Observation & Action Space

The agent observes 39 normalized features in 7 groups and selects from a MultiDiscrete action space with 4 dimensions. Action masking prevents invalid combinations.

Observation Vector Box(-1, 1, (39,), float32)

LLM11

llm_confidence
llm_setup_type
llm_ifvg_quality
llm_rr_estimate
llm_regime
llm_trend_* ×4
llm_rtr_estimate
llm_concern_count

IFVG5

ifvg_count_active
ifvg_nearest_dist
ifvg_best_quality
ifvg_avg_fill_pct
ifvg_direction_bias

Trend7

trend_score
ema20_slope_* ×3
structure_* ×3

Session5

minutes_since_open
window_progress_pct
overnight_range
prior_session_range
in_trading_window

RTR4

rtr_score_long
rtr_score_short
exhaustion_pct
exhaustion_flag

Micro + Portfolio7

vol_ratio
bar_range_norm
bar_body_ratio
in_position
unrealized_pnl_r
daily_pnl_r
trades_today

Action Space MultiDiscrete([3, 3, 3, 3])

Entry

0 Skip
1 Long
2 Short

Size

0 1 contract
1 2 contracts
2 3 contracts

Stop

0 Tight (1×ATR)
1 Medium (1.5×)
2 Wide (2×ATR)

Target

0 Nearest (2R)
1 Extended (3R)
2 Trail (5R)

Action Masking

Boolean mask of shape (12,) — flattened across 4 sub-actions. Entry blocked when:

Already in position → only skip allowed
Daily loss limit hit (−3R) → only skip
Max trades reached (5 per session) → only skip

Full observation normalization table

Feature	Normalization	Output Range
`ifvg_count_active`	val / 10.0	[0, 1]
`ifvg_nearest_dist`	val / ATR	[0, 1]
`ifvg_best_quality`	val / 3.0	[0, 1]
`rtr_score_*`	val / 100.0	[0, 1]
`minutes_since_open`	val / 480.0 (8h max)	[0, 1]
`overnight_range`	val / ATR	[0, 1]
`unrealized_pnl_r`	unrealized / risk / 5.0	[-1, 1]
`daily_pnl_r`	val / 5.0	[-1, 1]
`trades_today`	val / 10.0	[0, 1]
`ema20_slope_*`	rising=1, flat=0, falling=-1	[-1, 1]
`structure_*`	uptrend=1, ranging=0, downtrend=-1	[-1, 1]

11 — Position Management

Trade Lifecycle Simulation

The position module tracks the full lifecycle of each trade: fill simulation with slippage, MAE/MFE tracking, breakeven trailing stops, and commission modeling.

Fill Simulation

Orders filled within candle range with 1-tick slippage. Long: min(price + tick, high). Short: max(price - tick, low). Returns None if range doesn't reach order.

MAE / MFE Tracking

MAE (Max Adverse Excursion): worst drawdown in ticks. MFE (Max Favorable Excursion): best unrealized profit. Updated every bar for trade analysis.

Commissions

Per-contract commission based on currency:
JPY: 80.0 × size
USD: 1.25 × size
Deducted as commission ticks from realized P&L.

Position Update Logic

Each bar, the position is updated with conservative order checking: stop first, then target.

Check Stop Hit

Long: candle_low ≤ stop_price. Short: candle_high ≥ stop_price. Checked first (conservative).

Check Target Hit

Long: candle_high ≥ target. Short: candle_low ≤ target. Returns CompletedTrade if hit.

Update MAE/MFE & Trailing Stop

Track adverse/favorable excursion. When mfe_ticks ≥ risk_ticks (1R profit), move stop to breakeven.

Exit reasons: stop, target, session_end (force-close at window end).

12 — Reward Shaping

Reward Function

Three reward components shape the agent toward profitable, disciplined trading. Exact formulas from rl/reward.py:

Trade Reward

+realized_rr × 1.0
+0.3 if hit_target

Core signal: risk-reward ratio with target bonus.

Step Penalties

−0.05 if trades > 5
−0.8 × |pnl − (−2R)|

Overtrading + accelerating loss penalty.

Patience Bonus

+0.01 skip while flat

Prevents over-entering low-quality setups.

Exact Python implementations

def compute_trade_reward(trade: CompletedTrade) -> float:
    reward = trade.realized_rr * 1.0
    if trade.hit_target:
        reward += 0.3
    return reward

def compute_step_penalty(trades_today, daily_pnl_r,
                         max_trades=5, drawdown_threshold=-2.0):
    penalty = 0.0
    if trades_today > max_trades:
        penalty -= 0.05
    if daily_pnl_r < drawdown_threshold:
        penalty -= 0.8 * abs(daily_pnl_r - drawdown_threshold)
    return penalty

def compute_patience_bonus(action_is_skip, in_position):
    if action_is_skip and not in_position:
        return 0.01
    return 0.0

13 — Training Pipeline

MaskablePPO Training

sb3-contrib MaskablePPO trains on session episodes with periodic checkpointing, evaluation callbacks, and TensorBoard logging.

Hyperparameters

total_timesteps   1,000,000
learning_rate     3e-4
n_steps           2,048
batch_size        256
gamma             0.99
clip_range        0.2
ent_coef          0.01
policy_net_arch   [64, 64]

Risk Limits & Callbacks

max_daily_loss_r       3.0 R
max_trades_session     5
checkpoint_freq        50,000 steps
eval_freq              50,000 steps
trailing_stop          breakeven @ 1R

Policy Architecture

Two-layer MLP [64, 64] with shared feature extractor for actor and critic. Input: 39-dim observation vector. Output: MultiDiscrete([3,3,3,3]) action logits + value estimate.

14 — Evaluation & Baselines

Model Assessment

Trained models are evaluated on held-out episodes against baseline agents. Promotion criteria gate deployment quality.

EvalMetrics

≥ 1.0

Sharpe Ratio

≤ 5R

Max Drawdown

≥ 40%

Win Rate

✓

Promotion Gate

Additional metrics: profit factor, avg RR, total R, trades per session. Sharpe annualized by √252.

RandomAgent

Samples random valid actions respecting action masks. Uses per-sub-action sampling from valid options. Baseline for "can the agent beat random?"

rl/baselines.py

AlwaysEnterAgent

Always enters long with size=1, stop=medium, target=nearest (2R). Respects masks — skips when blocked. Tests "is selective entry better than always-in?"

rl/baselines.py

Promotion criteria code

def meets_promotion_criteria(metrics: EvalMetrics) -> bool:
    return (
        metrics.sharpe_ratio >= 1.0
        and metrics.max_drawdown_r <= 5.0
        and metrics.win_rate >= 0.40
    )

15 — Model Registry

Model Storage & Metadata

Trained models are saved with full metadata for reproducibility. Each training run creates a timestamped directory with the model, config, and evaluation results.

models/
  20260222_101530/
    model.zip              # MaskablePPO weights
    metadata.json          # Training config, instrument, timestamps
    eval_metrics.json      # EvalMetrics from validation episodes
    config_snapshot.yaml   # Full GothamSettings at training time

16 — Configuration

Layered Config System

Pydantic-settings with YAML layering. Four priority levels from init kwargs (highest) to default.yaml (lowest). 9 config sections.

1. Init Kwargs — Highest

Direct constructor arguments for testing and programmatic overrides.

2. Environment Variables

GOTHAM_ prefix with __ nesting. Example: GOTHAM_TRAINING__LEARNING_RATE=1e-4

3. Env-Specific YAML Overlay

config/{GOTHAM_ENV}.yaml (default: dev)

4. Base Defaults — Lowest

config/default.yaml — singleton via get_settings()

DatabaseConfig

database:
  host: localhost
  port: 5432
  name: gotham
  user: gotham
  password: changeme

URL-encoded credentials. Sync + async URLs (asyncpg).

IBConfig

ib:
  host: 127.0.0.1
  port: 4002
  client_id: 1
  timeout: 30
  readonly: true

InstrumentsConfig

instruments:
  nikkei:
    symbol: NIY
    exchange: CME
    currency: JPY
    tick_size: 5.0
    point_value: 500.0
    session: tokyo
  nasdaq:
    symbol: NQ
    exchange: CME
    currency: USD
    tick_size: 0.25
    point_value: 20.0
    session: us

FeatureConfig

features:
  ifvg_min_gap_ticks: 4.0
  ifvg_max_age_bars: 100
  ema_fast: 20
  ema_slow: 50
  atr_period: 14
  displacement_body_pct: 0.70
  displacement_atr_mult: 1.5
  rtr_lookback_days: 20
  pre_screen_min_ifvgs: 2
  pre_screen_min_trend: 0.3
  pre_screen_min_rtr: 30.0

TrainingConfig

training:
  total_timesteps: 1000000
  learning_rate: 0.0003
  n_steps: 2048
  batch_size: 256
  gamma: 0.99
  clip_range: 0.2
  ent_coef: 0.01
  checkpoint_freq: 50000
  eval_freq: 50000
  max_daily_loss_r: 3.0
  max_trades_per_session: 5
  policy_net_arch: [64, 64]

LLMConfig, BackfillConfig, SimConfig, LoggingConfig

llm:
  model: claude-sonnet-4-5-20250929
  max_tokens: 4096
  temperature: 0.3

backfill:
  data_dir: data/raw
  source: histdata
  batch_days: 30

sim:
  data_dir: data
  enriched_dir: data/enriched
  assessments_dir: data/assessments
  model_dir: models

logging:
  level: INFO
  format: json
  rotation: "50MB"
  log_dir: logs

17 — Infrastructure & CLI

Development & Operations

Docker services, Makefile targets, CI/CD pipeline, and CLI commands for the full development lifecycle.

Docker Services

TimescaleDB	pg18 on port 5432
IB Gateway	Port 4002, paper trading

make docker-up     # start services
make docker-down   # stop services

Makefile Targets

make install     # uv sync --all-extras
make lint        # ruff check + mypy
make format      # ruff format + fix
make test        # pytest -m unit
make test-all    # pytest (all markers)
make test-cov    # coverage report

CLI Commands

# Train a model
uv run python -m gotham.rl train \
  --enriched-path data/enriched/nq.parquet \
  --instrument NQ --timesteps 1000000

# Evaluate a model
uv run python -m gotham.rl evaluate \
  --model-path models/20260222_101530 \
  --instrument NQ --n-episodes 50

# Backfill data
uv run python -m gotham.data backfill \
  --instrument NQ --start 2023-01-01

# Quality audit
uv run python -m gotham.data quality-audit \
  --instrument NQ --days 30

Code Quality

ruff — line-length 100, rules E,F,W,I,UP,B,SIM,RUF, target py311.

mypy — disallow_untyped_defs = true, warn_return_any = true. All function signatures require type annotations.

pytest — markers: @unit, @integration, @slow. conftest auto-clears settings cache.

uv — Fast Python package manager. requires-python = ">=3.11".

18 — Tech Stack

Built With

Python 3.11+

StrEnum, modern typing

Polars

All data processing

Gymnasium

RL environment

sb3-contrib

MaskablePPO

Anthropic SDK

Claude assessments

Pydantic

Config + schemas

TimescaleDB

Time-series storage

SQLAlchemy

Async ORM

structlog

Structured logging

TensorBoard

Training monitoring

ruff + mypy

Lint + strict types

Fast package manager