System Overview Trading Strategy Architecture Data Ingestion Feature Pipeline LLM Integration RL Agent Training Evaluation Configuration Infrastructure Tech Stack

Gotham RL

Reinforcement learning for futures trading — combining multi-timeframe market structure analysis, LLM-powered regime classification, and MaskablePPO for discrete order generation on Nikkei 225 and Nasdaq 100 futures.

0
Observation features
0
Source packages
0
Data modules
0
Action combinations

What Gotham RL Does

A fully offline, broker-decoupled training simulation for futures trading. Processes historical market data through a feature pipeline, enriches it with LLM assessments, and trains an RL agent to generate trading decisions.

Market Structure Analysis

Inverse Fair Value Gaps, multi-timeframe trend alignment, and liquidity pools computed with pure Polars on 5-minute OHLCV candles.

LLM Regime Classification

Claude assesses market regime, setup quality, and risk-reward — encoded as 11 normalized observation features for the RL agent.

MaskablePPO Training

Gymnasium environment with action masking trains a PPO agent to select entry, size, stop, and target for each 5-minute bar.

Rigorous Evaluation

Sharpe, max drawdown, win rate, profit factor across episodes. Promotion criteria gate deployment quality.

Data Quality Framework

CandleValidator, gap detection, quality audits with per-day completeness, spike detection, and OHLC relationship checks.

Position Simulation

Fill simulation with 1-tick slippage, MAE/MFE tracking, breakeven trailing stops, and commission modeling (JPY & USD contracts).

The Mental Model

An IFVG-centric trading strategy: detect structural gaps, wait for price inversion, confirm with multi-timeframe trend alignment, and size trades based on room-to-right.

Detect Fair Value Gaps

Scan 3-candle patterns for gaps ≥ 4 ticks. When price trades through, it becomes an Inverse FVG — a support/resistance zone. NIY tick size: 5.0 JPY, NQ: 0.25 USD.

features/ifvg.py

Confirm Trend Alignment

Multi-timeframe structure: daily (weight 0.4), 4-hour (0.35), 15-minute (0.25). Composite trend score from −1.0 to +1.0 filters for directional conviction.

features/trend.py

Assess Room-to-Right

Cluster swing points into liquidity pools, count obstacles toward target, check daily range exhaustion. Scored 0–100 for long and short.

features/room_to_right.py

LLM Regime Check

Claude classifies regime (trending, choppy, event-driven, low-liquidity), assesses quality, and provides confidence + narrative. Encoded as 11 features.

llm/

RL Agent Decides

MaskablePPO observes 39 features and selects from 81 action combinations: entry direction, position size, stop distance, and target type.

rl/env.py

System Architecture

Nine packages with clean dependency boundaries. Common provides the foundation, data handles ingestion, features compute market structure, sim manages offline simulation, llm integrates Claude, rl contains the environment, execution targets IB, and monitoring handles alerts.

Data Sources & Backfill

Three pluggable data sources feed into a unified backfill pipeline that validates, normalizes, and stores 5-minute OHLCV candles in TimescaleDB.

HistData

1-minute CSVs aggregated to 5m via Polars group_by_dynamic. Symbol mapping: NQ→NSXUSD, NIY→JPXJPY. Semicolon or comma delimited.

data/sources.py

CSV Import

Pre-formatted 5m CSVs with standard 8-column schema. Auto-assigns contract months if missing. Supports Databento exports and custom sources.

data/sources.py

Interactive Brokers

Real-time and historical via IB Gateway on port 4002. Read-only mode by default. Client ID isolation for concurrent connections.

data/ib_client.py

Backfill Service

The BackfillService orchestrates historical data loading with resume support, batch processing, validation, and progress logging.

Check Resume Point

Query MAX(timestamp) from candles_5m for the instrument. Skip past already-stored dates.

Fetch in Batches

Configurable batch size (default 30 days). Each batch fetched from the CandleSource protocol.

Validate & Upsert

OHLCV validation rejects null/negative/inconsistent rows. Valid candles upserted to TimescaleDB. Commit per batch for crash resilience.

CandleSource Protocol
@runtime_checkable
class CandleSource(Protocol):
    def fetch(self, instrument: str, start: date, end: date) -> pl.DataFrame:
        """Return 5m OHLCV DataFrame with standard 8-column schema.
        Columns: timestamp, instrument, open, high, low, close,
                 volume, contract_month"""
        ...

Validation & Quality Audit

Multi-layer validation ensures data integrity from ingestion through training. The CandleValidator runs four checks; the GapDetector finds missing candles; the QualityAudit produces per-day completeness reports.

CandleValidator Checks

ohlc_invalidL ≤ O,C ≤ H relationship, null/missing values
spikeClose-to-close change > 5% threshold
zero_volumeVolume = 0 during active session hours
gapMissing expected candles in session window

Severity levels: WARNING, ERROR, CRITICAL

Quality Audit Report

The QualityAudit scans date ranges producing DailyQualityMetrics:

  • completeness_pct — actual vs expected candle count
  • spike_count — anomalous price jumps
  • gap_count — missing candles per day
  • ohlc_issues — relationship violations
  • zero_volume_count — dead periods
Ingestion-level validation (validate_candles)

Fast Polars-based row filtering. Returns (valid_df, rejected_df). Checks:

# No NaN/null in OHLC, no negative prices
# low <= high, low <= open, low <= close
# high >= open, high >= close
# Volume: not null, not NaN, not negative

Futures Contracts & Back-Adjustment

Quarterly contract calendar with automatic month assignment and two back-adjustment methods for creating seamless continuous price series across contract rolls.

Quarterly Calendar

H (Mar)Jan – Mar
M (Jun)Apr – Jun
U (Sep)Jul – Sep
Z (Dec)Oct – Dec

Example: 2024-02-15 → 2024H, 2024-07-01 → 2024U

Back-Adjustment Methods

Ratio (default) — Multiply prior prices by new_close / old_close at each roll. Preserves % returns, keeps prices positive.

Difference (Panama Canal) — Subtract the price gap at each roll from all earlier prices. Preserves absolute differences.

Roll detection & adjustment code
# Detect roll points where contract_month changes
rolls = detect_rolls(df_sorted)  # -> list[RollPoint]

# Each RollPoint contains:
#   timestamp, old_contract, new_contract,
#   old_close, new_close, ratio_factor, diff_factor

# Apply cumulative adjustment backward from newest contract
result = back_adjust(df, method=AdjustmentMethod.RATIO)

Market Structure Features

Six stages compute market structure from raw candles. Each is a pure function on Polars DataFrames — deterministic and side-effect free.

Pipeline Execution Order

1. Multi-Timeframe Aggregation

5m candles → 15m, 4h, daily frames via group_by_dynamic

sim/aggregate.py

2. Trend Features

EMA-20/50, slopes, ATR-14, swing points, structure classification, displacement detection. Composite score: D1×0.4 + H4×0.35 + M15×0.25

features/trend.py

3. IFVG Features

3-candle gap detection → inversion tracking → lifecycle states (active/tested/mitigated/expired) → quality scoring

features/ifvg.py

4. Session Features

Trading window boundaries, minutes since open, overnight range, prior session high/low/close

features/session.py

5. Room-to-Right Features

Liquidity pool clustering, obstacle counting, exhaustion measurement

features/room_to_right.py

6. Pre-Screen Gate

Boolean filter: IFVGs ≥ 2, |trend_score| ≥ 0.3, max RTR ≥ 30, in trading window

features/pre_screen.py
IFVG quality scoring criteria
QualityGap TicksBody Ratio
High≥ 8AND body > 70% of range
Medium≥ 6OR body > 60%
LowEverything else

From Raw Candles to Trained Agent

Data flows through five transformations: enrichment, assessment, episode construction, time-based splitting, and RL training. Each stage uses Polars DataFrames with Parquet I/O.

Episode Construction

Enriched candles grouped by session_date. Each session becomes one Episode with window boundaries from in_trading_window indices. Minimum 6 in-window candles required.

sim/episode_slicer.py

Time-Based Splits

Three-way split by date. No data leakage — strictly chronological:

  • Train ≤ 2024-06-30
  • Validation 2024-07-01 – 2024-12-31
  • Test > 2024-12-31
sim/splits.py

Claude Market Assessment

Anthropic Claude analyzes enriched candle windows to produce structured assessments with setup classification, confidence scoring, and market regime analysis. Assessments are cached to Parquet for reproducible training.

Prompt Engineering

Context builder sends last 50 bars as compact CSV plus trend summary, IFVG context, session timing, and room-to-right metrics. Target: under 4,000 tokens.

System prompt defines IFVG criteria, 5 setup types, 5 trend alignments, 4 market regimes, and room-to-right scoring guidelines.

llm/prompt_v1.txt llm/context_builder.py

LLMAssessment Schema

Pydantic model with 11 fields returned via tool use:

  • setup_type — 5 enum values
  • confidence — 0.0–1.0
  • ifvg_quality — high/medium/low
  • trend_alignment — per-timeframe dict
  • regime — 4 enum values
  • risk_reward_estimate, room_to_right_estimate
  • narrative, concerns

Assessment Encoding

The encode_assessment() function normalizes the structured assessment into 11 float features for the observation vector:

FeatureEncodingRange
llm_confidenceDirect passthrough[0, 1]
llm_setup_typeEnum index / 4.0[0, 1]
llm_ifvg_qualityhigh=1.0, medium=0.66, low=0.33[0.33, 1]
llm_rr_estimatemin(rr / 5.0, 1.0)[0, 1]
llm_regimeEnum index / 3.0[0, 1]
llm_trend_*Bullish=1, Turning=±0.5, Neutral=0, Bearish=-1[-1, 1]
llm_rtr_estimatevalue / 100.0[0, 1]
llm_concern_countmin(count / 5.0, 1.0)[0, 1]
Setup types & market regimes

Setup Types: bullish_reversal, bearish_reversal, bullish_continuation, bearish_continuation, no_setup

Market Regimes: trending_day (strong directional), choppy (overlapping candles), event_driven (unusual volatility), low_liquidity (thin order book)

Trend Alignments: bullish, bearish, turning_bullish, turning_bearish, neutral

Observation & Action Space

The agent observes 39 normalized features in 7 groups and selects from a MultiDiscrete action space with 4 dimensions. Action masking prevents invalid combinations.

Observation Vector  Box(-1, 1, (39,), float32)

LLM11
  • llm_confidence
  • llm_setup_type
  • llm_ifvg_quality
  • llm_rr_estimate
  • llm_regime
  • llm_trend_* ×4
  • llm_rtr_estimate
  • llm_concern_count
IFVG5
  • ifvg_count_active
  • ifvg_nearest_dist
  • ifvg_best_quality
  • ifvg_avg_fill_pct
  • ifvg_direction_bias
Trend7
  • trend_score
  • ema20_slope_* ×3
  • structure_* ×3
Session5
  • minutes_since_open
  • window_progress_pct
  • overnight_range
  • prior_session_range
  • in_trading_window
RTR4
  • rtr_score_long
  • rtr_score_short
  • exhaustion_pct
  • exhaustion_flag
Micro + Portfolio7
  • vol_ratio
  • bar_range_norm
  • bar_body_ratio
  • in_position
  • unrealized_pnl_r
  • daily_pnl_r
  • trades_today

Action Space  MultiDiscrete([3, 3, 3, 3])

Entry
  • 0 Skip
  • 1 Long
  • 2 Short
Size
  • 0 1 contract
  • 1 2 contracts
  • 2 3 contracts
Stop
  • 0 Tight (1×ATR)
  • 1 Medium (1.5×)
  • 2 Wide (2×ATR)
Target
  • 0 Nearest (2R)
  • 1 Extended (3R)
  • 2 Trail (5R)

Action Masking

Boolean mask of shape (12,) — flattened across 4 sub-actions. Entry blocked when:

  • Already in position → only skip allowed
  • Daily loss limit hit (−3R) → only skip
  • Max trades reached (5 per session) → only skip
Full observation normalization table
FeatureNormalizationOutput Range
ifvg_count_activeval / 10.0[0, 1]
ifvg_nearest_distval / ATR[0, 1]
ifvg_best_qualityval / 3.0[0, 1]
rtr_score_*val / 100.0[0, 1]
minutes_since_openval / 480.0 (8h max)[0, 1]
overnight_rangeval / ATR[0, 1]
unrealized_pnl_runrealized / risk / 5.0[-1, 1]
daily_pnl_rval / 5.0[-1, 1]
trades_todayval / 10.0[0, 1]
ema20_slope_*rising=1, flat=0, falling=-1[-1, 1]
structure_*uptrend=1, ranging=0, downtrend=-1[-1, 1]

Trade Lifecycle Simulation

The position module tracks the full lifecycle of each trade: fill simulation with slippage, MAE/MFE tracking, breakeven trailing stops, and commission modeling.

Fill Simulation

Orders filled within candle range with 1-tick slippage. Long: min(price + tick, high). Short: max(price - tick, low). Returns None if range doesn't reach order.

MAE / MFE Tracking

MAE (Max Adverse Excursion): worst drawdown in ticks. MFE (Max Favorable Excursion): best unrealized profit. Updated every bar for trade analysis.

Commissions

Per-contract commission based on currency:
JPY: 80.0 × size
USD: 1.25 × size
Deducted as commission ticks from realized P&L.

Position Update Logic

Each bar, the position is updated with conservative order checking: stop first, then target.

Check Stop Hit

Long: candle_low ≤ stop_price. Short: candle_high ≥ stop_price. Checked first (conservative).

Check Target Hit

Long: candle_high ≥ target. Short: candle_low ≤ target. Returns CompletedTrade if hit.

Update MAE/MFE & Trailing Stop

Track adverse/favorable excursion. When mfe_ticks ≥ risk_ticks (1R profit), move stop to breakeven.

Exit reasons: stop, target, session_end (force-close at window end).

Reward Function

Three reward components shape the agent toward profitable, disciplined trading. Exact formulas from rl/reward.py:

Trade Reward

+realized_rr × 1.0
+0.3 if hit_target

Core signal: risk-reward ratio with target bonus.

Step Penalties

−0.05 if trades > 5
−0.8 × |pnl − (−2R)|

Overtrading + accelerating loss penalty.

Patience Bonus

+0.01 skip while flat

Prevents over-entering low-quality setups.

Exact Python implementations
def compute_trade_reward(trade: CompletedTrade) -> float:
    reward = trade.realized_rr * 1.0
    if trade.hit_target:
        reward += 0.3
    return reward

def compute_step_penalty(trades_today, daily_pnl_r,
                         max_trades=5, drawdown_threshold=-2.0):
    penalty = 0.0
    if trades_today > max_trades:
        penalty -= 0.05
    if daily_pnl_r < drawdown_threshold:
        penalty -= 0.8 * abs(daily_pnl_r - drawdown_threshold)
    return penalty

def compute_patience_bonus(action_is_skip, in_position):
    if action_is_skip and not in_position:
        return 0.01
    return 0.0

MaskablePPO Training

sb3-contrib MaskablePPO trains on session episodes with periodic checkpointing, evaluation callbacks, and TensorBoard logging.

Hyperparameters

total_timesteps   1,000,000
learning_rate     3e-4
n_steps           2,048
batch_size        256
gamma             0.99
clip_range        0.2
ent_coef          0.01
policy_net_arch   [64, 64]

Risk Limits & Callbacks

max_daily_loss_r       3.0 R
max_trades_session     5
checkpoint_freq        50,000 steps
eval_freq              50,000 steps
trailing_stop          breakeven @ 1R

Policy Architecture

Two-layer MLP [64, 64] with shared feature extractor for actor and critic. Input: 39-dim observation vector. Output: MultiDiscrete([3,3,3,3]) action logits + value estimate.

Model Assessment

Trained models are evaluated on held-out episodes against baseline agents. Promotion criteria gate deployment quality.

EvalMetrics

≥ 1.0
Sharpe Ratio
≤ 5R
Max Drawdown
≥ 40%
Win Rate
Promotion Gate

Additional metrics: profit factor, avg RR, total R, trades per session. Sharpe annualized by √252.

RandomAgent

Samples random valid actions respecting action masks. Uses per-sub-action sampling from valid options. Baseline for "can the agent beat random?"

rl/baselines.py

AlwaysEnterAgent

Always enters long with size=1, stop=medium, target=nearest (2R). Respects masks — skips when blocked. Tests "is selective entry better than always-in?"

rl/baselines.py
Promotion criteria code
def meets_promotion_criteria(metrics: EvalMetrics) -> bool:
    return (
        metrics.sharpe_ratio >= 1.0
        and metrics.max_drawdown_r <= 5.0
        and metrics.win_rate >= 0.40
    )

Model Storage & Metadata

Trained models are saved with full metadata for reproducibility. Each training run creates a timestamped directory with the model, config, and evaluation results.

models/
  20260222_101530/
    model.zip              # MaskablePPO weights
    metadata.json          # Training config, instrument, timestamps
    eval_metrics.json      # EvalMetrics from validation episodes
    config_snapshot.yaml   # Full GothamSettings at training time

Layered Config System

Pydantic-settings with YAML layering. Four priority levels from init kwargs (highest) to default.yaml (lowest). 9 config sections.

1. Init Kwargs — Highest

Direct constructor arguments for testing and programmatic overrides.

2. Environment Variables

GOTHAM_ prefix with __ nesting. Example: GOTHAM_TRAINING__LEARNING_RATE=1e-4

3. Env-Specific YAML Overlay

config/{GOTHAM_ENV}.yaml (default: dev)

4. Base Defaults — Lowest

config/default.yaml — singleton via get_settings()

DatabaseConfig
database:
  host: localhost
  port: 5432
  name: gotham
  user: gotham
  password: changeme

URL-encoded credentials. Sync + async URLs (asyncpg).

IBConfig
ib:
  host: 127.0.0.1
  port: 4002
  client_id: 1
  timeout: 30
  readonly: true
InstrumentsConfig
instruments:
  nikkei:
    symbol: NIY
    exchange: CME
    currency: JPY
    tick_size: 5.0
    point_value: 500.0
    session: tokyo
  nasdaq:
    symbol: NQ
    exchange: CME
    currency: USD
    tick_size: 0.25
    point_value: 20.0
    session: us
FeatureConfig
features:
  ifvg_min_gap_ticks: 4.0
  ifvg_max_age_bars: 100
  ema_fast: 20
  ema_slow: 50
  atr_period: 14
  displacement_body_pct: 0.70
  displacement_atr_mult: 1.5
  rtr_lookback_days: 20
  pre_screen_min_ifvgs: 2
  pre_screen_min_trend: 0.3
  pre_screen_min_rtr: 30.0
TrainingConfig
training:
  total_timesteps: 1000000
  learning_rate: 0.0003
  n_steps: 2048
  batch_size: 256
  gamma: 0.99
  clip_range: 0.2
  ent_coef: 0.01
  checkpoint_freq: 50000
  eval_freq: 50000
  max_daily_loss_r: 3.0
  max_trades_per_session: 5
  policy_net_arch: [64, 64]
LLMConfig, BackfillConfig, SimConfig, LoggingConfig
llm:
  model: claude-sonnet-4-5-20250929
  max_tokens: 4096
  temperature: 0.3

backfill:
  data_dir: data/raw
  source: histdata
  batch_days: 30

sim:
  data_dir: data
  enriched_dir: data/enriched
  assessments_dir: data/assessments
  model_dir: models

logging:
  level: INFO
  format: json
  rotation: "50MB"
  log_dir: logs

Development & Operations

Docker services, Makefile targets, CI/CD pipeline, and CLI commands for the full development lifecycle.

Docker Services

TimescaleDBpg18 on port 5432
IB GatewayPort 4002, paper trading
make docker-up     # start services
make docker-down   # stop services

Makefile Targets

make install     # uv sync --all-extras
make lint        # ruff check + mypy
make format      # ruff format + fix
make test        # pytest -m unit
make test-all    # pytest (all markers)
make test-cov    # coverage report

CLI Commands

# Train a model
uv run python -m gotham.rl train \
  --enriched-path data/enriched/nq.parquet \
  --instrument NQ --timesteps 1000000

# Evaluate a model
uv run python -m gotham.rl evaluate \
  --model-path models/20260222_101530 \
  --instrument NQ --n-episodes 50

# Backfill data
uv run python -m gotham.data backfill \
  --instrument NQ --start 2023-01-01

# Quality audit
uv run python -m gotham.data quality-audit \
  --instrument NQ --days 30

Code Quality

ruff — line-length 100, rules E,F,W,I,UP,B,SIM,RUF, target py311.

mypydisallow_untyped_defs = true, warn_return_any = true. All function signatures require type annotations.

pytest — markers: @unit, @integration, @slow. conftest auto-clears settings cache.

uv — Fast Python package manager. requires-python = ">=3.11".

Built With

Python 3.11+
StrEnum, modern typing
Polars
All data processing
Gymnasium
RL environment
sb3-contrib
MaskablePPO
Anthropic SDK
Claude assessments
Pydantic
Config + schemas
TimescaleDB
Time-series storage
SQLAlchemy
Async ORM
structlog
Structured logging
TensorBoard
Training monitoring
ruff + mypy
Lint + strict types
uv
Fast package manager