Developers

API Reference

Comprehensive documentation for the rlx-search HTTP API. Manage datasets, compute pattern matches, run ANN searches, and train RL agents.

RL Training Data API

Endpoints for Reinforcement Learning agents to retrieve training data based on similar historical patterns.

Why Context-Aware RL?

Traditional Reinforcement Learning often fails in finance due to non-stationarity: market dynamics change over time (e.g., the "rules" of 2020 differ from 2023). Training an agent on the entire history confuses it, leading to mediocre performance.

Context-Aware RL solves this by turning a non-stationary problem into a stationary one. Instead of training on random data, we use the Pattern Search Engine to retrieve a cluster of historical episodes that are structurally identical to the current market state.

This allows you to train a specialized agent on the fly. For example, if the market currently resembles the "SVB Crisis" crash, the API feeds the agent only similar historical crashes. The agent quickly learns the optimal policy for this specific regime (e.g., "Short Aggressively"), ignoring irrelevant bull market data.

Workflow

Identify Context: Provide the current market state (vector) or a timestamp (`anchorTs`) to the API.
Retrieve Cluster: The API finds the top 50-100 most similar historical episodes using HNSW ANN search.
Train Specialist: Initialize an ephemeral RL environment using only these episodes. Train a PPO/SAC agent for a few thousand steps.
Execute: Use the trained agent to predict the action for the current real-time step.

Full Training Example (Python)

This script demonstrates how to train a PPO agent specifically for a target market scenario (e.g., the SVB Crisis) using the API.

import gymnasium as gym
import numpy as np
import requests
from gymnasium import spaces
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

class ContextAwareTradingEnv(gym.Env):
    """
    Custom Environment that resets ONLY to historical episodes 
    similar to our target context.
    """
    def __init__(self, episodes_data):
        super(ContextAwareTradingEnv, self).__init__()
        self.episodes = episodes_data
        self.current_transitions = []
        self.step_idx = 0
        self.max_steps = 24
        
        # Actions: 0=Hold, 1=Long, 2=Short
        self.action_space = spaces.Discrete(3)
        
        # Observation: [Price, Volatility, Time_Left]
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(3,), dtype=np.float32
        )

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        # Randomly sample one of the "parallel universes"
        episode = np.random.choice(self.episodes)
        self.current_transitions = episode.get("transitions", [])
        self.step_idx = 0
        return self._get_obs(), {}

    def step(self, action):
        if self.step_idx >= len(self.current_transitions):
            return self._get_obs(), 0, True, False, {}

        step_data = self.current_transitions[self.step_idx]
        ret = step_data.get("ret", 0.0)
        
        # Simple Reward Logic
        reward = 0.0
        if action == 1: reward = ret      # Long
        elif action == 2: reward = -ret   # Short
        
        self.step_idx += 1
        done = self.step_idx >= len(self.current_transitions)
        return self._get_obs(), reward, done, False, {}

    def _get_obs(self):
        # Simplified observation construction
        if self.step_idx == 0:
            price = 0.0
            vol = 0.0
        else:
            prev = self.current_transitions[self.step_idx - 1]
            price = prev.get("price", 0.0)
            vol = prev.get("volatility", 0.0)
            
        return np.array([price, vol, 1.0], dtype=np.float32)

# --- Main Workflow ---

# 1. Fetch Context: Get episodes similar to SVB Crisis (March 2023)
ANCHOR_TS = 1678406400000 
resp = requests.post("http://localhost:8787/api/rl/episodes", json={
    "symbol": "BTCUSDT",
    "interval": "1h",
    "anchorTs": ANCHOR_TS,
    "forecastHorizon": 24,
    "numEpisodes": 50
})
episodes = resp.json().get("episodes", [])

# 2. Train Specialist Agent
env = DummyVecEnv([lambda: ContextAwareTradingEnv(episodes)])
model = PPO("MlpPolicy", env, verbose=1, learning_rate=0.001)
model.learn(total_timesteps=5000)

# 3. The agent is now an expert on "SVB-like" crashes.
print("Agent trained on specific market regime.")

POST/api/rl/episodes

Get Episodes

Returns similar historical episodes for training RL agents. Provide either `currentState` (vector) or `anchorTs` (timestamp).

Example Request

curl -s -X POST "$BASE_URL/api/rl/episodes" \
-H "Content-Type: application/json" \
-d "{
\"symbol\": \"BTCUSDT\",
\"interval\": \"1h\",
\"anchorTs\": 1678406400000,
\"forecastHorizon\": 24,
\"numEpisodes\": 20,
\"minSimilarity\": 0.80,
\"includeActions\": true,
\"rewardType\": \"returns\"
}" | jq

POST/api/rl/training-batch

Get Training Batch

Returns flattened arrays (states, nextStates, rewards, dones) optimized for efficient batch training. The response contains a 'meta' object and a 'data' object with the flattened tensors.

Example Request

curl -s -X POST "$BASE_URL/api/rl/training-batch" \
-H "Content-Type: application/json" \
-d "{
\"symbol\": \"BTCUSDT\",
\"interval\": \"1h\",
\"queryLength\": 40,
\"forecastHorizon\": 24,
\"batchSize\": 100
}" | jq

GET/api/rl/regimes

Get Regimes

Returns available market regimes for regime-based training strategies.

Example Request

curl -s -G "$BASE_URL/api/rl/regimes" | jq