API Reference
Comprehensive documentation for the rlx-search HTTP API. Manage datasets, compute pattern matches, run ANN searches, and train RL agents.
RL Training Data API
Endpoints for Reinforcement Learning agents to retrieve training data based on similar historical patterns.
Why Context-Aware RL?
Traditional Reinforcement Learning often fails in finance due to non-stationarity: market dynamics change over time (e.g., the "rules" of 2020 differ from 2023). Training an agent on the entire history confuses it, leading to mediocre performance.
Context-Aware RL solves this by turning a non-stationary problem into a stationary one. Instead of training on random data, we use the Pattern Search Engine to retrieve a cluster of historical episodes that are structurally identical to the current market state.
This allows you to train a specialized agent on the fly. For example, if the market currently resembles the "SVB Crisis" crash, the API feeds the agent only similar historical crashes. The agent quickly learns the optimal policy for this specific regime (e.g., "Short Aggressively"), ignoring irrelevant bull market data.
Workflow
- Identify Context: Provide the current market state (vector) or a timestamp (`anchorTs`) to the API.
- Retrieve Cluster: The API finds the top 50-100 most similar historical episodes using HNSW ANN search.
- Train Specialist: Initialize an ephemeral RL environment using only these episodes. Train a PPO/SAC agent for a few thousand steps.
- Execute: Use the trained agent to predict the action for the current real-time step.
Full Training Example (Python)
This script demonstrates how to train a PPO agent specifically for a target market scenario (e.g., the SVB Crisis) using the API.
POST/api/rl/episodes
Get Episodes
Returns similar historical episodes for training RL agents. Provide either `currentState` (vector) or `anchorTs` (timestamp).
Example Request
AI Engineer's Guide: The "Parallel Universe" Generator
This is the core of Context-Aware RL. It solves the "Data Scarcity" problem. You only have one reality (the current candle), but to train a neural network, you need thousands of examples. This endpoint finds those examples from history.
How to use it
- 1. Define Context: Send
anchorTs(a specific past moment) orcurrentState(a vector of recent prices). - 2. Get Universes: The API returns
numEpisodes(e.g., 50) historical scenarios that started exactly like your context. - 3. Train: Spin up a gym environment that resets to a random episode from this list every time the agent dies.
Why it works
By training on 50 similar historical scenarios, your agent learns the invariant properties of this specific market structure (e.g., "After a 3-sigma drop, volatility usually compresses"). It becomes a specialist for the current regime.
Parameter Tip: "minSimilarity"
Keep minSimilarity around 0.80. If it's too high (0.95), you might get 0 results (overfitting). If it's too low (0.50), you pollute the training data with irrelevant noise (underfitting).
POST/api/rl/training-batch
Get Training Batch
Returns flattened arrays (states, nextStates, rewards, dones) optimized for efficient batch training. The response contains a 'meta' object and a 'data' object with the flattened tensors.
Example Request
High-Performance: The Tensor Factory
This endpoint is for Heavy Duty training. While /episodes is great for Python loops, it is slow for massive datasets. This endpoint bypasses the Python loop entirely and gives you raw, flattened data arrays ready for your GPU.
The "Tuple" Structure
It returns the standard RL transition tuple (s, r, s', d) pre-calculated for thousands of steps:
- • States (s): Flattened array of input vectors.
- • Rewards (r): Array of returns for each step.
- • Next States (s'): The state at t+1.
- • Dones (d): Boolean flags for episode ends.
Use Case: Offline RL
Use this if you want to train a Transformer or a large PPO model on 100,000+ steps of market data instantly. You download the batch once and feed it into PyTorch/TensorFlow using a custom DataLoader. Zero Python overhead.
Zero-Copy Friendly
The data is returned as flat JSON arrays of floats. In a production setup, you can map this directly into a Numpy array or Torch Tensor without complex parsing logic.