A vector similarity approach to discovering truly analogous market regimes — preserving structural nuance where neural networks blur and statistical models average. Interpretable pattern retrieval and probabilistic scenario planning for professional trading.
Abstract: The "Glass Box" Approach
Modern markets operate in regimes that legacy models fail to recognize. Neural networks ("Black Boxes") provide signals but lack explainability, making them unsuitable for managing large capital due to compliance and risk management issues. We propose a "Glass Box" Pattern Memory that indexes market structure using vector similarity search. Instead of "predicting" the future, our engine instantly retrieves historical precedents for the current situation, providing complete transparency and evidence-based decision making.
Motivation: Regime Detection & Honest Backtesting
Practitioners need a way to recover historical context: “Where have we seen this kind of behavior before?” Our system searches for similar situations in a high‑dimensional structural space. Crucially, we employ a "Time Machine" methodology for backtesting—rigorous Walk-Forward Analysis with recursive lookups that eliminates look-ahead bias, providing a true measure of strategy robustness.
Honest Backtesting: The "Time Machine"
Most backtests are flawed because they train on the entire dataset (including the future). Our engine enforces strict temporal isolation.
Recursive Lookups
For every point in the backtest (e.g., Jan 1, 2020), the engine rebuilds its index using only data available up to that moment. It cannot "see" the crash of March 2020 until it happens.
Walk-Forward Validation
We simulate the exact experience of a trader living through history day by day. This exposes how strategies perform during regime shifts, not just on average.
Method Overview
Our pipeline ingests market data, derives structure‑aware features, indexes windows into a Graph Memory, retrieves analogous cohorts via multi‑metric similarity, and projects forward quantile envelopes for range‑first planning.
Figure: End‑to‑end pipeline for structural retrieval and probabilistic planning.
Ingest Features Graph Search Bands
Normalization and windowing aligned to volatility regime.
Feature families: shape, dispersion, persistence, state transitions.
Graph construction with temporal coherence constraints.
Composite similarity and consensus neighborhoods for retrieval.
Quantile aggregation with median tracking and ongoing auditing.
Vector-Based Pattern Memory Architecture
Our Pattern Memory encodes time‑localized market structure as high-dimensional vectors stored in a searchable graph. Each node represents a window enriched with features that capture shape, volatility context, and microstructure cues—not compressed into opaque weights that blur details, but preserved as rich, multi-metric representations.
Edges connect windows that are structurally analogous under multiple similarity metrics, enabling nuanced pattern discovery that classical averaging methods miss. The result is a searchable vector database of regimes that supports precise retrieval, clustering, and transfer across instruments while maintaining interpretability.
Key Advantages Over Black-Box Models:
✓
Preserved complexity: Vector features capture shape, persistence, dispersion, and state transitions without averaging them into a single latent space.
✓
Multi‑metric similarity: Composite distances and consensus neighborhoods ensure robustness without over-smoothing.
The system scales beyond a single instrument. It can retrieve patterns from Bitcoin and test whether analogous structure emerges on Solana, or jointly aggregate matches from BTC, ETH, and SOL to assess whether a cross‑asset consensus edge exists. This enables portability of memory across markets without assuming identical dynamics.
Example 1 — Transfer
“Find situations like those seen in BTC and apply to SOL if the structure reappears.”
Example 2 — Consensus
“Find similar structures on SOL, ETH, and BTC and summarize if a consistent edge persists across assets.”
Figure: Transfer and consensus across assets.
Transfer arrow Cohort Aggregate → consensus
Cross‑instrument mapping without assuming identical dynamics.
Consensus formed by aggregating cohorts across assets.
Use cases: confirmation, divergence detection, robustness checks.
Probabilistic Envelopes, Not Points
For each cohort of retrieved matches, we project a forward distribution summarized as quantile bands (e.g., 10–90%, 25–75%, 40–60%) and a dynamic median. Traders get a range‑first view of scenario space, not a single path. This aligns with risk management, sizing, and guardrail design under uncertainty.
Figure: Quantile envelopes illustrate scenario space versus a single path.
10–90% 25–75% 40–60% Median (dynamic)
Quantiles estimated from matched cohort forward outcomes.
Median tracked for directional context; avoid single‑path bias.
Designed for sizing, guardrails, and scenario planning.
Interpretability: No Black Box Opacity
Unlike neural networks that produce opaque predictions from compressed latent spaces, our vector-based approach returns actual historical windows that structurally match the current regime.
Traders see the matched patterns, their similarity scores, cohort composition, and forward outcome envelopes—a complete, transparent line from evidence to decision. No hidden layers, no averaged-away details, no black-box mystery. Just real patterns that preserve the complexity professional traders need to make informed decisions.
Pilot Scope and Scalability
The current pilot operates on a limited set of instruments and datasets. Graph Memory is designed to scale to broader universes and cross‑market searches as data coverage expands. The underlying graph yields efficient reuse of structure and incremental indexing.
Roadmap: Towards Semantic Search
We are actively researching algorithmic improvements to deepen structural understanding beyond simple geometry:
●
Dynamic Time Warping (DTW): To recognize patterns that are temporally distorted (stretched or compressed) but structurally identical.
●
Semantic Embeddings: Exploring hybrid models that use contrastive learning to capture non-linear relationships while maintaining the "Glass Box" retrieval paradigm.
●
Multi-Scale Context: Integrating higher-timeframe trends into the local pattern definition for context-aware matching.
Limitations and Disclaimer
Historical analysis does not guarantee future results. Structural similarity may fail under regime breaks, liquidity shocks, or novel catalysts. We emphasize range‑based planning and ongoing accuracy auditing. Outputs are decision support, not signals to execute blindly.