Deterministic exact query engine

Exact state-signature queries over complex finite-state data.

Signature Index turns structured operational data into a finite state-signature query problem. It is designed for fast, verifiable exact, threshold, near-miss, drilldown, segment-conditioned and outcome-count questions.

Core role Exact query layer
Answer type Counts, matches, near-misses, outcomes
Primary fit Rich repeated state-signature workloads
Not a Prediction model or database replacement

What it is

A specialized engine for asking exact questions about states.

Many industrial, operational and research systems do not only need storage, dashboards or prediction. They also need fast memory over complex situations: whether a state has appeared before, what was close to it, where it occurred and what outcomes followed.

Signature Index is built for that layer. It does not decide which domain question is important. It makes large families of exact state-signature questions practical once the domain is translated into objects, states, signatures, segments and measurable outcomes.

Domain data
Finite states
SI engine
Exact answers

Query families

Designed for rich repeated workloads.

Exact match

How many objects matched this exact state signature?

Threshold / roll-up

How many matched a broader state condition?

Near-miss

Which states are close to the selected signature, and what happened next?

Broad-to-exact

Which exact sub-states sit inside a broad alert or condition?

Segment-conditioned

How does this signature behave by line, asset, site, shift, model or region?

Multi-outcome counts

What were the counts for failure, defect, latency, alert, cost or other outcomes?

100M-event benchmark

Public ClickBench workload: 500,000 hidden state-signature queries.

The benchmark tested Signature Index as a deterministic exact state-signature query engine. It did not test prediction, anomaly detection or model accuracy.

Rows / events99,997,497
Total queries500,000
Query families5
Correctness mismatches0
Build time (AirMac M4, 24GB)~19.7 min
Peak RSS~2.42 GB
Query family SI speedup vs named baseline SI speedup vs reference Interpretation
Exact ~439,441× ~704,270× Exact state-signature lookup
Segment-conditioned ~372,104× ~624,308× Signature behavior within a segment
Threshold / roll-up ~95.2× ~65.4× Broader state-condition queries
Near-miss L1 ~21.4× ~98.2× Close state-signature neighborhoods
Broad-to-exact ~591.8× ~4,672× Exact sub-states under a broad condition

The benchmark represents a large repeated query workload over a public clickstream dataset. Speedups are shown against the fastest named baseline used for each query family and against the reference-check path. Exact correctness was required first: 0 mismatches across 500,000 queries.

HEP / scientific data validation

Repeated selection and region queries over HEP-style event data.

The public technical report includes a HEP-oriented validation pass focused on repeated exact selection, region, cut-grid, neighborhood and segment-conditioned count workloads. The purpose is not to replace ROOT, RDataFrame or experiment-specific analysis frameworks. The purpose is to test whether an observed-support memory layer can accelerate repeated query families once event-level data have been translated into finite signatures.

Correctness mismatches0
Canonical Z-region vs repeated scan~652.6×
Segment-conditioned vs repeated scan~2,136×
Calibrated break-even~1.12 queries
100M-event scenario, 100 queries~89×
1B-event scenario, 1,000 queries~892×

What was tested

The HEP validation emphasizes query families that naturally recur during exploratory analysis: region selections, mass-bucket scans, cut-grid families, near-neighborhood counts and segment-conditioned selections. Results in the report distinguish measured validation summaries from calibrated scaling and break-even estimates.

How to read the results

SI is strongest when many related queries are asked repeatedly over the same observed support. Narrow pre-aggregates can still win for a single fixed aggregate. The HEP result should be read as evidence for a complementary repeated-query layer, not as a claim to replace established HEP analysis stacks.

Evidence across domains

One engine pattern, several state-signature workloads.

These materials show SI as a reusable exact query layer. The domain changes the translation into states, segments and outcomes; the public claim remains the same: exact, fast, verifiable state-signature query serving.

Scale benchmark

Public ClickBench 100M-event workload

500,000 hidden exact, threshold, near-miss, broad-to-exact and segment-conditioned queries over nearly 100M public clickstream events.

  • 99,997,497 rows / events
  • 500,000 total hidden queries
  • 0 correctness mismatches
  • Speedups up to ~439,441× vs named baseline
Open benchmark summary →

AI infrastructure

GPU cluster telemetry risk memory

Public Alibaba GPU Cluster Trace workload: recurring telemetry-state signatures, tail bottlenecks, fail-slow states, near-miss incidents and outcome counts.

  • 3,033,232 telemetry-state rows
  • 300 hidden queries
  • 0 mismatches vs reference
  • 513.7× median speedup vs fastest public baseline
Open AI infrastructure deck →

HEP / research data

HEP-style repeated selection workload

Scientific event-level validation for repeated exact selections, region queries, cut-grid families, neighborhood counts and segment-conditioned queries.

  • 0 correctness mismatches
  • ~652.6× canonical Z-region selection vs repeated scan
  • ~2,136× segment-conditioned family count vs repeated scan
  • Calibrated scaling estimates up to ~892× in repeated-query scenarios
Open technical report →

Quant research

financial-state state-query memory

Controlled post-dataset query layer for exploratory financial analysis: exact regimes, threshold variants, near-miss cohorts and historical outcome counts.

  • 7.5M instrument-date rows
  • 375M formula values at 50 formulas
  • 0 mismatches in internal checks
  • Build-inclusive break-even within normal exploratory query counts
Internal case material available on request

Industrial fit

Where the engine is naturally useful.

Signature Index is strongest where the problem can be expressed as finite objects, multi-level states, composed signatures, segments and measurable outcomes.

Asset performance management

Exact and near-miss memory over asset states, alarms, conditions and historical outcomes.

Predictive maintenance support

A deterministic evidence layer underneath predictive or diagnostic systems.

Manufacturing quality

Drilldown from broad defect classes to exact process-state signatures.

Process-state exploration

Fast queries over thresholds, operating regimes, recipes, units, sites and outcomes.

Private evaluation without disclosure

SI can be evaluated as a controlled black-box engine.

A practical evaluation can be scoped around an agreed state-signature workload and validated by reference counts, hidden answer keys or agreed audit outputs. The public artifact includes a semantic reference implementation; private evaluations can use domain-oriented evaluation harnesses without exposing proprietary internals.

01

Case pack

Agree state dimensions, segments, outcomes and query families upfront.

02

Black-box run

Run SI as an engine on a public, synthetic, anonymized or partner-defined state matrix.

03

Validation

Check exactness by reference counts, hidden answer keys, hashes or agreed output schema.

04

Metrics

Report mismatch count, latency, throughput, memory, build time and speedup vs agreed baselines.

Looking for 2–3 private evaluation cases

The best fit is a repeated exact state-signature workload: asset telemetry, process states, alarm histories, quality events, downtime records, GPU-cluster telemetry or high-dimensional research states.

Boundaries

What Signature Index does not claim.

Materials

Download the short technical materials.

The public materials include the technical report, reference artifact, benchmark summaries and domain-oriented evidence. The public code is a semantic reference implementation, not a production-optimized engine.