Universal Data layer

Purpose

The Universal Data Layer (UDL) is Orbofi’s retrieval‑oriented substrate that transforms a natural‑language agent declaration into a fully contextualized, real‑time‑aware AI agent. It combines:

  • Domain Classification – multi‑label transformer that maps free‑text to one or more domain ontologies.

  • Ontology Mapping – a hierarchical knowledge graph that standardizes tags, entities, and relations.

  • Data‑Source Resolution – adapter layer that binds ontological tags to first‑party & third‑party data APIs.

  • Function Calling Gateway – standardized OpenAI function‑schema interface that injects tools & live data.

  • Context Cache – low‑latency vector store (HNSW, pq‑compressed) that co‑locates retrieved snippets with the agent’s runtime.

Together, these components deliver high‑fidelity context with millisecond‑scale retrieval for every agent.

Figure 1. Universal Data Layer Pipeline (see infographic below).

2 High‑Level Flow (Text → Agent)

  1. Text Definition – User submits: “You are an agent specialising in quantum computing.”

  2. Domain Classifier assigns labels: {quantum‑computing, physics, research‑papers}.

  3. Ontology Mapper expands labels to canonical entity IDs (e.g., arXiv category quant‑ph).

  4. Data‑Source Resolver attaches connectors (arXiv API, Qiskit docs, IBM qubit telemetry, etc.).

  5. Context Cache fetches & embeds the top‑K chunks (BM25‑>embedding re‑rank).

  6. Agent Runtime merges the cached context with the LLM prompt.

  7. Function Calling enables on‑demand calls (e.g., get_latest_arxiv("quantum error correction")).

  8. Enriched Agent responds with domain‑accurate, up‑to‑date answers.

3 Core Components

3.1 Domain Classifier

  • Model: bert‑base‑multilingual‑cased finetuned on 1.2M labelled prompts.

  • Latency: ≤ 25 ms per request (ONNX‑runtime, GPU batch size 64).

  • Output: up to 8 domain tags with confidence scores.

3.2 Ontology Mapper

  • JSON‑LD knowledge graph (~2.4 M nodes, 18 M edges).

  • Supports transitive closure (is‑a, part‑of, related‑to).

  • Ensures canonical identifiers across overlapping domains.

3.3 Data‑Source aggregation

  • A curated, modular adapter set (HTTP & gRPC). Unlimited expansion via our drag and drop data-source aggregators: plug in bespoke/public or private feeds

  • On‑the‑Fly Retrieval: If no native adapter exists for a given ontology tag or keyword, the resolver spins up a dynamic search worker (SerpAPI/Bing Web & News, RSS hubs, signed crawler) to pull the latest documents in real time.

  • Semantic Post‑Filtering: Freshly‑fetched docs are embedded, clustered, and re‑ranked against the agent’s ontology context to ensure topical precision before cache insertion.

3.4 Context Cache

  • Hybrid HNSW‑ANN + LRU hot‑cache.

  • Median retrieval latency 8 ms for K = 8 chunks.

3.5 Function Calling Gateway

  • JSON‑schema registry; auto‑generates tool definitions per adapter.

  • Streams partial results back to the LLM via tokens.

4 Real‑Time Enrichment Loop

  1. Trigger: Data‑source emits webhook or polling interval hits.

  2. Diff Detect: Kafka topic keyed by agent_id publishes changed docs.

  3. Re‑Index: Changed vectors re‑embedded (SBERT‑mini) and upserted.

  4. Notify Agent: Long‑poll channel pushes delta context; agent can optionally call refresh_context().

5 Security & Governance

  • Auth: OAuth 2.1 w/ JWT; granular scopes (read:arxiv, write:dune).

  • PII Scrubbing: Regex + NER filter before embedding.

  • Audit Logs: Signed, append‑only (OpenTelemetry + AWS QLDB).

6 Example: Quantum‑Computing Agent

POST /v1/agents
{
  "description": "You are a quantum‑computing research assistant.",
  "goals": ["summarise latest quant‑ph papers", "compare error‑rates across qubit types"]
}
  • Within 300 ms the agent receives:

    • 5 latest arXiv abstracts (quant‑ph).

    • IBM Quantum service status.

    • Vector‑DB embeddings of Nielsen & Chuang textbook chapters.

7 Key Benefits

  • Lightning Setup: Any domain in < 4s from plain text.

  • Always Fresh: Auto‑syncs when upstream data changes.

  • Composable: Plug‑and‑play adapters & function schemas.

  • Secure by Default: Field‑level ACLs and full audit trail.


Last updated

Was this helpful?