Universal Data layer

Purpose
The Universal Data Layer (UDL) is Orbofi’s retrieval‑oriented substrate that transforms a natural‑language agent declaration into a fully contextualized, real‑time‑aware AI agent. It combines:
Domain Classification – multi‑label transformer that maps free‑text to one or more domain ontologies.
Ontology Mapping – a hierarchical knowledge graph that standardizes tags, entities, and relations.
Data‑Source Resolution – adapter layer that binds ontological tags to first‑party & third‑party data APIs.
Function Calling Gateway – standardized OpenAI function‑schema interface that injects tools & live data.
Context Cache – low‑latency vector store (HNSW, pq‑compressed) that co‑locates retrieved snippets with the agent’s runtime.
Together, these components deliver high‑fidelity context with millisecond‑scale retrieval for every agent.
Figure 1. Universal Data Layer Pipeline (see infographic below).
2 High‑Level Flow (Text → Agent)
Text Definition – User submits: “You are an agent specialising in quantum computing.”
Domain Classifier assigns labels: {quantum‑computing, physics, research‑papers}.
Ontology Mapper expands labels to canonical entity IDs (e.g., arXiv category
quant‑ph).Data‑Source Resolver attaches connectors (arXiv API, Qiskit docs, IBM qubit telemetry, etc.).
Context Cache fetches & embeds the top‑K chunks (BM25‑>embedding re‑rank).
Agent Runtime merges the cached context with the LLM prompt.
Function Calling enables on‑demand calls (e.g.,
get_latest_arxiv("quantum error correction")).Enriched Agent responds with domain‑accurate, up‑to‑date answers.
3 Core Components
3.1 Domain Classifier
Model:
bert‑base‑multilingual‑casedfinetuned on 1.2M labelled prompts.Latency: ≤ 25 ms per request (ONNX‑runtime, GPU batch size 64).
Output: up to 8 domain tags with confidence scores.
3.2 Ontology Mapper
JSON‑LD knowledge graph (~2.4 M nodes, 18 M edges).
Supports transitive closure (
is‑a,part‑of,related‑to).Ensures canonical identifiers across overlapping domains.
3.3 Data‑Source aggregation
A curated, modular adapter set (HTTP & gRPC). Unlimited expansion via our drag and drop data-source aggregators: plug in bespoke/public or private feeds
On‑the‑Fly Retrieval: If no native adapter exists for a given ontology tag or keyword, the resolver spins up a dynamic search worker (SerpAPI/Bing Web & News, RSS hubs, signed crawler) to pull the latest documents in real time.
Semantic Post‑Filtering: Freshly‑fetched docs are embedded, clustered, and re‑ranked against the agent’s ontology context to ensure topical precision before cache insertion.
3.4 Context Cache
Hybrid HNSW‑ANN + LRU hot‑cache.
Median retrieval latency 8 ms for K = 8 chunks.
3.5 Function Calling Gateway
JSON‑schema registry; auto‑generates tool definitions per adapter.
Streams partial results back to the LLM via tokens.
4 Real‑Time Enrichment Loop
Trigger: Data‑source emits webhook or polling interval hits.
Diff Detect: Kafka topic keyed by
agent_idpublishes changed docs.Re‑Index: Changed vectors re‑embedded (SBERT‑mini) and upserted.
Notify Agent: Long‑poll channel pushes delta context; agent can optionally call
refresh_context().
5 Security & Governance
Auth: OAuth 2.1 w/ JWT; granular scopes (
read:arxiv,write:dune).PII Scrubbing: Regex + NER filter before embedding.
Audit Logs: Signed, append‑only (OpenTelemetry + AWS QLDB).
6 Example: Quantum‑Computing Agent
POST /v1/agents
{
"description": "You are a quantum‑computing research assistant.",
"goals": ["summarise latest quant‑ph papers", "compare error‑rates across qubit types"]
}Within 300 ms the agent receives:
5 latest arXiv abstracts (quant‑ph).
IBM Quantum service status.
Vector‑DB embeddings of Nielsen & Chuang textbook chapters.
7 Key Benefits
Lightning Setup: Any domain in < 4s from plain text.
Always Fresh: Auto‑syncs when upstream data changes.
Composable: Plug‑and‑play adapters & function schemas.
Secure by Default: Field‑level ACLs and full audit trail.
Last updated
Was this helpful?

