# Universal Data layer

\ <br>

<figure><img src="https://169289273-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F27jJiqws9JFvjcsPeR1J%2Fuploads%2Fgs0EIzgnwxwbfPtByZhQ%2Fimage.png?alt=media&#x26;token=0e19df4c-1de4-461d-98a2-3ee6085ab87a" alt=""><figcaption></figcaption></figure>

Purpose

The Universal Data Layer (UDL) is Orbofi’s retrieval‑oriented substrate that transforms a natural‑language **agent declaration** into a fully contextualized, real‑time‑aware AI agent. It combines:

* **Domain Classification** – multi‑label transformer that maps free‑text to one or more domain ontologies.
* **Ontology Mapping** – a hierarchical knowledge graph that standardizes tags, entities, and relations.
* **Data‑Source Resolution** – adapter layer that binds ontological tags to first‑party & third‑party data APIs.
* **Function Calling Gateway** – standardized OpenAI function‑schema interface that injects tools & live data.
* **Context Cache** – low‑latency vector store (HNSW, pq‑compressed) that co‑locates retrieved snippets with the agent’s runtime.

Together, these components deliver high‑fidelity context with millisecond‑scale retrieval for every agent.

> **Figure 1.** Universal Data Layer Pipeline (see infographic below).

### 2  High‑Level Flow (Text → Agent)

1. **Text Definition** – User submits: “You are an agent specialising in *quantum computing*.”
2. **Domain Classifier** assigns labels: *{quantum‑computing, physics, research‑papers}*.
3. **Ontology Mapper** expands labels to canonical entity IDs (e.g., arXiv category `quant‑ph`).
4. **Data‑Source Resolver** attaches connectors (arXiv API, Qiskit docs, IBM qubit telemetry, etc.).
5. **Context Cache** fetches & embeds the top‑K chunks (BM25‑>embedding re‑rank).
6. **Agent Runtime** merges the cached context with the LLM prompt.
7. **Function Calling** enables on‑demand calls (e.g., `get_latest_arxiv("quantum error correction")`).
8. **Enriched Agent** responds with domain‑accurate, up‑to‑date answers.

### 3  Core Components

#### 3.1  Domain Classifier

* **Model:** `bert‑base‑multilingual‑cased` finetuned on 1.2M labelled prompts.
* **Latency:** ≤ 25 ms per request (ONNX‑runtime, GPU batch size 64).
* **Output:** up to 8 domain tags with confidence scores.

#### 3.2  Ontology Mapper

* JSON‑LD knowledge graph (\~2.4 M nodes, 18 M edges).
* Supports transitive closure (`is‑a`, `part‑of`, `related‑to`).
* Ensures canonical identifiers across overlapping domains.

#### 3.3  Data‑Source aggregation

* A curated, modular adapter set (HTTP & gRPC). Unlimited expansion via our drag and drop data-source aggregators: plug in bespoke/public or private feeds&#x20;

* **On‑the‑Fly Retrieval:** If no native adapter exists for a given ontology tag or keyword, the resolver spins up a **dynamic search worker** (SerpAPI/Bing Web & News, RSS hubs, signed crawler) to pull the latest documents in real time.

* **Semantic Post‑Filtering:** Freshly‑fetched docs are embedded, clustered, and re‑ranked against the agent’s ontology context to ensure topical precision before cache insertion.

#### 3.4  Context Cache

* Hybrid HNSW‑ANN + LRU hot‑cache.
* Median retrieval latency 8 ms for K = 8 chunks.

#### 3.5  Function Calling Gateway

* JSON‑schema registry; auto‑generates tool definitions per adapter.
* Streams partial results back to the LLM via tokens.

### 4  Real‑Time Enrichment Loop

1. **Trigger:** Data‑source emits webhook or polling interval hits.
2. **Diff Detect:** Kafka topic keyed by `agent_id` publishes changed docs.
3. **Re‑Index:** Changed vectors re‑embedded (SBERT‑mini) and upserted.
4. **Notify Agent:** Long‑poll channel pushes delta context; agent can optionally call `refresh_context()`.

### 5  Security & Governance

* **Auth:** OAuth 2.1 w/ JWT; granular scopes (`read:arxiv`, `write:dune`).
* **PII Scrubbing:** Regex + NER filter before embedding.
* **Audit Logs:** Signed, append‑only (OpenTelemetry + AWS QLDB).

### 6  Example: Quantum‑Computing Agent

```
POST /v1/agents
{
  "description": "You are a quantum‑computing research assistant.",
  "goals": ["summarise latest quant‑ph papers", "compare error‑rates across qubit types"]
}
```

* Within 300 ms the agent receives:
  * 5 latest arXiv abstracts (quant‑ph).
  * IBM Quantum service status.
  * Vector‑DB embeddings of Nielsen & Chuang textbook chapters.

### 7  Key Benefits

* **Lightning Setup:** Any domain in < 4s from plain text.
* **Always Fresh:** Auto‑syncs when upstream data changes.
* **Composable:** Plug‑and‑play adapters & function schemas.
* **Secure by Default:** Field‑level ACLs and full audit trail.

***

<br>
