The Cognitive Operating System That Builds Itself other AI systems.
RiNET is not an application. RiNET is an operating system — modular, autonomous, predictive. One brain. Six platforms. 32M+ structured records. Zero downtime. Sovereign EU infrastructure. Zero cloud dependency for inference. While the industry POCs LangChain demos, we operate.
Three servers. One mind.
EU-sovereign, three-node cognitive mesh. WireGuard over 10.10.0.0/24. Cognition on dedicated GPU, memory on the I/O node, public surface at the edge — every layer split for blast-radius isolation.
Six platforms. One mind.
Every platform shares the same cognitive core — what one learns, all know. What one ingests, all access. Cross-platform knowledge propagation in real time, anchored to a sovereign knowledge graph.
The system that builds itself.
700+ verified facts ingested per hour. 5 AI models orchestrated in parallel. GPU saturated 24/7 — by design. Knowledge growing exponentially. Smarter every day than the day before. If you are still wiring Supabase + LangChain + a hosted vector DB, you are already obsolete.
- LLM waterfall: Groq → DeepSeek → Ollama-local
- Self-learning: 26,000+/day, zero skip rate, evidence-anchored
- Forensic engine: 20,360+ findings, entity risk profiling
- Neo4j knowledge graph: 111K+ nodes, entity risk scoring across companies and persons, temporal relationships
- 14 specialized AI persona agents — legal, financial, journalist, prosecutor, auditor, social-media analyst, data scientist...
- DABI chat: 30+ patterns, 100% DB-grounded, zero hallucinations
- Redis cache: 24-56× faster on heavy queries, hot-path sub-second
- EVERYTHING passing through the system stays in the brain. Forever. WORM-retained. Tamper-proof.
$ rinet status --live CORE.............. ONLINE
Six models. One GPU. Sub-second response.
vLLM and Ollama running side-by-side on a single RTX 4000 SFF Ada. BGE-M3 embedder over CUDA at 1024-dim. Three reranker instances behind a load-balanced pool. Custom QLoRA fine-tunes shipped to production. A smart router that picks DeepSeek, Groq, vLLM-local, or Ollama per request — not a fixed waterfall, a routing decision. All EU sovereign, all under one shell.
vLLM
PagedAttention serving · OpenAI-compatible
- Qwen2.5-7B-Instruct-AWQ
- 4-bit AWQ · half precision
- 8 192-token context
- 40% GPU mem · enforce-eager
Ollama
6 local models · hot-swap
- qwen2.5:7b · 4.4 GB
- dabi-gemma · 8.0 GB
- dabi-budget · 4.4 GB · LoRA-merged
- dabi-3b-hr · 5.8 GB · Croatian
- gemma3:4b · 3.1 GB
- nomic-embed-text · 0.3 GB
BGE-M3 embedder
CUDA · v2.0-fixed · 5+ days uptime
- 1 024-dim dense vectors
- Multilingual · 100+ languages
- Powers 18.9 M vectors / 54 Qdrant collections
- Self-healing health check daemon
Custom LoRAs
Fine-tunes shipped to production
- dabi-budget LoRA: 92.7% train / 91.0% eval token accuracy
- 3 epochs · 783 steps · 1 h 37 m on RTX 4000 SFF Ada (20 GB)
- dabi-qlora-qwen2.5-3b-v1: HR-domain adapter
- Daily training timer present, currently paused — reactivate when training data refresh window opens
Reranker pool
3 instances · load-balanced
- Cross-encoder relevance scoring
- Sub-100 ms typical re-rank on top-50
- Powers DABI chat + RAG hybrid retrieval
- Parallel pool · failure tolerance built-in
LLM Smart Router
Per-request model selection (not a fixed waterfall)
- User-facing (brain, chat, app, api) → DeepSeek
- Editorial (portal, articles) → Groq llama-4-scout
- Background / bulk (self-learn, ingest) → vLLM-local
- Failsafe → Ollama dabi-gemma local
// Routing decision per request
Click a request type to trace the path. Latency budget shown per hop.
All four paths are live. The router picks based on request source, content type, and budget — failover is automatic when a primary returns ≥ 5xx or times out, with the next tier called in < 50 ms.
It learns. Continuously. While you read this.
Every minute, dozens of new evidence-anchored facts land in the knowledge graph — extracted from court notices, procurement awards, financial filings, EU register changes, and the press wire. No human in the loop, no model fabrication: every fact carries a source row id and a trust tier. The counter below ticks the live total, drawn from /api/status and interpolated between fetches at the measured rate.
Services that fix themselves.
Ten self-heal daemons watch every other service. When one crashes, the watchdog catches the exit, restarts it inside a back-off envelope, and posts to Telegram if the same service keeps falling. The cognition node alone runs 154 services right now; manual systemctl restart calls are an exception, not a routine.
rinet-supervisor if its heartbeat file is stale > 10 min.rinet-self-heal.service on miss.systemctl list-units on the cognition node).Ri.NET — Civic Intelligence Operating System
Seven-layer architecture. 600+ autonomous agents. 18.6M vectors across 47 specialized Qdrant collections, including 1.26M court notices and 978k legal documents. Forensic engine surfaces 20,360+ conflict-of-interest signals. Hourly evidence-grounded RAG eval framework. Custom LoRA fine-tunes on dedicated EU GPUs. REST API access. EU sovereign infrastructure. Public documentation.
Technical deep-dive. Public documentation. Transparent pricing. Engineering team direct. rinet.dev →
Pluggable intelligence.
Every module runs standalone or inside any platform. One API call — instant intelligence.
The future is already operational.
While others talk about AI, Ri.NET is already building it.
If your city, company or project does not yet have an AI that thinks — someone else already does. The gap is widening every hour.
Watch demo →