Sovereign Models — Private LLMs on Your Tenant

Catalog

14 sovereign models, ready to serve.

Curated, quantized and packaged for cost-efficient inference. Hosted in EU AZs. Replicated on demand. Per-token billing with reserved capacity available.

llama-3.3-70b

Meta · open weight

70B

Top-tier generalist. Strong reasoning, multilingual, 128k context. Great default for new deployments.

q4q8fp16128k ctx

from €0.40 / 1M tok

Use →

mistral-large

Mistral · open weight

123B

European frontier model. Excellent at multilingual reasoning, code, and structured output. 128k context.

q4fp16128k ctx

from €0.55 / 1M tok

Use →

qwen-2.5-72b

Alibaba · open weight

72B

Strong coder, especially at Python and SQL. Best-in-class for tool calling and agentic workflows.

q4q8fp1632k ctx

from €0.38 / 1M tok

Use →

deepseek-v3

DeepSeek · open weight

671B MoE

Mixture-of-experts at the frontier. Activated params ~37B per token. Premier reasoning and math.

q4fp8128k ctx

from €1.20 / 1M tok

Use →

phi-4

Microsoft · open weight

14B

Small but mighty. Reasoning that punches above its weight class. Excellent for cost-sensitive workloads.

q4q8fp1616k ctx

from €0.12 / 1M tok

Use →

gemma-3-27b

Google · open weight

27B

Best-in-class multimodal. Reads images, OCRs documents. Strong default for visual-document workflows.

q4q8multimodal128k ctx

from €0.22 / 1M tok

Use →

llama-3.2-3b

Meta · open weight

Edge-friendly mini-model. Sub-100ms TTFT. Perfect for autocomplete, classification, intent detection.

q4fp16128k ctx

from €0.04 / 1M tok

Use →

nomic-embed-v2

Nomic · open weight

embed

Long-context embeddings. 8192-token chunks, MoE-routed. Use with your tenant Qdrant or Chroma.

fp16768 dimmultilingual

from €0.02 / 1M tok

Use →

+ custom fine-tune

Your weights

LoRA

Bring your own LoRA, QLoRA or full fine-tuned weights. We host, version, and serve. SafeTensors only.

any baseany quantversioned

from €89 / mo

Upload →

Sovereignty by design

Three guarantees, written into the contract.

No egress, ever

Inference happens on dedicated GPUs in EU AZs. Your prompts, completions and embeddings never leave the tenant. Egress is technically blocked at the VPC level, not just policy.

Customer-managed keys

BYO HSM-backed keys for encryption at rest and audit log signing. Rotate on your schedule. Pull a signed evidence pack any quarter for your auditors.

Auditable infrastructure

Every request is logged with the model version, parameters, latency and token counts. Logs are append-only, signed, and exportable as a CSV or stream to your SIEM.

Drop-in compatibility

Your OpenAI code already works.

Change two lines — the base URL and the API key. Same endpoints, same response shapes, same SDKs.

# pip install openai from openai import OpenAI client = OpenAI( base_url="https://api.kawkav.com/v1", api_key="kwk_…", # your tenant key ) resp = client.chat.completions.create( model="llama-3.3-70b", messages=[ {"role": "system", "content": "You are a sovereign assistant."}, {"role": "user", "content": "Summarize the GDPR article 25."}, ], temperature=0.2, ) print(resp.choices[0].message.content)

// npm i openai import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.kawkav.com/v1", apiKey: process.env.KAWKAV_KEY, }); const r = await client.chat.completions.create({ model: "mistral-large", messages: [ { role: "system", content: "Be precise." }, { role: "user", content: "Outline a HIPAA-safe RAG architecture." }, ], }); console.log(r.choices[0].message.content);

curl https://api.kawkav.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer kwk_…" \ -d '{ "model": "qwen-2.5-72b", "messages": [ {"role": "user", "content": "Generate a Postgres schema for clinical trials."} ] }'

Pricing

Per-token, monthly minimum, or fully reserved.

Pay as you go

€0.04/ 1M tok

Starting price · phi-4 mini

✓ Per-token billing, no monthly commit
✓ All 14 catalog models
✓ Shared-tenant inference pool
✓ 99.5% SLA
✓ EU data residency

Get an API key →

Tenant inference

€89/mo

+ €0.30 / 1M tok · 70B class

✓ Dedicated tenant, no co-tenancy
✓ 1M tokens/day included
✓ Custom fine-tunes hosted
✓ 99.9% SLA + priority queue
✓ Audit log + SIEM stream
✓ Customer-managed keys (HSM)

Start tenant →

Reserved capacity

From €1,200/mo

Dedicated GPU(s) · single-tenant

✓ Reserved GPU — A100, H100, B200
✓ Unlimited tokens at served capacity
✓ Pin specific model versions
✓ Air-gap / on-prem option
✓ DPA + SCC + DPIA support
✓ 99.99% SLA available

Talk to enterprise →

Open-weight LLMs.
Closed-tenant inference.

14 sovereign models, ready to serve.

Three guarantees, written into the contract.

No egress, ever

Customer-managed keys

Auditable infrastructure

Your OpenAI code already works.

Per-token, monthly minimum, or fully reserved.

Try it. Five minutes.