Server
The server side of Privatemode hosts the inference service and processes prompts securely. Its architecture is designed to be highly scalable while never compromising confidentiality.
It consists of three main parts:
- The AI inference application itself, the workers
- The Contrast framework for running the workers in a Confidential Computing Environment (CCE)
- A key management service, the secret service
Workers
Workers are central to the backend. They host an AI model and serve inference requests. The necessary inference code and model are provided externally by the platform and model provider respectively.
The containerized inference code, in the following referred to as AI code, runs in a secure and isolated environment.
Each worker is a confidential container running in Contrast's runtime environment that isolates containers, more precisely Kubernetes Pods, in confidential VMs (CVM). The runtime CVM is minimal, immutable, and verifiable through remote attestation. It's also described in the Contrast documentation.
Inference code
The inference code is provided by an external party, such as HuggingFace TGI, vLLM, NVIDIA Triton, and is frequently updated. In case of Privatemode, the inference code is currently provided by vLLM. It's included in the remote attestation flow.
This code operates within a confidential computing environment that encrypts all data in memory. Within this secure environment, the inference code can access user data. To ensure that the inference code doesn't leak user data, the system relies on remote attestation, enabling the client to review and verify the code's integrity and behavior before execution.
This architecture ensures that (1) the infrastructure can't access user data or the inference code, and (2) the inference code doesn't leak user data to unprotected memory, the disk, or the network.
Confidential computing environment
Confidential Computing Environments (CCEs) provide robust hardware-based security and workload isolation.
While encryption in transit (TLS) and at rest (disk encryption) have become widespread, confidential computing completes data protection. It secures data at runtime—ensuring encryption throughout its entire lifecycle.
In Privatemode, all workloads run inside AMD SEV-SNP based Confidential VMs (CVMs).
With SEV-SNP, the memory of virtual machines (VMs) is encrypted. The processor manages encryption keys and ensures they're not accessible by untrusted software. Because encryption is hardware-accelerated, performance penalties are minimal. This reduces the attack surface, shielding workloads from:
- Unauthorized Access: Even if a malicious actor compromises the server-side system including the hypervisor or other VMs, SEV-SNP's encryption makes your data unreadable.
- Sophisticated Memory Attacks: SEV-SNP goes beyond confidentiality by adding integrity protection. It ensures that the data your VM reads is the same data it previously wrote, preventing tampering attempts.