Overview
Continuum consists of two important parts: the server side and the client side. The server side hosts the AI service and processes prompts securely. The client side verifies the server, encrypts the prompts, and sends inference requests to the server. This page explains how these components interact and details their respective roles.
Server-side architecture
The server side of Continuum hosts the inference service. Its architecture consists of two main components: the workers and the attestation service. We'll dive into each of these components in the following sections.
Worker
Worker nodes are central to the backend. They host an AI model and serve inference requests. The necessary inference code and model are provided externally by the platform and model provider respectively.
The containerized inference code, in the following referred to as AI code, runs in a secure and isolated environment.
Each worker is a confidential VM (CVM) running Continuum's customized Linux OS, Continuum OS. This OS is minimal, immutable, and verifiable through remote attestation. Continuum OS hosts workloads in a secure sandbox environment and mediates network traffic through an server-sider encryption proxy.
AI code sandbox
The AI code, provided by the platform provider runs in a gVisor sandbox. In case of Continuum, the platform provider is vLLM. This sandbox isolates the AI code from the host, handling system calls in a user-space kernel and blocking network traffic to prevent data leaks.
Server-side encryption proxy
The AI code has an attached proxy container, which is its only connection to the outside world. It decrypts incoming requests from the client and forwards them to the sandbox. In the opposite direction, it encrypts responses and sends them back to the user.