Overview
The architecture of the Privatemode API is designed to:
- Provide strict confidentiality for your AI interactions.
- Make confidentiality and security transparently verifiable from the client.
Components
Various components on both the client and server sides are crucial for achieving these security features.
Client
On the client side, a single component is essential.
The privatemode-proxy is run by the user and has two main responsibilities:
- Verifies the integrity and authenticity of the service before any inference communication begins.
- Encrypts all inference requests and decrypts AI responses.
As it handles all communication with the service, we also refer to it simply as the client or client-software.
Server
The server architecture is designed to strictly isolate any Privatemode service from the rest of the infrastructure, preventing external access or unintended data leaks while ensuring confidential data processing.
Confidential Computing Environments (CCE) are set up to:
- Ensure strong isolation of all service components from the infrastructure and environment.
- Enforce runtime encryption of all data processed by the service.
- Provide provider-independent and transparent hash generation for integrity and authenticity verification.
The Attestation Service (AS) runs within a CCE and facilitates:
- Scalable authenticity and integrity checks of the GenAI endpoint.
- Secure key exchange between the client and the GenAI inference service to ensure end-to-end encryption.
The GenAI Inference Service runs within a CCE and:
- Processes inference requests using state-of-the-art LLMs.
- Ensures that models never learn from inference data, preventing unintended data leaks.
Architectural principles
The Privatemode API is built for seamless integration while maintaining strict confidentiality in all data interactions. This is achieved through our core architectural principles.
Seamless integration
The Privatemode API is designed for seamless usage and easy integration, handling all the complexities of confidential computing behind the scenes. On the client side, the key component is the privatemode-proxy, which manages remote attestation and end-to-end encryption.
Verifiable security
Remote attestation is a cornerstone of confidential computing, and it plays a critical role in Privatemode.
In the context of Privatemode, the client uses remote attestation to verify that all server-side software components are both trustworthy and in their intended state. By leveraging independent cryptographic certificates and hardware-enforced signatures, remote attestation ensures that the GenAI endpoint is genuinely confidential, securely isolated, and running valid, trusted AI code.
Our open-source approach, combined with reproducible builds, ensures complete transparency in both the verification process and the security of our service for all API users.
Successful remote attestation is always the necessary precondition for any key exchange and prompt transfer.
To learn more about attestation in Privatemode, visit the dedicated section.
End-to-end encryption
By verifying the server side through remote attestation, the client ensures that prompt encryption keys are securely exchanged and stored. These keys are never shared with anyone except the privatemode-proxy on the client side and the AI worker running completely isolated within a CCE on the server side.
Using modern encryption schemes to secure all prompts and responses, this ensures a confidential channel between the user and the AI.
You can find more details in our encryption section.
Protection against learning
Some attacks, known as training data extraction attacks, allow user data to be extracted directly from a model through clever prompting if the user data was used for model training.
We can never access your prompts, thus we can't train our models on your data. As a result, our AI models will never retain any information from your prompts. This ensures that no other API user can extract your data. Thanks to our open-source approach, this is fully transparent and verifiable by anyone.
Protection against the infrastructure
The Privatemode API uses confidential computing to shield the AI worker that processes your prompts on the server side. Essentially, the AI worker is a virtual machine (VM) that has access to an AI accelerator like the Nvidia H100 and runs some AI code. The AI code loads an AI model onto the accelerator, pre-processes prompts, and feeds them to the AI model. Privatemode applies confidential computing to both VM and the AI accelerator and establishes a secure connection between the two.
With this approach, Privatemode shields the AI worker (and all data it processes) from the rest of the infrastructure. Here, "the infrastructure" includes the entire hardware and software stack that the AI worker runs on, as well as the people managing that stack.
Currently, in many cases, Privatemode's server-side components run on Microsoft Azure. Thus, Microsoft Azure is "the infrastructure" and Privatemode's use of confidential computing ensures that Microsoft Azure can't access any of your data.
Protection against Edgeless Systems
We, at Edgeless Systems, are your GenAI SaaS provider. Confidential computing ensures that GenAI endpoints operate in a fully isolated environment. Independent cryptographic certificates and key material are used to establish the CEE. This setup is verifiable by the client through remote attestation and guarantees that the endpoints are trustworthy and can't be manipulated by us.
The Privatemode API further ensures that all data exchanged with the AI is end-to-end encrypted. Prompts and responses remain completely private.
Our open-source approach works hand-in-hand with confidential computing. Together, they establish a verifiable and confidential channel between you and the GenAI endpoint.
This design ensures that we can never access your data.