All attestation verification happens client side. We have verifiers in Python [1] and Go [2] (which FFIs to our other SDKs like WASM and Swift). We push all the verification logic to the client so the verification process is entirely transparent and auditable.
An attacker would need to compromise our build pipeline to publish a backdoored VM image [1] and extract key material to forge an attestation from the hardware [2]. The build process publishes a hash of the code to Sigstore’s transparency log [3], which would make the attack auditable.
That said, a sufficiently resourced attacker wouldn’t need to inject a backdoor at all. If the attacker already possesses the keys (e.g. the attacker IS the hardware manufacturer, or they’ve coerced the manufacturer to hand the keys over), then they would just need to gain access to the host server (which we control) to get access to the hypervisor, then use their keys to read memory or launch a new enclave with a forged attestation. We're planning on writing a much more detailed blog post about "how to hack ourselves" in the future.
We actually plan to do an experiment at DEFCON, likely next year where we gives ssh access to a test machine running the enclave and have people try to exfiltrate data from inside the enclave while keeping the machine running.
We have to trust the hardware manufacturer (Intel/AMD/NVIDIA) designed their chips to execute the instructions we inspect, so we're assuming trust in vendor silicon either way.
The real benefit of confidential computing is to extend that trust to the source code too (the inference server, OS, firmware).
Hi Nate. Routinely your various networking-related FOSS tools. Surprising to see you now work in the AI infrastructure space let alone co-founding a startup funded by YC! Tinfoil looks über neat. All the best (:
The verified trust boundary extends from the CPU to GPU [1], and TLS encrypts all data to/from the enclave and client so we can't see anything in the clear.
HTTP parsing and application logic happens on the CPU like normal. The GPU runs CUDA just like any other app, after it's integrity is verified by the CPU. Data on the PCIe bus is encrypted between the CPU and GPU too.
Could you talk more about how how this works? I don't think linked article doesn't given enough detail on how the trust boundary extends from CPU to GPU.
Does the CPU have the ability to see unencrypted data?
>You're not terminating the TLS connection from the client anywhere besides the enclave?
Yes.
>How do you load balance or front end all of this effectively?
We don't, atleast not yet. That's why all our model endpoints have different subdomains. In the next couple months, we're planning to generate a keypair inside the enclave using HPKE that will be used to encrypt the data, as I described in this comment: https://news.ycombinator.com/item?id=43996849
AS34553 hosts my personal services (email, website, VM hosting, nuclear fusion reactor, etc). It's an "autonomous system" - the kind of network that enables announcing your own IP space with BGP.
[1] https://github.com/tinfoilsh/tinfoil-python [2] https://github.com/tinfoilsh/verifier