> Bubblewrap and Docker are not hardened security isolation mechanisms, but that's okay with me.
Edit to add: my understanding is the major flaw in this approach is potential bugs in Linux kernel that would allow sandbox escape. Would appreciate your insight if there are some easier/more probable attack vectors.
I'm not saying sandboxes are not needed, I'm saying VMs/containers already provide the core tech and it's easy to DIY a sandbox. Would love to understand what value E2B offers over VMs?
That's right. But they (E2B) rely on the underneath Cloud infra to achieve high scalability. Personally, I'm still not sure about the value they add on top of Cloud hosted VMs. GCP/AWS already offer huge discounts to startups, which should be enough for VM-based sandboxing of agents in the MVP phase.
This is exactly what i am building for a friend in a semi amateur fashion with LLMs. Looking at your codebase I would probably end up with something very similar in 6 months. You even have an Air toml and use firecracker, not to mention using go. Great minds think alike I suppose :D. Mine is not for AI but for running unvetted data science scripts. Simple stuff mostly. I am using rootless podman (I think you are using docker? or perhaps packer which is a tool i didn't know about until now.) to create the microvm images and the images have no network access. We're creating a .ext4 disk image to bring in the data/script.
I think I might just "take" this if the resource requirements are not too demanding. Thanks for sharing. Do you have docs for deploying on bare metal?
Not sure what your customers look like, but I'd for one also be fine with "fair source" licenses (there are several - fair source, fair code, Defold license, etc.)
These give customers 100% control but keep Amazon, Google, and other cling-on folks like WP Engine from reselling your work. It avoids the Docker, Elasticsearch, Redis fate.
"OSI" is a submarine from big tech hyperscalers that mostly take. We should have gone full Stallman, but fair source is a push back against big tech.
when we were starting out we figured there was no solution that would satisfy our requirements for running untrusted code. so we had to build our own.
the reason we open-sourced this is because we want everyone to be able to run our Sandboxes - in contrast to the majority of our competitors who’s goal is to lock you in to their offering.
with open-source you have the choice, and luckily Manus, Perplexity, Nvidia choose us for their workloads.
Regarding the browser instances: While VM boot times have definitely improved, accessing a site through a full browser render isn't always the most efficient way to retrieve information. Our goal is to get the most up-to-date information as fast as possible.
For example, something we may consider for the future is balancing when to implement direct API access versus browser rendering. If a website offers the same information via an API, that would almost always be faster and lighter than spinning up a headless browser, regardless of how fast the VM boots. While we don't support that hybrid approach yet, it illustrates why we are optimizing for the best tool for the job rather than just defaulting to a full browser every time.
Regarding robots.txt: We agree. Not all potential customers are going to want a service that respects robots.txt or other content-owner-friendly policies. As I alluded to in another comment, we have a difficult task ahead of us to do our best by both the content owners and the developers trying to access that content.
As part of Mozilla, we have certain values that we work by and will remain true to. If that ultimately means some number of potential customers choose a competitor, that is a trade-off we are comfortable with.
Katakate is built on top of Kata, and sets up a stack combining Kubernetes (K3s), Kata, Firecracker, and devmapper snapshotter for thin pool provisioning. Combining these tools together is highly non-trivial and can be a headache for many, especially for AI engs who are often more comfortable with Python workflows. The stack gets deployed with an Ansible playbook. It implements a CLI, API and Python SDK to make it super easy to use. A lot of defense in depth settings are also baked-in so that you don't need to understand those systems at a low level to get a secure setup.
reply