If you need random access to s3 within an ec2 region, sticking nginx in front of it as a caching proxy is unbelievably faster than hitting s3 directly; even moreso if you're multi-region.
I recently experimented with trying to have nginx rewrite images from png/jpeg to webp for clients that support it. I ended up with a solution where a lambda triggered off new files added to a bucket and re-encoded them as webp alongside the originals. When a request came into nginx, it would examine the URL and client Accept headers, and then first try to fetch the webp file from s3 before falling back to fetching the original from s3.
I was somewhat surprised that nginx was capable of doing it efficiently, given the nginx configuration format and all the moving pieces.
If you;re doing this with an ec2, why not use CloudFront, and if you need a tiny bit of logic you could use API GW edge optimized and toss a lambda in there to do the logic bits.
That would work, too. It would be very expensive at scale and would be difficult to keep tail latencies down. Which might be ok for some cases, but wasn't for mine.
Só, nginx is a freemium webserver ($2500 IIRC, the open source community edition deliberately withheld features like hot reload of configs, not sure of current status wrt versions features parity etc.)
It can also serve as a proxy server, but we already have the finest proxy server in the world as open source: HAProxy
I urge anyone to learn it’s admittedly obscure but simple config file switches and be amazed at how many layers this software can operate on.
When you really need to performance tune your frontend in real-time, you will appreciate HAProxy and what it offers.
After using HAProxy at work, I’ve been trying to slowly move my personal setups to using it. The config is a bit weird to learn, like nginx, at first but it really is performant.
Here be dragons. The free version of nginx will only do DNS resolution on a backend hostname at startup, the paid version will do periodic lookups.
They do mention this further down the page, but in 8 months when it randomly breaks you have to hope you remember it needs to be periodically restarted to keep working.
This is by far the stupidest paywalled feature ever, because it amount to downtime extortion.
I'm quite interested in Caddy. The last time I checked, things were in a rough spot with the v2 transition, but it looks like the documentation has improved.
Kong Gateway - which is built on top of NGINX - provides frequent DNS lookups for free in the open source version, and we have implemented this feature a very long time ago (2017?) to overcome this limitation.
So if you need need this capability for free, check it out. Not only that, but SRV record resolutions too.
I haven’t tried to pitch something to nginx, but as long as you did it as a clean implementation, “We’re declining to merge, since this is duplicative of code in our paid offering” is the general approach. And then you’re able to maintain your patch set alongside their upstream source.
Nothing. For instance, the Debian package nginx-extras includes implementations of some closed-source nginx features. But in my experience the patches are not particularly well-maintained (since they obviously won't be merged by nginx, there's already an official paid version, and the features are named differently from the closed-source ones so they're harder to find).
Indeed, that makes me wonder how difficult for companies like Nginx is making profit from open source, well they were acquired by F5 a couple of years ago so probably they were doing quite well I think.
SeaweedFS and this project have different purposes. This project is intended to show off how to configure NGINX to act as a S3 proxying gateway by using [njs](https://nginx.org/en/docs/njs/). If you look at the github for it, you will see it is just a collection of nginx config and javascript files. This all works will standard open source NGINX. All it does is proxy files like a L7 load balancer, but in this case, it adds AWS v2/v4 headers to the upstream requests.
As for caching, that is totally configurable to whatever you want; the example configuration is set to 1 hour but that is arbitrary. In fact, one of the interesting this is all of the additional functionality that can be enabled because the proxying is being done by NGINX.
Regarding read and write, that can be enabled for AWSv2 signatures, but it is more difficult to do in AWSv4 signatures. I have an idea about how to accomplish it with v4 signatures, but it will take some time to prototype it.
SeaweedFS is very different from Nginx. It's just the names are so similar.
There are 2 ways to cache: write through and write back. You are using write through, which needs to write to the remote storage before returning. Write back is only writing to local copy, which is much faster to return. The actual updates are executed asynchronously.
Coincidentally I wanted to do that for a side-project I launched yesterday [1]. I tried their nginx-s3-gateway image but couldn't get the authorization to AWS S3 to work.
I replaced it with Varnish with the files publicly available on the (cheaper) S3 compatible Scaleway. I guess a simple Nginx would have work the same at that point. My goal was mostly to minimize the bandwidth cost (which is not metered on my server).
I recently experimented with trying to have nginx rewrite images from png/jpeg to webp for clients that support it. I ended up with a solution where a lambda triggered off new files added to a bucket and re-encoded them as webp alongside the originals. When a request came into nginx, it would examine the URL and client Accept headers, and then first try to fetch the webp file from s3 before falling back to fetching the original from s3.
I was somewhat surprised that nginx was capable of doing it efficiently, given the nginx configuration format and all the moving pieces.