Using Nginx as an Object Storage Gateway

mnutt · on Dec 3, 2021

If you need random access to s3 within an ec2 region, sticking nginx in front of it as a caching proxy is unbelievably faster than hitting s3 directly; even moreso if you're multi-region.

I recently experimented with trying to have nginx rewrite images from png/jpeg to webp for clients that support it. I ended up with a solution where a lambda triggered off new files added to a bucket and re-encoded them as webp alongside the originals. When a request came into nginx, it would examine the URL and client Accept headers, and then first try to fetch the webp file from s3 before falling back to fetching the original from s3.

I was somewhat surprised that nginx was capable of doing it efficiently, given the nginx configuration format and all the moving pieces.

gonzo41 · on Dec 3, 2021

If you;re doing this with an ec2, why not use CloudFront, and if you need a tiny bit of logic you could use API GW edge optimized and toss a lambda in there to do the logic bits.

mnutt · on Dec 3, 2021

That would work, too. It would be very expensive at scale and would be difficult to keep tail latencies down. Which might be ok for some cases, but wasn't for mine.

halfmatthalfcat · on Dec 3, 2021

At Edge workers really are great at solving so many of these use cases.

killingtime74 · on Dec 3, 2021

Nginx itself supports lua and JavaScript

gonzo41 · on Dec 3, 2021

yeah but with an EC2 you're always running it. Going serverless you only pay when there's use.

dopp0 · on Dec 3, 2021

but when you have a lot of processing, lambdas can get really expensive, more than dedicated ec2 machines

dikei · on Dec 3, 2021

Instead of re-encoding everything, have you considered using something like `imgproxy` to transcode image on demand and cached the result ?

jgalt212 · on Dec 3, 2021

why webp over png/jpg? Is there that much of a difference to amortize the cost of the extra processing and/or caching?

iostream23 · on Dec 3, 2021

Só, nginx is a freemium webserver ($2500 IIRC, the open source community edition deliberately withheld features like hot reload of configs, not sure of current status wrt versions features parity etc.)

It can also serve as a proxy server, but we already have the finest proxy server in the world as open source: HAProxy

I urge anyone to learn it’s admittedly obscure but simple config file switches and be amazed at how many layers this software can operate on.

When you really need to performance tune your frontend in real-time, you will appreciate HAProxy and what it offers.

bsagdiyev · on Dec 3, 2021

After using HAProxy at work, I’ve been trying to slowly move my personal setups to using it. The config is a bit weird to learn, like nginx, at first but it really is performant.

Datagenerator · on Dec 3, 2021

Until you used Caddy?

bsagdiyev · on Dec 3, 2021

Absolutely not. We can pay HAProxy for support.

sneak · on Dec 2, 2021

Note that this is part of the not-open-source nginx.

tempay · on Dec 2, 2021

The article states:

> You can use both NGINX Open Source and NGINX Plus as the gateway to S3 or a compatible object store.

mike_d · on Dec 2, 2021

Here be dragons. The free version of nginx will only do DNS resolution on a backend hostname at startup, the paid version will do periodic lookups.

They do mention this further down the page, but in 8 months when it randomly breaks you have to hope you remember it needs to be periodically restarted to keep working.

This is by far the stupidest paywalled feature ever, because it amount to downtime extortion.

dmart · on Dec 2, 2021

There are hacky ways around it, though. The method here is something I've used: https://tenzer.dk/nginx-with-dynamic-upstreams/

gary_0 · on Dec 3, 2021

Having to hack around the nginx paywall made me briefly consider going back to Apache.

codetrotter · on Dec 3, 2021

Have you tried Caddy server? No affiliation just a happy user. It’s open source.

It may or may not be able to replace Nginx depending on your use case. For me Caddy has replaced everything I used to use Nginx for and more.

https://caddyserver.com/

dmart · on Dec 3, 2021

I'm quite interested in Caddy. The last time I checked, things were in a rough spot with the v2 transition, but it looks like the documentation has improved.

_hyn3 · on Dec 3, 2021

Apache Traffic Server (no relation to Apache itself) would be an excellent option: https://trafficserver.apache.org/

fosk · on Dec 3, 2021

Kong Gateway - which is built on top of NGINX - provides frequent DNS lookups for free in the open source version, and we have implemented this feature a very long time ago (2017?) to overcome this limitation.

So if you need need this capability for free, check it out. Not only that, but SRV record resolutions too.

tinus_hn · on Dec 3, 2021

It’s open source, what’s keeping you from patching out that limitation?

jmg_ · on Dec 3, 2021

I've always been curious to see how project owners respond to someone re-implementing portions of paid features in an open source project.

Assuming the patch is valid, do they decline it citing the paid feature or do something like making a straw man argument against it?

akerl_ · on Dec 3, 2021

I haven’t tried to pitch something to nginx, but as long as you did it as a clean implementation, “We’re declining to merge, since this is duplicative of code in our paid offering” is the general approach. And then you’re able to maintain your patch set alongside their upstream source.

sneak · on Dec 3, 2021

My patches removing spyware and phone-home in open source have been universally rejected.

gary_0 · on Dec 3, 2021

Nothing. For instance, the Debian package nginx-extras includes implementations of some closed-source nginx features. But in my experience the patches are not particularly well-maintained (since they obviously won't be merged by nginx, there's already an official paid version, and the features are named differently from the closed-source ones so they're harder to find).

mbreese · on Dec 3, 2021

Dumb question - do AWS S3 endpoints change DNS that much? Is the DNS resolution limit an issue with this specific workload, or just a general issue?

ceejayoz · on Dec 3, 2021

Run `dig s3.amazonaws.com` a few times. It's got like a 5 second TTL and the IP changes every time.

prpl · on Dec 3, 2021

Use openresty local resolver with

set $proxy_url xxx; proxy_pass $proxy_url;

iostream23 · on Dec 3, 2021

Use HAProxy, this is what it is designed for.

chespinoza · on Dec 2, 2021

Indeed, that makes me wonder how difficult for companies like Nginx is making profit from open source, well they were acquired by F5 a couple of years ago so probably they were doing quite well I think.

rad_gruchalski · on Dec 3, 2021

And it’s S3 only. The title could mention that.

chrislusf · on Dec 2, 2021

Interesting that SeaweedFS also has a similar named "Gateway to Remote Object Storage" https://github.com/chrislusf/seaweedfs/wiki/Gateway-to-Remot...

The difference is that SeaweedFS can support both read and write, with asynchronous write back. Nginx can support read only caching with 1 hour TTL

dekobon · on Dec 3, 2021

Author of the article and project here...

SeaweedFS and this project have different purposes. This project is intended to show off how to configure NGINX to act as a S3 proxying gateway by using [njs](https://nginx.org/en/docs/njs/). If you look at the github for it, you will see it is just a collection of nginx config and javascript files. This all works will standard open source NGINX. All it does is proxy files like a L7 load balancer, but in this case, it adds AWS v2/v4 headers to the upstream requests.

As for caching, that is totally configurable to whatever you want; the example configuration is set to 1 hour but that is arbitrary. In fact, one of the interesting this is all of the additional functionality that can be enabled because the proxying is being done by NGINX.

Regarding read and write, that can be enabled for AWSv2 signatures, but it is more difficult to do in AWSv4 signatures. I have an idea about how to accomplish it with v4 signatures, but it will take some time to prototype it.

What is "asynchronous write back"?

chrislusf · on Dec 3, 2021

SeaweedFS is very different from Nginx. It's just the names are so similar.

There are 2 ways to cache: write through and write back. You are using write through, which needs to write to the remote storage before returning. Write back is only writing to local copy, which is much faster to return. The actual updates are executed asynchronously.

timuralp · on Dec 3, 2021

For requests with non-empty body with v4 signatures (e.g. PUT object) you can use Unsigned-Payload (https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-heade...) and not have to compute the payload sha256.

conradfr · on Dec 3, 2021

Coincidentally I wanted to do that for a side-project I launched yesterday [1]. I tried their nginx-s3-gateway image but couldn't get the authorization to AWS S3 to work.

I replaced it with Varnish with the files publicly available on the (cheaper) S3 compatible Scaleway. I guess a simple Nginx would have work the same at that point. My goal was mostly to minimize the bandwidth cost (which is not metered on my server).

[1] https://abx.funkybits.fr