Adding GIT SHAs to your application's HTTP headers

bradgessler · on March 14, 2011

Using a header isn't quite right. Just implement an URL on your app at /_revision that grabs the HEAD commit. Then write a git-diff-prod command like this:

  #!/bin/sh
  git-diff `curl -s "http://www.yourapp.com/_revision"`

Get your paths squared away, chmod +x it, then type in:

  git diff-prod

And boom! Instant diff with head in production.

Of course, you might want to get your deploy scripts squared away for figure out why your servers aren't restarting if thats the reason for this.

andrewvc · on March 14, 2011

Yeah, that was the original reason I implemented it (since fixed), but I just found having it live was too useful to take away.

morganpyne · on March 14, 2011

While it's a neat little trick, I'm not so sure I like this. This is adding unnecessary headers and traffic overhead. It doesn't even addresses some of the points the author mentioned e.g. "someone deployed something in a weird way and bypassed the normal deployment process" (if somebody is doing this you have bigger problems anyway). If you are managing your releases in a sensible and automated way then you will also typically have an easy way to check which version is currently deployed or running which doesn't involve adding more baggage to the HTTP headers.

andrewvc · on March 14, 2011

True, but I'm not worried about the extra bytes of an HTTP header.

It's actually really useful when coupled with Rails+Unicorn w/ downtimeless deploy. Every once in a while unicorn will run a deploy but stop picking up code changes, and this is a good way to verify that your deploy didn't go all the way through.

Additionally, it integrates with shell scripts easier than most other solutions. Lastly, if you really want you don't need to make it an HTTP header, you can just make a special /git_sha page that renders it out if you're really worried.

Additionally, GIT_SHAs are good for cache busting in some situations (though totally inappropriate for others).

marcc · on March 14, 2011

That's a really good point. Doing this will invalidate browser cache on most sites whenever you do a deployment. If you are the type that does production deployments throughout the day, then this could slow down the user experience. However, maybe you want an easy-to-use javascript cache reset tool for every deployment.

listrophy · on March 14, 2011

Neat, but kinda overkill to send it on every request. If you're doing Rails (as the article is), try out pig ( https://github.com/bendyworks/pig ) and go to /revision on your site.

does not work with heroku since they strip the .git folder... we should probably add that to the readme

sophacles · on March 14, 2011

You know, the other day I was lamenting the sorry state that is caching on the web. I decided a pretty good protocol would be similar to this: make a standard http header that includes a SHA of the content for that url. That way if the content changes the SHA changes, and you get really good cache-ability without the worries of content timeout etc[1]. Further, cache control pragmas/headers/etc don't accidentally get misconfigured in a bad way. Caches just send a header request upstream and if the header hasn't changed, just blast the content on the presumably faster/cheaper local link.

[1] They could still exist for hints or what not, but I presume reasonably advanced caches will be able to heuristically figure out how long the content lasts and do things like start sending results while waiting on the header to return... with some sort of reset if it turns out to be a bad cached result.

prodigal_erik · on March 14, 2011

Are you describing http://en.wikipedia.org/wiki/HTTP_ETag or something different?

sophacles · on March 14, 2011

Apparently ETag, but you know, with people actually using it, and a standardized hashing mechanism (to reduce duplication of common elements across sites).

(honestly I had never even heard of it before just now).

psadauskas · on March 14, 2011

All browsers take advantage of it. Not very many library http clients do, though. (I had to write my own, https://github.com/paul/resourceful )

riffraff · on March 14, 2011

ommon elements are shared by using a single source (e.g. google's CDN for javascripts) but standardizing makes little sense imvho. For one, you can use an ETag without generating the page and hashing it if it only depends on a db-stored resource, and that is mightily application dependent.

sophacles · on March 14, 2011

1) What does a CDN have to do with the simple real-world fact that content gets duplicated around sites all the time, particularly stupid crap like funny picture memes? A good global hash of this would certainly help optimize web-caching.

2) A well designed app will hopefully not have every page rendered dynamically at the server anymore, but just have an api to send data back to the page for rendering client side. This benefits from content caching (provided the content is big enough to benefit from caching rather than retransmission).

3. Using something like varnish in front of your site will allow you to dynamically invalidate the hash when needed, and in the interim, a generated result to a url is no different than a static one.

riffraff · on March 14, 2011

1) that common stuff reused as-is (such as jquery) can be cached and reused by many, funny picture memes are usually resized, cropped, modified and I frankly do not see many of them, but I load jquery 200 times a day, it seems to me you want to optimize a non-bottleneck.

2) how do you send such content? suppose it's in the db, you need to convert it to a wire format (json or whatever) and then generate an hash for it, but you do not need to, you could just send a revision id as ETag without even reading the full data from the db

3) whatever you want to do with an hash, you can do using an hash as the etag. The latter is simply more generic

sophacles · on March 14, 2011

1) You are doing browser level cache optimization and bike-shedding, I am looking at it from an ISP level. Once you have a big pipe and many users, things like this actually start to have a noticeable effect on bandwidth usage. Further, no matter what your use case, it has almost no bearing on the normal (statistically normal no digressions please) use case.

2) So I see what you are saying, but you are wrong -- the ETag still requires a lookup of "did content change since then, based on this tag". My system does not preclude a similar mapping of (tag value, last change) in the server.

3) Yeah, I know, my point is that a standardized hashing method for the ETag provides benefits on top of the ETag. Sometimes everyone playing nicely together actually works out better than lots of flexibility.

grimlck · on March 14, 2011

GWT's perfect caching is pretty similar to this - Use the MD5 of the file contents as the filename, and send out 'cache forever' headers with the file. Thus, if the file hasn't changed, the browser doesn't even need to make a HTTP request/wait for a 304 response. The MD5 filenames are stored in a 'never cache' bootstrap.

http://code.google.com/webtoolkit/doc/latest/DevGuideCompili...

I'm surprised i don't see this type of thing more often in other libraries.

wulczer · on March 14, 2011

Sounds useful for early deployment stages, but later on you really should tag the code you deploy and perhaps include the tag in your application headers (or in the page footer, or somewhere). And even better, build packages after tagging in your VCS, which gives you numerous advantages, like being able to quickly check if someone modified the code in-place on the production machine.

dman · on March 14, 2011

So you have git installed in production?

jrockway · on March 14, 2011

I bet he has "ls" also. The question is, "what's your point"?

emillon · on March 14, 2011

If it is not necessary, don't install it. Rsync or scp is probably "good enough" for production.

jrockway · on March 14, 2011

Apparently not if you want to include the git commit id in your HTTP headers :)

metageek · on March 14, 2011

But git will let you do a diff to see if someone made manual changes.

dman · on March 14, 2011

You can create a file with your git commit id in your deployment script and have it run on your source control box.

JonnieCache · on March 14, 2011

A lot of people use git for deployment. It is also used during automatic dependency installation.

ulf · on March 14, 2011

Awesome. A real hack. Beautiful, simple, useful.