Git from the Inside Out (2015)

troughway · on Dec 10, 2019

>Git from the Inside Out

https://github.com/git/git

https://www.aosabook.org/en/git.html

https://codewords.recurse.com/issues/two/git-from-the-inside...

http://gitimmersion.com/

neves · on Dec 10, 2019

Why these links are better than the original article?

Mijka · on Dec 11, 2019

Complementary links from some list i guess. Third one matches with current article.

akkartik · on Dec 10, 2019

> Notice how just `git add`ing a file saves its content to the objects directory. Its content will still be safe inside Git if the user deletes data/letter.txt from the working copy.

Holy crap, how do I not know this in 14+ years of working with git?!

The `git add --help` manpage seems to make no reference to this feature, it just talks about adding the file to the index.

coldpie · on Dec 10, 2019

It's not really a feature, more of a side-effect. Git-add causes git to record the state of added files. You can see this because if you make changes to an added (but uncommitted) file, you can see the diff between that uncommitted index and the state on disk. That index state must exist somewhere. Where it exists is in the object dir, just like everything else Git knows about.

(The article is slightly incorrect in that I think Git will eventually delete unreferenced state files during git-fsck; it's not stored forever. But there's a lot of heuristics during fsck to help keep data that could be valuable if the user messed up.)

amarshall · on Dec 10, 2019

> Git will eventually delete unreferenced state files during git-fsck

Yes, but if it’s in the index currently, then it is referenced and won’t be garbage collected.

akkartik · on Dec 10, 2019

I always assumed the state of the index is stored in the binary file `.git/index`, and that mutations of the index overwrite this file. Is this not accurate?

tux1968 · on Dec 10, 2019

The index doesn't contain objects themselves, every added file (whether it is committed or not) is held as a regular object. The index simply holds a list of object ID's.

So yes, if you change a file and re-add it, the index will be overwritten and the original object will become dangling without any references to it.

gcz · on Dec 10, 2019

You're mostly right. The index is a representation of the working directory tree and any modification of the index will modify .git/index. The thing is in git a directory (a tree in git language) is a collection of files references (of blobs hashes), thus the index is the staged tree, a collection of blobs hashes that will become the tree of your next commit. Blobs content are stored objects in .git/objects.

There is a reference of the index format in: Documentation/technical/index-format.txt [1]

[1]: https://github.com/git/git/blob/master/Documentation/technic...

coldpie · on Dec 10, 2019

I admit I'm getting out of my knowledge here (it's been a while since I read gitcore-tutorial), but I think that's the "to-be-committed" commit object. So the index file "points to" the object file that stores the state of the object you added.

akkartik · on Dec 10, 2019

I just `git add`ed a 100MB file to a test repo, and `.git/index` only grew to 104 bytes. So it seems to only contain metadata. TIL.

coldpie · on Dec 10, 2019

And if you go dig into `objects/`, you'll find your (possibly compressed) 100 MB object under its hash, and can view it with "git cat-file -p $hash" without ever having committed it :)

Edit: And I bet if you dig around your index file enough, you'll find that hash someplace in there.

chimeracoder · on Dec 10, 2019

> And if you go dig into `objects/`, you'll find your (possibly compressed) 100 MB object under its hash, and can view it with "git cat-file -p $hash" without ever having committed it :) > > Edit: And I bet if you dig around your index file enough, you'll find that hash someplace in there.

And if you want to know how that works, there's an article in the next issue of Code Words that goes into that as well: https://codewords.recurse.com/issues/three/unpacking-git-pac...

skrebbel · on Dec 10, 2019

The staging area^W^Windex^Wcache is a terribly designed mess, that's why.

The best way to think about it is it's just a half-finished commit, i.e. one without a message and an author and a date. But otherwise git treats the index like any other commit. Adding stuff to the staging area is like amending that commit. Actually committing it is like amending that commit again, but without changing the files, only editing the metadata (message, author, etc). And then moving the current branch to it.

You could totally simulate the cache by doing exactly this, i.e. a series of `git commit -a --amend` commands (just make sure you don't push halfway). The idea behind the staging area is that you obviously need this all the time, because reasons, so let's force you to go through the hassle for every commit you might want to make.

Because it's just a commit that hates your guts, it has all the same side effects as making a real commit has.

minitech · on Dec 10, 2019

> The idea behind the staging area is that you obviously need this all the time, because reasons, so let's force you to go through the hassle for every commit you might want to make.

I do need it all the time, and simulating it with commits would be horribly unsuited to getting work done. (The funny thing is that what you mentioned – `git commit -a` – doesn’t accomplish that, but it does skip the hassle you just said was forced.) It’s also a clean place to handle conflict resolution state, because that shouldn’t go in commits.

skrebbel · on Dec 11, 2019

> (The funny thing is that what you mentioned – `git commit -a` – doesn’t accomplish that, but it does skip the hassle you just said was forced.)

I wish it did, but it doesn't stage deletions, therefore also messing up renames, which is exactly never what you want.

minitech · on Dec 11, 2019

It does commit deletions. It doesn’t commit untracked files. If you really want to track every new file unprompted, an alias can be made for `git add --all && git commit`.

stouset · on Dec 10, 2019

> The idea behind the staging area is that you obviously need this all the time

I have seen commits by people who habitually do `git commit -a`, and it has led me to the unavoidable conclusion that yes, you obviously need to stage commits all of the time.

james-skemp · on Dec 10, 2019

While confusing for some, I love this part of `git add`.

While branching is already easy enough, I'll regularly get to a point where I may want to spend a few minutes going down a path. I'll either be happy where it leads me, or realize it was a bad idea and scrap it.

I'll `git add` the current state, make the changes I want, test it out, and then either revert back to what's staged, or like where I'm at and `git add` the rest in.

That and `git add -p` also mean that I rarely do a `git commit -a` or the like; stage it, then commit it.

reificator · on Dec 10, 2019

That sounds like the intended workflow of `git stash`.

james-skemp · on Dec 10, 2019

Then I did a bad job of explaining my workflow. :)

Stash is 'I want to temporarily keep track of where I'm at and roll back to a previous state, likely so I can do something else.' You could do a commit/branch, but it's temporary/not finished.

My add is 'I have stuff that I'm working on and want to try going in another direction for a bit; I'll either add it in if I like where I went, or go back to what I've staged.'

Example from a few hours ago:

We've lost the primary on a project that is doing ... interesting things with Grunt. The packages are three years out of date, which is what I'm tackling first. I'm opting to break these into commit based upon related groupings of packages.

So, `npm upgrade package1`, test things out, then `git add package-lock.json && git add package.json -p`. Now I upgrade another package and after testing determine that this is a pretty significant change, even though it should have been easy. Since I haven't staged my last npm upgrade I can easily discard the changes and still have all of my `npm upgrade package1` modifications. Now I can choose to commit those staged changes or try a different, but related, package upgrade.

Simple example, but easy to expand this to something that touches a handful or more of files.

The alternative would be to commit each individual upgrade, or roll back the last upgrade/thing(s) you did.

Another common use case is if I'm writing some code and realize I'd like to refactor a bit before I commit it. `git add` the working code, refactor, and if it's getting to look like it's a commit onto itself I can always `git commit` what I have staged, versus having to undo. (`git commit --amend` would work in this case too, but I do a lot of work with third-party code and am never quite sure if I'll want to keep something for historical purposes/an alternative way to do something.)

akkartik · on Dec 10, 2019

What I like about `git add` is that it creates a backup I can restore from in extremis but without cluttering up various dashboard views like `git status` or `git branch` or `git stash list`.

frenchyatwork · on Dec 10, 2019

It's certainly an understandable point of confusion. It's not clear to me if the current behavior was actually intentional, or just an byproduct of implementation.

If you add a file, then modify it, and then commit it, you're old version gets committed. That caused me a bit of confusion back in the day.

boustrophedon · on Dec 10, 2019

The behavior is pretty intentional - you can use `git add -p` explicitly to only add parts of a given file to the index.

tonyedgecombe · on Dec 11, 2019

A mistake I seem to make fairly often. That or not adding new files in the first place.

dang · on Dec 10, 2019

A thread from 2016: https://news.ycombinator.com/item?id=12802949

A bit from 2015: https://news.ycombinator.com/item?id=9793069

Discussed at the time: https://news.ycombinator.com/item?id=9272249

sdan · on Dec 10, 2019

How did they generate those nice looking graphs?

maryrosecook · on Dec 10, 2019

Hiya! Author of the article here. I used OmniGraffle.

tomca32 · on Dec 10, 2019

Hey Mary, Tomislav here from the 2013 summer batch. Just wanted to let you know it was an absolute pleasure learning from you at HS. You're a wonderful teacher and I learned a ton. Thanks!

sdan · on Dec 10, 2019

Interesting. Wish there was a way to auto-generate these graphs with my git history.

I like them!

tcoff91 · on Dec 10, 2019

Lots of git clients like Fork have pretty graphs for browsing your history. Even the terminal client can generate graphs in the git log if you give it the right params.

fit2rule · on Dec 10, 2019

I regularly use 'gource' to get a lovely picture of my repos:

https://gource.io

coldpie · on Dec 10, 2019

Have you tried "gitk"?

shadykiller · on Dec 10, 2019

I gave a similar talk “Inside Git Guts with Ruby” at RubyConf India 2013 - https://m.youtube.com/watch?v=lPlwkxrG2NM

I had to learn a lot of git internals and it was super fun