Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Git from the Inside Out (2015) (recurse.com)
195 points by dpeck on Dec 10, 2019 | hide | past | favorite | 34 comments



Why these links are better than the original article?


Complementary links from some list i guess. Third one matches with current article.


> Notice how just `git add`ing a file saves its content to the objects directory. Its content will still be safe inside Git if the user deletes data/letter.txt from the working copy.

Holy crap, how do I not know this in 14+ years of working with git?!

The `git add --help` manpage seems to make no reference to this feature, it just talks about adding the file to the index.


It's not really a feature, more of a side-effect. Git-add causes git to record the state of added files. You can see this because if you make changes to an added (but uncommitted) file, you can see the diff between that uncommitted index and the state on disk. That index state must exist somewhere. Where it exists is in the object dir, just like everything else Git knows about.

(The article is slightly incorrect in that I think Git will eventually delete unreferenced state files during git-fsck; it's not stored forever. But there's a lot of heuristics during fsck to help keep data that could be valuable if the user messed up.)


> Git will eventually delete unreferenced state files during git-fsck

Yes, but if it’s in the index currently, then it is referenced and won’t be garbage collected.


I always assumed the state of the index is stored in the binary file `.git/index`, and that mutations of the index overwrite this file. Is this not accurate?


The index doesn't contain objects themselves, every added file (whether it is committed or not) is held as a regular object. The index simply holds a list of object ID's.

So yes, if you change a file and re-add it, the index will be overwritten and the original object will become dangling without any references to it.


You're mostly right. The index is a representation of the working directory tree and any modification of the index will modify .git/index. The thing is in git a directory (a tree in git language) is a collection of files references (of blobs hashes), thus the index is the staged tree, a collection of blobs hashes that will become the tree of your next commit. Blobs content are stored objects in .git/objects.

There is a reference of the index format in: Documentation/technical/index-format.txt [1]

[1]: https://github.com/git/git/blob/master/Documentation/technic...


I admit I'm getting out of my knowledge here (it's been a while since I read gitcore-tutorial), but I think that's the "to-be-committed" commit object. So the index file "points to" the object file that stores the state of the object you added.


I just `git add`ed a 100MB file to a test repo, and `.git/index` only grew to 104 bytes. So it seems to only contain metadata. TIL.


And if you go dig into `objects/`, you'll find your (possibly compressed) 100 MB object under its hash, and can view it with "git cat-file -p $hash" without ever having committed it :)

Edit: And I bet if you dig around your index file enough, you'll find that hash someplace in there.


> And if you go dig into `objects/`, you'll find your (possibly compressed) 100 MB object under its hash, and can view it with "git cat-file -p $hash" without ever having committed it :) > > Edit: And I bet if you dig around your index file enough, you'll find that hash someplace in there.

And if you want to know how that works, there's an article in the next issue of Code Words that goes into that as well: https://codewords.recurse.com/issues/three/unpacking-git-pac...


The staging area^W^Windex^Wcache is a terribly designed mess, that's why.

The best way to think about it is it's just a half-finished commit, i.e. one without a message and an author and a date. But otherwise git treats the index like any other commit. Adding stuff to the staging area is like amending that commit. Actually committing it is like amending that commit again, but without changing the files, only editing the metadata (message, author, etc). And then moving the current branch to it.

You could totally simulate the cache by doing exactly this, i.e. a series of `git commit -a --amend` commands (just make sure you don't push halfway). The idea behind the staging area is that you obviously need this all the time, because reasons, so let's force you to go through the hassle for every commit you might want to make.

Because it's just a commit that hates your guts, it has all the same side effects as making a real commit has.


> The idea behind the staging area is that you obviously need this all the time, because reasons, so let's force you to go through the hassle for every commit you might want to make.

I do need it all the time, and simulating it with commits would be horribly unsuited to getting work done. (The funny thing is that what you mentioned – `git commit -a` – doesn’t accomplish that, but it does skip the hassle you just said was forced.) It’s also a clean place to handle conflict resolution state, because that shouldn’t go in commits.


> (The funny thing is that what you mentioned – `git commit -a` – doesn’t accomplish that, but it does skip the hassle you just said was forced.)

I wish it did, but it doesn't stage deletions, therefore also messing up renames, which is exactly never what you want.


It does commit deletions. It doesn’t commit untracked files. If you really want to track every new file unprompted, an alias can be made for `git add --all && git commit`.


> The idea behind the staging area is that you obviously need this all the time

I have seen commits by people who habitually do `git commit -a`, and it has led me to the unavoidable conclusion that yes, you obviously need to stage commits all of the time.


While confusing for some, I love this part of `git add`.

While branching is already easy enough, I'll regularly get to a point where I may want to spend a few minutes going down a path. I'll either be happy where it leads me, or realize it was a bad idea and scrap it.

I'll `git add` the current state, make the changes I want, test it out, and then either revert back to what's staged, or like where I'm at and `git add` the rest in.

That and `git add -p` also mean that I rarely do a `git commit -a` or the like; stage it, then commit it.


That sounds like the intended workflow of `git stash`.


Then I did a bad job of explaining my workflow. :)

Stash is 'I want to temporarily keep track of where I'm at and roll back to a previous state, likely so I can do something else.' You could do a commit/branch, but it's temporary/not finished.

My add is 'I have stuff that I'm working on and want to try going in another direction for a bit; I'll either add it in if I like where I went, or go back to what I've staged.'

Example from a few hours ago:

We've lost the primary on a project that is doing ... interesting things with Grunt. The packages are three years out of date, which is what I'm tackling first. I'm opting to break these into commit based upon related groupings of packages.

So, `npm upgrade package1`, test things out, then `git add package-lock.json && git add package.json -p`. Now I upgrade another package and after testing determine that this is a pretty significant change, even though it should have been easy. Since I haven't staged my last npm upgrade I can easily discard the changes and still have all of my `npm upgrade package1` modifications. Now I can choose to commit those staged changes or try a different, but related, package upgrade.

Simple example, but easy to expand this to something that touches a handful or more of files.

The alternative would be to commit each individual upgrade, or roll back the last upgrade/thing(s) you did.

Another common use case is if I'm writing some code and realize I'd like to refactor a bit before I commit it. `git add` the working code, refactor, and if it's getting to look like it's a commit onto itself I can always `git commit` what I have staged, versus having to undo. (`git commit --amend` would work in this case too, but I do a lot of work with third-party code and am never quite sure if I'll want to keep something for historical purposes/an alternative way to do something.)


What I like about `git add` is that it creates a backup I can restore from in extremis but without cluttering up various dashboard views like `git status` or `git branch` or `git stash list`.


It's certainly an understandable point of confusion. It's not clear to me if the current behavior was actually intentional, or just an byproduct of implementation.

If you add a file, then modify it, and then commit it, you're old version gets committed. That caused me a bit of confusion back in the day.


The behavior is pretty intentional - you can use `git add -p` explicitly to only add parts of a given file to the index.


A mistake I seem to make fairly often. That or not adding new files in the first place.



How did they generate those nice looking graphs?


Hiya! Author of the article here. I used OmniGraffle.


Hey Mary, Tomislav here from the 2013 summer batch. Just wanted to let you know it was an absolute pleasure learning from you at HS. You're a wonderful teacher and I learned a ton. Thanks!


Interesting. Wish there was a way to auto-generate these graphs with my git history.

I like them!


Lots of git clients like Fork have pretty graphs for browsing your history. Even the terminal client can generate graphs in the git log if you give it the right params.


I regularly use 'gource' to get a lovely picture of my repos:

https://gource.io


Have you tried "gitk"?


I gave a similar talk “Inside Git Guts with Ruby” at RubyConf India 2013 - https://m.youtube.com/watch?v=lPlwkxrG2NM

I had to learn a lot of git internals and it was super fun




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: