Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SHA-1 will still work fine for the purpose of git. It is just no longer considered secure for cryptographic operations, such as digital signature, that doesn't mean that you can't use it for other purposes, like git does. Using it is still fine and will ever be fine.

Making the hashing algorithm exchangeable would have introduces complexity in a software that is already complex, and also less efficient (one of the reasons git was created was speed for large project like the Linux kernel) for no real purpose. If you want to change the algorithm, given that you will break compatibility with all existing repositories, tools, and clients, you make a fork of git because you are changing too much.

I don't see why migrating to SHA-256. The collisions are still very unlikely to generate accidentally, and sure, if you want to do damage to a repository you can create one on purpose, as you can as well commit some hooks that contains malware, or alter the git history in any way you want, so what's the point?



Collisions definitely do matter for git security: many people pin explicit git hashes for their dependancies, and thus they can be tricked in running malicious forks. This requires placing a chosen commit in the git repo (so unlike second preimage break it does not mean that you could attack repos you have no control over) but that's not an unrealistic threat model overall.


What is the thread model though?

I don't think it's possible to create a collision that's also executeable code which adds a security hole or anything.

So what exactly would they achieve with the collision?

And how do they push these gigantic files that have the hash collisions to a server? The upload time would be significant.


1) People systematically underestimate the possibility of creating collisions that still do something "interesting", like being polyglots (files that can be interpreted in multiple formats, executable or otherwise). See PoC||GTFO, specifically anything by Ange Albertini, for examples; grep https://github.com/angea/pocorgtfo/blob/master/README.md for "MD5". I specifically recommend this writeup: https://github.com/angea/pocorgtfo/blob/master/writeups/19/R... .

1bis) You can use an existing collision to create new collisions. People seem to think you need to generate all the work again from scratch; this is not true. See PoC||GTFO for proof by example.

1cis) The files do not need to be gigantic. See PoC||GTFO for proof by example.

2) You can do the collision in advance, and publish the malicious version later. What it accomplishes is that the concept of "this Git hash unambiguously specifies a revision" no longer works, and one of them can be malicious.

3) The standard should be "obviously safe beyond a reasonable doubt", not "not obviously unsafe to a non-expert". By the latter standard, pretty much any random encryption construction is fine. (The examples I gave use MD5, not SHA-1, but that's a matter of degrees.)

4) SHA-256 was published years before git first was.


What do you mean by the `bis` and `cis` suffixes to your entry labels?


It's just a subdivision; it might as well have said 1a, 1b... -- but "bis" and "cis/tris" (and possibly tetrakis) tend to emphasize that they're addenda, not equal points.


It should normally be "bis" and "ter".

The Latin for "once, twice, thrice, four times, five times" is "semel, bis, ter, quater, quinquies". ("Bis" and "ter" are the only really short ones.)

It's moderately common in European standards and bureaucracy to use "bis" and "ter" for "version/revision 2" and "version/revision 3", respectively. For example https://en.wikipedia.org/wiki/List_of_ITU-T_V-series_recomme...


Huh, good point; I wonder if my mind mixed up the org chem with the numbers (likely) or if that's some kind of unique Belgian affectation.


Ah, gotcha. I thought I recognized the prefixes from Organic, but I don't think I've seen it used like this here. Neat!


The possible attack is to prepare 2 versions of a commit, both resulting in the same commit id. Then later on, after the project is successful/etc, swap out the commit with the second version, while keeping the other commits intact.

Granted, the file that the commit touches would need to be not touched in other commits. That's not out of question in a typical software project - maybe a file in the utils folder which is only written once and never changed?

> I don't think it's possible to create a collision that's also executeable code

You can include an unreadable binary blob in the commit. Tweak the blob to find the collision while keeping the code the way attack requires.


> swap out the commit

What's the method for doing this? Does a "git push" replace objects with identical hashes on the remote? Or a "git pull" replace identical hashes on the local repo?

I suspect finding a hash collision is only the first difficult part of actually pulling this off. You may need direct write access to the file system of the target. And even then everyone else that has already fetched the repo may not be impacted. At which point collisions becomes moot because you can rewrite the entire git history however you want.


The history teaches us: If any system isn't hardened against something, we can assume it's possible. If Git server isn't specifically hardened against that, it might still be tricked to update the file by adversary client. Or attacker can temporarily add hooks that will replace the file on server. Or integration testing system might have write access to the server repo.


> Granted, the file that the commit touches would need to be not touched in other commits.

That's not how git works. The commit contains the entire tree. You could prepare two separate repositories such that `git checkout deadbeef0001deadbeef` in one checks out the linux kernel and in the other checks out ILOVEYOU.exe.


You're right. Commit id points to a commit object, that points to a tree object and subsequently to individual blob objects. Then it is sufficiently harder, you need to find a collision between 2 blob objects, both of which are executable and don't look suspicious.


That one is nasty.


The files don't need to be gigantic. You could, for example, have a binary config file which in one colliding version encodes a potentially dangerous debugging setting, e.g., "allow_unauthenticated_rpc = false" but in other has it to "true".


A denial of service of sorts? (Something broken and unusable is delivered instead, as distinct from something usable but maliciously so.)

I agree that the chances of ever getting a second pre-image that not only makes sense, but does so in some malicious way may as well be zero, surely?


The part about the 2nd pre-image chances being effectively zero may be true, but the nasty cases described upthread don't need a 2nd pre-image. (You could do a lot worse with one, granted!)


In addition to the attack described in a sibling comment, when a hashing algorithm has been broken in some way, it is safe to assume that other more advanced collision attacks will be soon discovered.


> It is just no longer considered secure for cryptographic operations, such as digital signature, that doesn't mean that you can't use it for other purposes, like git does.

Git effectively uses its hashes as a component of a digital signature scheme, in the form of signed tags and signed commits. The GPG signature covers the commit object, which identifies the content (tree) by its hash.


> SHA-1 will still work fine for the purpose of git. It is just no longer considered secure for cryptographic operations, such as digital signature, that doesn't mean that you can't use it for other purposes, like git does. Using it is still fine and will ever be fine.

This suggests git does not rely on its hash for security properties, which seems false? What is the purpose of pinning revisions or signing git tags?


Honestly I find these rationalizations around the use of SHA-1 annoying and counter-productive. The rule is simple: don't use SHA-1. If you already use SHA-1 migrate away from it. You know that plenty of software out there that interfaces with git expecting that the commit hash will be unique. Is it a security risk? Maybe, maybe not. I don't care to find out.

It doesn't matter until it starts mattering. If the Git devs had done the right thing over a decade ago we wouldn't be having this discussion. The longer they wait the more painful the migration will be.

SHA-2 was published in 2001, git was released in 2005 and now we're in 2021 and we're having this discussion again. The first concrete attack was released in "early 2005" according to wikipedia, so there's really no excuse.

Just do it, make a major change where you replace SHA-1 with SHA-256 and call it a day. It's going to be painful for a few months and then we'll move on.

For me these discussions demonstrate the immaturity of software engineering. In other industries regulators would've banned the use of SHA-1 and you couldn't get certified if you used it.

Do electronic engineers regularly try to argue "well ok RoHS says we can't have lead solder in this product but frankly for this one it's fine the casing is completely waterproof and there are no risks for the customer"? No, they don't. If the spec says no lead, then either it's no lead or you can't sell your product. End of story.

SHA-1 is the lead solder of software engineering. Only acceptable for military use.


> You know that plenty of software out there that interfaces with git expecting that the commit hash will be unique

You also know that plenty of software out there that interfaces with git has hardcoded assumptions (like, for example, the assumption that the commit hash will be exactly 40 characters long). Some tools parse the output of git log and other commit-bearing commands to make decisions. Will changing git to SHA-256 create new unforeseen security risks due to breakage of those tools (for example, by only grabbing the first 40 characters of a SHA-256 digest instead of all 64 or by just outright crashing)? Maybe, maybe not.

IMO I think you would create more security risks with the git integration breakage that would accompany migrating to sha256 vs. staying with sha1.

At this point it's almost like you want a new tool/new command. `git` vs. `git2`. New projects use git2, existing projects use git (or something like that). Otherwise confusion and backwards-compatibility breakage will abound.


> New projects use git2, existing projects use git (or something like that). Otherwise confusion and backwards-compatibility breakage will abound.

Like uh, python2 and python3? ;-p


The current plan is for Git to essentially use SHA-1 hashes as aliases for SHA-256, using a lookup table. This would mean that any given SHA-1 hash would, in a particular repository, map to one and only one SHA-256 hash, which is then used to retrieve the object. Eventually the SHA-1 hashes would be deprecated (via a per-repository mode switch, so every Git user or host would migrate according to their preferences).

https://git-scm.com/docs/hash-function-transition/


The difference is that this breakage would be immediately visible. All code that mishandles these hash would immediately break. With collisions it can remain undetected for a long time, and potentially until somebody smart and with bad intentions finds a way to break your system in some creative way.

And again, if we had done this when it should have been done, i.e. pre-2010, we wouldn't be having this discussion. The longer we wait the more painful the migration will be whenever somebody manages to actually bruteforce collisions for git commits. We're not there. Yet.


If the lead solder is actually functional in some way then they may have to attempt to find an exception under RoHS. They could attempt to define the usage so as to have an application not covered by RoHS for example.

This analogy is kind of confusing because RoHS is an imposed standard. A user of SHA1 is expected to make their own decision about appropriate usage. They might reasonably continue the usage of SHA1 for their specific use case. The real world is full of such compromises.


> SHA-1 will still work fine for the purpose of git.

So why are they changing it? That's pretty strong evidence it's not fine. I found this Stackoverflow question, "Why does Git use a cryptographic hash function?" [1], which points to [2]. Note: pretty much every DVCS uses a cryptographic hash function. That doesn't seem like an accident.

Reading through some of these old posts and threads it seems like performance was the main factor combined with the expectation that SHA1 collisions just wouldn't be an issue. The latter I find to be surprisingly naive.

[1]: https://stackoverflow.com/questions/28792784/why-does-git-us...

[2]: https://ericsink.com/vcbe/html/cryptographic_hashes.html


And, to that point, I'm not really convinced that a cryptographic hashing algorithm is really a great choice for git.

It is nice that it checks off the boxes for even distribution of hashes, but there's a bunch of other hashing algorithms that can do that without the performance penalty inherent in crypto hashes. For example, FNV seems like a good fit for something like git.


Is hashing a significant bottleneck in any git deployment? I'd expect that the diffing would be vastly more expensive for instance.

Besides don't many modern CPU supports things like SHA-256 in hardware?


FNV is really cool and has a reasonable quality given its simplicity. But it does have issues (sticky state, and I think the avalanche characteristics also weren't great) that are solved by only slightly more complex hashing algorithms.


> SHA-1 will still work fine for the purpose of git.

So why are so many corporations, individuals, and orgs working hard to protect against it?

Hint: because it's not actually fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: