More

uasi · 2026-01-22T05:52:21 1769061141

Git can display diff between binary files using custom diff drivers:

> Put the following line in your .gitattributes file: *.docx diff=word

> This tells Git that any file that matches this pattern (.docx) should use the “word” filter when you try to view a diff that contains changes. What is the “word” filter? You have to set it up [in .gitconfig].

https://git-scm.com/book/en/v2/Customizing-Git-Git-Attribute...

danudey · 2026-01-22T18:19:24 1769105964

In their 'Git is unsuited for applications' blog post[0] they also say the following:

> We currently have to clone the whole repository just to edit translation files. That is problematic for big repositories. The repository for posthog.com for example is ~680MB in size. Even though we only need translation files which would be at max 1MB in size, we have to clone the whole repository. That is also one of the reasons why git is not used at Facebook, Google & Co which have repository sizes in the gigabytes.

I get that it can be a bit complex, but Git can handle this circumstance pretty easily if you know how (or write a script for it).

For example, cloning the GIMP repo from GitLab takes me about 56 seconds and uses up 632 MB on disk, using just `git clone <repo>`.

In comparison, running these commands:

    git clone --quiet --filter=blob:none --sparse https://gitlab.gnome.org/GNOME/gimp.git gimp-sparse-clone
    git -C gimp-sparse-clone sparse-checkout add po po-libgimp po-plug-ins po-python po-script-fu po-tags po-tips po-windows-installer

(You can also run `git sparse-checkout init --no-cone` and then just `git sparse-checkout add *.po` to grab every .po file in the repo and nothing else)

Takes 14 seconds on my laptop and uses 59 MB of disk space, and checks out only the specified directories and their contents.

So yeah, it's not as automatic as one might like but ship a shell script to your translators and you're good to go. The 'Git can't do X' arguments are mostly untrue; it should really be 'Getting git to do X is more complicated than I would prefer' or 'Explaining how to do X is git is a pain', both of which are legitimate complaints.

[0] https://samuelstroschein.com/blog/git-limitations/

theknarf · 2026-01-22T07:48:55 1769068135

Would be interesting to see some tooling built around being a custom diff driver for a bunch of different standard formats!

WorldMaker · 2026-01-22T19:05:54 1769108754

I had some interesting luck with the generic approach to unzip the DOCX/XLSX/ODT/etc, then to the contents recursively apply other filters like XML and JSON formatters/prettifiers.

(My work [1] in this space predated git so it wasn't written as a git diff filter, instead it automated source control. But the same principles could be used in the other direction.)

Not the highest level diffs you could possibly get, but at least for a programmer even ugly XML and JSON diffs were still nice to have over binary diffs.

[1] https://github.com/WorldMaker/musdex

theknarf · 2026-01-22T07:53:31 1769068411

I found this in my git starts: https://github.com/xltrail/git-xl?tab=readme-ov-file

And then there is also Pandoc that I guess could be helpful in this regard.

nine_k · 2026-01-22T17:36:51 1769103411

This is great for showing diffs. To actually make git store only deltas, not entire binaries, you would need to configure "clean" and "smudge" filters for the format. Given that docx (and xlsx) are a bunch of XML files compressed by zip, you can actually have clean diffs, and small commits.

packetlost · 2026-01-22T16:04:55 1769097895

Yeah, this is how I would prefer to solve this problem personally, but it would be really nice to have some collection of tools that cover common binary file formats automatically instead of having to configure this manually every time.

cat5e · 2026-01-22T17:05:59 1769101559

This is really great. I read the Git config article, but I thought the image diff example was kinda lackluster. Im sure some better metrics could be extracted for a more descriptive diff.

Thanks for sharing!

uasi · 2026-01-11T15:13:29 1768144409

1.6 *dollars

uasi · 2026-01-08T07:39:11 1767857951

I'm not deeply familiar with this, but from reading the `go mod tidy` manual[1], it seems that running `go mod tidy` loads all packages imported from the main module (including transitive dependencies) and records them with their precise versions back to `go.mod`, which should prevent them from being substituted with later versions. Am I understanding this correctly?

[1]: https://go.dev/ref/mod#go-mod-tidy

kadoban · 2026-01-08T07:44:04 1767858244

go.mod will always match whatever versions are being used directly, as far as I know. But it's not possible to lock them using go.mod. Like if you wanted to bump one version only in go.mod, you're then stumped for actually doing that. Because _probably_ the only reasonable way to get that to build is to do `go mod tidy` after doing that, which will modify go.mod itself. And you can't _really_ go back in and undo it unless you just manually do all of go.mod and go.sum yourself.

ncruces · 2026-01-08T07:51:25 1767858685

Running `go mod tidy` months apart with no other changes to your module will not change your go.mod. It certainly won't update dependencies.

You run that when you've made manual changes (to go.mod or to your Go code), or when you want to slim down your go.sum to the bare minimum needed for the current go.mod.

And that's one common way to update a dependency: you can edit your go.mod manually. But there are also commands to update dependencies one by one.

arccy · 2026-01-08T11:10:30 1767870630

go always requires a dependency graph that is consistent with all the declared requirements.

Which means if you wanted to update one version, it might bump up the requirements on its dependencies, and that's all the changes you see from running go mod tidy afterwards.

Manually constructing an inconsistent dependency graph will not work.

uasi · 2025-12-17T05:32:32 1765949552

塔 can be pronounced as tou, too, or somewhere between the two. It depends on the speaker, speaking style, and possibly dialect. Either way, Japanese speakers rely more on context and pitch accent than actual pronunciation, so it communicates fine.

kazinator · 2025-12-17T07:47:19 1765957639

> 塔 can be pronounced as tou

No it can't, unless someone is spelling it out, or singing it in a song where it is given two notes, or just hyper-correcting their speech based on their knowledge of writing.

Annoyed speech and such can break words into their morae for empahsis, which breaks up dipthongs.

E.g. angry Japanese five-year-old:

ga kkō ni i ki ta ku nā i!!! (I don't wanna go to school!!!)

"nā i" is not the regular way of saying "nai". The idea that "nai" has that as an alternative pronunciation is a strawman.

uasi · 2025-12-17T16:45:44 1765989944

You're right. I looked up 現代仮名遣いの告示 [0] for the first time, and it says 塔（とう） is officially pronounced as "too". I had it backwards - I thought that 塔 is "tou", but due to the varying sounds of う, people could (and often preferred to) pronounce it as "too" in everyday speech.

This kind of misconception seems not uncommon. There's an FAQ on NHK's website [1] that addresses the question of whether 言う（いう） is pronounced "iu" or "yuu". The answer is "yuu", and the article make it clear that: "It's not that [iu] is used for polite/careful speech and [yuu] for casual speech - there is no such distinction."

I think native speakers learn words by hearing them and seeing them written in hiragana, before learning the underlying rules, so they know "too" is written as とう, but might not realize that とう shouldn't be pronounced as "tou" or いう as "iu". These are at least less obvious than cases like は in こんにちは never being "ha".

Personally, if I heard someone say 塔 as "tou" or 言う as "iu", I probably wouldn't count it as incorrect, nor would I even notice the phonetic difference.

[0] https://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kiju...

[1] https://www.nhk.or.jp/bunken/research/kotoba/20160801_2.html

BalinKing · 2025-12-20T03:52:45 1766202765

FWIW I think 言う is a different phenomenon entirely, because おう is pronounced as two vowels when it has grammatical meaning (in this case, as the verb ending), or between different words/morphemes. But my (non-native) understanding was that for nouns and such, or within the main morpheme of a verb (e.g. 葬る), “ou” is (usually) indistinguishable from “oo”.

Lightkey · 2025-12-17T09:09:59 1765962599

> as tou, too, or somewhere between the two.

I see what you did there.

uasi · 2025-12-17T05:17:18 1765948638

> 方 and 頬 (hou vs hoo) is a better example.

As a native Japanese speaker, this example is eye-opening. I hadn't even realized that the u in 方 is pronounced as /o:/ — I believe most Japanese people haven't either, despite unknowingly pronounce it that way.

Also, I have no idea how to Hepburn-romanize 方 vs 頬, 負う vs 王, and 塔 vs 遠. If I had to romanize, I would just write it as whatever the romaji input method understands correctly (hou/hoo, ou/ou, and tou/too, in this case).

kazinator · 2025-12-17T07:10:39 1765955439

Your comment is astonishing.

If you know the word 方, that it is /ho:/, and you know that it has a う in it when written out, how can you not know that う stands for making the o long? The only vowel is the long o.

Japanese kindergarten kids can recognize hiragana words with "おう", correctly identifying it as /o:/. By the time they learn the 方 kanji they would have seen it written in hiragana upmpteen times, like AよりBのほうがいい and whatnot.

uasi · 2025-12-17T09:06:51 1765962411

Well, speaking for myself, I internalized how う is pronounced differently in different contexts when I was young, and by now I've almost forgotten there's a difference I need to be conscious of.

When I hear /ho:/ in a certain context, "ほう(方)" immediately comes to mind, without noticing that what I heard was a long o. To me it's just the う sound. And if someone pointed to their face while saying /ho:/, I'd think it's the お sound as in "ほお(頬)".

raincole · 2025-12-17T09:11:25 1765962685

Because they're a native speaker. Native speakers are often utterly oblivious to the 'rules' of their own languages.

Every time I read a rule about my mother tongue (Mandarin) online I was like, lol what nonsense foreigners made up... And then I realize that rule does exist. I just have internalized it for so long.

pitkali · 2025-12-17T10:02:44 1765965764

A typical example for English is the adjective order.

naniwaduni · 2025-12-17T17:10:13 1765991413

Adjective order in English is basically that most essential qualities of the object go closest to the head. There are lists out there that try to break this down into categories of adjective ("opinion-size-age-shape-colour-origin-material-purpose"), and to some extent the anglo intuitions on which sorts of properties are more or less essential are not trivial, but it's not as arbitrary as people want to make it out to be.

SilasX · 2025-12-17T18:50:13 1765997413

This. People act like it's a hyper-complicated rule that English speakers magically infer, when in reality, a) other languages do it, and b) it's a much simpler rule (that you've given) which someone overcomplicated.

As a counterexample (in line with your explanation), consider someone snarking on the WallStreetBets forum: "Come on, guys, this is supposed to be Wall Street bets, not Wall Street prudent hedges!" Adjective order changes because the intended significance changes. (Normally it would be "prudent Wall Street hedges".)

Side note: please don't nitpick about whether "Wall Street" is functionally an adjective here. The same thing would happen if the forum had been named "FinancialBets".

kazinator · 2025-12-17T19:42:37 1766000557

People "overcomplicate" the rule because they find counterexamples to the simple rule.

It's a fool's errand because the way human language works is that people happily accept odd exceptions by rote memory. So the rule simply says that there exist these exceptions. Also, there is something called euphony: speakers find utterances questionable if they are not in some canonical form they are used to hearing. For instance "black & white" is preferred over "white & black".

The rules boil down to "what people are used to hearing, regardless of the underlying grammar offering other possibilities".

Cpoll · 2025-12-17T21:34:43 1766007283

Isn't this a bad example? There's only one adjective in "prudent hedges." Changing which noun "prudent" acts on isn't a matter of adjective order.

(I suppose Wall Street is a proper adjective, like "New York pizza," but you said no nitpicking)

kazinator · 2025-12-18T02:20:48 1766024448

In compound noun phrases, nouns serve as adjective-like modifiers.

By the way, modifying compounds generally must not be plurals, to the extent that even pluralia tantum words like scissors and pants get forced into a pseudo-singular form in order to serve as modifiers, giving us scissor lift and pant leg, which must not be scissors lift and pants leg.

An example of a noun phrase containing many modifying nouns is something like: law school entrance examination grading procedure workflow.

The order among modifying nouns is semantically critical and different from euphonic adjective order; examples in which modifying nouns are permuted, resulting in strange or nonsensical interpretations, or bad grammar, are not valid for demonstrating constraintsa mong the order of true adjectives which independently apply to their subject.

For instance, red, big house is strange and wants to be big, red house. The house is independently big and red.

This is not related to why entrance examination grading procedure cannot be changed to examination entrance grading procedure. The modifiers do not target the head, but each other. "entrance" applies to "examination", not to "procedure" or "grading".

SilasX · 2025-12-18T00:19:29 1766017169

Did you read the second sentence of that paragraph? The same thing would happen with a legit adjective, like if the forum had been named "FinancialBets": "Guys, this is financial bets, not financial prudent hedges."

uasi · 2025-12-03T11:44:02 1764762242

Although many kanjis can be algorithmically composed, manual adjustment of each character's shape is still necessary for production-grade fonts. For example, if you closely compare the 彳 radical between 徧, 行, and 桁, you'll notice subtle differences in width, stroke length, angle, and margin.

uasi · 2025-10-13T07:39:21 1760341161

Thunderbird is no different than Electron apps, though. It's built on a browser engine, renders UI written in HTML + CSS (+ XUL partially), consumes ~500MB of RAM on idle, etc.

array_key_first · 2025-10-13T15:56:52 1760371012

That's because thunderbird is a full featured application. It contains a browser because you can actually use that browser. It's not using the browser as a mere presentation engine.

uasi · 2025-05-31T16:05:44 1748707544

JVM predates BEAM.

cess11 · 2025-05-31T17:08:00 1748711280

Java is ten years younger than Erlang. When Java was marketed into the new hot thing in 1996-1997 Erlang was widely used within Ericsson, and in 1997-1998 someone in Ericsson had swallowed the Sun bait and forced the start of a transition to Java. Joe Armstrong and others went to management and convinced them that since Erlang was now useless it should be released as free software, and it surprisingly was.

Which meant that some of them promptly resigned from their jobs and started a company doing freedom Erlang. It took until 2005 or so for Ericsson to confess that they had made a mistake in trying to Java all the things and got around to using Erlang again.

lioeters · 2025-05-31T18:30:28 1748716228

> it took until 2005 or so for Ericsson to confess that they had made a mistake

Impressive that someone was able to make that call and accept the situation, after investing half a decade moving to Java. Also says something about the staying power of Erlang and its paradigm, that the company was able to re-adopt it again.

cess11 · 2025-05-31T20:16:37 1748722597

Well, the IT bubble had burst and Sun was basically two thirds down the sewer at the time. Re-adopting something you had built and proven in the early days of cell phones probably looked like very reasonable risk management.

They kept using Java for some things, of course.

pjmlp · 2025-05-31T16:33:59 1748709239

Bytecode based runtimes and compiler toolchains predate both, and Erlang came to life in 1986, a couple of years until Oak.

uasi · 2025-05-17T09:24:51 1747473891

> create a new global object named "Resource" which has the needed method prototypes that can be overwritten.

those methods could conflict with existing methods already used in other ways if you’d want to make an existing class a subclass of Resource.

uasi · 2025-04-10T09:31:18 1744277478

Patchwork robes and non-orange clothes are common in Japanese Buddhism. The styles and colors vary significantly depending on the sect and one’s rank.[1]

[1] https://www2.ntj.jac.go.jp/dglib/contents/learn/edc28/shiru/...