More

belak · 2025-07-28T20:47:27 1753735647

I'm have no experience with abstracting away the backend, but Dockerizing is actually pretty easy now - there's an offline mode[1] where you can have sqlx generate some files which let it work when there's no DB running.

[1]: https://docs.rs/sqlx/latest/sqlx/macro.query.html#offline-mo...

belak · on Dec 4, 2024

I think you're missing much of the context and are misrepresenting what happened.

As far as I'm aware, there was no crying he didn't want to work with people, but there was a frustration that he was not open to having a Markdown "standard". To the point where he actively opposed efforts to standardize it, at least under the name Markdown[1].

This is legally and technically fine, as he owns a trademark for Markdown, but when you combine the inconsistent application of that trademark (GitHub Flavored Markdown is seemingly fine, but Common Markdown was not), along with him calling it "Jeff Atwood's crusade" and mocking the effort[2], it's not a great look and resulted in quite a few frustrated people.

As an open source project, you're right that he doesn't owe anything to anyone, but that doesn't mean people have to be entirely happy about how the situation was handled either.

[1]: https://blog.codinghorror.com/standard-markdown-is-now-commo...

[2]: https://soundcloud.com/thetalkshow/ep-88-cat-pictures-side-1, around 1:15

latexr · on Dec 4, 2024

> As an open source project

But it’s not an open-source project. It’s a download link to a Perl script which is never updated and effectively no one uses.

There’s currently zero value to the code he wrote or the reference in his page. The only thing of worth that remains of his original implementation is the concept and the most basic syntax.

the_af · on Dec 4, 2024

The point is Gruber was contacted out of courtesy, his name as the initial author carrying some weight in people's minds, and he reacted in an openly hostile and mocking way.

latexr · on Dec 5, 2024

I understand that. To be clear, I was adding more context, not defending him (see above on the thread).

the_af · on Dec 5, 2024

True! I didn't realize you were the same user who sparked this specific thread. We're in agreement.

belak · on March 2, 2023

They introduced Bitbucket Pipelines around when I left, about 7 years ago.

- https://bitbucket.org/product/features/pipelines

- https://bitbucket.org/blog/introducing-bitbucket-pipelines-b...

belak · on Aug 2, 2022

This is absolutely true - when I was at Bitbucket (ages ago at this point) and we were having issues with our DB server (mostly due to scaling), almost everyone we talked to said "buy a bigger box until you can't any more" because of how complex (and indirectly expensive) the alternatives are - sharding and microservices both have a ton more failure points than a single large box.

I'm sure they eventually moved off that single primary box, but for many years Bitbucket was run off 1 primary in each datacenter (with a failover), and a few read-only copies. If you're getting to the point where one database isn't enough, you're either doing something pretty weird, are working on a specific problem which needs a more complicated setup, or have grown to the point where investing in a microservice architecture starts to make sense.

thayne · on Aug 2, 2022

One issue I've seen with this is that if you have a single, very large database, it can take a very, very long time to restore from backups. Or for that matter just taking backups.

I'd be interested to know if anyone has a good solution for that.

dsr_ · on Aug 2, 2022

Here's the way it works for, say, Postgresql:

- you rsync or zfs send the database files from machine A to machine B. You would like the database to be off during this process, which will make it consistent. The big advantage of ZFS is that you can stop PG, snapshot the filesystem, and turn PG on again immediately, then send the snapshot. Machine B is now a cold backup replica of A. Your loss potential is limited to the time between backups.

- after the previous step is completed, you arrange for machine A to send WAL files to machine B. It's well documented. You could use rsync or scp here. It happens automatically and frequently. Machine B is now a warm replica of A -- if you need to turn it on in an emergency, you will only have lost one WAL file's worth of changes.

- after that step is completed, you give machine B credentials to login to A for live replication. Machine B is now a live, very slightly delayed read-only replica of A. Anything that A processes will be updated on B as soon as it is received.

You can go further and arrange to load balance requests between read-only replicas, while sending the write requests to the primary; you can look at Citus (now open source) to add multi-primary clustering.

mgiampapa · on Aug 2, 2022

This isn't really a backup, it's redundancy which is good thing but not the same as a backup solution. You can't get out of a drop table production type event this way.

Twisell · on Aug 2, 2022

The previous commenter was probably unaware of the various way to backup recent postgresql release.

For what you describe a "point in time recovery" backup would probably be the more adequate flavor https://www.postgresql.org/docs/current/continuous-archiving...

It was first release around 2010 and gained robustness with every release hence not everyone is aware of it.

The for instance I don't think it's really required anymore to shutdown the database to do the initial sync if you use the proper tooling (for instance pg_basebackup if I remember correctly)

maxclark · on Aug 2, 2022

Going back 20 years with Oracle DB it was common to use "triple mirror" on storage to make a block level copy of the database. Lock the DB for changes, flush the logs, break the mirror. You now have a point in time copy of the database that could be mounted by a second system to create a tape backup, or as a recovery point to restore.

It was the way to do it, and very easy to manage.

gav · on Aug 2, 2022

If you add a delay of say 30 minutes for one of your replicas, you have another option in a "drop table" type event.

hamandcheese · on Aug 2, 2022

If you stop at the first bullet point then you have a backup solution.

thayne · on Aug 2, 2022

It doesn't solve the problem that sending that snapshot to a backup location takes a long time.

dsr_ · on Aug 2, 2022

No, it doesn't.

It takes exactly the time that it takes, bottlenecked by:

* your disk read speed on one end and write speed on the other, modulo compression

* the network bandwidth between points A and B, modulo compression

* the size of the data you are sending

So, if you have a 10GB database that you send over a 10Gb/s link to the other side of the datacenter, it might be as little as 10 seconds. If you have a 10TB database that you send over a nominally 1GB/s link but actually there's a lot of congestion from other users, to a datacenter on the other side of the world, that might take a hundred hours or so.

rsync can help a lot here, or the ZFS differential snapshot send.

nightshift1 · on Aug 2, 2022

Unless your storage is already mirrored off-site. Ex: EMC srdf

thayne · on Aug 3, 2022

so say the disk fails on your main DB. or for some reason a customer needs data from 6 months ago, which is no longer in your local snapshots. In order to restore the data, you have to transfer the data for the full database back over.

With multiple databases, you only have to transfer a single database, not all of your data.

hamandcheese · on Aug 5, 2022

pg_dump has an option to output one-table-per-file. You can use this for selective restores later.

dsr_ · on Aug 2, 2022

Precisely so.

hamandcheese · on Aug 2, 2022

Do you even have to stop Postgres if using ZFS snapshots? ZFS snapshots are atomic, so I’d expect that to be fine. If it wasn’t fine, that would also mean Postgres couldn’t handle power failure or other sudden failures.

dsr_ · on Aug 2, 2022

You have choices.

* shut down PG. Gain perfect consistency.

* use pg_dump. Perfect consistency at the cost of a longer transaction. Gain portability for major version upgrades.

* Don't shut down PG: here's what the manual says:

However, a backup created in this way saves the database files in a state as if the database server was not properly shut down; therefore, when you start the database server on the backed-up data, it will think the previous server instance crashed and will replay the WAL log. This is not a problem; just be aware of it (and be sure to include the WAL files in your backup). You can perform a CHECKPOINT before taking the snapshot to reduce recovery time.

* Midway: use SELECT pg_start_backup('label', false, false); and SELECT * FROM pg_stop_backup(false, true); to generate WAL files while you are running the backup, and add those to your backup.

mike_hearn · on Aug 2, 2022

Presumably it doesn't matter if you break your DB up into smaller DBs, you still have the same amount of data to back up no matter what. However, now you also have the problem of snapshot consistency to worry about.

If you need to backup/restore just one set of tables, you can do that with a single DB server without taking the rest offline.

thayne · on Aug 2, 2022

> you still have the same amount of data to back up no matter what

But you can restore/back up the databases in parallel.

> If you need to backup/restore just one set of tables, you can do that with a single DB server without taking the rest offline.

I'm not aware of a good way to restore just a few tables from a full db backup. At least that doesn't require copying over all the data (because the backup is stored over the network, not on a local disk). And that may be desirable to recover from say a bug corrupting or deleting a customer's data.

rszorness · on Aug 2, 2022

Try out pg_probackup. It works on database files directly. Restore is as fast as you can write on your ssd.

I've setup a pgsql server with timescaledb recently. Continuing backup based on WAL takes seconds each hour and a complete restore takes 15 minutes for almost 300 GB of data because the 1 GBit connection to the backup server is the bottleneck.

xuki · on Aug 3, 2022

For MySQL there is xtrabackup - https://www.percona.com/software/mysql-database/percona-xtra....

nick__m · on Aug 2, 2022

On mariadb you can tell the replica to enter into a snapshotable state[1] and take a simple lvm snapshot, tell the the database it's over, backup your snapshot somewhere else and finally delete the snapshot.

1) https://mariadb.com/kb/en/storage-snapshots-and-backup-stage...

Svenstaro · on Aug 2, 2022

I found this approach pretty cool in that regard: https://github.com/pgbackrest/pgbackrest

raarts · on Aug 3, 2022

Not a solution but using event sourcing would have prevented this.

altdataseller · on Aug 2, 2022

What if your product simply stores a lot of data (ie a search engine) How is that weird?

belak · on Aug 2, 2022

That's fair - I added "are working on a specific problem which needs a more complicated setup" to my original comment as a nicer way of referring to edge cases like search engines. I still believe that 99% of applications would function perfectly fine with a single primary DB.

zasdffaa · on Aug 2, 2022

Depends what you mean by a database I guess. I take it to mean an RDBMS.

RDBMSs provide guarantees that web searching doesn't need. You can afford to lose a pieces of data, provide not-quite-perfect results for web stuff. It's just wrong for an RDBMS.

altdataseller · on Aug 2, 2022

What if you are using the database as a system of record to index into a real search engine like Elasticsearch? For a product where you have tons of data to search from (ie text from web pages)

IggleSniggle · on Aug 2, 2022

In regards to Elasticsearch, you basically opt-in to which behavior you want/need. You end up in the same place: potentially losing some data points or introducing some "fuzziness" to the results in exchange for speed. When you ask Elasticsearch to behave in a guaranteed atomic manner across all records, performing locks on data, you end up with similar constraints as in a RDBMS.

Elasticsearch is for search.

If you're asking about "what if you use an RDBMS as a pointer to Elasticsearch" then I guess I would ask: why would you do this? Elasticsearch can be used as a system of record. You could use an RDBMS over top of Elasticsearch without configuring Elasticsearch as a system of record, but then you would be lying when you refer to your RDBMS as a "system of record." It's not a "system of record" for your actual data, just a record of where pointers to actual data were at one point in time.

I feel like I must be missing what you're suggesting here.

altdataseller · on Aug 2, 2022

Having just an Elasticsearch index without also having the data in a primary store like a RDMS is an anti-pattern and not recommended by almost all experts. Whether you want to call it a “system of record”, i wont argue semantics. But the point is, its recommended hacing your data in a primary store where you can index into elasticsearch.

zasdffaa · on Aug 3, 2022

Have you a link for this? Never heard of this requirement (but not an elastic user so no surprise).

skeeter2020 · on Aug 2, 2022

This is not typically going to be stored in an ACID-compliant RDBMS, which is where the most common scaling problem occurs. Search engines, document stores, adtech, eventing, etc. are likely going to have a different storage mechanism where consistency isn't as important.

rmbyrro · on Aug 2, 2022

a search engine won't need joins, but other things (ie text indexing) that can be split in a relatively easier way.

belak · on July 26, 2022

The article is great, but the title here has been editorialized a bit. I'm not super familiar with HN, so what's the best way to get that fixed to match the actual article?

phnofive · on July 26, 2022

Send mail to hn@ycombinator.com :)

Note that this was submitted by the author, who's in the thread.

belak · on March 1, 2022

The original plan was to release in February, but in the Beta 2 release [1], they said this: "Because we are taking the time to issue a second beta, we now expect that the Go 1.18 release candidate will be issued in February, with the final Go 1.18 release in March."

It looks like there are still a few release blockers [2]. I'd imagine RC is fairly soon though.

EDIT: as mentioned by _fz_ below, RC1 has already released. Seems like the full release will most likely still be in March.

[1]: https://go.dev/blog/go1.18beta2

[2]: https://github.com/golang/go/issues?q=is%3Aopen+label%3Arele...

_fz_ · on March 1, 2022

> I'd imagine RC is fairly soon though.

RC1 was released two weeks ago.

LukeShu · on March 1, 2022

Well, not quite 2 weeks ago; it'll be 2 weeks on Thursday. I say this because policy is to issue a release "no sooner than two weeks after" issuing the release candidate.

https://go.dev/s/release

belak · on March 1, 2022

Thanks! Good catch! I was going off blog posts and didn't see that there was a download for RC1.

belak · on Nov 23, 2021

It's also available as a mirror at https://github.com/golang/pkgsite. All the golang.org/x/* packages are thankfully available there, making them pretty easy to find.

belak · on March 14, 2021

The short version is "it depends". Essentially, each separate Wayland compositor needs to add support for the nvidia driver (or the driver needs to support more standardized APIs).

Gnome and KDE support Wayland on nvidia hardware (though it's a bit rough around the edges). Wlroots (and therefore Sway) doesn't because they don't want to have a separate code-path for one driver which doesn't want to support standards and because the APIs the driver does support wouldn't work well with the code model of wlroots.

qudat · on March 14, 2021

Nvidia works with wlroots if you use nouveau and not the proprietary drivers.

solarkraft · on March 14, 2021

As much as I appreciate Noveau, it's not really practically usable with recent cards (due to Nvidias faults!).

shmerl · on March 14, 2021

It's probably not relevant in the long term anyway. I expect desktop Nvidia usage on Linux to pretty much plummet while their driver remains a blob. The trend is negative for them already.

So I fully support wlroots developer's approach to it - show Nvidia to the door until they come back with upstreamed driver.

belak · on Sept 21, 2020

I've seen endlessh[0] referenced in the past. Here's an article that talks about this and other SSH tarpits[1].

[0]: https://github.com/skeeto/endlessh

[1]: https://nullprogram.com/blog/2019/03/22/

belak · on March 1, 2020

Oh cool, I missed this in the 1.14 release notes. This partially fixes it, but it's not easy to implement multiple interfaces with overlapping functions because you still can't implement multiple functions on the same type with the same name.