Hacker Newsnew | past | comments | ask | show | jobs | submit | ellisv's commentslogin

I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.

It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.


JSON columns shine when

* The data does not map well to database tables, e.g. when it's tree structures (of course that could be represented as many table rows too, but it's complicated and may be slower when you always need to operate on the whole tree anyway)

* your programming language has better types and programming facilities than SQL offers; for example in our Haskell+TypeScript code base, we can conveniently serialise large nested data structures with 100s of types into JSON, without having to think about how to represent those trees as tables.


You do need some fancy in-house way to migrate old JSONs to new JSON in case you want to evolve the (implicit) JSON schema.

I find this one of the hardest part of using JSON, and the main reason why I rather put it in proper columns. Once I go JSON I needs a fair bit of code to deal with migrartions (either doing them during migrations; or some way to do them at read/write time).


Yes, that's what we do: Migrations with proper sum types and exhaustiveness checking.

Since OP is using Haskell, the actual code most likely won’t really touch the JSON type, but the actual domain type. This makes migrations super easy to write. Of course they could have written a fancy in-house way to do that, or just use the safe-copy library which solves this problem and it has been around for almost two decades. In particular it solves the “nested version control” problem with data structures containing other data structures but with varying versions.

I find that JSON(B) works best when you have a collection of data with different or variant concrete types of data that aren't 1:1 matches. Ex: the actual transaction result if you have different payment processors (paypal, amazon, google, apple-pay, etc)... you don't necessarily want/care about having N different tables for a clean mapping (along with the overhead of a join) to pull the transaction details in the original format(s).

Another example is a classifieds website, where your extra details for a Dress are going to be quite a bit different than the details for a Car or Watch. But, again, you don't necessarily want to inflate the table structure for a fully normalized flow.

If you're using a concretely typed service language it can help. C# does a decent job here. But even then, mixing in Zod with Hono and OpenAPI isn't exactly difficult on the JS/TS front.


Yeah document formats (jsonb) are excellent for apps etc that interface with the messy real world. ecommerce, gvt systems etc, anything involving forms, payments etc

tryna map everything in a relational way etc - you're in a world of pain


For very simple JSON data whose schema never changes, I agree.

But the more complex it is, the more complex the relational representation becomes. JSON responses from some API's could easily require 8 new tables to store the data in, with lots of arbitrary new primary keys and lots of foreign key constraints, your queries will be full of JOIN's that need proper indexing set up...

Oftentimes it's just not worth it, especially if your queries are relatively simple, but you still need to store the full JSON in case you need the data in the future.

Obviously storing JSON in a relational database feels a bit like a Frankenstein monster. But at the end of the day, it's really just about what's simplest to maintain and provides the necessary performance.

And the whole point of the article is how easy it is to set up indexes on JSON.


When a data tree is tightly coupled (like a complex sample of nested data with some arrays from a sensor) and the entire tree is treated like a single thing by writes, the JSON column just keeps things easier. Reads can be accelerated with indexes as demonstrated here.

I fully agree that's wrong (can't imagine the overhead of some larger tables I have if that had happened), that said, often people want weird customizations in medium-sized tables that would set one on a path to having annoying 100 column tables if we couldn't express customizations in a "simple" JSON column (that is more or less polymorphic).

Typical example is a price-setting product I work on.. there's price ranges that are universal (and DB columns reflect that part) but they all have weird custom requests for pricing like rebates on the 3rd weekend after X-mas (but only if the customer is related to Uncle Rudolph who picks his nose).


But if you have to model those custom pricing structures anyway, the question what you gain by not reflecting them in the database schema.

There's no reason to put all those extra fields in the same table that contains the universal pricing information.


Normalisation brings its own overhead though.

These types of books are always interesting to me because they tackle so many different things. They cover a range of topics at a high level (data manipulation, visualization, machine learning) and each could have its own book. They balance teaching programming while introducing concepts (and sometimes theory).

In short I think it's hard to strike an appropriate balance between these but this seems to be a good intro level book.


I never liked the global leaderboard since I was usually asleep when the puzzles were released. I likely never would have had a competitive time anyway.

I never had any hope or interest to compete in the leaderboard, but I found it fun to check it out, see times, time differences ("omg 1 min for part 1 and 6 for part 2"), lookup the names of the leaders to check if they have something public about their solutions, etc. One time I even ran into the name of an old friend so it was a good excuse to say hi.

I believe that Everybody Codes has a leaderboard where it starts counting from when you first open the puzzle. So if you're looking for coding puzzles with a leaderboard that one would be fair for you.

https://everybody.codes/events


The IDE has been available for awhile.


In beta state.


Precip.ai or go grab the MRMS data yourself


We can't pull/push to our repos

    git@github.com: Permission denied (publickey).
    fatal: Could not read from remote repository.
    
    Please make sure you have the correct access rights
    and the repository exists.


Same in Canada and for colleagues in the UK and Poland.


Same here (Canada east coast)

Edit: I was just able to pull again


Germany here. Seems to be global then.


For tasks like planning travel I often am trying to optimize multiple goals at once. I might find cheaper flights on certain days but more expensive hotels. This is much easier on larger screens because you can view more information side by side.


The most impressive part to me is finding the right channel to communicate with the hospital. We had to dispute a billing issue with our hospital and it simply wasn't possible to talk to any person that wasn't part of the "patient relations" team. Billing problems went through patient relations who talked to the billers.


> Also, the rainfall. Some farmers go from morning to night never saying a word that isn't a complaint about the rainfall being wrong.

Yes. Some of them use proper rain gauges but some just complain about it. Basically none of them understand the difference between a point measurement and an areal average estimate.


Farmers will always have reason to complain about rain.

Farmers need rain, but there is never a perfect time for it to rain. There is always something they need to do that can't be done because it rained. If rain was 100% predictable months in advance farmers would just plan to not do those things on rain days (rain days often last a couple days because things need to dry), but it isn't and so they often are in the middle of something that cannot be interrupted when rain interrupts them.

Of course the other problem is sometimes it doesn't rain and then they can get all the jobs done above - but because there is no rain nothing grew (well) and so the harvests are bad...


> Some of them use proper rain gauges

OOOhh is there a device I can get that tracks this for home?


Easiest thing in the world, put a cup outside: https://www.zoro.com/acurite-rain-gauge-magnifying-12-12-in-...


With property graphs being adopting in the SQL standard, this isn’t surprising.


The fact that GQL is now supported by some of the relational Database, doesn't mean they'll become an alternative to native Graph Databases.


Yeah I guess it’s like saying that relational DBs supporting native JSON type meant the end of NoSQL DBs.


>relational DBs supporting native JSON type meant the end of NoSQL DBs

This holds true for 95% of cases of well-made software.


It pretty much did except for very specific use cases


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: