Great point! It was originally about both of these things, but the storage aspect isn't discussed much anymore because it's really not a concern.
The data integrity issue is still a concern, and I talk about that in the book. You need to manage data integrity in your application and think about how to handle updates properly. But it's completely doable and many people have.
> You need to manage data integrity in your application
This is just another way of saying you need to implement your own system for managing consistency. Dynamo offers transactions now, but they don’t offer actual serialization. Your transaction will simply fail if it runs into any contention. You might think that’s ok, just retry. But because you’ve chosen to sacrifice modelling your data, this will happen a lot. If you want to use aggregates in your Dynamo, you have to update an aggregate field every time you update your data, which means you create single points of contention for your failure prone transactions to run into all over your app. There are ways around this, but they all come at the cost of further sacrificing consistency guarantees. The main issue with that being that once you’ve achieved an inconsistent state, it’s incredibly difficult to even know it’s happened, let alone to fix it.
Then you run into the issue of actually implementing your data access. With a denormalized data set, implementing all your data access comes at the expense of increasing interface complexity. Your schema ends up having objects, which contain arrays of objects, which contain arrays... and you have to design all of your interfaces around keys that belong to the top level parent object.
The relational model wasn’t designed to optimize one type of performance over another. It was designed to optimize operating on a relational dataset, regardless of the implementation details of the underlying management system. Trying to cram a relational set into a NoSQL DB unavoidably comes at the expense of some very serious compromises, with the primary benefit being that the DB itself is easier to administer. It’s not as simple as cost of storage vs cost of compute. NoSQL DBs like dynamo (actually especially dynamo) are great technology, with many perfectly valid use cases. But RDBMS is not one of them, and everybody I’ve seen attempt to use it as an RDBMS eventually regrets it.
Are there any basic examples you can give around maintaining that integrity?
I'm liking DynamoDB for tasks that fit nicely within a single domain, have relatively pain-free access patterns, etc. And I've found good fits, but there are some places where the eventual consistency model makes me nervous.
I'm specifically thinking about updating multiple different DynamoDB keys that might need to be aggregated for a data object. The valid answer may be "don't do that!" – if so, what should I do?
> Are there any basic examples you can give around maintaining that integrity?
For those types of use cases, the OP’s advice would actually require implementing a fully bespoke concurrency control system in your business logic layer. Without trying to disparage the OP, this is for all intents and purposes, impossible (aside from also being very, very impractical). There’s some things you can do to create additional almost-functional (though still highly impractical) consistency controls for dynamo (like throttling through FIFO queues), but they all end up being worse performance and scaling trade-offs then you’d get from simply using an RDBMS.
A lot of it boils down to the fact that dynamo doesn’t have (and wasn’t designed to have) locking, meaning that pretty much any concurrency control system you want to implement on top of it, is eventually going to run into a brick wall. The best you’d possibly be able to do is a very, very slow and clunky reimplementation of some of Spanner’s design patterns.
Yeah, the limited transaction support is a killer for many use cases.
Thankfully, it’s not actually all that hard to implement your own “dynamo layer” on top of an SQL database and get most of the scaling benefits without giving up real transactions.
The data integrity issue is still a concern, and I talk about that in the book. You need to manage data integrity in your application and think about how to handle updates properly. But it's completely doable and many people have.