SQL cannot be "fixed". It's too broken, too ubiquitous, too customized, too optimized. Will these functors integrate with the thousands of optimizations modern database systems rely on?
This post has the format: "SQL is extremely terribly awful at writing business logic. But if people implemented this one half-baked, unrealistic idea, then it would suck a little bit less!" The first part is spot-on: SQL is a horrible language for writing business logic. So: stop. Stop doing that.
Stop writing business logic in SQL!
Imagine you joined a new team, and they said: our program uses a bunch of XML files as its data format, and that's why most of our business logic is implemented in XPath. Just sit there and ponder that situation for a little bit. This is what you all sound like to me. Stop it!
> Testing the simplest query requires building up a massive object graph, just to test an infinitesimal slice of it.
Yes, correct, because SQL is awful at representing business logic, and foreign key constraints are business logic! The fact that an employee has one supervisor who is also an employee (FK Employee.SupervisorID -> Employee.ID) is a business constraint. (And one that can easily change, by the way.) All these database engineers who can't pull themselves away from their stored procedures will argue with me to their graves about foreign key constraints, but they're generally terrible. They're a business constraint that just happens to be supported by the database engine, but they're still a business constraint. Stop putting your business logic in your persistence layer.
We've been getting better at building programming languages that actually support software engineering every year. Use one of those. SQL is an output of one of your program layers. Just like XPath would be an output of your program. Compose SQL in an actual real programming language that isn't horrible instead of doing any of this.
> Yes, correct, because SQL is awful at representing business logic, and foreign key constraints are business logic!
This strikes me as a unnecessary all-or-nothing argument. I would agree that you shouldn't put all your business logic in the database but that doesn't mean you shouldn't put any business logic in your database. If foreign key constraints are business logic then so are table definitions and column types. It's not awful for those things as it's literally built right in.
My advice is to ensure, as best as possible, that invalid states are impossible to represent. If you can't have an address without being attached to a client, make that impossible to represent in the database. It's one less thing you'll ever have to worry about.
> This strikes me as a unnecessary all-or-nothing argument
I completely understand why it comes across that way. HN Comment length is limited and the current practice is a comfy local optimum, so any one change away from it is going to feel bad, and it's hard to get across my full theory of database design in comment form. I would not recommend most existing teams simply stop using foreign keys, unless they also adapted a lot of other changes that coexist with it. Still, there is a better world for database design, and that world probably does not use foreign key constraints.
> I would agree that you shouldn't put all your business logic in the database but that doesn't mean you shouldn't put any business logic in your database.
I actually would (lightly) argue that does mean you shouldn't put any business logic in your database, as I find that highly coherent design philosophy leads to highly coherent code. The most important question to me is about how the system can handle change over time. Foreign keys are one part of a larger set of "design principles" that everyone seems to take for granted that lead to a system that cannot handle change.
> If foreign key constraints are business logic then so are table definitions and column types.
I disagree on this one. Table definitions and column types are a "level below" business logic, as evidenced by the fact that you could probably refactor a lot of schemas for better/different performance characteristics without changing any of the actual business logic that drive them. The business logic doesn't (shouldn't) care whether you've used a bitfield or a bunch of boolean columns or a child set-inclusion table to represent a bunch of yes/no facts about some entity. They're "persistence logic", and the database is exactly the right place for that.
> My advice is to ensure, as best as possible, that invalid states are impossible to represent.
The problem is that an "invalid state" is highly dependent on your perspective. A state that is invalid to use in one part of your codebase is perfectly valid in other parts. The ubiquity of "draft", "draft final", and "final" shows that humans want to represent "invalid states" all the time! Aren't you glad your IDE lets you represent invalid code? Okay, fine, that data is not ready for the annual submittal, but does that mean I'm really not allowed to save it at all?
And these "invalid states" are extremely volatile as your focus expands around your domain. All contact info must have a primary contact method -- well, unless they're just "contact stubs". All facilities need a facility manager -- well, unless they're 3rd party facilities, or storage facilities, or in Region 9, or start with the letter 'P'.
Haven't you been 300 hours into the development of a system when suddenly someone springs on you, "Oh yeah, Bob has two bosses. Why should that be a problem?" And you're screaming internally, "Of course it's a problem! You assured me six months ago that everyone only has one supervisor, and now you're telling me this one guy has two!? This changes EVERYTHING!"
But why? Why does it change everything? Why does every little change to our database schemas always seem to propagate to the entire system? It's because we're designing databases wrong. And foreign keys are a (small) part of that wrongness.
> The problem is that an "invalid state" is highly dependent on your perspective. A state that is invalid to use in one part of your codebase is perfectly valid in other parts. The ubiquity of "draft", "draft final", and "final" shows that humans want to represent "invalid states" all the time! Aren't you glad your IDE lets you represent invalid code? Okay, fine, that data is not ready for the annual submittal, but does that mean I'm really not allowed to save it at all?
It's perfectly possible to define a schema that can represent draft or incomplete data, syntactically invalid code (as a VARCHAR/TEXT), etc. while still also enforcing that a complete and valid domain object must be available when the record is in a non-draft state.
> Haven't you been 300 hours into the development of a system when suddenly someone springs on you, "Oh yeah, Bob has two bosses. Why should that be a problem?" And you're screaming internally, "Of course it's a problem! You assured me six months ago that everyone only has one supervisor, and now you're telling me this one guy has two!? This changes EVERYTHING!"
Schema updates and migrations should not be this difficult in any reasonably designed system.
> Haven't you been 300 hours into the development of a system when suddenly someone springs on you, "Oh yeah, Bob has two bosses. Why should that be a problem?" And you're screaming internally, "Of course it's a problem! You assured me six months ago that everyone only has one supervisor, and now you're telling me this one guy has two!? This changes EVERYTHING!"
I have had this issue and it does change everything -- all the systems that depend on that data, the calculations involved, the user interface. The change in the database schema is actually relatively minor compared to the rest of the code. But with an enforced schema there's a never a case where it will be wrong -- the code with either work all the way one way or the other but never both at the same time.
Neither of examples provided has anything to do with foreign key relationships. If Bob can have two bosses, that's changing a column into a relation. That's going to have an effect beyond the database. If the facilities manager is optional then that's changing a column to be nullable. Again most the work after that change is outside of the database.
Even if drafts or temporary invalid states exist, I'm never going to want orphaned data. You said table definitions are column types are a "level below" business logic but the table definition is directly tied to what you're trying to store and how you're storing it. I think I would simply argue that foreign key definitions are also part of that "level below". It might be that not all facilities need a facility manager but if one is specified it better exist and you better not be able to remove it without affecting the facility.
I have a bug right now where in some race-condition situation the database is catching the issue with the constraints. I haven't found the bug yet (it's low priority) but at least it's not filling my database with invalid data -- it is just crashing that operation.
> I have a bug right now where in some race-condition situation the database is catching the issue with the constraints. I haven't found the bug yet (it's low priority) but at least it's not filling my database with invalid data -- it is just crashing that operation.
This right here. The thing that all proponents for nosqls and "this one trick
will turn sql hell into sql heaven" don't seem to take into account is that a
properly designed schema will save your ass when the front end(s) have bugs;
and yes, properly designing a schema is work, up-front work that needs to be
thought about before writing the first line of code (or sql).
Thanks for posting this, at least I know I'm not the only one that feels
this way.
For whatever it is worth, I have reached the same conclusion and I have been building systems like you describe for the last few years.
Recently I changed jobs. In the new team they love their ORM with all the foreign keys and direct mapping between business logic data objects and the database schema.
Needless to say, it is a mess. Despite all the automated tooling they have around migrations that "should make it easy" to change things, the reality is that the whole application depends on what the database schemas look like and migrating the existing representation to a new one would break everything.
This has become an elephant in the room that nobody talks about and everyone works around.
What is business logic? What specifically is SQL bad at?
I’m not sold by your foreign key example. Constraints are great. I guess you can do that in python? Good like getting ACID right
As soon as your data stops resembling a tree and starts looking like a graph than all your “business logic” looks a lot like operations on relations and you are back to where you started.
I disagree that we should be happy about moving data out of the storage layer into the backend in cases where it does not make sense performance wise.
The problem is doing some processing in SQL in some cases has a huge performance advantages over moving data to the backend.
If your business logic requires you to take a million rows from table A, apply some rule, and output a million rows into table B -- then piping all that data through a backend is a bad solution performance wise. I reject the idea that all that extra IO (+extra overhead also for the database) is worth it just to use another programming language than SQL.
What we need is better alternatives that execute at the storage layer. Probably starting with proper languages that compile to SQL, like we saw with JavaScript.
Agree not to putting your business logic in stored procs, but hard neg on getting rid of all constraints.
I consider it defense in depth. Sure your application should absolutely ensure order lines aren't referencing a non-existing order, but bugs happen and I prefer the database returning an error to the database getting full of inconsistent data.
> Compose SQL in an actual real programming language that isn't horrible instead of doing any of this.
Got any recommendations? That's a serious question, I've tried many (and built my own) and some are okay but none feel remotely good enough when you consider how much progress we've made basically everywhere else in the wider world of software engineering. Relational databases are so good; they deserve better than SQL!
Shameless plug if you've tried a range of alternatives: I've tried building a version (Trilogy) targeted at analytics, and I'd be curious about how it stacks up to your experience with other attempts at this - can read more about the composition/reuse story here: https://trilogydata.dev/blog/composability_reuse.html
I looked at it when it hit HN a while ago. Looks nice, but not exactly what I want because
> Joins and source tables are automatically resolved in Trilogy. You won't ever explicitly specify one in your query; you're declaring what you want, not how to get it. The responsibility of how to get it is delegated to the semantic model.
I don't fully understand how the semantic model is created (is this documented anywhere?), but I don't think I would enjoy a hard separation between the query layer and the semantic layer regardless.
I would prefer a continuum of "how much indirection do I really want" to be available to the user. My own exploration of this topic is https://docs.racket-lang.org/plisqin/index.html and you can compare section 2 (Using define-schema) to section 7.2 (Plisqin Desugared) if you want to know what I mean about a continuum.
Unrelated, the article we are commenting on has inspired me such that I think I have an answer to the big type system questions that eluded me when I decided to put Plisqin on the shelf. Maybe time to pick it up again... but probably not.
Ah thanks for the response! That makes sense - I gloss over it a bit in the demo intentionally, but that's a miss. The semantic definition flow is pretty much identical to what I think I'm reading with plisqin - there's an inline definition format (similar to DDLs in SQL), with the opportunity to reuse it by saving it and importing it. The separation is between SELECT/DDL - the relationships aren't defined in a select via Joins, they're defined before it in a separate statement then implicitly reused in the select.
Plisqin looks fun, the design rules resonate. If you don't pick it up again today, perhaps another time!
Find a good ORM that allows you to keep things at the "set logic" level for as long as possible and has a good internal query representation. Avoid or turn off any fancier features of the ORM like "navigation properties" or "implicit subtyping" or automatically detecting changes or anything that happens "magically" in the background: prefer simplicity and being explicit about everything. For C#, this is EntityFramework with most of its features turned off -- compose queries by joining IQueryables together, and only drop down into in-memory iteration when you have to.
If that's not available, I'd probably look for a library that presents a structured representation of the SQL abstract syntax tree (probably a subset) as an object, along with a translation to the various dialects. Pass those ASTs around and compose them together. I'd probably slowly build abstractions on top of this that eventually look similar to what I described in the first paragraph.
https://kysely.dev/ (SQL query builder for Typescript) is good, although I'm not sure it lives up to your expectations. Instead of an ORM, build the underlying SQL queries efficiently.
It basically allows to express queries independent of the target language. E.g. it's entirely possible that the same LINQ query can target PG, MSSQL Server, objects in local memory or RavenDB (NoSql). Quality of the generated query depends on the translating library ofc.
Syntax is very nice and much more aligned with programming than SQL.
E.g:
someData.Where(x => x.id =="someId").OrderBy(x => x.LastModified).Take(10).ToListAsync()
Note that the query is not executed until ToListAsync (or other materializing calls) is called. This allows for composabillity.
There is an alternative form of writing it more resembling SQL, but I've never seen it much used.
I don't understand what you are trying to say about foreign keys. Are you advocating moving FKs from database to application?
If yes, you are opening yourself up to race conditions that will only become visible in concurrent environment. I had a distinct misfortune of working with an ERP system based on Oracle which didn't use database-level FKs. Predictably, their customers would occasionally complain about "dangling" data, which, to my knowledge, has never been properly resolved.
It is possible to move foreign keys and other constraints and consistency checks into the application. But that comes at a huge cost: Either you need to be generous with your use of transactions to ensure that no invalid state is ever written, and your application has to know and use those transactions. Which is something application developers in my experience often fail at. Or you have to accept that broken inconsistent data is a fact of life that can happen at any time, and will increase in times of high throughput and concurrency (where the business part of the company will care most about things running smoothly). Also, you will limit yourself to exactly one application using the database, because otherwise you will have to duplicate your constraint logic in any application precisely.
There is also a cost of doing constraints and consistency in SQL, the language is somewhat limited in what you can easily represent (anything is possible, but only some things are easy). And the handling of violations still needs to be implemented in the application, which often necessitates at least some duplication and fiddling to e.g. get friendly and useful error messages.
So imho, yes, SQL has its problems, but it is still the lesser evil in almost all cases.
> Or you have to accept that broken inconsistent data is a fact of life that can happen at any time
You have to accept this anyway. Data can be broken and inconsistent in so many different ways that you cannot protect against every possible way. People will type the wrong thing in sometimes, in ways you never expected. Someone pasting text into the wrong field is just as "broken and inconsistent" as an ID field pointing to a nonexistent entity. How important those errors are to the business depends on how important those columns are, not whether they're an ID vs. text. And as another person pointed out, data always ends up split/replicated among multiple systems anyway.
> Also, you will limit yourself to exactly one application using the database, because otherwise you will have to duplicate your constraint logic in any application precisely.
Layer your architecture? Expose an API? Microservices? There are all sorts of ways around this problem.
Yes, it makes it harder for people to just log into your database and run their own SQL queries. Sometimes that may be politically hard to sell. But I heard this same argument in college when private variables were introduced: wait, but, what if someone wants to directly manipulate private variables of class members? That is not a virtue. It's perfectly reasonable to assert that some internals are simply private implementation details. You are reserving your right to change those details in the future as the application changes. You are separating the details of your persistence layer from the public interface that you're exposing. That is a virtue.
I am not sure what your point is, as you have not yet described it fully. Do you mean to suggest that you just drop the C of ACID? What is there to gain exactly besides the freedom that is like the freedom of a blind man crossing the freeway.
How do you regain the C from the application layer? Do you reject scaling horizontally? Or does the C just interfere with achieving the mellow developer nirvana.
It's inevitable that some data will be split between more than one service / database. I'm not saying we should have data whimsically split up, but foreign key constraint logic must eventually come to live in the app layer too.
> Imagine you joined a new team, and they said: our program uses a bunch of XML files as its data format, and that's why most of our business logic is implemented in XPath. Just sit there and ponder that situation for a little bit.
Been there 20 years ago! It sucks at unimaginable levels!
I agree with most of this, but I don't understand why you'd treat FKs as business logic. IMO, it's clearly a part of the data structure. Like, say if you have a domain that looks like:
type User = {
username: string;
posts: Post[];
}
type Post = {
content: string;
}
(for the DDD-brained, assume User is an aggregate and Post is a value object here)
The natural way to represent this in the database is:
CREATE TABLE users (
username text PRIMARY KEY
);
CREATE TABLE posts (
author text NOT NULL REFERENCES users (username),
content text NOT NULL
);
I just don't see why you'd do it in any other way.
Hey dicytea, thanks for getting that blogging app up. Our users love it. Hey, I was talking to John, and he said he's got some guest columns he wants you to throw up there. Just some blog posts from thinkers he likes that he thinks would be a good fit for our site. Can you throw up some of those?
What do you mean they need to be users of the site? They're just some people whose posts we want to feature. No, they shouldn't need to log in. What are you talking about? What does that have to do with it?
</BossVoice>
Oops, turns out the rule "all blog post authors are users of the site" was actually just a volatile business rule after all.
Yes, and isn't it wonderful that you get an error message when you try to change a business rule, forcing you to properly encode the new rule instead?
A lot of these types of scenarios are missing the fact that, without these enforcements in the database, sooner or later a developer is going to make a change that violates an existing business rule without realising that they just broke a rule!
Rules change. We know this. What is valuable is being told that some new rule conflicts with an existing rule.
If you don't enforce the business rules in the database, how do you know when a new rule conflicts with some existing rule?
I think the fact that these functions/functors/macros already exist in a few different varieties in mainstream database systems presents a challenge to your argument.
SQL can be fixed, has been fixed, and is a underutilized tool in my personal opinion.
I think you’re throwing the baby out with the bath water here a bit.
I agree that stored-price are rarely, if ever the solution.
I do however think that having something that enforces certain constraints across all your data, and forces you to handle schema evolutions reasonably over time is really powerful. Database implementations are (mostly) full of hard-fought correctness decisions, and I don’t think most devs are in a position to “whip out” an implementation that cares for a number of these on-demand.
We had the whole NoSQL thing, we saw what when we threw most of this stuff away.
Is sql good enough? For many problems yeah. Is it _really_ good enough? Nah probably not, we could do a lot better, but I think a better design looks more like a better sql-like language and less like a full programming language.
I don't know why you are getting downvoted, and I generally don't like a lot of business logic in the persistence layer, but representing the things you absolutely dont want another part of your app doing via FKs is...pretty normal.
If you are arguing it spreads out your business logic into multiple places, that's fair, but so does... a lot of things. Databases with decent fks and constraints have a tendency to outlive the apps that build atop them, and communicating these constraints in long lived fashions can be ... good actually.
> representing the things you absolutely dont want another part of your app doing via FKs is...pretty normal
"Normal" has never been compelling in a world where most software sucks and most projects overrun their budgets.
> If you are arguing it spreads out your business logic into multiple places, that's fair, but so does... a lot of things
Stop doing those things too!
> Databases with decent fks and constraints have a tendency to outlive the apps that build atop them
This is oft repeated, but I think there are some flaws with it:
1. I'd ask the question: is that because those apps sucked? It's possible to have a mediocre database and then an even worse app on top of it; in that case it wouldn't be surprising if the app itself died first.
2. Did the next app use the existing schema as-is, or did it adapt/migrate the schema to a different form that worked for that next app? There's a big difference between saying "this data turned out to be useful for other purposes" (of course!) and "this data representation turned out to be so perfect and elegant that we kept using it"
3. Do you have an overly narrow definition of "apps" here? I'd argue if six teams are all doing their own things with the database -- some writing code, some building views, some hooking into it with Excel and PowerBI -- then I'm not overly interested in the fate of the "OG app" that the database was built for. If that one dies, you still have five other "apps" using that database. We're not talking about a schema sticking around because it's so useful, we're talking about an entrenched, locked-in tech that nobody can change because it's too costly. That's a predictable symptom of a bad representation! That's exactly what happens with bad code! Good systems can handle change in isolated parts without breaking things for everybody else. Database schemas are so entangled with themselves, and SQL is so bereft of proper abstraction, that they can't change easily once they have many users.
1. no, its because they were so useful they expanded beyond their original containers.
2. The cluster of apps that then lived on the top of the schema (the four times I have worked on versions of this) mostly just extended things occasionally when they needed to, but oddly mostly did not and just read and wrote to the same schema; this required more coordination and slower development release cycles, but not really a problem for these companies.
3. Somewhat fair, but the real world and real shippable software (which you seem interested in) has a lot of shared concerns, a lot of trouble when you ETL or use APIs for everything instead of just... here's the data. Short term that seems entirely reasonable instead of creating contracts everywhere and spending the time and effort understanding more and more of the changes introduced into smartly decoupled systems.
In a nutshell you're right, SQL makes it easy to mix a bunch of concerns and ship them and let them grow organically. Which I would argue almost any software structure does, ESPECIALLY with data, except now you dont even get to know where all the bodies are buried.
Edit: for instance, I have worked on/converted many useful databases that were created in the 1970s, before I was born. Can you imagine a stateful application that you could easily understand and utilize all the data that's been ingested on for 50+ years?
> I have worked on/converted many useful databases that were created in the 1970s, before I was born
"Converted" makes it sound like you found a database with lots of useful data and then developed a different schema for it that was useful for your own needs. This is kinda my point: when people say the database outlives its application, they mean the data itself. That's somewhat irrelevant to the discussion of how to properly architect database schemas.
Did you find relational database schemas from the 1970s that made you go "ahh, this is perfect for what I need!"?
> Can you imagine a stateful application that you could easily understand and utilize all the data that's been ingested on for 50+ years?
I don't think I understand this question. Yes, I can certainly imagine an application that remains understandable, relevant, and maintainable for 50+ years. I hate the current expectation that apps and code have shelf-lives in the single-digit years. I hate the term "legacy code". These are immature attitudes from the "move fast and break things" eternal startup crowd whose only goal is a profitable exit. That is not "engineering", it's apathetic planned obsolescence, and everyone who participates in planned obsolescence should be ashamed of themselves. Code that could reasonably last 50+ years should be your default if you call yourself a "software engineer". Otherwise your title should be "YOLO coder-bro".
But isn't that the opposite of letting every random team have unfettered access to the internal private details of a database and entrenching every little design decision under a mountain of unmaintainable SQL? "Growing organically" sounds great until you realize you're just describing cancer. It took life trillions of tries before "growing organically" ended up with something half-passing as kinda-elegant if you squint. Do you want bridge engineers to let the design of a suspension bridge "grow organically"? Will that bridge last 50+ years?
You just can’t scale SQL code. If you have a small app, then sure. But then I don’t know why you’d invest so much in testing.
But if you need to scale your code to more than 3 people, SQL should only be your persistent state store. You probably already have a BL layer in another, more manageable programming language, so use that.
In my team I set a rule to remove all SQL functions and views, and move all the logic to our typescript backend. That resulted in code which is much easier to test, and is version controlled.
The DB is rarely our bottleneck.
Our reads are very simple, mostly paged sets from a table with a 1 level join.
When we need complex abstractions or transformations, instead of using function or views, we listen to table changes, and with a TS transformation layer we save a new row in a specific table for the abstraction. That way the BL is handled in TS, and performed in writes asynchronously instead of in-line in reads. The only downside is that the transformed data is not real time.
That mechanism allowed us to scale our data very easily without any performance hits.
These days? It doesn't really matter. For most purposes, you can just hold the entire important dataset in RAM.
For example, a humongous company with 500000 employees (Amazon) can trivially hold on a fairly low-end server the database of all the personnel, and all their salary payments, and all their employment history, badge photos, and office access logs.
If you can do processing outside of the database you don't have a lot, that's basically the definition of a lot of data. Don't be sorry, try to not have more data.
That's an odd rule of thumb. You going to do some processing outside of the DB, e.g. transformations, views. Then you have the event streaming and aggregations world. Then different type of DB. Distributed DBs where you could do processing on each node in code for example.
I think we agree. Transformations and views bring code to data (if you are willing to consider them code, which I am). Then when you say 'Distributed DBs where you could do processing on each node in code for example' it also sounds like exactly what I have in mind - code brought to data, not data to code.
This post has the format: "SQL is extremely terribly awful at writing business logic. But if people implemented this one half-baked, unrealistic idea, then it would suck a little bit less!" The first part is spot-on: SQL is a horrible language for writing business logic. So: stop. Stop doing that.
Stop writing business logic in SQL!
Imagine you joined a new team, and they said: our program uses a bunch of XML files as its data format, and that's why most of our business logic is implemented in XPath. Just sit there and ponder that situation for a little bit. This is what you all sound like to me. Stop it!
> Testing the simplest query requires building up a massive object graph, just to test an infinitesimal slice of it.
Yes, correct, because SQL is awful at representing business logic, and foreign key constraints are business logic! The fact that an employee has one supervisor who is also an employee (FK Employee.SupervisorID -> Employee.ID) is a business constraint. (And one that can easily change, by the way.) All these database engineers who can't pull themselves away from their stored procedures will argue with me to their graves about foreign key constraints, but they're generally terrible. They're a business constraint that just happens to be supported by the database engine, but they're still a business constraint. Stop putting your business logic in your persistence layer.
We've been getting better at building programming languages that actually support software engineering every year. Use one of those. SQL is an output of one of your program layers. Just like XPath would be an output of your program. Compose SQL in an actual real programming language that isn't horrible instead of doing any of this.