I work on a project that does tons of SQL based data transformations (thousands). A lot of it isn't backed by unit tests. Some of our people with more of a C++ background have got the idea that this is crazy, every bit of business logic should be encapsulated into a function just like in every other civilized language, so that it could be unit tested. They built a framework like this for SQL and implemented quite a few jobs with it.
The problem is most of the SQL-first devs hate it and would rather write untested code, or rely on alternative test automation techniques like bringing up a local DB instance with fake data, than have to refit their business logic into pure functions. The truth is that it's just not a way of working that comes naturally to people who write a lot of SQL for a living.
I can definitely sympathise with the SQL developers here. (I have worked extensively both with SQL and C++).
There is a trade off between unit tests and having control and overview of what you are writing.
Also very often it is so much more about verifying that the input data is clean and if this becomes more cumbersome, you may have lost the ability to debug the biggest source of errors.
And of course sometimes it hits a nerve when the "real engineers" come and dictate exactly how you should do things. Reference also to Joel Spolsky's architecture astronauts.
dbt v1.8 added a feature to be able to unit-test a view or CTAS statement. (a natural extension of dbt since it was designed to assist with moving and renaming the inputs to a view)
I haven't tried it yet but I look forward to doing so. Already dbt macro expansions with tests have provided me way more test coverage than I ever had in the pre-dbt days.
It's a footnote in the article, but it seemed like a natural half-step forward in the direction indicated by the article.
Pure functions in the context of databases are a rarity. A database is meant to be persisting data while upholding certain consistency, integrity and isolation guarantees. This basically means that most of the database-using SQL functions can never be pure, because having side effects is their whole purpose. Whats more, integrity and isolation are basically untestable, especially with unit testing.
So imho the DB people are totally right, to test any SQL, you need database tables with data in them.
long time ago i had similar need.. did not want to directly churn SQL, for hundreds of small filters. So i did an expression parser and translator, and wrote all things in simple python functions, then translated those into SQL automatically. Both funcs and the translator were heavily tested. Apparently, that parser also grew few other "renderers" - like evaluator, documenter and what not.
A lot of enterprise software tools that I know of are not tested as your C++ professional developers would expect. Especially when this software is not from the original software vendor, but is assembled/configured/enhanced by third party vendors or consultants.
But of course even such software is "tested", albeit by the end user over a long period of use, unstructured, haphazardly, and in most cases unsatisfactorily.
The problem is particularly acute if the industry is not deeply rooted in software development.
SQL relies on RDBMS's type system, just like C++ devs rely on C++ OOP.
the problem lies with lax/weak table design that allows too lax data (all string nullable fields for example), instead of strict types like int, float, varchar, foreign key constraints etc.
you dont need unit test if you table design is clean, because SQL operators are deterministic and are tested by the RDBMS vendor
just need integration test, dry run test, periodic data integrity checks at the system boundaries when data is ingested/egress
SQL represents operations on the Sets of data and their relations, so you need not unit test on a single row, but a test suite on your entire DataSet (backup copy or synthetic copy)
> you dont need unit test if you table design is clean, because SQL operators are deterministic and are tested by the RDBMS vendor
Imagine well structured tables in 3rd normal form.
And then a view which joins 20 tables and creates a report with myriad of case/then expressions.
Many things can go wrong. (Wrong join conditions, duplicate rows, omitted rows, wrong formulas)
You might say that there should be no such logic in database and I understand the sentiment, but circumstances (e.g. performance) might say it’s needed.
> The truth is that it's just not a way of working that comes naturally to people who write a lot of SQL for a living.
I don't know... Refactoring Databases is pretty old now. Plenty of DBAs can handle this style and even prefer it. I think this is more of a cultural problem in the small than in the large.
You know that SQL is completely pure up to the point where you specify a data alteration command, right?
Your framework is ditching the queries, that are completely pure as enforced by the engine, and insists on using maybe-pure functions that all the engines have problems with.
The problem is most of the SQL-first devs hate it and would rather write untested code, or rely on alternative test automation techniques like bringing up a local DB instance with fake data, than have to refit their business logic into pure functions. The truth is that it's just not a way of working that comes naturally to people who write a lot of SQL for a living.