I work on a project that does tons of SQL based data transformations (thousands)...

fifilura · 2025-01-29T09:15:54 1738142154

I can definitely sympathise with the SQL developers here. (I have worked extensively both with SQL and C++).

There is a trade off between unit tests and having control and overview of what you are writing.

Also very often it is so much more about verifying that the input data is clean and if this becomes more cumbersome, you may have lost the ability to debug the biggest source of errors.

And of course sometimes it hits a nerve when the "real engineers" come and dictate exactly how you should do things. Reference also to Joel Spolsky's architecture astronauts.

NortySpock · 2025-01-29T12:54:09 1738155249

dbt v1.8 added a feature to be able to unit-test a view or CTAS statement. (a natural extension of dbt since it was designed to assist with moving and renaming the inputs to a view)

I haven't tried it yet but I look forward to doing so. Already dbt macro expansions with tests have provided me way more test coverage than I ever had in the pre-dbt days.

It's a footnote in the article, but it seemed like a natural half-step forward in the direction indicated by the article.

thyristan · 2025-01-29T10:52:32 1738147952

Pure functions in the context of databases are a rarity. A database is meant to be persisting data while upholding certain consistency, integrity and isolation guarantees. This basically means that most of the database-using SQL functions can never be pure, because having side effects is their whole purpose. Whats more, integrity and isolation are basically untestable, especially with unit testing.

So imho the DB people are totally right, to test any SQL, you need database tables with data in them.

svilen_dobrev · 2025-01-29T09:01:16 1738141276

long time ago i had similar need.. did not want to directly churn SQL, for hundreds of small filters. So i did an expression parser and translator, and wrote all things in simple python functions, then translated those into SQL automatically. Both funcs and the translator were heavily tested. Apparently, that parser also grew few other "renderers" - like evaluator, documenter and what not.

https://github.com/svilendobrev/svd_util/blob/master/expr.py

smartmic · 2025-01-29T09:22:50 1738142570

A lot of enterprise software tools that I know of are not tested as your C++ professional developers would expect. Especially when this software is not from the original software vendor, but is assembled/configured/enhanced by third party vendors or consultants. But of course even such software is "tested", albeit by the end user over a long period of use, unstructured, haphazardly, and in most cases unsatisfactorily. The problem is particularly acute if the industry is not deeply rooted in software development.

slt2021 · 2025-01-29T20:19:02 1738181942

SQL relies on RDBMS's type system, just like C++ devs rely on C++ OOP.

the problem lies with lax/weak table design that allows too lax data (all string nullable fields for example), instead of strict types like int, float, varchar, foreign key constraints etc.

you dont need unit test if you table design is clean, because SQL operators are deterministic and are tested by the RDBMS vendor

just need integration test, dry run test, periodic data integrity checks at the system boundaries when data is ingested/egress

SQL represents operations on the Sets of data and their relations, so you need not unit test on a single row, but a test suite on your entire DataSet (backup copy or synthetic copy)

earnestinger · 2025-02-01T08:09:35 1738397375

> you dont need unit test if you table design is clean, because SQL operators are deterministic and are tested by the RDBMS vendor

Imagine well structured tables in 3rd normal form.

And then a view which joins 20 tables and creates a report with myriad of case/then expressions.

Many things can go wrong. (Wrong join conditions, duplicate rows, omitted rows, wrong formulas)

You might say that there should be no such logic in database and I understand the sentiment, but circumstances (e.g. performance) might say it’s needed.

slt2021 · 2025-02-01T14:37:42 1738420662

Performance of complex query is totally depends on the query planner, which depends on 1) table size, 2) data cardinality, 3) table statistics

It is only testable on real production database, so you can do test of complex queries on backup of your prod to get a sense of performance.

But still query can degrade unexpectedly under consistent load in prod from other queries

cbsmith · 2025-01-29T18:11:32 1738174292

> The truth is that it's just not a way of working that comes naturally to people who write a lot of SQL for a living.

I don't know... Refactoring Databases is pretty old now. Plenty of DBAs can handle this style and even prefer it. I think this is more of a cultural problem in the small than in the large.

TachyonicBytes · 2025-01-29T09:08:13 1738141693

I assume that framework is not open-source or somewhere I can look at it?

mousetree · 2025-01-29T11:46:00 1738151160

You could take a look at the open-source dbt framework. They have a good implementation of unit testing for sql

https://docs.getdbt.com/docs/build/unit-tests

yujzgzc · 2025-01-29T18:03:29 1738173809

I guarantee you wouldn't want to use it

marcosdumay · 2025-01-29T14:18:46 1738160326

Hum...

You know that SQL is completely pure up to the point where you specify a data alteration command, right?

Your framework is ditching the queries, that are completely pure as enforced by the engine, and insists on using maybe-pure functions that all the engines have problems with.