> Many inputs to query construction can't be parameterized.
I do not allow the client direct access to the database. They access the database through objects, which have methods that only permit parameterized input.
The client cannot specify the database, table, or columns. That is hard-coded. They can only specify values. Values that are always entered through a prepared statement.
No. Of course not. In all likelihood your SQL RDBMS will only allow you to bind data. Do you paginate? What about "LIMIT" and "OFFSET"? Do you sort tables? Then you have to handle column names. Do you sort in both directions? Then you have to handle "ASC" and "DESC". And so on.
Obviously, you can write simple code to do each of these safely. But then, you can write simple code to handle query parameters too.
By all means, use parameterized queries. Just don't assume they're a magic totem against SQLI.
You can and should bind LIMIT and OFFSET with parameters on many if not most drivers (some can't handle it but most that I work with these days do, though I'm in Python, perhaps JDBC or other drivers are a different story). ASC and DESC are SQL keywords so assuming you're not a terrible programmer (which I know is what you're addressing) you wouldn't have those rendered from user input any more than the WHERE or ORDER BY keywords.
The lexical structure of the statement and the parameters themselves are two different things. You are correct that "parameterization can't save you from SQL injection", if you're such a bad programmer that you're feeding user input directly to produce lexical tokens. One example is a recent blog post talking about how to escape table names for apps that feed things like "customer_name" to serve as the table name (totally insane). Your example of feeding in "sort=xyz" to produce the column name in the ORDER BY is also a pretty awful practice - I haven't seen that one in like a decade, but sure.
So of course parameterization doesn't magically protect a system against all forms of attack - the programmer could be feeding user input directly into shell commands too. The recommendation for parameterization is addressing the bulk of the issue at least among the code that I regularly work with, maybe you deal with crappier programmers than I do on a regular basis.
> Your example of feeding in "sort=xyz" to produce the column name in the ORDER BY is also a pretty awful practice - I haven't seen that one in like a decade, but sure.
This still happens every day. Google how to do dynamic sorting and you'll find a hundred examples of it in PHP, and as unfortunate as it is, that's how a significant portion of code is authored.
You're missing his point somewhat. I don't know if this applies universally to SQL, but consider MySQL.
$stmt = $db->prepare('SELECT bar FROM foo
LIMIT :offset, :count');
...
You can't do that. You can only bind data to fields.
$stmt = $db->prepare('SELECT bar FROM foo
ORDER BY :column :direction');
$stmt->bindValue(':column', 'foo');
$stmt->bindValue(':direction', 'DESC');
You can't do that either, because again, it's not data.
So, you end up having to do something similar to this (for the love of god, don't actually do this):
$stmt->prepare("SELECT bar FROM foo WHERE id = :id
ORDER BY {$column} {$direction}
LIMIT {$offset}, {$count}");
Bound parameters won't save you, and the potential attack vector is there if you're not really careful. That example shows the most stupid thing you could ever do, so don't do it.
Other than probably historical reasons and interface definitions, is there any reason why this is the case? I can't immediately think of any sensible argument against allowing the binding of a value into a limit/order field. In fact, it should be type-checking/converting it as well, to prevent exactly mistakes like this.
Even if the DB interface doesn't directly support this, it seems like it's something all client wrappers should handle, preferably at the low level in addition to having to pull in a whole ORM + assorted bacon for the purpose.
That would not have been a "real" prepared statement in my mindset so I get why I was confused.
I don't feel super smart, but wouldn't I simply make sure that $column is a valid column (ok, that one might need extra attention) and $direction would be either ASC or DESC and that $offset and $count are integers?
Yes it is impossible. For example the following query "SELECT person_name FROM persons WHERE person_id = ?" Will be parsed and put in the query cache. Then when it is run (multiple times) it will bind the value in the correct place, and if there is anything else than a numeric value when it is trying to bind, the query will fail to execute.
Yeah, that's kinda the whole point. You can put an incorrect value in there and it'll fail, but you can't change the query itself. (i.e. Bobby tables would be an incorrect value, not a dropped students table.)
Rephrasing the question, pessimistically (ideally as kindly as possible, but my intent could easily be lost):
I've implemented a [complex?] white-listing solution to a problem in a domain where there is a long history of developer mistakes, passing user input to a 3rd-party technology most developers only know the minimum required for their job. Isn't injection impossible?
Edit: Sorry for any confusion; this was my way of saying 'this is only bulletproof if you got every edge case 100% right; few security experts make guarantees of impossibility of failure'.
For the record, it's my understanding that white-listing is normally better than black-listing. I didn't mean that white-listing wasn't the best option, just that it sounded like it was being used.
"Many inputs to query construction can't be parameterized."
Such as? If you only use PreparedStatment (Java), or the equivalent for your language, and have a rule against, and never broken, to concatenate SQL strings together, you are safe from SQL injection.
Without dynamic sql, it is tough to do arbitrary (user-selected) columns, sorting, and server-side paging simultaneously (limiting the result set in the query instead of skipping through results in the app). SQL does not allow use of parameters in all the necessary locations in the query (result column list, order by list, etc.), and some databases are more limited than others.
User selected columns is easy, just select all required columns each time and display as needed, the performance impact of this is minimal. The fact that the query need not be reparsed is probably a bigger benefit performance wise.
Paging is equally easy, using ROWNUM < X in Oracle or equivalent in other RDBMS.
The order by is a good point. The obvious solution here would be to always have the same sorting as default, and do the user sorting client side (by client here, I mean calling application).
Yes, the requirement of paging server-side (when a result set is ridiculously massive or the client/network is so overloaded that alternatives are impractical) is the big one: page X of Y changes depending on the sort, so sorting has to be done server-side too.
Do you agree that this is a scenario requiring dynamic SQL?
It doesn't hurt to prepare a statement, execute it once and then deallocate it. Intersect the user-selected columns with a whitelist of possible columns, use a simple flag for sorting and map it to ASC/DESC. Prepend the primary key, bind parameters, execute, harvest the results, deallocate temporary resources.
This is somewhat less efficient, but if the alternative is to be open to sqli attacks it'd be my pleasure.
You could do without using a CTE, it's just not as convenient most of the time. Alternatively, you could pass the column names to a function (using a prepared statement, no less) and let that function figure out which columns are valid. If your tables change frequently, this would save more time in the long run.
We are talking about manual, purpose-built functionality designed to prevent SQLI in queries that use bound parameters. Don't move the goalposts: nobody is saying it's particularly hard to avoid SQL injection, just that bound parameters aren't a panacea.
If your tables has SO many columns that you can't simply return them all and show the relevant ones only at the application layer, .... well then just maybe there's something wrong with your database design.
I agree this is the best way; it's still dynamic sql (the database is not paramaterizing the columns, the application is). I think you accurately summarized best practices, but it always comes down to implementation and people make mistakes.