15

I know that as a general rule, you shouldn't construct SQL queries dynamically because of the possibility of SQL injection.

However, it could come in quite handy to break this rule and define for example my SELECT statements like so and use it for all my tables:

public static string SelectAll<T>() => $"SELECT * FROM {typeof(T)?.Name}s";

In this case, there is guaranteed to be no user input. Also, we're not talking about a public library or similar, but a closed app.

Would it therefore be acceptable to write code like the above or should one always act according to the rule that "you never know who or what will end up using this code and how" and under no circumstance construct SQL queries dynamically?

19
  • 12
    The specific example in your question is harmless. The trouble is (and @DocBrown points this out), people without sufficient experience tend to generalize in the wrong areas. A junior dev might look at that line of code and make the jump to "SELECT * from X where A = " + unsanitizedUserInput is ok (and it is very, very, very not OK). Commented Dec 19, 2022 at 13:01
  • 39
    You're just reinventing the idea of an ORM. Why not use one of the many existing ORMs that have already been built, tested, security validated?
    – Alexander
    Commented Dec 19, 2022 at 16:06
  • 3
    I don't know C#, but it seems like you still forgot to escape typeof(T)?.Name (with the rules for sql identifiers, not the rules for sql literals)
    – Bergi
    Commented Dec 20, 2022 at 8:09
  • 4
    @JaredSmith Agreed. Unfortunately, often ORMs are sold by using just that as a major appeal ("If we use this framework, we won't need a database expert!/to learn SQL!'). ORMs have their place, but the place isn't the same place you go when you need to cut corners. They're power tools, and as any power tool, they can wreck havoc in the wrong hands.
    – T. Sar
    Commented Dec 20, 2022 at 14:51
  • 6
    @T.Sar The irony is that you then need an expert in the ORM and potentially its own SQL-like query language. Then there's also all the wasted hours (really: days or weeks) trying to get them to do what you want in a reasonably performant way. That's my experience, at least.
    – JimmyJames
    Commented Dec 20, 2022 at 16:53

9 Answers 9

68

On one hand, I don't like such braindead rules, since they are clearly an oversimplification. A better wording would be IMHO

You shouldn't construct SQL queries dynamically using unsanitized input data from some source you don't trust to be or stay reliable.

On the other hand, such code bears a certain risk of becoming a bad example for junior developers in your team which don't treat it so carefully as you. Hence if you see a slight chance that this code will become an example for other code which could turn into a security risk, it is probably better to invest the extra effort for creating a parametrized query (when possible).

17
  • 1
    Not all DBMS support parameter for table name. Commented Dec 20, 2022 at 10:25
  • 12
    I misread this answer the first time and now I really want a Code Bear plush.
    – T. Sar
    Commented Dec 20, 2022 at 11:38
  • 1
    "Unsanitized" isn't quite right, because that's isn't precisely what parameterized queries do. That is, parameterized queries don't escape the input data to make it safe. Rather, they keep the input data separate from query data completely. "Unisolated" or "unquarantined" is probably a better way to phrase it. Commented Dec 21, 2022 at 21:34
  • 1
    @JoelCoehoorn: I spoke of unsanitized data especially in the context of dynamic queries, not parametrized queries. When you need to create a dynamic query and cannot use a parametrized one, you have to sanitize the data, not to "isolate" it.
    – Doc Brown
    Commented Dec 21, 2022 at 21:36
  • 1
    @BeniCherniavsky-Paskin: my answer isn't intended to explain all the gory details of correct, incorrect, safe, and unsafe dynamic SQL. For this, we have Stackoverflow and Codereview.SE.
    – Doc Brown
    Commented Dec 22, 2022 at 6:36
20

There's a whole tonne load of things that can go wrong with dynamically constructed queries. Hence the rule.

  • sql injection attacks
  • non-performant queries
  • non cached queries
  • server specific settings, character set, time formats, version
  • SQL brand, MySQL, MSSQL, Oracle etc
  • edge cases, what about views?
  • all your code now has to adhere to the classname + s = tablename. even Person and Sheep

Now, obviously if you use a ORM, say entity framework, it may well be using reflection to dynamically create sql queries. So its not like it's impossible to do. All of these problems can be solved and you can make it work.

So really the reason for the "Don't dynamically generate SQL" rules is not any of these problems, the reason is what I call the "Tip of the Iceberg" problem.

Lots of people think that doing this will save them work, when in reality dealing with all the problems and edge cases means that it is more work in the long run. You have made a thing which deals with the problem you can see, but as you progress you (or more likely your team) will encounter more and more of the iceberg.

4
  • 1
    Another unfortunate caveat is SQL non-portability. You can't generally build SQL string in one portion of code and send it to a DB in another place. In ideal world that should be possible — SQL is a query language, literals are the bedrock of query languages, and ANSI SQL standardized things like identifier and string literal syntax decades ago. Alas 😭. DBs have incompatible extensions to literal syntax, and what's worse options like MySQL's NO_BACKSLASH_ESCAPES may be a dynamic property of DB connection (SET SESSION sql_mode, with admin-controlled defaults to make it worse). Commented Dec 22, 2022 at 0:17
  • 100% i will add some extras
    – Ewan
    Commented Dec 22, 2022 at 10:30
  • What is it about views that creates an edge case in this context?
    – JimmyJames
    Commented Dec 22, 2022 at 17:21
  • I guess its more that you only have one target per type, presumably you have a table with the data but you might have a number of views of the data with different names
    – Ewan
    Commented Dec 22, 2022 at 17:56
11

It can be acceptable to concatenate SQL, but only under a very narrow and specific set of circumstances. Before reading this answer, you should read Doc Brown's and Ewan's answers for a good warning.

  • Don't concatenate variables/parameters.
    • Use parameterized queries/prepared statements instead.
  • You genuinely do not know what the SQL query is ahead of time.
  • Provide an obvious means for using prepared statement.
  • Pay very close attention to any SQL that is hard coded in a different programming language.

The only use cases where I've used dynamic queries has been for search forms. Since the query depends on which criteria the user chooses, it becomes impossible to know that query ahead of time. One way or another, you will need to build up your query by concatenating SQL. The criteria the user chooses can change the WHERE clause and determines which JOINs you need to perform.

But use prepared statements for user input!

And code-review the heck out of that code.


To focus on your specific example, the table name is parameterized. There isn't much benefit to this, and as Doc Brown warns, this could set a bad example for less-experienced developers. Ewan notes some additional disadvantages when it comes to query performance, which I don't think applies to the small code snippet in your question, but will apply if you concat a parameter.

4
  • 6
    This. I used concatenation in the past due to a variable filter: creating all possible variations of the filters would have been possible, but it certainly wouldn't have been clear nor maintainable. On the other hand, the query was still parameterized. The only thing that was dynamic was the query shape, and all the portions to create that shape were hardcoded static strings. Commented Dec 20, 2022 at 7:56
  • @MatthieuM. seconded. I'm no SQL expert, but doing almost the same thing in my code - it's a batch insert with an unknown (but limited) batch size. So I'm just lazily generating the SQL every time a new batch size is encountered.
    – jaskij
    Commented Dec 22, 2022 at 16:40
  • @jaskij, and to be fair, object-relational mappers do this behind the scenes, too. It's all SQL across the network, and the ORM builds SQL statements based on mappings and configuration. It's all dynamic, but it does not concat user input with SQL. It uses prepared statements. Commented Dec 22, 2022 at 16:54
  • 1
    Concatenating user input with SQL is how you get Little Bobby Tables. Or that guy who got many, many, speeding tickets because his vanity plates say NULL.
    – jaskij
    Commented Dec 22, 2022 at 17:05
4

or should one always act according to the rule that "you never know who or what will end up using this code and how"

This leads to paranoia. Like security.

You can go mad and protect everything from possible (but improbable) threats and spend an insane amount of resources on it... And yet, you won't be able (ever) to protect the code from human stupidity or ignorance.

Doesn't matter how much effort you put into this, it will take me a day to pervert or bypass it if I want. I can convince my PM that, against all claims, this time was necessary for us to meet the deadlines. It's "temporary" and we'll get (cough) rid of later (cough).1

So no, appealing the common sense won't work.

However, it could come in quite handy to break this rule and define for example my SELECT statements like so and use it for all my tables:

If you look for ways to ignore and bypass best practices you will find many. These are solutions that fall into the group I call happy ideas. Everyone has one and everyone can make them work. Sometimes are good, sometimes are ... well, let's just stop here.

But, just because something is possible doesn't mean you have to do it. As I said, it will take only a couple of minutes to copy SelectAll, make the template more complex and insecure and elaborate on my reasons. At some point, I could succeed at doing this and I will leave an open window to doubtful but accepted practices.

Would it, therefore, be acceptable to write code like the above

It depends on whether you can afford the trade-offs, or in this case, the risks. Risks, like security, are measured and managed proportionally to the threat they represent. How probable they are, how often and for how long we are exposed to them, etc. So, you are asking the wrong audience.

If you take measures to mitigate the risks, then it's "acceptable". But don't lie to yourself. Like @Ewan suggested, things can shift quickly and lead you to unexpected situations if you are not cautious about measuring the risks.

If you don't have enough experience on this subject, I would suggest implementing a well-known solution that allows you to do it in the safest way possible and is future-proof.


1: You can't imagine how fast projects embrace the normalization of deviance. Add then the fact that code, as evolves, is prone to get worse, not to get better.

2

I know that as a general rule, you shouldn't construct SQL queries dynamically because of the possibility of SQL injection.

This is wrong.

There is the possibility of SQL injection and other trouble if you dynamically build SQL without knowing what you're doing.

That is why several languages and frameworks include Query Builders that allow you to do so safely.

For your specific statement - can you guarantee that it will work for all possible values? Especially with the type abstraction the answer is clearly "no", because I can easily declare my own type and then call your class and now I'm selecting from a table that doesn't exist or, if I'm clever and your DB schema isn't, from a system table. With MySQL it would even be possible to select from a table in a different database.

2

As this is being downvoted, please do kindly note that the question is "Can XYZ..." and I am answering "Yes, it can". Observe the fine difference to "Yes, you must always do it this way" and how the examples are specific enough to not be misconstrued as a general advise to always do it this way in random, every-day application code.

Can it be acceptable to construct (concatenate) SQL queries dynamically?

Yes, while any dynamic query generation needs to do appropriate escaping (of SQL identifiers and value literals), it can be perfectly acceptable in specific cases to concatenate SQL queries dynamically, and in some cases it can be much better than an alternative.

Some examples:

  • If you wish to implement an ORM library, you by definition will most certainly be concatenating (generating) SQL statements all the time within the library itself.
  • If you are using an ORM library, you may never see any SQL yourself (if all goes well) but under the hood the ORM library of your choice does the concatenation for you all the time.
  • If you wish to implement some kind of report generator where the user has great flexibility to set up rules, filters, joins and so on, you may find that it may be much more useful to be able to provide SQL fragments (for trusted power users) instead of breaking the language into component parts and providing all of it through a complex GUI. In this case you absolutely do want to validate any user-provided SQL fragments.
  • Saving the data of a single user form, querying a single or a handful of objects, and so on and so forth, may require no dynamic SQL. In this case you would, again, use no SQL (but an ORM instead), or use SQL queries which are static, and with bind variables to transport the dynamic inputs. But sometimes you may wish to generate larger or very complex SQL statements within your own "user" code (i.e. not in the ORM) based on some logic within your application. This may even end up with no actual dynamic SQL, but always the same result - but you might still wish to use code to generate this instead of having a huge, SQL string in your source, difficult to understand and maintain.

Obviously, as with everything we do as programmers, there are always benefits and drawbacks, and picking the right tool for the job at hand is something one learns with experience; and often it also depends a lot on context (for example, the choice of RDBMS might influence aspects of what you are doing or should not be doing). Something which is a strict rule in theory might be perfectly fine to be broken in practice.

3
  • I didn't downvote, but your third item sounds like you actually want the user to provide sql fragments in the GUI? Also while yes, it is acceptable, you should emphasise that any dynamic query generation needs to do appropriate escaping (of sql identifiers and value literals).
    – Bergi
    Commented Dec 21, 2022 at 1:07
  • 1
    Thanks @Bergi, I've added your recommendations into the answer.
    – AnoE
    Commented Dec 21, 2022 at 9:44
  • 2
    This sounds a little like it misunderstands how query parameterization works. Parameterized queries do not escape or sanitize the parameter input data. Rather they keep that data separate from SQL command completely, so there's no chance at all of a malicious input somehow out-smarting the sanitization/escaping process. Commented Dec 21, 2022 at 21:37
1

I would not step down to manually concatenating SQL strings for convenience. This adds a lot of potential for errors that compromise the security of your application. If I can avoid this, I use an alternative like parametrized queries or an ORM or query builder. Having to think about all these issues would outweigh any convenience I might get out of this.

But there are cases where you can't avoid concatenating SQL strings manually. And in those cases it is of course acceptable to do this, as long as you keep the security issues in mind and ensure that all user-supplied input is whitelisted and everything that can be is still in parameters.

If you're using an ORM, you should be able to do most kinds of dynamic queries without having to manipulate SQL. In your example in C# it would be pretty straightforward to implement your function in EF Core, without any need to concatenate strings. But even then there are a few things that the ORM won't be able to do. The main case where I've encountered this are queries that are very dynamic and that adapt the returned rows to the input. Especially if it's not a simple select, but something more complex.

If you don't use an ORM or query builder you might have to resort to concatenating strings much earlier, e.g. if you have a set of filters that can be toggled for a specific query. It's not that hard to do something like this safely if you know what you're doing, but it's really not fun and becomes tedious and annoying very quickly. And if this gets more complex, it's easy to screw it up. I'd strongly recommend to use some query builder or ORMs if you need dynamic queries, if you don't use them you end up reimplementing a basic version of them anyway after some time.

1

This is one of those things where: you can do it, very many people do it, and very many successful apps rely on it, but it's quite difficult to do correctly, and if you do it yourself without having extensive experience it's very likely that you will make serious mistakes. So we say "don't do it". But of course there's more nuance to it than that. The way I would put it is:

If you need dynamically-generated queries, use a mature and well-tested ORM or query-builder framework.

If you want to learn, study a mature and well-tested ORM or query-builder framework.

If you're really interested in developing such things, or if you need a feature that you can't find anywhere, contribute to a mature and well-tested ORM or query-builder framework.

If you've been doing that for 5 years and you're convinced that there's a better way, go out and write your own. You may find out along the way that your approach isn't as good as you thought — but now you'll have the experience to recognize it, which you didn't have on day one.

0

The short answer is yes, of course. Whilst you ought to be able to predict your constructs and possibly even use precompilation to include or exclude clauses with commenting based upon bitmaps in Jacquard style is it acceptable.

Not everything is acceptable just because it is possible (NodeJS, .htaccess etc.)

Controls are implied. Knee jerk fresher "injection" objections are just noise. This is done in production commonly and has been done so for a very long time.

Given 2 "bucket" tables, would you write 2 effectively identical queries or parameterise the names? What about hashing across 10 of them?

Why do you think "WHERE 1" is widely used?

Even driven by user input, providing they only enter boolean (checkbox) or cleansed/validated types (numbers, strings, dates) injection is foiled.

Not the answer you're looking for? Browse other questions tagged or ask your own question.