11

From my experience, sql code changes almost always tend to be NOT incremental: someone creates a new stored procedure, or modifies an entire embedded sql query for optimization purposes, or creates a brand new table. When I receive one of these code review requests, I could not find any way except understanding the entire sql query. Often times these are really long nested queries, sometimes calling other procedures. Comprehending these changes and verifying them becomes a huge task. Then I am left with three options:

  1. approve without reviewing carefully;
  2. spend a significant amount of time to go line by line, understand the data model, run the query on some test system to see what it produces;
  3. ask the person to walk me through the changes.

I do not want to do 1 or 3. However, I also do not want to spend hours figuring out what the whole query is doing and whether the new version is equivalent but works faster.

Are there any technologies or tools or methodologies to ease this process? How do others perform such reviews without going through too much pain?

One fundamental difference from regular code changes seems like sql changes more frequently become (at least conceptually) entire rewrites which makes the problem more frequent to encounter - for me almost every sql review I do every single day.

Any recommendations are greatly appreciated.

Update:

You can reject a review or tell the author rewrite the change in more clear way, or add more comments or tests. However, what I am looking for is a way as a reviewer to actually take care of the review as it is. Most of the answers so far about improving the code change so that I can review more easily. That is obvious. On the other hand, the practical issue here is that I need to address the review without pushing it back to the author, without changing author behavior or restructure the organization.

I am also hoping for a sql specific recommendation, not some generic code review recommendation. I am hoping that may be there are tools that help the reviewer no matter how bad the code base is and how bad the code writing practices are.

Personally, I worked on many big and small companies as a developer close to 20 years. I maintain open source projects and I have a PhD in Computer Science. I would appreciate some help that says more than “change others and your world gets better” type of non pragmatic answer.

16
  • My question is specific to sql. I do not think it is a duplicate of what you are referencing.
    – CEGRD
    Commented Sep 12, 2019 at 8:04
  • 1
    Pushing the code back to the author may feel like dodging your job as a reviewer, but it is actually the opposite. You are taking responsibility for keeping the software maintainable, so that there remains a fighting chance to make further changes within a reasonable timeframe. Commented Sep 12, 2019 at 10:49
  • 5
    @CERGD, calm down a little. The problem here (as shown by comments on answers) is that you are proceeding on the assumption that an answer (more precisely, an answer within the constraints of your existing organisational procedures, and which meets the criteria of being an "SQL tool") already exists. Contributors here broadly feel that no such answer does exist, and that various emollients (like testing, documentation, liaising with the writer, or other changes in the relation between writer and reviewer) are being sharply rejected when they are actually the most sensible starting point.
    – Steve
    Commented Sep 12, 2019 at 10:57
  • 2
    @CEGRD, it can be a valid (if not scientifically definitive) answer for an experienced person to say "there is no known answer within the strict terms you've set" (although nobody said it quite so baldly). Your own question has not been laid out scientifically with all underpinning assumptions stated - it is just a practical problem casually stated. If you understood the problem with any scientific rigour, you would probably either have an idea what a tool needs to do to solve it (and describe it to us), or you would (like us) think it unlikely that any tool can solve the problem.
    – Steve
    Commented Sep 12, 2019 at 12:47
  • 2
    please can we not close this question? Its got a bit heated but the question is far more focused than the supposed dupe and someone might come along with a "oh redgate make a SQL comparison tool, its great" any second.
    – Ewan
    Commented Sep 12, 2019 at 13:46

4 Answers 4

11

You may not like this, but:

This is not a problem to be solved easily by additional technologies or tools.

An SQL which contains "long nested queries, sometimes calling other procedures" and cannot be easily understood should at least have proper indentation and comments. Otherwise, it will become an unmaintainable mess. So if it is really that hard to understand the query line-by-line, reject the change and ask the author for making it more readable.

And asking authors to explain their code during a review is not the worst thing , quite the opposite. You could both together go through the code and add the missing comments together.

Of course, the technological solution to this would be to avoid writing SQL queries at all and switch to something like an ORM which generates most queries automatically. But unfortunately, this is not a realistic solution for many real-world systems, since it could mean to rewrite large parts of the system new from the ground (and often ORMs introduce their own class of new problems, sometimes they introduce more problems than they solve).

10
  • 3
    @CEGRD: quite the opposite. Adding comments to an otherwise unreadable piece of code is a very pragmatic solution. An academic approach would be to insist on breaking down the query into smaller parts, or by trying to make the SQL so "self-docmenting" that no comments are needed. Both of these approaches don't work well for SQL in reality.
    – Doc Brown
    Commented Sep 12, 2019 at 8:30
  • 3
    @CEGRD, the bottom line is there isn't a way to achieve what you want - which is easy code reviews of complex SQL. SQL is capable of doing a lot of work (and encapsulating a lot of analysis and design) efficiently in few words and in a way that causes very efficient execution, and has at times a fiendish subtlety with all parts set in a careful interlock with each other. To review it is often as much work as to write it, and it's not reducible very far with any known tools or methods, because SQL itself already performs a lot of important reduction of the scale of the challenge.
    – Steve
    Commented Sep 12, 2019 at 9:40
  • 3
    @CEGRD, we could say "we don't know", but that doesn't quite emphasise that we do know (from our own experience and that of many others) that there is no easy answer to making complex SQL simple. Many queries are complex not for lack of skill by the writer or deficiencies of the language, but simply because they do a lot of inherently complex things with the data (which the firm has stated they must) or manage a lot of constraints in a carefully balanced solution that may take the writer a relatively long time to conceive and refine.
    – Steve
    Commented Sep 12, 2019 at 10:19
  • 3
    @CEGRD: when code - in whatever language - is written in an unreadable way, the only way I know to improve the situation is by cleaning up the code (and in case of complex SQLs, since usual refactoring techniques often won't work, commenting is often the only thing you can do). If you find a better way, feel free to post an answer of your own, after more than 3 decades of programming I am still open to learn something new.
    – Doc Brown
    Commented Sep 12, 2019 at 10:38
  • 1
    @CEGRD, we do not say you should not look for an answer. But a "better" answer seems to be one which does not change your colleagues' work processes in the slightest, so that curtails a huge field where feasible answers may lay. (1/2)
    – Steve
    Commented Sep 12, 2019 at 11:11
3

SQL queries are complex and brittle code that I experienced myself to be quite difficult to review. We do have a good portion of our codebase written as SQL templates (for good reasons I won't elaborate on), so I completely understand how the review process can become difficult (and yet, especially valuable), despite our continuous efforts to code quality.

What we did already have and helps a bit is to have code standards for SQL queries : normalized indentation levels, table naming conventions and so on.

But what did eased the process tremendously in our case, is to introduce unit testing of all modified/new SQL queries in our codebase alongside with code to be reviewed. This gives you a quick insight to what the query should be doing and confidence that it is correct in all cases. If the test suite doesn't give you this guarantee, the code should be rejected anyway.

Right now, I read SQL queries mostly to ensure conventions are followed and performance looks decent. Code correction and robustness is almost always better covered by an up-to-date test suite.

6
  • I hear you but introducing test cases do not solve the entire problem of having to still going through the entire code and verify whether the test cases are accurately testing the change. Am I misunderstanding?
    – CEGRD
    Commented Sep 12, 2019 at 8:19
  • 1
    @CEGRD A test (or a suite of tests) should inherently give you the guarantee that the implementation is correct, otherwise it's not a good test or it's not sufficiently tested. You should make sure the tests cover the change, not that the implementation cover the change. E.g. if you have a request that should update table A using B and C, in your test mock A, B and C data, check A updated correspond to the perimeter of your functionnal need, you don't need to actually care how it internally was updated (only should so for performance and style like I mentionned)
    – Diane M
    Commented Sep 12, 2019 at 8:30
  • I believe your assumption is that it is easy to see “a request should update table A using B and C”. That is not the case most of the time in real world queries. I am talking about 100 lines sql queries going over many tables with iterations, temporary tables, stored procedure calls etc.
    – CEGRD
    Commented Sep 12, 2019 at 9:07
  • 1
    @CEGRD Temporary tables and PL_SQL are implementation details, not functional needs, as long as in the end of the day the query does its job on a minimal example why would you care ?
    – Diane M
    Commented Sep 12, 2019 at 10:18
  • 1
    You can never ensure 100% code correctness. You can have tooling that makes you more confident that it is for some cases, including known edge cases. I highly doubt humanly compile and execute a query in your head is faster and more reliable than having it written down and done by machines. This is why as a reviewer you should mostlsy assure tests are covering enough of these cases, and that they pass.
    – Diane M
    Commented Sep 12, 2019 at 10:34
2

I guess my thoughts on reading your question are

  1. Are you defining what a code review is supposed to check?
  2. Why do you perceive a difference between incremental and non-incremental changes?
  3. omg stop putting business logic in sprocs

For me a code review checks a defined list of things, are there tests, is the CI pipeline configured correctly, is the style guide being followed, does the work actually do what was asked etc etc. Whatever, but its a list of checks that anyone can do on any code.

So, if your SQL change has tests, the tests pass, the test cover enough edge cases and the code does what the ticket asked for... Code Review passed!

In terms of understanding the impact of the change, sure you change the sproc you are calling but not the sproc code thats a big change that you can't understand from looking at the code.

But if even a small code change can break everything, a missing bracket or null check, an integer overflow etc Or a business logic change, which would code wise would seem correct either way but one way is required.

This is why you have tests. So you don't have to follow the code through and work out if it right. You run the test and it passes or doesnt and also the test explains what the code is supposed to do. The sproc updates the order, the select returns office around the world etc. You don't have to work out what the code is attempting because it's right there Assert.AreEqual(results, listOfOfficesinTheWorld)

My guess is that you have too much logic in sprocs which are maybe not even in source control and don't have tests, including performance tests for them.

I would suggest that you start to move logic out of the data layer, keep your sql simple, adding and removing data and have the manipulation logic in code with lots and lots of unit tests.

Update in response to changes:

I still think my answer applies. You want a :

tools or methodologies to ease this process (point 2)

Automating point 2 is equivalent to writing tests.

understand the data model,

Compare the before change and after change data models. If you have a test the commit to that test will show the incremental, or ground up change

run the query on some test system to see what it produces;

Run the sql, compare the result against expected. If you automate that there's your test right there and the commit showing any change in the expected result shows you the incremental or group up change. Naming and comments in the test explain the change EnsureCustomerIsAlwaysRight() SharesCanBeSoldInUnder100ms() etc

Expecting a tool to write the test for you is asking a bit much. If your code follows a pattern you might be able to template some basic stuff. But the expected result of your processes is going to be bespoke to your application. You are always going to have to fill in the hard bit manually.

Either fail the review on the basis that these tests aren't present, or as part of your current manual testing process, write the tests yourself. Then at least next time your job is easier.

This is the pragmatic answer. It's hard work and will take time, but it's simple to implement and understand, can be implemented incrementally, forces you to move towards separation of concerns and it's the normal solution to fix this common problem.

15
  • If we had tests and continous integration for sql side of things, then obviously we would not have the issue. I am hoping, not sure of course if at all possible, for some approach that helps without changing the “whole work place”.
    – CEGRD
    Commented Sep 12, 2019 at 8:16
  • Add a new Code Review requirement - sql must have tests
    – Ewan
    Commented Sep 12, 2019 at 8:20
  • Just having tests does not remove the necessity of understanding the changes. You still have to verify whether tests are accurately testing the changes.
    – CEGRD
    Commented Sep 12, 2019 at 8:22
  • 1
    @CEGRD it seems to me like everyone here is giving you basically the same answer "do more tests!" and you are just saying "that won't work!" But we all do the same job and it works for us.
    – Ewan
    Commented Sep 12, 2019 at 9:26
  • 1
    I'm not, it's a fine question. My answer is correct and will help you. But there's no way for any of us to make you change you mind if you dont think it will work. It's up to you to test each answer and see if they work or not
    – Ewan
    Commented Sep 12, 2019 at 9:31
1

Gotta be honest; I'm not a fan of these kinds of systems. They are very difficult to maintain, for several reasons which you have already articulated. Your review process is not the problem; your architecture is the problem.

As I see it, your choices are twofold:

  1. Begin having architectural reviews on a regular basis so that everyone stays familiar with your tables and queries, and provide adequate documentation and training that explains how it all works, or

  2. Shift your simple (CRUD) queries to the front end or middleware application and do away with the stored procedures, reserving those mostly for batch and server operations.

3
  • 1
    I am sorry but this is not a pragmatic answer. There is not much I can do with this answer practically.
    – CEGRD
    Commented Sep 12, 2019 at 8:21
  • Have you given some thought to the other community members' observations that the way you characterize you problem isn't exactly realistic? Commented Sep 12, 2019 at 12:45
  • In what sense? The problem is real. It is not some imaginary issue I am depicting. And often times I do have to review the code changes without the option of sending them back to the author. And I have no authority to change the organizational processes or enforcing things on others. Therefore, I am not saying the answers given so far do not make sense; what I am saying is that they are simply not applicable in my situation.
    – CEGRD
    Commented Sep 12, 2019 at 12:49

Not the answer you're looking for? Browse other questions tagged or ask your own question.