7

Please see the simple http://sqlfiddle.com/#!9/e853f/1 for this problem in operation.

I refer to MySQL ver 5.6.12-log

As I understand it, a left join returns NULL for columns in the rightmost dataset where the key in the left dataset does not exist in the right dataset.

However, I am getting data returned from the right hand side even where the left hand key does not exist in the right.

Can anyone explain what is going on here?

The SQLfiddle creates:

  • A table with 6 rows, each containing just an integer ID
  • A second table with 3 rows containing some of those integer IDs plus two more INT fields
  • A view based upon that second table that returns 3 rows containing the integer ID plus a textual field, derived from the two other INT fields

(Obviously, the 3 IDs in the view correspond to some of the IDs in the 6 row table.)

The SQL SELECT * FROM LEFT JOIN ON table_ID = view_ID; returns 6 rows as expected but all of them have data in the textual field instead of the 3 unmatched ones being NULL

BUT

If the method used in the view to derive the textual column is slightly altered, then the Left Join SQL gives the correct result. (You can show this by selectively commenting out one or other of the two methods in sql fiddle)

But surely doesn't the optimiser evaluate the view first, so it shouldn't matter how the data is created, just what it contains?

(This s a much simplified version of an earlier question of mine that I admit was rather too complicated to illicit sensible answers)

It has been suggested (Jeroen Mostert)that I show data and expected results. Here it is:

Table person

personID
--------
   1
   2
   3
   4
   5
   6

View payment_state

payment_personID  |   state
----------------------------
       1          |   'equal'
       2          |   'under'
       3          |   'over'

Query

SELECT * FROM  person 
LEFT JOIN   payment_state 
ON personID = payment_personID;

Expected result

personID | payment_personID  |state
-------------------------------------
    1    |      1            | 'equal'
    2    |      2            | 'under'
    3    |      3            | 'over'
    4    |     NULL          |  NULL
    5    |     NULL          |  NULL
    6    |     NULL          |  NULL

Actual result

personID | payment_personID  |state
-------------------------------------
    1    |      1            | 'equal'
    2    |      2            | 'under'
    3    |      3            | 'over'
    4    |     NULL          | 'equal'
    5    |     NULL          | 'equal'
    6    |     NULL          | 'equal'
5
  • If you switch the RDBMS used to "SQL Server 2017" or "PostgreSQL" and replace IFNULL with COALESCE (to make the syntax work) you will get exactly the results you expect. (Which is, I believe, also the result you should get as far as ANSI SQL is concerned, but don't quote me on that). This looks like a MySQL oddity. Commented Mar 23, 2018 at 12:13
  • Add the sample data and the current result and the expected result here. Formatted text, please.
    – jarlh
    Commented Mar 23, 2018 at 12:13
  • @jarlh 6 It is all very clear in the SQL fiddle. I have deliberately not reproduced it here to make my question succinct and clear Commented Mar 23, 2018 at 12:22
  • @user2834566: from at least one comment and at least one answer that don't seem to understand the issue at hand, consider the possibility that it's still not clear enough. You may wish to elaborate with a "I do this, I got this, I expected this" with no requirement of the reader to press buttons. Commented Mar 23, 2018 at 12:25
  • "left join returns NULL for columns in the rightmost dataset where the key in the left dataset does not exist in the right dataset" That is very poorly phrased & if it were clear it would be wrong because that only applies to certain joins involving keys, but the general definition does not involve keys. Left join on returns inner join on rows plus unmatched left table rows extended by NULLs. Unmatched rows are ones that are not part of a row in the inner join on result. Ones that could not make the ON condition true with any row from the other table.
    – philipxy
    Commented Mar 24, 2018 at 6:06

3 Answers 3

3

I beg to disagree with other answers. This is a MySQL defect. Actually it is bug #83707 in MySQL 5.6. It looks it's fixed in MySQL 5.7

This bug is already fixed in MariaDB 5.5.

The internal join strategy such as Nested Loop Join, Merge Join, or Hash Join does not matter. The result should be correct in any case.

I tried the same query in PostgreSQL and Oracle and it works as expected, returning null values on the last three rows.

Oracle Example

CREATE TABLE person (personID INT); 

INSERT INTO person (personID) VALUES (1); 
INSERT INTO person (personID) VALUES(2); 
INSERT INTO person (personID) VALUES(3);
INSERT INTO person (personID) VALUES(4);
INSERT INTO person (personID) VALUES(5);
INSERT INTO person (personID) VALUES(6);

CREATE TABLE payments (
   payment_personID INT,
   Due INT,
   Paid INT) ;

INSERT INTO payments  (payment_personID, due, paid) VALUES (1, 5, 5);
INSERT INTO payments  (payment_personID, due, paid) VALUES (2, 5, 3);
INSERT INTO payments  (payment_personID, due, paid) VALUES (3, 5, 8);

CREATE VIEW payment_state AS (
SELECT
   payment_personID,
  CASE 
   WHEN COALESCE(paid,0) < COALESCE(due,0) AND due <> 0 THEN 'under' 
   WHEN COALESCE(paid,0) > COALESCE(due,0) THEN 'over' 
   WHEN COALESCE(paid,0) = COALESCE(due,0) THEN 'equal' 
   END AS state 
FROM payments);

SELECT *
FROM
    person
LEFT JOIN 
    payment_state   
ON personID = payment_personID;

Result:

PERSONID  PAYMENT_PERSONID  STATE
========  ================  =====
       1                 1  equal
       2                 2  under
       3                 3  over
       6            <null>  <null>
       5            <null>  <null>
       4            <null>  <null>

Works perfectly!

PostgreSQL Example

CREATE TABLE person (personID INT); 
INSERT INTO person (personID) VALUES
(1),(2),(3),(4),(5),(6);

CREATE TABLE payments (
   payment_personID INT,
   Due INT,
   Paid INT) ;

INSERT INTO payments  (payment_personID, due, paid) VALUES
(1, 5, 5), (2, 5, 3), (3, 5, 8);

CREATE VIEW payment_state AS (
SELECT
   payment_personID,
  CASE 
   WHEN COALESCE(paid,0) < COALESCE(due,0) AND due <> 0 THEN 'under' 
   WHEN COALESCE(paid,0) > COALESCE(due,0) THEN 'over' 
   WHEN COALESCE(paid,0) = COALESCE(due,0) THEN 'equal' 
   END AS state 
FROM payments);

SELECT *
FROM
    person
LEFT JOIN 
    payment_state   
ON personID = payment_personID;

Result:

personid  payment_personid  state
========  ================  =====
       1                 1  equal
       2                 2  under
       3                 3  over
       4            <null>  <null>
       5            <null>  <null>
       6            <null>  <null>

Also, works perfectly!

6
  • 1
    Well, it appears the MySQL folks agree, and it's been fixed in 5.7+. Commented Mar 23, 2018 at 14:38
  • but changing IFNULL to COALESCE is also doesn't work sqlfiddle.com/#!9/b492af/1
    – Pelin
    Commented Mar 23, 2018 at 14:42
  • Ah... OK. So this bug was already reported in November 2016 as bug #83707. The bad news is they don't have any intention yet to fix it. Per ticket comment: "Suggested Fix: a warning should be issued." Commented Mar 23, 2018 at 14:42
  • I just tried MariaDB 5.5 (originally a clone of MySQL) and it works perfectly. Those guys already fixed it, but the MySQL guys (Oracle today) are dragging their feet. Commented Mar 23, 2018 at 14:48
  • So your solution is basically switching the RDBMS to "PostgreSQL" and replace IFNULL with COALESCE, like how @JeroenMostert suggested in the first comment?
    – Pelin
    Commented Mar 23, 2018 at 14:57
1

Processing algorithm of your view causes this result. For default, MySQL usually chooses MERGE, because it is more efficient. If you create a view with "TEMPTABLE" algorithm you will be able to see NULL for the unmatched rows.

http://www.mysqltutorial.org/create-sql-views-mysql.aspx

CREATE ALGORITHM =  TEMPTABLE VIEW  payment_state AS (
SELECT
   payment_personID,
 CASE 
   WHEN IFNULL(paid,0) < IFNULL(due,0) AND due <> 0 THEN 'under' 
   WHEN IFNULL(paid,0) > IFNULL(due,0) THEN 'over' 
   WHEN IFNULL(paid,0) = IFNULL(due,0) THEN 'equal' 
   END AS state 
FROM payments);
5
  • 1
    That's amazing. It's just like the view definition for additional columns was copied instead of actual results. The ALGORITHM = TEMPTABLE parts works like an added .ToList() in C#, materializing data here and now before JOINing it. You learn something new every day :) Commented Mar 23, 2018 at 12:47
  • 1
    Seems like a good illustration of the MySQL philosophy: make it fast, then give the user options to make it correct (if they insist). Commented Mar 23, 2018 at 13:06
  • That really is something I would not have expected! I had read the documentation on algorithms here (mysqltutorial.org/create-sql-views-mysql.aspx) but nowhere did it say 'Oh by the way, merge gives the wrong result'. I still don't quite follow the logic though. If merge combines the queries then somewhere it must be evaluating IFNULL(paid,0) = IFNULL(due,0) to get 'equal' for the unmatched keys. But even if it does this after the join, although the unmatched 'paid' columns will be null, none of the 'due' ones are so I don't quite see how it evaluates a 'equal' result. Commented Mar 23, 2018 at 13:27
  • Incidentally I tried putting the view definition as a sub query in the join instead of the view name (sort of simulating the merge algorithm) and I got the correct result, so I'm even more puzzled now. Still, to paraphrase Jerone re MySQL Philosophy, "Never mind the correctness, look at the speed" Commented Mar 23, 2018 at 13:27
  • I agree that is weird and I learnt it in hard way :) Glad that it works!
    – Pelin
    Commented Mar 23, 2018 at 13:30
0

This is the normal way LEFT JOIN works. It appends new columns to the result, then fills them with:

  • values pulled from the table being JOINed if the JOIN succeeds,
  • NULLs if the JOIN doesn't match (that includes the fields you joined ON)!

Normally there is no distinction between NULLs pulled from real tables (where JOIN matched) and NULLs filled in because the JOIN didn't match. The CASE + IFNULL just look for NULLs and swaps them to 0s (no matter their source). That's why you have results in the state column even in unmatched rows.

As a matter of fact, if you want to know if a given NULL you are looking at was a result of not matching a JOIN, you need to explicitly check this - if all key fields you JOINed on are NULLs, when the NULL in this column is a result of a fill-in. If the fields from key are present in this row yet there is still a NULL in other column, then it is there because it was pulled from the table you JOINed.

2
  • @ ensisNoctis quote "NULLs if the JOIN doesn't match". But my point is that I am not getting NULLs where the join doesn't match, I'm getting data. Commented Mar 23, 2018 at 12:28
  • Oh, now I see. You JOIN to a view, then results appear just as the CASE definition was copied to the SELECT (not enough data for a view itself = NULLs, but enough data after JOIN = columns filled). Sorry, I misunderstood your question. Commented Mar 23, 2018 at 12:45

Not the answer you're looking for? Browse other questions tagged or ask your own question.