0
\$\begingroup\$

The problem I have solved is as follows:

Consider a table of dogs, DOGGIES, that records on each row the ID of a doggy, one skill that they have, when they started having it, when they stopped having it (assume a useful placeholder if they never stop), and an extra bool column called IS_NO_LONGER_GOOD_BOY. This bool column is magically a property of the skill and not of the dog or the history. Find every doggy who has a skill with a 0 value for IS_NO_LONGER_GOOD_BOY without, at some point in time, having that same skill with a 1 value for IS_NO_LONGER_GOOD_BOY. For each doggy found, return the IS_NO_LONGER_GOOD_BOY = 0 row(s) that caused them to appear.

Doggies can have multiple skills at any given time, but doggies with no skills aren't in the data. There is no need to worry about NULLs, malformed data, or other such traps.

This was my solution. It works, but the code comments explain my dislike of it.

SELECT [ID], [SKILL_NAME], [START_DATE], [STOP_DATE], [IS_NO_LONGER_GOOD_BOY]
FROM [DOGGIES] AS [DOGS_OUT] --Not sure if the alias is needed, but it adds clarity?
WHERE EXISTS
(
    SELECT 1
    FROM
    --Really don't like this bit.
    --This SELECT 1 part only exists so I can write a WHERE after the indented query's HAVING.
    --You'd really think there'd be a way to put this inside the indented query.
    --But the innermost query needs the HAVING part, so I think it's inescapable?
    (
        SELECT [ID], [SKILL_NAME]
        FROM [DOGGIES]
        GROUP BY [ID], [SKILL_NAME]
        HAVING
                COUNT(CASE WHEN [IS_NO_LONGER_GOOD_BOY] = 0 THEN 1 END) > 0 --SQL has no COUNTIF
            AND
                COUNT(CASE WHEN [IS_NO_LONGER_GOOD_BOY] = 1 THEN 1 END) = 0
    ) AS [DOG_SKILL_PAIRS]
    WHERE
            [DOG_SKILL_PAIRS].[ID] = [DOGS_OUT].[ID]
        AND
            [DOG_SKILL_PAIRS].[SKILL_NAME] = [DOGS_OUT].[SKILL_NAME]
)

My question is simply if what I've done can be improved upon. Some suspicions of mine are as following:

  1. I get the feeling that all of this code should only need at most one level of nesting, rather than having a correlated subquery inside the outermost query.
  2. Working from the innermost layer outwards, I don't like how I've gone from a HAVING clause without a WHERE clause to a WHERE clause that filters the result of the HAVING clause. You typically go from WHERE to HAVING, not the other way around.
  3. Can window functions save the day? COUNT(CASE WHEN [IS_NO_LONGER_GOOD_BOY] = 0 THEN 1 END) OVER (PARTITION BY [ID], [SKILL_NAME]) AS [FALSE_COUNT] sounds like it ought to be a good idea, but I couldn't make it fit.

I'm on a 2016 version of SQL Server.

\$\endgroup\$
6
  • \$\begingroup\$ @Reinderien 2016. \$\endgroup\$
    – J. Mini
    Commented Jul 12, 2022 at 23:05
  • 1
    \$\begingroup\$ I find the problem statement and the schema confusing. To me, being a "good boy" is a property of the dog, not of a skill, and certainly not a property of a skill associated with a time interval. Also, being "no longer a good boy" implies that a TRUE value tells you something about the history, which is a weird way to store data. \$\endgroup\$ Commented Jul 12, 2022 at 23:16
  • \$\begingroup\$ @200_success It's absolutely a weird way to store data. Accept the schema at face value and the problem should then make sense. \$\endgroup\$
    – J. Mini
    Commented Jul 12, 2022 at 23:19
  • \$\begingroup\$ I wrote about the pitfalls of double negatives in my answer, but it's even tripped you up in the title of your question: the problem doesn't ask Find dogs who stopped being good boys without ever being good boys; it asks Find dogs who are and have always been good boys. \$\endgroup\$
    – Reinderien
    Commented Jul 13, 2022 at 0:04
  • \$\begingroup\$ (Aside: all dogs are good boys) \$\endgroup\$
    – Reinderien
    Commented Jul 13, 2022 at 0:05

1 Answer 1

2
\$\begingroup\$

First some commentary on the question. @200_success is right to point out that IS_NO_LONGER_GOOD_BOY does not make sense as a property on the skill, but that's hardly the only problem:

  • Don't store booleans in negative form! Store is_good_boy instead. Otherwise, double negatives produce headaches when trying to understand logic.
  • Re. assume a useful placeholder if they never stop: a "useful placeholder" is null. So a well-designed schema would ignore "There is no need to worry about NULLs"; this is not a "trap" but a "feature" and a "necessary mechanism for good schema design".
  • doggies should be a separate table from doggy_skills should be a separate table from skills. doggy_skills being a join table, it will have foreign key references to both of the other tables.

So, making a judgement call, I will assume recommendation of "what should be done" and not only "what should be done to the letter of the question". I'm sure you can fill out the latter based on the other points I suggest, if you so need.

You do not need a group by nor a count nor a having, but you do need a correlated subquery. You only need one level of nesting.

In terms of style, delete all of your [] escaping brackets; and (more as a matter of personal taste) there is no reason to SHOUT.

Suggested

create table doggies(
    id int primary key
);

create table skills(
    id int primary key,
    name varchar(100) not null
);

create table doggie_skills(
    doggie int not null references doggies(id),
    skill int not null references skills(id),
    start datetime not null default current_timestamp,
    stop datetime,
    is_good_boy bit not null default 1,
    
    primary key(doggie, skill, start),
    check(start < stop)
);

insert into doggies(id) values
    (1), (2), (3), (4), (5), (6);

insert into skills(id, name) values
    (10, 'roll over'),
    (11, 'brain surgery');

insert into doggie_skills(doggie, skill, start, stop, is_good_boy) values
    --- included ---
    -- Only one skill, current, good boy
    (1, 10, '2010-01-01T00:00:00', null, 1),
    
    -- One current skill with a history, always a good boy
    (2, 10, '2010-01-01T00:00:00', '2015-01-01T00:00:00', 1),
    (2, 10, '2016-01-01T00:00:00', null, 1),
    
    --- excluded ---
    -- no skills (3)
    
    -- no current skills
    (4, 11, '2010-01-01T00:00:00', '2015-01-01T00:00:00', 1),
    
    -- one current skill but not a good boy
    (5, 11, '2010-01-01T00:00:00', null, 0),
    
    -- one current skill, good boy now, bad boy before
    (6, 10, '2010-01-01T00:00:00', '2015-01-01T00:00:00', 0),
    (6, 10, '2016-01-01T00:00:00', null, 1);

-- Find every doggie who has a skill with a 1 value for is_good_boy
--                       ^^^   implying stop is null
-- without, at some point in time, having that same skill with a 0 value for is_good_boy

select * 
from doggie_skills as current_skill
where current_skill.is_good_boy = 1
    and current_skill.stop is null -- current
    and not exists(
        select 1
        from doggie_skills as old_skill
        where old_skill.skill = current_skill.skill
            and old_skill.doggie = current_skill.doggie
            and old_skill.is_good_boy = 0
    );

Also see fiddle.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.