2
\$\begingroup\$

I've 2 tables emp and expenditure.

Emp:

ID, NAME

Expenditure:

ID, EMP_ID, AMOUNT

Each emp has a limit of 100 that he/she can spend. We want to check which emp has expenditure > 100.

Output attributes needed: Emp name, exp id, amount

My query:

SELECT E.NAME,
    EXP.ID,
    EXP.AMOUNT
FROM EMP E
INNER JOIN expenditure EXP ON E.ID = EXP.EMP_ID
WHERE E.ID in
        (SELECT EMP_ID
            FROM
                (SELECT EMP_ID,
                        SUM(AMOUNT) AS TOTAL
                    FROM expenditure
                    GROUP BY EMP_ID
                    HAVING SUM(AMOUNT) > 100.00
                    ORDER BY TOTAL DESC) SUBQ)
ORDER BY EXP.AMOUNT desc;

Is it possible to optimize this? Can the subqueries be simplified?

\$\endgroup\$
8
  • 1
    \$\begingroup\$ Where does T.AMOUNT come from? Does this query work? \$\endgroup\$ Commented Mar 20, 2022 at 9:54
  • 1
    \$\begingroup\$ This question has been posted on Stack Overflow. I think that's where it belongs, and I voted to close it for that reason. \$\endgroup\$ Commented Mar 20, 2022 at 11:01
  • 3
    \$\begingroup\$ I think this is (basically) on-topic for CodeReview, since it's asking for optimization; though OP should show the full table definitions. \$\endgroup\$
    – Reinderien
    Commented Mar 20, 2022 at 14:17
  • 1
    \$\begingroup\$ The ORDER BY TOTAL DESC in the subquery is useless. \$\endgroup\$ Commented Mar 20, 2022 at 18:27
  • 4
    \$\begingroup\$ Please note the 3 rules of posting SQL questions: "1) Provide context, 2) Include the schema, 3) If asking about performance, include indexes and the output of EXPLAIN SELECT." \$\endgroup\$
    – Mast
    Commented Mar 21, 2022 at 14:28

1 Answer 1

9
\$\begingroup\$

Let's assume that your table declarations look like this and that you're using MySQL 8.0:

create table Employee(
    id serial primary key,
    name text not null
);

create table Expenditure(
    id serial primary key,
    employee_id int not null references Employee(id)
        on update cascade on delete cascade,
    amount decimal not null check(amount > 0)
);

(PostgreSQL would allow for the standards-compliant generated always as identity as well as a money column; MySQL supports neither.)

With your test data as

insert into Employee(name) values
   ('Bob'),
   ('Doug'),
   ('McKenzie');
   
insert into Expenditure(employee_id, amount)
select id, amount from (
    select 9 as amount
    union all select 2
    union all select 3
    union all select 5
) as amounts
cross join Employee where name = 'Bob';
   
insert into Expenditure(employee_id, amount)
select id, amount from (
    select 100 as amount
    union all select 190
    union all select 450
) as amounts
cross join Employee where name = 'Doug';

PostgreSQL would allow for the standard values-subquery syntax, which MySQL does not:

insert into Expenditure(employee_id, amount)
select id, amount from (
    values (9), (2), (3), (5)
) as amounts(amount)
cross join Employee where name = 'Bob';
   
insert into Expenditure(employee_id, amount)
select id, amount from (
    values (100), (190), (450)
) as amounts(amount)
cross join Employee where name = 'Doug';

You can eliminate some of your subqueries and your in clause by using a windowing expression:

select exp.employee_id, emp.name, exp.amount
from Employee emp
join (
    select
        employee_id, amount,
        sum(amount) over (partition by employee_id) as total
    from Expenditure
) exp on exp.employee_id = emp.id
where exp.total >= 100
order by exp.amount desc;

You will always need at least one join to get between your employee and expense tables. See fiddle.

All of that said, the windowing syntax is a little bit unwieldy, and you can also do (vaguely closer to your original query)

select exp.employee_id, emp.name, exp.amount
from Employee emp
join Expenditure exp on exp.employee_id = emp.id
where exp.employee_id in (
    select employee_id
    from Expenditure
    group by employee_id
    having sum(amount) >= 100
)
order by exp.amount desc;

but with only one subquery and one sum() expression.

\$\endgroup\$
12
  • 2
    \$\begingroup\$ "MySQL is a toy RDBMS" I am no fan of mysql, but this seems unnecessary \$\endgroup\$ Commented Mar 20, 2022 at 18:31
  • 2
    \$\begingroup\$ It’s not if it’s true and adds value to the answer which IMO it does \$\endgroup\$ Commented Mar 20, 2022 at 22:11
  • 1
    \$\begingroup\$ This is a bread and butter problem, and windowing is totally peacocking. Yes the subquery (and there should be only one) can be simplified. Why the nesting? Why the sorting? Nothing tricky required. In fact it should be an ANSI join for readability. \$\endgroup\$
    – mckenzm
    Commented Mar 21, 2022 at 7:37
  • 1
    \$\begingroup\$ MySQL is definitely not a toy. It also has window functions, you answer seems to imply that it doesn't. \$\endgroup\$ Commented Mar 21, 2022 at 11:02
  • 1
    \$\begingroup\$ @HoneyBadger actually MySQL only supports the INSERT ... VALUES ... syntax, not the full version as in the answer here (... from ( values (9), (2), (3), (5) ) as amounts(amount) ...) \$\endgroup\$ Commented Mar 21, 2022 at 15:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.