0
\$\begingroup\$

Given the following example data:

id username group unit department team status
1 user1 g1 u1 d1 t1 active
2 user2 g1 u1 d1 t2 active
3 user3 g1 u1 d1 t3 inactive
4 user4 g3 u6 d12 t30 active
5 user5 g25 u54 d70 t88 inactive

And the table definition for the table used above:

Create Table table_name(id INT, username VARCHAR(50), group_ VARCHAR(50), unit VARCHAR(50), department VARCHAR(50), team VARCHAR(50), status VARCHAR(50));

Insert Into table_name Values(1,'user1','g1','u1','d1','t1','active'),
(2,'user2','g1','u1','d1','t2','active'),
(3,'user3','g1','u1','d1','t3','inactive'),
(4,'user4','g3','u6','d12','t30','active'),
(5,'user5','g25','u54','d70','t88','inactive');

I get selections as arrays. Each array represents selection of group/unit/department/team, and I need to count how many active and inactive users are per the selection.

The selection does not necessarily have to include all the levels, so I can get the following selections for example:

["g1", "u1", "d1"] and ["g25", "u54", "d70", "t88"].

For this specific example, this is the query:

SELECT group_
     , unit
     , department
     , NULL as team
     , count(case when status='active' then 1 end) as active_count,
       count(case when status='inactive' then 1 end) as inactive_count
  FROM table_name
 WHERE group_='g1' 
   AND unit='u1' 
   AND department='d1'
group
    BY group_
     , unit
     , department
UNION ALL
SELECT group_
     , unit
     , department
     , team
     , count(case when status='active' then 1 end) as active_count,
       count(case when status='inactive' then 1 end) as inactive_count
  FROM table_name
 WHERE group_='g25' 
   AND unit='u54' 
   AND department='d70' 
   AND team='t88'
group
    BY group_
     , unit
     , department
     , team

demo: https://dbfiddle.uk/tBEDulvw

But I might get dozens of selection arrays so I will have dozens of UNIONs and I think it is not efficient.

Is it possible to make a more efficient query for this purpose?

It needs to work for unknown number of selections but also where it's unknown which of the levels were selected (but you can assume that it has to be selected with the correct "hierarchy" structure, so if a department is in the array, you can assume that there has to be a unit and group as well)

Edit: Note that the number of arrays can change, for example I can get 3 arrays:

["g1", "u1", "d1"] and ["g25", "u54", "d70", "t88"] and ["g3", "u6"], then the query would be:

SELECT group_
     , unit
     , department
     , NULL as team
     , count(case when status='active' then 1 end) as active_count,
       count(case when status='inactive' then 1 end) as inactive_count
  FROM my_table
 WHERE group_='g1' 
   AND unit='u1' 
   AND department='d1'
group
    BY group_
     , unit
     , department
UNION ALL
SELECT group_
     , unit
     , department
     , team
     , count(case when status='active' then 1 end) as active_count,
       count(case when status='inactive' then 1 end) as inactive_count
  FROM my_table
 WHERE group_='g25' 
   AND unit='u54' 
   AND department='d70' 
   AND team='t88'
group
    BY group_
     , unit
     , department
     , team
UNION ALL
SELECT group_
     , unit
     , NULL as department
     , NULL as team
     , count(case when status='active' then 1 end) as active_count,
       count(case when status='inactive' then 1 end) as inactive_count
  FROM my_table
 WHERE group_='g3' 
   AND unit='u6' 
group
    BY group_
     , unit

demo: https://dbfiddle.uk/Nv2csSc8

\$\endgroup\$
6
  • 4
    \$\begingroup\$ All information for the review must be in the question. We can't review code not in the question. \$\endgroup\$
    – pacmaninbw
    Commented Jan 18, 2023 at 22:43
  • 1
    \$\begingroup\$ FYI, why do you need a union if there is only one table? \$\endgroup\$
    – pacmaninbw
    Commented Jan 18, 2023 at 22:44
  • \$\begingroup\$ Added the table definition to the post as well. The reason I used union is because the number of items in each of the selection arrays can be different but I still want to display it as empty if it was not selected (in the example you can see the first selection has no team so it shows as null. But there could be a better way to do it) \$\endgroup\$
    – pileup
    Commented Jan 18, 2023 at 22:54
  • 1
    \$\begingroup\$ The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles. \$\endgroup\$ Commented Jan 19, 2023 at 8:00
  • \$\begingroup\$ One option is to do the most granular group by (the one with most grouping columns) in the sql query. Compute the less granular grouping on the caller site by summing the result of most granular grouping. This might not be suitable if the most granular grouping returns a lot of rows to be grouped by the client (that is ie if the team column has many distinct values for the same set of the other columns values). That all actually assuming you're also programming the client app in some other language. \$\endgroup\$
    – slepic
    Commented Jan 25, 2023 at 5:44

1 Answer 1

3
\$\begingroup\$

As this answer to How do I select data with a case statement and group by? explains a subquery can be used to get the result. The subquery can conditionally select a value for the team using a CASE expression.

SELECT group_, unit, department, team, 
  count(case when status='active' then 1 end) as active_count,
  count(case when status='inactive' then 1 end) as inactive_count
FROM (
SELECT group_
     , unit
     , department
     , CASE WHEN group_='g1' AND unit='u1' AND department='d1' THEN NULL else team end as team
     , status
  FROM table_name
 WHERE (group_='g1' AND unit='u1' AND department='d1') OR 
  ( group_='g25'  AND unit='u54' AND department='d70' AND team='t88')
)  as groupedData
group BY group_, unit, department, team

DBFiddle sample

Below is a screenshot of the query plans for the original query as well as the query above. It illustrates how the query with the sub-query has one less table scan, three fewer scalar computations, one less stream aggregates and no concatenation (since it has no UNION).

Query plan comparison

\$\endgroup\$
3
  • \$\begingroup\$ Thank you! In fact I did use similar query. The problem that it only works for this specific example. I couldn't make it more "general". Imagine there can be dozens of these selection arrays. Here the grouping by condition only works for single input \$\endgroup\$
    – pileup
    Commented Jan 19, 2023 at 4:34
  • 2
    \$\begingroup\$ @B.DLiroy - "Imagine there can be dozens of these selection arrays" would be completely different code to what is in the question. If you present example code for review instead of your real program, you waste everybody's time, including your own. \$\endgroup\$ Commented Jan 19, 2023 at 8:03
  • \$\begingroup\$ But why different? If there are more arrays, I need to add another SELECT at the bottom, please take a look at the updated post, I added the example and demo at the bottom. It remains the same structure just the number of SELECTs is dynamic. That's why your query works only for the specific example where there are 2 selections. My mistake is that I should've given the example with 3 arrays at first maybe? sorry. I also tried to edit your query to be more general but failed: dbfiddle.uk/GUdRjMp0. \$\endgroup\$
    – pileup
    Commented Jan 19, 2023 at 8:17

Not the answer you're looking for? Browse other questions tagged or ask your own question.