14

I'm trying to write a query to find distinct values in a particular field, count the number of occurrences of that value where for all instances of that particular value another column value is satisfied, and then display the results as follows (more explanation to follow):

Example db:

RowId    Status       MemberIdentifier
-----    ------       ----------------
1       In Progress   111111111
2       Complete      123456789
3       Not Started   146782452
4       Complete      111111111
5       Complete      123456789
6       Not Started   146782452
7       Complete      111111111

Desired Result:

Status         MemberIdentifierCount 
------         ---------------------- 
Not Started    1
In Progress    1
Complete       1

In the above query, the number of distinct MemberIdentifiers with a given Status are counted and displayed. If a MemberIdentifier has two rows with Status 'Complete' but one with Status 'In Progress,' it is grouped and counted as in progress (i.e., MemberIdentifier= 111111111). For a MemberIdentifier to be grouped and counted as complete, all of its rows must have a Status of 'Complete' (i.e., MemberIdentifier= 123456789). Any insight would be appreciated (MySQL newbie).

4
  • 1
    If a member has one record with "Not Started" and one record with "In Progress," then what is the final actual status? Commented Nov 6, 2017 at 14:44
  • The case where a member has records in "Not Started" and "In Progress" will never occur, but for the sake of logic the final status would be "In Progress"
    – Funsaized
    Commented Nov 6, 2017 at 15:14
  • In order to answer this question properly that matches with your database, we'll need some more information on the tables and structure. If you can update your question with some information of where these fields derive from I can write you a good query. I could do it with the content above, but after looking at some of your comments on the answers provided, it sounds like your table structure is a bit more complex than that. Commented Nov 9, 2017 at 23:33
  • @Bronco423 by the way you wrote the table above is been obtained joining on rowID, but you have a duplicate value, exactly "6 Not Started 146782452" that has 2 different rowID. I would suggest then to check also your previous query where you join them. Also as already said provide the originals tables. You could have a better time using this sqlfiddle.com Commented Nov 15, 2017 at 8:54

8 Answers 8

11

Per MemberIdentifier find the status you consider appropriate, e.g. 'In Progress' wins over 'Complete' and 'Not Started'. 'Not Started' wins over 'Complete'. Use conditional aggregation for this.

select status, count(*)
from
(
  select 
    case when sum(status = 'In Progress') > 0 then 'In Progress'
         when sum(status = 'Not Started') > 0 then 'Not Started'
         else 'Complete'
    end as status
  from mytable
  group by memberidentifier
) statuses
group by status;
2
  • I'm having a bit of trouble getting this query to execute in the context of my tables (I believe the issue is where I'm trying to join). I didn't mention this in the question; however, the memberidentifier field comes from table1 and the status field comes from table2. The two tables are joined on table1.RowId=table2.RowId ... For my understanding, what is the "statuses" field in your query doing exactly?
    – Funsaized
    Commented Nov 6, 2017 at 15:16
  • statuses is not a field, but the name of the subquery I select from. In the subquery I get the status per memberidentifier and in the main query I count the rows per status. Commented Nov 6, 2017 at 15:20
6
+25
SELECT max_status AS Status
     , COUNT(*) AS ct
    FROM (
        SELECT MAX(Status) AS max_status
            FROM tbl
            GROUP BY MemberIdentifier
         ) AS a
    GROUP BY max_status;

This takes advantage of how these strings compare: "In Progress" > "Complete". In doing so, it does random things to any other member with multiple Statuses.

6
  • +10. shortcut with alphabetic order of status values, using MAX() to give precedence to 'In Progress' over 'Complete'. Commented Nov 13, 2017 at 23:51
  • @spencer7593 - Kludge with a capital K. (I should have added a flashing neon sign on the code, explaining kludge.)
    – Rick James
    Commented Nov 13, 2017 at 23:54
  • the unknowns are other possible values of status... we were given some information about some possible values. "In theory, there's no difference between theory and practice. In practice, there is." Commented Nov 13, 2017 at 23:59
  • SELECT max_status AS Status , (, missing buddy can't change it) Commented Nov 15, 2017 at 11:33
  • i don't why you use MAX(Status) in subquery, there is no difference if we remove max rextester.com/VYIEFR77975 Commented Nov 15, 2017 at 11:36
5

SQL

SELECT AdjustedStatus AS Status,
       COUNT(*) AS MemberIdentifierCount
FROM
(SELECT IF(Status='Complete',
           IF(EXISTS(SELECT Status
                     FROM tbl t2
                     WHERE t2.Status = 'In Progress'
                       AND t2.MemberIdentifier = t1.MemberIdentifier),
              'In Progress',
              'Complete'),
           Status) AS AdjustedStatus,
        MemberIdentifier
 FROM tbl t1
 GROUP BY AdjustedStatus, MemberIdentifier) subq
GROUP BY AdjustedStatus;

Online demo

http://rextester.com/FFGM6300

Explanation

The first IF() function checks whether the status is "Complete" and if so, checks for the existence of another record with the same MemberIdentifier but with a status of "In Progress": This is done via IF(EXISTS(SELECT...))). If found, a status of "In Progress" is assigned to the AdjustedStatus field, otherwise the AdjustedStatus is set from the (unadjusted) Status value.

With the adjusted status having been derived like this for each of the rows in the table, GROUP BY the AdjustedStatus and MemberIdentifier in order to get all unique combinations of these two field values. This is then made into a subquery - aliased as subq. Then aggregate over (GROUP BY) the AdjustedStatus and count the number of occurrences, i.e. the number of unique MemberIdentifiers for each.

5

I assume you have 2 tables as below

CREATE TABLE table1 (RowId INT PRIMARY KEY, MemberIdentifier VARCHAR(255));
INSERT INTO table1 (RowId, MemberIdentifier)
VALUES
(1,'111111111'), (2, '123456789'), (3, '146782452'), (4, '111111111'),(5,'123456789'), (6,'146782452'), (7,'111111111');


CREATE TABLE table2 (RowId INT PRIMARY KEY, Status VARCHAR(255));
INSERT INTO table2 (RowId, Status)
VALUES
(1,'In Progress'), (2,'Complete'   ), (3,'Not Started'), (4,'Complete'   ), (5,'Complete'   ), (6,'Not Started'), (7,'Complete'   );

Assuming you don't have millions of records in these tables, you can use the query below to achieve what you want.

SELECT CASE WHEN not_started.Status = 'Not Started' 
            THEN 'Not Started' 
            WHEN in_progress.Status = 'In Progress' 
            THEN 'In Progress' 
            WHEN complete.Status = 'Complete' 
            THEN 'Complete' 
       END AS over_all_status,
       COUNT(*) AS MemberIdentifierCount
  FROM  (SELECT DISTINCT t1.MemberIdentifier
          FROM table1 t1) main
        LEFT OUTER JOIN   
            (SELECT DISTINCT t1.MemberIdentifier, t2.Status
              FROM table1 t1,
                   table2 t2 
             WHERE t1.RowId = t2.RowId
               AND t2.Status = 'In Progress') in_progress
            ON (main.MemberIdentifier = in_progress.MemberIdentifier)
        LEFT OUTER JOIN
            (SELECT DISTINCT t1.MemberIdentifier, t2.Status
              FROM table1 t1,
                   table2 t2 
             WHERE t1.RowId = t2.RowId
               AND t2.Status = 'Not Started') not_started
        ON (main.MemberIdentifier = not_started.MemberIdentifier)
        LEFT OUTER JOIN
            (SELECT DISTINCT t1.MemberIdentifier, t2.Status
              FROM table1 t1,
                   table2 t2 
             WHERE t1.RowId = t2.RowId
               AND t2.Status = 'Complete') complete
        ON (main.MemberIdentifier = complete.MemberIdentifier)
GROUP BY over_all_status;

Basically the query creates one record per MemberIdentifier containing all three statuses possible. It then groups the result based on the overall status and outputs the count.

Output from the query is

enter image description here

1
  • 1
    this query seems to be a bit overkill... like taking down a squirrel with deer rifle. Commented Nov 13, 2017 at 23:56
3

use the following code to get the status of MemberIdentifier

select MemberIdentifier
,case 
when total = cn then 'Complete' 
when total < cn then 'In Progress' 
when total is null then 'Not Started' END as Fstatus
 from 
(
select sum(stat) total,MemberIdentifier,(select count(MemberIdentifier) as cnt from tbldata t1
     where t1.MemberIdentifier = C.MemberIdentifier
     group by MemberIdentifier) as cn
from (
select MemberIdentifier,case status when 'In Progress' then -1 
                                    when 'Complete' Then 1 
                                    when 'Not Started' then null End as Stat from tbldata 
 ) C
 group by MemberIdentifier

 ) as f1

use the following code to get the count of MemberIdentifiers in particular status.

Select count(fstatus) counts,fstatus from (
select MemberIdentifier
,case when total = cn then 'Complete' 
      when total < cn then 'In Progress' 
      when total is null then 'Not Started' END as Fstatus
 from 
(
select sum(stat) total,MemberIdentifier,(select count(MemberIdentifier) as cnt from tbldata t1
     where t1.MemberIdentifier = C.MemberIdentifier
     group by MemberIdentifier) as cn
from (
select MemberIdentifier
,case status when 'In Progress' then -1 when 'Complete' Then 1 when 'Not Started' then null End as Stat from tbldata 
 ) C
 group by MemberIdentifier

 ) as f1

 ) f2 group by fstatus
output :
counts  fstatus
1       Complete
1       In Progress
1       Not Started
2

If the order of precedence for status is

 Not Started
 In Progress
 Complete

We can use a shortcut...

   SELECT t.memberIdentifier
        , MAX(t.status) AS status
     FROM mytable t
    GROUP BY t.MemberIdentifier

That gets us the distinct memberIdentifier.

If there are any rows for a member that has rows in 'In Progress' and 'Complete' status, the query will return 'In Progress' as the status.

We will get status 'Complete' returned for a member only if that member does not have any rows with a status greater than 'Complete'.

To get counts from that result, we can reference that query as an inline view:

 SELECT q.status
      , COUNT(q.memberIdentifier) 
   FROM ( 
          SELECT t.memberIdentifier
               , MAX(t.status) AS status
            FROM mytable t
           GROUP BY t.MemberIdentifier
        ) q
  ORDER BY q.status

Think of if this way... MySQL runs the query between the parens first (MySQL calls this a "derived table". The results from the query is a set of rows which can be queried like a table.

We could do a COUNT(DISTINCT q.memberIdentifier) or, assuming memberIdentifier is guaranteed to be non-NULL, we could do COUNT(1) or SUM(1) and get an equivalent result. (The GROUP BY in the inline view guarantees us that memberIdentifier will be unique.)


In the more general case, where we don't have a convenient shortcut of alphabetic ordering for the precedence of the status... we could use an expression that returns values that are "in order". That makes the query a bit more complicated, but it would work the same.

We could replace t.status with something like this:

  CASE t.status
  WHEN 'Complete'    THEN 1
  WHEN 'In Progress' THEN 2
  WHEN 'Not Started' THEN 3
  ELSE 4
  END AS `status_priority`

And replace q.status with something the inverse, to convert back to strings:

  CASE q.status_priority
  WHEN 1 THEN 'Complete'
  WHEN 2 THEN 'In Progress'
  WHEN 3 THEN 'Not Started'
  ELSE NULL
  END AS `status`

We'd need to decide how we would handle values of status that aren't one of the three... are those going to be ignored, handled as a higher or lower priority than any of the others. (A test case would be rows with status = 'Unknown' and rows with status = 'Abracadabra.

2

I just modified @thorsten-kettner's solution as you were facing problem while joining table. I have assumed you 2 tables, table1 - which has at least 2 rows (RowID and MemberIdentifier ) and table2 - which has at least 2 rows ( RowID and Status)

select Status, count(*)
from(
  select 
    case when sum(newTable.Status = 'In Progress') > 0 then 'In Progress'
         when sum(newTable.Status = 'Not Started') > 0 then 'Not Started'
         else 'Complete'
    end as status
  from (
    select table1.RowId as RowId, table1.MemberIdentifier as MemberIdentifier, table2.Status as Status from table1 INNER JOIN table2 ON table1.RowId = table2.RowId
  )newTable
  group by newTable.MemberIdentifier
) statuses
group by Status;
1

Another way using a specific table to configure the order (map to Power of two integer).

This mapping allow bit_or aggregate to simply transpose data.

http://rextester.com/edit/ZSG98543

-- Table bit_progression to determine priority

CREATE TABLE bit_progression (bit_status int PRIMARY KEY, Status VARCHAR(255));
INSERT INTO bit_progression (bit_status, Status)
VALUES
(1,       'Not Started'),  
(2,       'Complete'   ),      
(4,       'In Progress');

select
    Status,
    count(*)
from
    (
    select
         MemberIdentifier,max(bit_status) bit_status
    from
        tbl natural join bit_progression
    group by
        MemberIdentifier
    ) Maxi natural join bit_progression
group by
    Status
;

produce

Status  count(*)

1   Complete    1
2   In Progress 1
3   Not Started 1

Extra :

select
    MemberIdentifier,
    bit_or(bit_status) bits_status,
    case when bit_or(bit_status) & 4 = 4 then true end as withStatusInProgress,
    case when bit_or(bit_status) & 2 = 2 then true end as withStatusComplete,
    case when bit_or(bit_status) & 1 = 1 then true end as withStatusNotStarted
from
    tbl natural join bit_progression
group by
    MemberIdentifier
;

produce it :

MemberIdentifier bits_status    withStatusInProgress    withStatusComplete  withStatusNotStarted

111111111   6   1       1       NULL
123456789   2   NULL    1       NULL
146782452   1   NULL    NULL    1

Not the answer you're looking for? Browse other questions tagged or ask your own question.