1
\$\begingroup\$

I'm playing with data from a Time Use survey as a beginner SQL user.

Gender is represented as 1's (Male) and 2's (Female), and the age of each participant is listed. I would like to divide them into age brackets with a total count, and see how many are male/female within each bracket.

What I have is this:

    SELECT count(Age) as 'Total',
CASE 
    WHEN Age<=19 THEN 'Teen'
    WHEN Age >=20 and Age<=29 THEN '20s' 
    WHEN Age>=30 and Age<=39 THEN '30s'
    WHEN Age>=40 and Age<=49 THEN '40s'
    WHEN Age>=50 and Age<=59 THEN '50s'
    WHEN Age>=60 and Age<=69 THEN '60s'
    WHEN Age>=70 and Age<=79 THEN '70s'
    WHEN Age>=80 and Age<=89 THEN '80s'
END as 'Age',
CASE
    WHEN Sex=1 THEN 'Male'
    WHEN Sex=2 THEN 'Female'
END as 'Gender'
from TimeUse_Summary
Group by CASE 
    WHEN Age<=19 THEN 'Teen'
    WHEN Age >=20 and Age<=29 THEN '20s' 
    WHEN Age>=30 and Age<=39 THEN '30s'
    WHEN Age>=40 and Age<=49 THEN '40s'
    WHEN Age>=50 and Age<=59 THEN '50s'
    WHEN Age>=60 and Age<=69 THEN '60s'
    WHEN Age>=70 and Age<=79 THEN '70s'
    WHEN Age>=80 and Age<=89 THEN '80s'
END, Sex
order by Age

The result is exactly what I'm looking for:

|Total  |Age   |Gender|
|:------|:-----|------|
|173    |Teen  |Male  |
|175    |Teen  |Female|
|438    |20s   |Male  |
|468    |20s   |Female|
|etc.                 |

Should I be condensing the CASE statements somehow?

\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

I agree with you that we could DRY up those CASE statements a bit.

The core issue is we would like to report against a relation that does not exist yet, so we must synthesize it. Several approaches spring to mind:

  1. CASE expressions, as you described
  2. add an age_bracket column, which would be denormalized
  3. create a temporary reporting table, which again is denormalization
  4. create the same relation as a VIEW report (a named query)
  5. create a CTE relation

There is almost no difference between those last two options. If you look at SELECT statements out in the wild, common table expressions tend to crop up more often than queries based on CREATE VIEW. Me, personally? I tend to prefer a VIEW, mostly because they compose with JOINs very nicely. And you can conveniently SELECT * from it during debugging, instantly replace it with an improved edit, and then go on to mention its name in some giant JOIN that was already mostly working.

There are CTE examples all over SO, so I will expand on item 4.

DROP VIEW  IF EXISTS
            time_use_bracket;
CREATE VIEW time_use_bracket AS
SELECT
  CASE 
    WHEN Age<=19 THEN 'Teen'
    WHEN Age>=20 and Age<=29 THEN '20s' 
    WHEN Age>=30 and Age<=39 THEN '30s'
    WHEN Age>=40 and Age<=49 THEN '40s'
    WHEN Age>=50 and Age<=59 THEN '50s'
    WHEN Age>=60 and Age<=69 THEN '60s'
    WHEN Age>=70 and Age<=79 THEN '70s'
    WHEN Age>=80 and Age<=89 THEN '80s'
  END AS age_bracket,
  Age AS age,
  CASE
    WHEN Sex=1 THEN 'Male'
    WHEN Sex=2 THEN 'Female'
  END AS gender
FROM TimeUse_Summary
;

SELECT
  COUNT(*) AS total,
  age_bracket,
  gender
FROM time_use_bracket
GROUP BY age_bracket, gender
ORDER BY MIN(age)
;

You read from the INT Age column, and synthesize a new VARCHAR column which, confusingly, is named Age. Recommend you choose a distinct name for it.


As an orthogonal issue, the CASE statement you supply is a bit tediously long and repetitive. Consider exploiting the % modulo operator to do much of the heavy lifting.

\$\endgroup\$
1
  • \$\begingroup\$ Thank you. I honestly forgot the modulo was a thing. But I appreciate your answer, as it's given me a goo amount to [re]review. Admittedly I haven't really spent much time on CTEs \$\endgroup\$
    – Erik
    Commented Jun 22, 2023 at 2:07

Not the answer you're looking for? Browse other questions tagged or ask your own question.