0

Table Worker

worker_id Salary 
    1  100000
    2  80000
    3  300000
    4  500000
    5  500000
    6  200000
    7  75000
    8  90000

I wish to create another column salary_bin such that if salary < 80k, then it should be bin 1, if salary is between 80k and 100k, then it will be in bin 2, if salary is between 100k and 300k, then it will be bin3, if salary is between 300k and 500k then it will be bin 4.

Attempt:

alter table dbo.worker 
add salary_bin decimal(10,2) 
select worker_id, salary, 
case when salary>=0 and salary<80000 then salary_bin ='1',
case when salary>=80000 and salary<=100000 then salary_bin ='2',
case when salary>=100000 and salary<=300000 then salary_bin ='3',
case when salary>=300000 and salary<=500000 then salary_bin ='4
from Worker

Error:

Incorrect syntax near '='.

Can someone please help?

0

1 Answer 1

9

Fix your case expression to use the correct syntax and remove the table references, because a computed column automatically references the table you are adding it to, and in fact cannot reference another table (unless you use a scaler function).

Also to make bins you should ensure no overlap between values i.e. you can't have a bin <= 100,000 and then next bin >= 100,000, because it will then fall in the first bin whereas you want the second. You need one side of the expression to be >= and the other side to be just <. And you would normally want an else to handle any higher values, say bin 5.

alter table dbo.worker 
add salary_bin as
case when salary >= 0 and salary < 80000 then 1
when salary >= 80000 and salary < 100000 then 2
when salary >= 100000 and salary < 300000 then 3
when salary >= 300000 and salary < 500000 then 4
else 5
end

Note: You don't quote numerical values.

Also because a case expression stops at the first match, you can simplify by reversing the order of your conditions to be biggest to smallest and then remove the lower bound check as follows:

alter table dbo.worker 
add salary_bin as
case when salary >= 300000 then 4
when salary >= 100000 then 3
when salary >= 80000 then 2
else 1
end

Note: this doesn't handle a bin higher than 500,000 since that wasn't specified, but can easily be extended to do so.

Further: When specifying bins/buckets, the expression "between" isn't precise enough because it doesn't specify whether the start and or end values are included or excluded in the range. One should always use "greater than", "greater than and equal", "less than", "less than and equal" to be completely clear which values a bin contains.

2
  • 1
    The numerical values in the WHENs probably don't need to be in single quotes (').
    – Thom A
    Commented Jul 24, 2021 at 10:47
  • 1
    Also, because only the first true clause would be returned you can actually just shorten this later WHENs to < 100000, < 300000, and < 500000 respectively.
    – Thom A
    Commented Jul 24, 2021 at 10:53

Not the answer you're looking for? Browse other questions tagged or ask your own question.