0

I have an SQL table which has two integers. Let these integers be a and b.

I want to SELECT out a random record, such that the record is selected with probability proportional to C + a/b for some constant C which I will choose.

So for example, if C = 0, and there are two records with a=1,b=2 and a=2,b=3, then we have that for the first record C+a/b = 1/2 and for the second record C+a/b = 2/3, and therefore with probability 0.3 I will choose the first record, and probability 0.7 I will choose the second record from that SELECT query.

I know SQL well (I thought), but I am not even sure where to begin here. I thought of doing a select for the "SUM(a/b)" first, and then doing a select for the first record the sum of C+a/b up to it exceeds a random number between C*number_of_records + SUM(a/b) for the first time. But, I don't really know how to do that.

5
  • I'd say this is a duplicate: stackoverflow.com/q/19412/341547 Unless you have something that's specifically different from that general discussion.
    – Mike Ryan
    Commented Mar 7, 2012 at 21:21
  • @MikeRyan I would say this is a different question, as the probability aspect brings a new factor into the question
    – ewok
    Commented Mar 7, 2012 at 21:23
  • @ewok -- ok, on a more thorough read, I think you're right -- though ultimately it will be extending those techniques.
    – Mike Ryan
    Commented Mar 7, 2012 at 21:25
  • How many rows might there be in the table? One possible tactic could get very slow if there are lots of rows (for arbitrary values of lots), while another option is awkward as heck to write, but would (?) only scan the table once. Commented Mar 7, 2012 at 21:52
  • ewok, no, this is not for homework. @Philip I think there would be a few hundreds of thousands rows in the database. Commented Mar 8, 2012 at 0:08

1 Answer 1

2

You could do something like sorting by a random number multiplied by your other stuff, and just select top 1 from that query - something like:

SELECT TOP 1 (your column names)
FROM (your table)
ORDER BY Rand() * (your calculation)

Not the answer you're looking for? Browse other questions tagged or ask your own question.