0

I was wondering how to get random rows within a SQL query since the full query has over 10 Billion rows and would explode our servers.

How can I query a subset which is sampled in this query structure?

SELECT 
a,b,c
FROM test 
WHERE    
test.a= 123
AND test.b ILIKE '10008383825311900000' 
LIMIT 1000000
1

2 Answers 2

3

The canonical answer is to sort and use limit:

select t.*
from t
order by rand()
limit 100;

But do not do this! Instead, use rand() in a where clause. For a 1% sample:

select t.*
from t
where rand() < 0.01;

Random sampling methods in MySQL tend to require scanning the entire table, which is going to be expensive in your case.

EDIT:

To optimize your query, I would start by using = rather than ILIKE:

SELECT a, b, c
FROM test 
WHERE test.a = 123 AND
     test.b = '10008383825311900000' 
LIMIT 1000000;

You want an index on test(a, b, c).

1
  • I added my query and hope you can have a look at it and tell me how to restructure. Thank you
    – LaLaTi
    Commented Sep 13, 2019 at 3:54
0

Here's another answer.

select * from (
    select 
        a,b,c
        ,row_number() over (order by a) as rn
    from test 
    where     
        t1.a= 123
        AND t1.b ILIKE '10008383825311900000' 
        ) t1
     inner join 
         (select floor(rand()*100) as rn from test limit 1000000) t2 on t2.rn = t1.rn
1
  • I just updated and added my query. Could you take a look please and tell me how to incorporate your code?
    – LaLaTi
    Commented Sep 13, 2019 at 3:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.