3

When i run a select after a number of joins on my table I have an output of 2 columns and I want to select a distinct combination of col1 and col2 for the rowset returned.

the query that i run will be smthing like this:

select a.Col1,b.Col2 from a inner join b on b.Col4=a.Col3

now the output will be somewhat like this

Col1 Col2  
1   z  
2   z  
2   x  
2   y  
3   x  
3   x  
3   y  
4   a  
4   b  
5   b  
5   b  
6   c  
6   c  
6   d  

now I want the output should be something like follows

1  z  
2  y  
3  x  
4  a  
5  b  
6  d 

its ok if I pick the second column randomly as my query output is like a million rows and I really dnt think there will be a case where I will get Col1 and Col2 output to be same even if that is the case I can edit the value..

Can you please help me with the same.. I think basically the col3 needs to be a row number i guess and then i need to selct two cols bases on a random row number.. I dont know how do i transalte this to SQL

consider the case 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e now group by will give me all these results where as i want 1a and 2d or 1a and 2b. any such combination.

OK let me explain what im expecting:

with rs as(
select a.Col1,b.Col2,rownumber() as rowNumber from a inner join b on b.Col4=a.Col3)
select rs.Col1,rs.Col2 from rs where rs.rowNumber=Round( Rand() *100)

now I am not sure how do i get the rownumber or the random working correctly!!

Thanks in advance.

4 Answers 4

7

If you simply don't care what col2 value is returned

select a.Col1,MAX(b.Col2) AS Col2
from a inner join b on b.Col4=a.Col3 
GROUP BY a.Col1

If you do want a random value you could use the approach below.

 ;WITH T
     AS (SELECT a.Col1,
                b.Col2
                ROW_NUMBER() OVER (PARTITION BY a.Col1 ORDER BY (SELECT NEWID())
                ) AS RN
         FROM   a
                INNER JOIN b
                  ON b.Col4 = a.Col3)
SELECT Col1,
       Col2
FROM   T
WHERE  RN = 1  

Or alternatively use a CLR Aggregate function. This approach has the advantage that it eliminates the requirement to sort by partition, newid() an example implementation is below.

using System;
using System.Data.SqlTypes;
using System.IO;
using System.Security.Cryptography;
using Microsoft.SqlServer.Server;

[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = 8000)]
public struct Random : IBinarySerialize
{
    private MaxSoFar _maxSoFar;

    public void Init()
    {
    }

    public void Accumulate(SqlString value)
    {
        int rnd = GetRandom();
        if (!_maxSoFar.Initialised || (rnd > _maxSoFar.Rand))
            _maxSoFar = new MaxSoFar(value, rnd) {Rand = rnd, Value = value};
    }

    public void Merge(Random group)
    {
        if (_maxSoFar.Rand > group._maxSoFar.Rand)
        {
            _maxSoFar = group._maxSoFar;
        }
    }

    private static int GetRandom()
    {
        var buffer = new byte[4];

        new RNGCryptoServiceProvider().GetBytes(buffer);
        return BitConverter.ToInt32(buffer, 0);
    }

    public SqlString Terminate()
    {
        return _maxSoFar.Value;
    }

    #region Nested type: MaxSoFar

    private struct MaxSoFar
    {
        private SqlString _value;

        public MaxSoFar(SqlString value, int rand) : this()
        {
            Value = value;
            Rand = rand;
            Initialised = true;
        }

        public SqlString Value
        {
            get { return _value; }
            set
            {
                _value = value;
                IsNull = value.IsNull;
            }
        }

        public int Rand { get; set; }

        public bool Initialised { get; set; }
        public bool IsNull { get; set; }
    }

    #endregion


    #region IBinarySerialize Members

    public void Read(BinaryReader r)
    {
        _maxSoFar.Rand = r.ReadInt32();
        _maxSoFar.Initialised = r.ReadBoolean();
        _maxSoFar.IsNull = r.ReadBoolean();

        if (_maxSoFar.Initialised && !_maxSoFar.IsNull)
            _maxSoFar.Value = r.ReadString();
    }

    public void Write(BinaryWriter w)
    {
        w.Write(_maxSoFar.Rand);
        w.Write(_maxSoFar.Initialised);
        w.Write(_maxSoFar.IsNull);

        if (!_maxSoFar.IsNull)
            w.Write(_maxSoFar.Value.Value);
    }

    #endregion
}
7
  • @Asha - Max works on strings. It gives you the alphabetically last. What datatype is your column? Commented Mar 6, 2011 at 13:19
  • yup. I am aware that max works but then 1 a 1 b 1 z and 2 a 2 b 2 z will return 1 z and 2 z right
    – Asha B
    Commented Mar 6, 2011 at 13:22
  • @Asha - Yes. It sounds like my second answer might be more what you need then. Commented Mar 6, 2011 at 13:23
  • wait wat abt the rn i.e in the where clause??
    – Asha B
    Commented Mar 6, 2011 at 13:27
  • 1
    Can all the guys who participated in here upvote the answer.. I unfortunately cant.. :)
    – Asha B
    Commented Mar 6, 2011 at 13:44
3

You need to group by a.Col1 to get distinct by only a.Col1, then since b.Col2 is not included in the group you need to find a suitable aggregate function to reduce all values in the group to just one, MIN is good enough if you just want one of the values.

select a.Col1, MIN(b.Col2) as c2
from a 
inner join b on b.Col4=a.Col3
group by a.Col1
5
  • @Asha: MSDN: MAX: "MAX can be used with numeric, character, and datetime columns, but not with bit columns. Aggregate functions and subqueries are not permitted." Commented Mar 6, 2011 at 13:20
  • yup. I am aware that max works but then 1 a 1 b 1 z and 2 a 2 b 2 z will return 1 z and 2 z right
    – Asha B
    Commented Mar 6, 2011 at 13:22
  • @Asha: Yes, but you've mentioned that its ok if I pick the second column randomly as my query output is like a million r Commented Mar 6, 2011 at 13:23
  • yup.. I did say that but I dnt want all of them to return the same value right.. then i need to edit all my cols... I want the collision to be minimum which is the case when u use random
    – Asha B
    Commented Mar 6, 2011 at 13:25
  • @Asha B: MIN returns the alphabetically lowest value, if you really need a random value you need a CTE for this. Maybe this SO-Question helps you. Commented Mar 6, 2011 at 13:31
0

If I understand you correctly, you want to have one line for each combination in column 1 and 2. That can easily be done by using GROUP BY or DISTINCT for instance:

SELECT col1, col2

FROM Your Join

GROUP BY col1, col2

9
  • I want distinct combination of col1 and col2
    – Asha B
    Commented Mar 6, 2011 at 13:09
  • i dont want one line per distint combination.. consider the case 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e now group by will give me all these results where as i want 1a and 2d or 1a and 2b. any such combination.. the 2nd row can be randomly selected
    – Asha B
    Commented Mar 6, 2011 at 13:10
  • If you run the query you will get exactly one row for each combination in col1 and col2, that is not dumb, it is SQL 101. Isn´t that what you wanted @Asha?
    – bjorsig
    Commented Mar 6, 2011 at 13:12
  • @Asha: you're offending people you want to help you? What ananti-social behavior. Commented Mar 6, 2011 at 13:13
  • 1
    He doesn't want one row for each combination, he only wants one row for each value in Col1. I think the question with excepted results is clear enough and this answer clearly is wrong.
    – krtek
    Commented Mar 6, 2011 at 13:13
0

You must use a group by clause :

select a.Col1,b.Col2 
from a 
inner join b on b.Col4=a.Col3
group by a.Col1
4
  • 1
    You cannot select col1, col2 and only group by col1. You need to group by both
    – bjorsig
    Commented Mar 6, 2011 at 13:06
  • This works just fine with MySQL, don't know about tsql though.
    – krtek
    Commented Mar 6, 2011 at 13:10
  • MySQL is the only database that allows to specify an incomplete group by clause by simply returning non-deterministic results. See this link for an explanation why it's a bad idea to use it: rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html
    – user330315
    Commented Mar 6, 2011 at 13:16
  • I'm well aware that MySQL will choose the value for the second column in a non-deterministic way, but since the OP said "its ok if I pick the second column randomly", this will be no big deal ;)
    – krtek
    Commented Mar 6, 2011 at 13:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.