1

In excel I have a column (actually multiple columns with ~30K rows) with each cell value looking something like this

7AA914BC, 898B70FB, 898B70FB, 15DD4C5B, 15DD4C5B, 98D2185E, 898BAC48, 98D2185E, 8CFB1468, 8CFB1468, 98C35520, 98C35520, 98C35520, 98D13F8C, 98D13F8C, 98D13F8C, B04680D5, B04680D5, AB2BD8A8, AB2BD8A8, AB2BD8A8, 898C00B0

The CSVs could be as many as 50. I want to extract unique values in the adjacent cell where the unique values will be separated by comma again.

I am aware of the Text to Columns and Remove Duplicates functions, but neither of these are quite what I want. I have searched for an answer to this question to no avail. If it exists somewhere on this site, I apologize and would appreciate a link to that thread.

Thanks in advance!

0

4 Answers 4

3

To do it with a formula it becomes a very convoluted array formula version of TEXTJOIN.

=TEXTJOIN(", ",TRUE,IF(MATCH(TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),(ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*999+1,999)),TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),(ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*999+1,999)),0)=ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1)),TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),(ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*999+1,999)),""))

Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting Edit mode.

This basically splits the string on the comma creating an array of each part, then it iterates those arrays using MATCH to confirm which is the first and returns An array of values(if the first) or "" to the TEXTJOIN. The TEXTJOIN ignores the "" and only returns the unique list.

TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),(ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*999+1,999))

Is the part that creates the array of values split on the ,

enter image description here


If one has the new Dynamic Array formulas we can use UNIQUE() and shorten the formula considerably:

=TEXTJOIN(", ",TRUE,UNIQUE(TRIM(MID(SUBSTITUTE(A1,",",REPT(" ",999)),(ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))-1)*999+1,999))))

enter image description here


VBA would be better suited for this.

Put this in a standard module attached to the workbook:

Function MyUniqueStr(str As String, delim As String)
    Dim dic As Object
    Set dic = CreateObject("Scripting.Dictionary")

    Dim strArr() As String
    strArr = Split(str, delim)

    Dim strPart As Variant
    For Each strPart In strArr
        On Error Resume Next
            dic.Add Trim(strPart), Trim(strPart)
        On Error GoTo 0
    Next strPart

    Dim temp As String
    temp = ""

    Dim key As Variant
    For Each key In dic
        temp = temp & key & delim
    Next key

    MyUniqueStr = Left(temp, Len(temp) - Len(delim))

End Function

Then one would simply use it like a normal formula:

=MyUniqueStr(A1,", ")

enter image description here

1
  • With 30K rows, that dic should probably be Static dic As Object with a check to see if Set dic = CreateObject("Scripting.Dictionary") needs to be applied else dic.removeall.
    – user385793
    Commented Nov 28, 2019 at 8:39
1

Or try this shorter, non-array and does not require CSE formula solution :

=TEXTJOIN(", ",1,INDEX(FILTERXML("<a><b>"&SUBSTITUTE(A1,", ","</b><b>")&"</b></a>","//b[not(preceding::*=.)]"),0))

TEXTJOIN is a new function available in Office 365

2
  • Still an array type formula, like SUMPRODUCT it is native, Index still returns an array. Because it is a native array formula it does not require the CSE. But shorter so good one. Commented Nov 28, 2019 at 0:05
  • The INDEX function is unneccessary. FILTERXML will return the desired array. You can use: =TEXTJOIN(", ",1,FILTERXML("<a><b>"&SUBSTITUTE(A1,", ","</b><b>")&"</b></a>","//b[not(preceding::*=.)]")). Commented Nov 29, 2019 at 2:10
1

For a formula-only solution, like bosco_yip's, UNIQUE() would be needed...

So:

=TEXTJOIN(",",TRUE,
                   UNIQUE(
                          FILTERXML("<Group><Elements>"&
                          SUBSTITUTE(A1,  ", ",  "</Elements><Elements>")&
                          "</Elements></Group>","/Group/Elements")
                          )
          )

As given without it, one would just get the original back, except for the delimiter being a bare comma, not comma-space.

1

No script needed. This formula assumes the last value in the cell does not end with a comma and the "~" character is not found in the original content.

=mid(substitute(concatenate(unique(split(substitute(A1&",",", ",",~"),"~",FALSE),TRUE)),",",", "),1,len(substitute(concatenate(unique(split(substitute(A1&",",", ",",~"),"~",FALSE),TRUE)),",",", "))-2)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .