46

Let's assume I have some data in Excel (and not in a real database). In one sheet, I have data, where one column functions as the ID, and I have made sure that the values in this column are unique. In another sheet, I also have some data, again with one column which can be taken as an ID, and it is also unique. If row N in Sheet 1 has some value, and row M in Sheet 2 has the same value, I am sure that row N and row M describe the same real-world object.

What I am asking: how can I get the equivalent of a full outer join without writing any macros? Formulas and all functions accessible through the ribbon are OK.

A small "play data" example:

Sheet 1:

Dostoyevski    Russia
Pushkin        Russia
Shelley        England
Flaubert       France
Hugo           France
Eichendorff    Germany
Byron          England
Zola           France

Sheet 2:

Shelley        Percy Bysshe
Eichendorff    Josef Freiherr Von
Flaubert       Gustave
Byron          Lord
Keller         Gottfried
Dostoyevski    Fyodor
Zola           Emile
Balzac         Honoré de

Desired output (sorting is not important):

Dostoyevski    Russia   Fyodor
Pushkin        Russia
Shelley        England  Percy Bysshe
Flaubert       France   Gustave
Hugo           France
Eichendorff    Germany  Josef Freiherr von
Byron          England  Lord
Zola           France   Emile
Keller                  Gottfried
Balzac                  Honoré de

To everybody who is horrified by this scenario: I know that this is The Wrong Way To Do It. If I have any choice, I would not use Excel for this. However, there are enough situations out there where a pragmatic solution is needed, stat, and a better (from IT point of view) solution cannot be applied.

4
  • If you sort both lists and put them side-by-side it should be fairly easy to manually match up the desired column. If you have a lot of data, you could write a function that takes two ranges and does this for you.
    – Dan
    Commented Jan 7, 2016 at 17:03
  • @Dan as I said, I am asking for doing this without writing a function. In situations where I can write it, the solution is trivial (at least for me, because I've written VBA before)
    – rumtscho
    Commented Jan 7, 2016 at 17:05
  • Excel supports SQL queries via Data connections. I'm terrible at it so I unfortunately can't offer much more advice but it may be worth looking into for you.
    – Kyle
    Commented Jan 7, 2016 at 19:33
  • @Kyle I added that option to my answer. It's not for the faint of heart and for this might be a mess to play and/or complete overkill with but hey, why not!
    – enderland
    Commented Jan 7, 2016 at 22:04

3 Answers 3

51
+100

Easy approach - standard Excel operations

First, copy/paste both key columns from both tables into a single, new sheet as a single column.

Use the "Remove Duplicates" get the single list of all your unique keys.

Then, add two columns (in this case), one for each of your data columns in each table. I recommend you use the format as table option too as it makes your formulas look much nicer. Using vlookup, use the following formula:

=IFERROR(VLOOKUP([@ID],Sheet4!A:B,2,FALSE),"")

Where Sheet4!A:B represents whatever the source table data table is for each respective value. The IFERROR prevents the ugly #N/A results which appear when vlookup is not successful and in this case return a blank cell.

This gives you your resulting table.


Sheet3:

enter image description here

Sheet4:

enter image description here

Result data:

enter image description here

Result formulas (Ctrl+~ will toggle this):

enter image description here


Built in SQL Query

You can also do this with the built-in SQL query. It's... much less user friendly, but maybe will be a better use case. This will likely require you to have formatted your "source" data as tables.

  1. Click on a cell in a new sheet
  2. Go to Data --> From Other Sources --> From Microsoft Query
  3. Select Excel Files* under the Databases tab and hit ok
  4. Select your workbook
  5. Select the following four fields:
    • enter image description here
  6. Click "next" and "ok" at the nice 1990s formatted warning you see
  7. Following these instructions create the first Left Outer Join. In my case I am using the "countries" table as the left source and the "names" as the right.
    • enter image description here
    • This only gives some of the rows (since you join on the ID)
  8. The "create a subtract join and then add it as union" part is more complicated..

    • Here is the subtract join configurations: enter image description here
    • Copy this join's SQL from the SQL button:
    • SELECT countries$.ID, countries$.Val1, names$.ID, names$.Val2 FROM {oj C:\Users\Username\Desktop\Book2.xlsx.countries$ countries$ LEFT OUTER JOIN C:\Users\Username\Desktop\Book2.xlsx.names$ names$ ON countries$.ID = names$.ID} WHERE (names$.ID Is Null)

  9. Go back to the first outer join you created. Manually edit the SQL and

    • add Union to the bottom
    • Add the above subtract join text to the bottom of the join
  10. Hit the "Return Data" button immediately to the left of the SQL button
    • You may want to edit the SQL to only select the specific data you want at this point. I find it easier to hide columns in the result.
  11. Place the Query somewhere and confirm it's location
    • enter image description here

Not for the faint of heart. But if you want a great chance to see some not-updated-as-long-as-you-might-have-been-alive parts of Office it's a great chance.

4
  • 1
    Do you have a reference for the [@ID] notation? I don't doubt that it works, but I've never seen it before and I do a lot of work in Excel.
    – T.J.L.
    Commented Jan 7, 2016 at 19:58
  • 1
    @T.J.L. if you format a table as a table, it'll use that sort of notation for referencing other columns. Much clearer than cell references. It might be a 2010+ feature? I'm not sure. I didn't use the magic of formatting as table enough prior to 2010..
    – enderland
    Commented Jan 7, 2016 at 20:05
  • support.office.com/en-us/article/…
    – Siphor
    Commented Jan 7, 2016 at 20:26
  • Can't you just change the word LEFT to the word FULL in the original query? I'm sure it's just using the ACE OLEDB driver to run that query, which supports FULL JOIN syntax.
    – Kyle Hale
    Commented Jan 8, 2016 at 3:56
19

As an alternative solution, may I suggest Power Query? It's a free Excel add-in from Microsoft for basically performing exactly this sort of thing. Its functionality will actually be directly included into Excel 2016 as well, so it's futureproofed.

Anyway, with Power Query, the steps are pretty simple:

  1. Import both tables as queries into the Power Query Editor.
  2. Perform a Merge Queries transformation on them, setting the appropriate join column and setting the join type as Full Outer.
  3. Load your result table into a new sheet.

The nice thing about this is once you've set this up, if you make changes to your base data tables you just hit Data > Refresh All and your Power Query result sheet gets updated as well.

7
  • Really interesting. A friend of mine that owns a restaurant was asking me about MS Office and showed me his 2016 suite, which doesn't include MS Access. I see now why that is. Excel had a lot of limitations. I never bought above 2003 because none of the features were of value and I didn't like the new look either; 2010 and above. Access does all of this and more so it begs the question, why are so many people trying to do Database work in Excel when Access is the way to achieve this? The only thing I can see is that perhaps to get a good Access DB working, it also needs good VB experience.
    – ejbytes
    Commented Jan 18, 2016 at 9:43
  • @ejbytes most people understand the concept of spreadsheets. Most do... not understand the concept of a database. I had a project converting a spreadsheet-as-database project to a database and the users still called it "the spreadsheet" - even after extensive explanations for how a database works!
    – enderland
    Commented Jan 18, 2016 at 14:54
  • @enderland I think you're right. A spreadsheet with non-rigorous rules and non-contiguous flexibility... and queries disguised as filters. A platform for make-shift tables and lists.
    – ejbytes
    Commented Jan 19, 2016 at 6:09
  • @ejbytes tldr version is MS wanted to test a columnstore memory engine in Excel to let in more rows, it included external data connections, it proved wildly popular, so they added in an ETL tool. Now they've rolled all the tools up as PowerBI.
    – Kyle Hale
    Commented Jan 19, 2016 at 15:05
  • @KyleHale I'm using Office 2010 and trying to do a FULL OUTER JOIN, but Power Query ignores IDs from the second table which are not in the first table. Commented Apr 23, 2016 at 20:56
2

A quick way to semi-simulate an (outer join) is to take your second list and paste it directly below your fist list, i.e. so your (primary keys) are all in the same (primary) column. Then Sort that primary column, you'll end up with an interleaved list then do a transposed IF equation (superuser is messing up table layout view..):

listA listB


sam blue tim 32874 tim red mary 5710 chris green gustav 047 fred blue
mary black

copy/paste/sort, result looks like this: A B chris green fred blue gustav 047 mary 5710 mary black sam blue tim 32874 tim red

then formula for cell c1: (c1)=if(A1=A2,B2)

result looks like this:

A B C chris green FALSE fred blue FALSE gustav 047 FALSE mary 5710 black mary black FALSE sam blue FALSE tim 32874 red tim red FALSE

Sort C to get rid of falses etc. That is a basic version of the idea, just expand it if more data neded. -wag770310

1
  • 1
    Can you format this in any way to make it readable? Commented Nov 13, 2019 at 7:27

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .