17

I'm used to working with VLOOKUP but this time I have a challenge. I don't want the first matching value, but the last. How? (I'm working with LibreOffice Calc but an MS Excel solution ought to be equally useful.)

The reason is that I have two text columns with thousands of rows, let's say one is a list of transaction payees (Amazon, Ebay, employer, grocery store, etc.) and the other is a list of spending categories (wages, taxes, household, rent, etc.). Some transactions don't have the same spending category every time, and I want to grab the most recently used one. Note that the list is sorted by neither column (in fact by date), and I don't want to change the sort order.

What I have (excluding error handling) is the usual "first-match" formula:

=VLOOKUP( 
[payee field] , [payee+category range] , [index of category column] , 
0 )

I've seen solutions like this, but I get #DIV/0! errors:

=LOOKUP(2 , 1/( [payee range] = [search value] ) , [category range] )

The solution can be any formula, not necessarily VLOOKUP. I can also swap the payee/category columns around. Just no change in sorting column, please.


Bonus points for a solution that picks the most frequent value rather than the last!

7 Answers 7

4

You can use an array formula to get data from the last matching record.

=INDEX(IF($A$1:$A$20="c",$B$1:$B$20),MAX(IF($A$1:$A$20="c",ROW($A$1:$A$20))))

Enter the formula using Ctrl+Shift+Enter.

This works like the INDEX/MATCH construction of a VLOOKUP, but with a conditional MAX used instead of MATCH.

Note that this assumes that your table starts at row 1. If your data starts at a different row, you will need to adjust the ROW(...) part by subtracting the difference between the top row and 1.

7
  • I'm confused about that literal "c" - I'd think that the evaluation is always false, so what does it really do? Commented Jul 11, 2014 at 19:23
  • I've tested your suggestion (and checked that it was accepted as an array formula). I assume Col A is payee and B is category, right? Unfortunately, LibreOffice returns "ERR:502" which translates into "Invalid argument: Function argument is not valid. For example, a negative number for the SQRT() function, for this please use IMSQRT()". I checked that all the functions exist with that name in LibreOffice, but I wonder whether LibreOffice's IF can't handle arrays. Commented Jul 11, 2014 at 19:34
  • Sorry, the literal "c" was just the payee name you wanted to match. That was a relic from my sample data I was playing with. I assume that will be replaced with a cell reference in your sheet.
    – Excellll
    Commented Jul 11, 2014 at 20:21
  • @TorbenGundtofte-Bruun Care to share the formula you're using? I may be able to troubleshoot it if I can see it. Also, you can always try to step through the formula with Evaluate Formula to see which part of the formula is generating the error. This feature exists in Excel, and I'd be surprised if LibreOffice Calc doesn't have the same feature.
    – Excellll
    Commented Jul 11, 2014 at 20:25
  • My original formula is straightforward, that's why it's not adequate :-) =VLOOKUP(J1061;$J$2:$K$9999;2;0) where col J contains payees and col K the categories. It returns the first match as expected. Commented Jul 12, 2014 at 11:20
2

(Answering here as no separate question for sorted data.)

If the data were sorted, you could use VLOOKUP with the range_lookup argument TRUE (or omitted, since it's the default), which is officially described for Excel as "search for approximate match".

In other words, for sorted data:

  • setting the last argument to FALSE returns the first value, and
  • setting the last argument to TRUE returns the last value.

This is largely undocumented and obscure, but dates to VisiCalc (1979), and today holds at least in Microsoft Excel, LibreOffice Calc, and Google Sheets. It is ultimately due to the initial implementation of LOOKUP in VisiCalc (and thence VLOOKUP and HLOOKUP), when there was no fourth parameter. The value is found by binary search, using inclusive left bound and exclusive right bound (a common and elegant implementation), which results in this behavior.

Technically this means that one starts the search with the candidate interval [0, n), where n is the length of the array, and the loop invariant condition is that A[imin] <= key && key < A[imax] (the left bound is <= the target, the right bound, which starts one after the end, is > the target; to validate, either check values at endpoints before, or check result after), and successively bisecting and choosing whichever side preserves this invariant: by exclusion one side will, until you get to an interval with 1 term, [k, k+1), and the algorithm then returns k. This need not be an exact match (!): it's just the closest match from below. In case of duplicate matches, this results in returning the last match, as it requires that the next value be greater than the key (or the end of the array). In case of duplicates you need some behavior, and this is reasonable and easy to implement.

This behavior is stated explicitly in this old Microsoft Knowledge Base article (emphasis added): "XL: How to Return the First or Last Match in an Array" (Q214069):

You can use the LOOKUP() function to search for a value within an array of sorted data and return the corresponding value contained in that position within another array. If the lookup value is repeated within the array, it returns the last match encountered. This behavior is true for the VLOOKUP(), HLOOKUP(), and LOOKUP() functions.

Official documentation for some spreadsheets follow; in neither is the "last match" behavior stated, but it's implied in the Google Sheets documentation:

  • Microsoft Excel

    TRUE assumes the first column in the table is sorted either numerically or alphabetically, and will then search for the closest value.

  • Google Sheets:

    If is_sorted is TRUE or omitted, the nearest match (less than or equal to the search key) is returned

1
  • That nearest match thing was driving me crazy!
    – davetapley
    Commented Mar 20, 2018 at 21:42
1

If the values in the search array are sequential (i.e. you're looking for the largest value, such as the latest date), you don't even need to use the INDIRECT function. Try this simple code:

=MAX(IF($A$1:$A$20="c",$B$1:$B$20,)

Again, enter the formula using CTRL + SHIFT + ENTER

1

Just turn the table upside down.

Then the first matching item will really be the last item, as you desire.

How to turn it upside down? Well, I don't mean literally do that!!

(Today, in 2021, we'd just use XLOOKUP() and tell it to look last to first instead of first to last. But read on for a solution that would have worked nicely in 2014. Or 2004 or 1994.)

As you may or may not be aware, the INDEX() function can reorder columns in the range given to it so that the output has a column that was to the right of another column now on its left. Or let you select only some columns, or use one column more than once. It gives you that same opportunity with the rows. People seldom think about the rows though if they like Stephen Kind, they certainly do consider He Who Walks Behind The Rows...

Anyway, how to reverse the rows? The trick in 2014 for making sure all the rows in a range were present in the result (when one had to when leaving the value blank or "0" did not work, which was basically due to having used special selection for the columns) was to use the value ROW(1:1000) (if there were a thousand rows in the range). Variations on the theme existed, but they revolved around using calculations to make a string and INDIRECT() to make that something ROW() would use. I will show that in the formula that follows, but in 2021 we can just use SEQUENCE() to make the desired high-to-low sequence needed.

The problem with ROW(1:1000) being written as ROW(1000:1) trying to get a sequence going down 1 at a time from 1,000 to 1 (1,000, 999, 998, 997,...3, 2, 1) is that Excel cheerfully takes your "wrong" address range and "helpfully" fixes it for you so that it records as 1:1000, not 1000:1... no matter what you do.

But... you know how many rows there are. You gave INDEX() a range, so you can figure how many rows. Even if you gave it a Named Range. Since you can, you can add 1 to that value, and then subtract the ROW(1:1000) that Excel will accept. In this example, it would be 1,000 rows, so you use 1,001 like so:

1001 - ROW(1:1000)

so Excel calculates 1001-1 (1000), 1001-2 (999), 1001-3 (998), and so on. See how you get a reverse sequence?

Use that for the row parameter (second parameter) in your INDEX() function. Now your table is turned upside down.

Since you did this for the rows, you WILL have to specify the columns as well. Since you just have two columns, use the simple array constant {1,2} for that as you want them in that order. (Nowadays, you could use a (normal rising) SEQUENCE() for this as well.)

Now your whole table is reversed and you use this as the second parameter in your VLOOKUP(), the lookup range.

And the first value found is really the last because with the table reversed, you are looking last to first.

0

I had a go at the most frequent value. Not sure if it would work in libreOffice, but it seems to work in excel

=INDEX($B$2:$B$9,MATCH(MAX(--($A$2:$A$9=D2)*COUNTIFS($B$2:$B$9,$B$2:$B$9,$A$2:$A$9,D2)),--($A$2:$A$9=D2)*COUNTIFS($B$2:$B$9,$B$2:$B$9,$A$2:$A$9,D2),0))

Column A would be the payee, column B would be the category, D2 is the payee you want to filter by. I'm not sure why it's putting extra linebreaks in the function above.

My function to find the last cell would be as follows:

=INDIRECT("B" & MAX(--($A$2:$A$9=D2)*ROW($A$2:$A$9)))

Indirect lets me specify the column I want to return and find the row directly (so I don't need to subtract the number of header rows.

Both these functions need to be entered using Ctrl+shift+enter

0
=LOOKUP([payee field] , [payee range] , [category range])

This will get you the last value

Do I get bonus points for being 3 years late?

-1

You got #DIV/0! errors because you should rather write your formula like :

=LOOKUP(2;IF(([payee range] = [search value]);1;"");[category range])

this will work and will found last match.

([payee range] = [search value]) : boolean matrix TRUE/FALSE

IF(([payee range] = [search value]);1;"") : pseudo-boolean matrix 1/""

=LOOKUP(2; {pseudo-boolean matrix 1/""} );[category range]) : alway return last 1 position

1
  • LOOKUP works only on sorted list, the output of your comarison will result in a list of 1s and spaces in a non-sorted way, so it won't give correct result. Commented Sep 24, 2015 at 11:19

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .