(This comes from a real-world requirement.)
The idea is easy - if you have the three-letter abbreviated name of a month, say "Jan", get the numeric month number (for "Jan", 1). Sure, sure, pick a language with tables or hashes and this is simple and straightforward - but for the purposes of this exercise, we care about optimizing for number of characters compared.
For the naive approach, doing this in whatever language:
if (month == "Jan") then m=1;
else if (month == "Feb") then m=2;
else if (month == "Mar") then m=3;
$\vdots$
if we make the obvious assumption that a string comparison works left to right and stops at the first mismatch, then when we run this for each day of the year (that is, for all $365$ days: "Jan 1", "Jan 2", ..., "Dec 30", "Dec 31"), this will end up requiring 3328 comparisons, or 9.12 character comparisons on average. Surely we can do better!
So the challenge is: write the lowest cost abbreviation-to-number converter you can.
How low can you go?
My current best is $\bf{1423}$ for an average of $\bf{3.90}$ character comparisons per call.
Current leading answer (best Verified submission) without hash/table use is by @ffao
(also) with a score of $\bf{1423}$ for an average of $\bf{3.90}$ character comparisons per call.
— this is believed to be optimal.
Current Accepted answer (best Verified submission) using hash/table as per rules is by
@Stewart Becker with a score of $\bf{487}$ for an average of $\bf{1.33}$ character comparisons per call.
The specifics:
- Code gets executed in-order as written; optimizations done by a compiler don't count.
- Relative comparisons not allowed - you get equal or not-equal only. (This is an actual limitation of the real-life application.)
- Character arithmetic/conversion to integers is not allowed. (Also an actual limitation of the real-life application.) You have to work with the characters as themselves, sorry.
- String compares are done left to right, and stops either at the first mismatch or when all characters have been compared and matched; one comparison is charged per character actually tested. (Comparing
xyz
toabc
costs $1$ and is false; toxya
, $3$ and false; toxyz
, $3$ and true.) - You can use substrings or individual characters (C string-array indexing, Python slicing, perl/PHP substr(), etc.) - comparing substrings or single characters works the same as any other string compare.
- For our purposes, a table/hash lookup counts as a single string compare of your key against a (probably theoretical) index.
- Ignore leap years. There are 365 days in a year, and February has 28 days.
- Your code needs to take an input that is a month abbreviation (exactly one of these: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) and return the corresponding month number (1 through 12).
- Your code will be run 365 times, corresponding with each of the 365 days in a year, not necessarily in order. That is, 31 times it will be run with the input "Jan", 28 times with "Feb", 31 times with "Mar", 30 with "Apr", 31 with "May", 30 with "Jun", 31 with "Jul", 31 with "Aug", 30 with "Sep", 31 with "Oct", 30 with "Nov", and 31 with "Dec", in some random ordering.
- Your score is the total number of characters your code compares for all 365 invocations.
- Please provide your score and average comparison count with your answer!
- I'll verify your score.
- Best score gets the $\color{green}{\checkmark \small\text{Accept}}$!
______
Oh, and yes - this is a puzzle. Finding how to best do this is actually pretty interesting, and optimizing it even more so!