Build the first 6 letters of an Italian codice fiscale (tax identification number)

Question

Given a single string with a person's name and surname, output the first 6 characters of their codice fiscale.

Codice fiscale

Codice fiscale is a 16-character long alphanumeric string that identifies (almost uniquely) every Italian resident.

We are interested in the first six characters, which encode for surname and name, and are comprised of

three letters for the surname/family name, obtained by concatenating (until three letters are reached):
- the surname's consonants in order
- the surname's vowels (aeiou) in order
- the "X" character
three letters for the given name(s), obtained by concatenating (until three letters are reached):
- the name's consonants in order
  - but if there are four or more consonants, then the second is skipped (this exception doesn't apply to surnames)
- the name's vowels (aeiou) in order
- the "X" character

For instance, the codice fiscale of Ms. Coding Challenges is CHLCNG, while Dr. Code Golf has GLFCDO, Sir Stack Exchange has XCHSCK, and Mr. Nicola Sap has SPANCL.

Note: although it's always pronounced as a vowel in Italian, y is a consonant for the purposes of generating a codice fiscale.

Parsing the input

The input is a string made of ASCII alphabetic characters, spaces ( , at least one is guaranteed to be present), and possibly ' and -s. It is assumed to be stripped (trimmed), and to not contain consecutive spaces ( ). The challenge is case-insensitive, so you can choose to restrict inputs to only uppercase or only lowercase. Considered as a whole, both name and surname have at least two alphabetic letters (i.e. you won't need to append more than one X per part).

To determine name and surname (that are both non-empty and are always given in this order), we will assume that:

The last word (as defined by " "-separation) is part of the surname
If there are more than two words, and the last word did not contain 's or -s, and the second-to-last word is two characters or less, then it is also part of the surname
All other words are part of the given name(s).

Some example name/surname separations:

Pinco Pallino     Mario De Rossi     Carla Lu Di Mare
--n-- ---s---     --n-- ---s----     ---n---- ---s---

Charlotte Oz D'Angelo     Betta-Ty Verdi     Alex Max Meucci
-----n------ ----s---     ----n--- --s--     ----n--- --s---

Output

Any reasonable output is valid: a string, two strings (surname code THEN name code), a list/array of characters. Case doesn't matter.

However, as opposed to the input, the surname code must come before the name code.

Examples and reference implementation

Some of these examples make it clear that the heuristic I developed to separate name from surname doesn't always work in the real world (Lea Rossi Del Bo's surname is "Rossi Del Bo" in real life, but it's "Bo" for the purpose of the challenge). You should follow the heuristic.

A-ha-ha Risolini      , RSLHHA
Alex Max Meucci       , MCCLMX
Betta-Ty Verdi        , VRDBTT
Carla Lu Di Mare      , DMRCLL
Code Golf             , GLFCDO
Coding Challenges     , CHLCNG
Charlotte Oz D'Angelo , DNGCRL
Dante Aligheri        , LGHDNT
Dario Fo              , FOXDRA
Ennio De Giorgi       , DGRNNE
Filippo Brunelleschi  , BRNFPP
Gaetana Aulenti       , LNTGTN
Gerard 't Hooft       , THFGRD
Giulietto Giulietti   , GLTGTT
He La                 , LAXHEX
Jacopo De Barbari     , DBRJCP
Jacopo De' Barbari    , BRBJPD
Jo Zayner             , ZYNJOX
Lea Rossi Del Bo      , BOXLSS
L'Orfeo Di Monteverdi , DMNLRF
Luigi Mc Mario        , MCMLGU
Mario De Rossi        , DRSMRA
Menu a'-la-carte      , LCRMNE
Mona Lisa             , LSIMNO
Niccolo dell'Arca     , DLLNCL
Nicola Sap            , SPANCL
Oi Xy                 , XYXOIX
Pellegrino Artusi     , RTSPLG
Pinco Pallino         , PLLPNC
Salvo Montalbano      , MNTSLV
Stack Exchange        , XCHSCK
Sonia O'Neill         , NLLSNO
Tylynn Ng             , NGXTLY
Uy Li                 , LIXYUX
Xenia Silberberg      , SLBXNE

Reference implementation (ATO)

Score

This is code-golf, shortest code wins.

This feels quite chameleon-y, wouldn't the forenames and family name usually be given as two separate inputs? — Jonathan Allan, Commented Jun 8 at 16:37
@JonathanAllan the parsing problem and the encoding problem, on their own, didn't feel too interesting, so I combined them. — Nicola Sap, Commented Jun 8 at 16:43
Edge cases would be good. Forenames and/or surnames with only vowels or only consonants; multiple apostrophes, multiple dashes, and a mixture in the last "word". — Jonathan Allan, Commented Jun 8 at 16:44
@KevinCruijssen the challenge already says that you can ignore that case. "Considered as a whole, both name and surname have at least two alphabetic letters (i.e. you won't need to append more than one X per part)." — Nicola Sap, Commented Jun 10 at 9:04
@NicolaSap Ah, I read past that. That saves a byte in my answer, thanks. :) — Kevin Cruijssen, Commented Jun 10 at 9:08

Arnauld · Accepted Answer · 2024-06-09 13:46:19Z

JavaScript (ES6), 138 bytes

Expects the input in lowercase and returns a string in lowercase as well.

s=>(g=r=>(s.split(/ (.. ?\w+|\S+)$/)[+!r].replace(/[aeiou\W]/g,c=>(c>{}?v+=c:0,""),v="").replace(r,"")+v+"x").slice(0,3))()+g(/\B.(?=..)/)

Try it online!

Commented

s => (                  // s = input string
  g = r => (            // g is a helper function taking a regex r
    s.split(            // split the original string ...
      / (.. ?\w+|\S+)$/ // ... using a regex that matches the surname
    )[+!r]              // keep the name if r is defined,
                        // or the surname otherwise
    .replace(           // replace:
      /[aeiou\W]/g,     //   match all vowels and non-letter characters
      c => (            //   for each matched character c:
        c > {} ?        //     if c is a letter (i.e. a vowel):
          v += c        //       append it to v
        :               //     else:
          0,            //       do nothing
        ""              //     always remove the original character
      ),                //
      v = ""            //   start with v = "" (string of vowels)
    )                   // end of replace() -> only consonants remain
    .replace(r, "") +   // apply the regex r to conditionally remove
                        // the second consonant (for the name only)
    v + "x"             // append the vowels and a trailing "x"
  ).slice(0, 3)         // keep the first 3 characters
)() +                   // 1st call to g for the surname with no regex
g(                      // 2nd call to g for the name with a regex that
  /\B.(?=..)/           // matches the 2nd character if the 4th one is
)                       // defined

Neil · Accepted Answer · 2024-06-08 19:46:42Z

4

Retina 0.8.2, 94 92 bytes

(.+?) (..? \w+|\S+)$
$2¶$1
%`\W

%O$`([AEIOU])|.
$.1
(¶.).(.[^AEIOU])
$1$2
%`$
X
%M!`^...
¶

Try it online! Takes input in upper case but link is to test suite that uppercases input. Explanation:

(.+?) (..? \w+|\S+)$
$2¶$1

Extract the surname and given name. (Note: this assumes that there won't be two spaces in a row.)

%`\W

Remove all non-letters, but on each name separately, so that the newline doesn't get removed.

%O$`([AEIOU])|.
$.1

Sort the vowels to the end on each name separately.

(¶.).(.[^AEIOU])
$1$2

Remove the second consonant of the given name if there are at least four.

%`$
X
%M!`^...

For each name, take the first three letters, filling in with an X if there are only two.

¶

Join the codes of the surname and given name together.

Edit: Saved 2 bytes thanks to @Arnauld.

edited Jun 8 at 19:46

answered Jun 8 at 15:12

Neil

173k12 gold badges72 silver badges276 bronze badges

\$\begingroup\$ Is there any situation where (.+?) (.. ?\w+|\S+)$ would not work? \$\endgroup\$
– Arnauld
Commented Jun 8 at 17:56
\$\begingroup\$ @Arnauld So what you're saying is that I can change both of the first two \Ss to .s because the character before or after a space can never be a space? I guess that's reasonable... \$\endgroup\$
– Neil
Commented Jun 8 at 19:44
\$\begingroup\$ @Neil I clarified in the question that the assumption is reasonable. \$\endgroup\$
– Nicola Sap
Commented Jun 8 at 20:16

Add a comment |

Neil · Accepted Answer · 2024-06-08 22:53:09Z

Charcoal, 82 bytes

≔AEIOUθ≔⪪Ｓ η≔⟦⪫⮌Ｅ⊕∧⬤§η±¹№αι›³Ｌ§η±²⊟ηω⪫ηω⟧ηＵＭη⭆²Φι∧№αν⁼λ№θν⭆η…⁺Φι∨¬κ∨⊖μ№θ§⁺§η¹AA³X³

Try it online! Link is to verbose version of code. Takes input in upper case (could take in mixed case at a cost of a byte). Explanation:

≔AEIOUθ

Get the vowels in a variable as this saves a byte.

≔⪪Ｓ η

Input the name and split it on spaces.

≔⟦⪫⮌Ｅ⊕∧⬤§η±¹№αι›³Ｌ§η±²⊟ηω⪫ηω⟧η

Extract the surname and given name.

ＵＭη⭆²Φι∧№αν⁼λ№θν

For each name, sort the consonants to the start and the vowels to the end and ignore other characters.

⭆η…⁺Φι∨¬κ∨⊖μ№θ§⁺§η¹AA³X³

Remove the second consonant of four in the given name, then take the first three letters of each name, filling with X if necessary.

Nicola Sap · Accepted Answer · 2024-06-09 15:23:53Z

Python, 176 bytes

lambda F,K=lambda S,b="[\W":sub(b+"AEIOU]","",S):[(sub(f"(.){Q}.)",r"\1\2",K(P))+K(P,"[^")+"X")[:3]for P,Q in zip(split(" (.. ?\w+|\S+)$",F),[".(.","("])][::-1]
from re import*

Attempt This Online!

Expects uppercase input. Outputs a list of two strings: surname, then name.

Based on my previous answer (below) but treats name and surname in a loop rather than explicitly.

Indebted to Arnauld for the name/surname regexp / (.. ?\w+|\S+)$/. The "4 letters, skip the second" regexp /(.).(..)/ is also used by Neil.

Previous (with explanation):

Python, 180 bytes

from re import*
K=lambda S,b=0:sub("[[\^W"[b::2]+"AEIOU]","",S)+"X"*b
def f(F):
 n,s,_=split(" (.. ?\w+|\S+)$",F)
 return(K(s)+K(s,1))[:3]+(sub("(.).(..)",r"\1\2",K(n))+K(n,1))[:3]

Attempt This Online!

Expects uppercase input.

Explanation

from re import*                     # We will be using regexps.

K=lambda S,b=0:                     # Utility function. Takes a string and a 0/1 flag.
  sub("[[\^W"[b::2]+"AEIOU]","",S)  # If b=0, deletes vowels and symbols ([\WAEIOU]).
                                    # If b=1, deletes everything except vowels ([^AIEOU]).
  +"X"*b                            # If b=1, it also adds a "X".

def f(F):
 n,s,_=split(" (.. ?\w+|\S+)$",F)   # Arnauld's regexp splits name (n) and surname (s).
 return(K(s)+K(s,1)                 # Call K() on s, first for consonants (b=0), then for
                                    #  vowels and trailing "X" (b=1), and...
                   )[:3]            #  keep 3 chars.
  +(                                # Same for n, but, ...
    sub("(.).(..)",r"\1\2",K(n))    #  if K(n,0) contains any seq of 4 chars ("(X)Y(ZW)"),
                                    #  keep only the two groups in parentheses: "XZW".
    +K(n,1))[:3]

PS

Answering my own question (which is allowed, although waiting some time is advised). I only waited 8 hours because I didn't really get a head start (thought about this problem this morning), and I'm less proficient than other Python golfers so I don't believe this is going to be a winning Python entry.

In particular, I liked the regexp-based answers that have been posted so I wanted to try my own.

Kevin Cruijssen · Accepted Answer · 2024-06-10 09:16:44Z

05AB1E, 53 52 bytes

#R©¬a®g3@®1èg3‹**>®g‚£íðýlεáΣžMså}ÀDžNÃg4@N*.$Á'x«3£

Outputs as a lowercase pair of 3-char strings.

Try it online or verify all test cases.

Explanation:

Step 1: Split the input into first name and surname:

#               # Split the (implicit) input-string on spaces
 R              # Reverse this list
  ©             # Store it in variable `®` (without popping)
   ¬            # Push its first item (without popping the list)
    a           # Check if it's alphabetic (aka doesn't contain any "'-"-chars)
   ®            # Push list `®` again
    g           # Pop and push the length of this list
     3@         # Check if this list-length is >=3
   ®            # Push list `®` yet again
    1è          # Pop and leave just its second item
      g         # Pop and push the length of this string
       3‹       # Check if this string-length is <3
   **           # Combine all three checks
     >          # Increment this 0 or 1 to 1 or 2
      ®g        # Push the length of list `®` again
        ‚       # Pair the two together
         £      # Split the list into parts of those sizes
          í     # Reverse each inner list
           ðý   # Join each inner list with space delimiter

Try just step 1 online.

Step 2: Convert the surname and first name to the three letter Codice fiscale portions, and output the resulting pair:

l               # Convert everything to lowercase
 ε              # Map over the pair with surname and first name:
  á             #  Only leave the letters, removing " '-"-characters
   Σ            #  Sort these letters by:
    žM          #   Push the vowels(-y) constant string: "aeiou"
      s         #   Swap so the current letter is at the top
       å        #   Check if it's in this string
   }À           #  After the sort: rotate the string once towards the left
     D          #  Duplicate this string
      žNÃ       #  Only keep all consonants(+y) of this copy
         g      #  Pop and push the length of this consonants-string
          4@    #  Check if this consonants-length is >=4
            N*  #  Multiply it by the 0-based map-index
     .$         #  Remove that many leading characters
       Á        #  Rotate the string back once towards the right
        'x«    '#  Append "x"
           3£   #  Pop and keep the first three characters
                # (after which the result is output implicitly)

Themoonisacheese · Accepted Answer · 2024-06-10 10:30:04Z

Bash, 234 bytes

s=${@: -2:1}
l=${@: -1}
u=1
(($#>2&&`wc -m<<<$s`<4&&`grep -c [\'-]<<<$l`<1))&&l=$s$l&&u=2
f=${@:1:(($#-$u))}
v="[aeiou '-]"
p=[!aeiou]
c=${f//$v}
((`wc -m<<<$c`>4))&&c=${c:0:1}${c:2}
head -c3<<<${l//$v}${l//$p}x
head -c3<<<$c${f//$p}x

Attempt This Online!

🏆: broke syntax highlighting

Call as a bash function or script, with the name as arguments in lowercase. You may need to escape 's as otherwise bash will try parsing a string.
ATO link has automatic lowercase conversion for your convenience.

thanks to @GammaFunction for his entry into a previous challenge that helped me separate consonnants and vowels.

explanation:

# implicit: bash automatically separates arguments by whitespace, so calling `script a b c` is equivalent to calling script(a, b, c) in other languages
s=${@: -2:1}     # set s to the (s)econd to last argument
l=${@: -1}       # set l to the (l)ast argument
u=1              # u counts the numer of arguments that make up the last name, 1 by default
  $#>2           # there are more than 2 arguments
        `wc -m<<<$s`<4     # the second to last argument has 2 characters or less (wc counts a trailing char, and < is shorter than <=, so we use 4 instead of 2)
                        `grep -c [\'-]<<<$l`<1  # the last argument has no `'` or `-`
((    &&              &&                      ))  # if the 3 last expressions are all true...
                                                &&l=$s$l&&u=2 # then concatenate the last 2 arguments, making the last name, and set u to 2
f=${@:1:(($#-$u))} # set f to all arguments until u, making the (f)irst name
v="[aeiou '-]" #store a regular expression for later
p=[!aeiou]     #store another regular expression
c=${f//$v}     #set c to the (c)onsonnants in the first name, using the first regex.
((`wc -m<<<$c`>4))  #if there are more than 3 of them...
                  &&c=${c:0:1}${c:2}  #set c to c[0]+c[2:], skipping the second consonnant
           ${l//$v}${l//$p}x   #concat consonnants and vowels in the last name, and an additionnal x
head -c3<<<                    #trim the previous to 3 characters and output to stdout
head -c3<<<$c${f//$p}x         #do the same to the first name, but using $c, which may have the second consonnant missing

Jan Blumschein · Accepted Answer · 2024-06-12 21:25:20Z

1

sed -r, 125 bytes

Expecting all-uppercase input

s/ (..? \w+|\S+)$/_&/
s/\W//g
:1
s/([AEIOU])([^AEIOU_])/\2\1/
t1
/^[^AEIOU_]{4}/s/(.)./\1/
s/_.*/X&X/
s/(...).*_(...).*/\2\1/

Try it online!

sed -r, 135 bytes

Variant that accepts input in either case

s/./\U&/g
s/ (..? \w+|\S+)$/_&/
s/\W//g
:1
s/([AEIOU])([^AEIOU_])/\2\1/
t1
/^[^AEIOU_]{4}/s/(.)./\1/
s/_.*/X&X/
s/(...).*_(...).*/\2\1/

Try it online!

edited Jun 12 at 21:25

answered Jun 12 at 21:04

Jan Blumschein

1912 bronze badges

Add a comment |

Stack Exchange Network

Build the first 6 letters of an Italian codice fiscale (tax identification number)

Codice fiscale

Parsing the input

Output

Examples and reference implementation

Score

7 Answers 7

JavaScript (ES6), 138 bytes

Commented

Retina 0.8.2, 94 92 bytes

Charcoal, 82 bytes

Python, 176 bytes

Previous (with explanation):

Python, 180 bytes

Explanation

PS

05AB1E, 53 52 bytes

Bash, 234 bytes

sed -r, 125 bytes

sed -r, 135 bytes

Not the answer you're looking for? Browse other questions tagged
code-golf
string
parsing
or ask your own question.

Linked

Hot Network Questions

Build the first 6 letters of an Italian codice fiscale (tax identification number)

Codice fiscale

Parsing the input

Output

Examples and reference implementation

Score

7 Answers 7

JavaScript (ES6), 138 bytes

Commented

Retina 0.8.2, 94 92 bytes

Charcoal, 82 bytes

Python, 176 bytes

Previous (with explanation):

Python, 180 bytes

Explanation

PS

05AB1E, 53 52 bytes

Bash, 234 bytes

sed -r, 125 bytes

sed -r, 135 bytes

Not the answer you're looking for? Browse other questions tagged code-golfstringparsing or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
code-golf
string
parsing
or ask your own question.