15

I am stuck with a NAME field, which typically is in the format:

FirstName LastName

However, I also have the occasional names that are in any of these formats (with prefix or suffix):

Mr. First Last
First Last Jr.

What do people think is a safe way to split these into FIRST/LAST name variables in PHP? I can't really come up with anything that tends to work all of the time...

9
  • 1
    Explode by string. Skip elements that contain a period. The first accepted element is the first name; the second is the last name. Or do you want to preserve the prefix/suffixes? Commented Jan 10, 2012 at 19:00
  • 1
    no solution will be perfect, should of sorted this out in the original design. names can be one or more words, in any order, with any puncuation.
    – user557846
    Commented Jan 10, 2012 at 19:01
  • 3
    You'd be better off validating input before it got to your PHP script to prevent the problem occurring. There's not a perfect way to handle all cases after the fact. Commented Jan 10, 2012 at 19:01
  • 1
    This is a particulary hard problem, imagine 'Mr Vincent van Gogh' registering at your website, what would be the expected outcome?
    – TJHeuvel
    Commented Jan 10, 2012 at 19:01
  • 1
    For the benefit of anyone reading this, please read Falsehoods Programmers Believe About Names before implementing any solution to names that involves anything other than just a single open-ended "name" field.
    – Simba
    Commented Jun 5, 2015 at 16:22

16 Answers 16

24

A regex is the best way to handle something like this. Try this piece - it pulls out the prefix, first name, last name and suffix:

$array = array(
    'FirstName LastName',
    'Mr. First Last',
    'First Last Jr.',
    'Shaqueal O’neal',
    'D’angelo Hall',
);

foreach ($array as $name)
{
    $results = array();
    echo $name;
    preg_match('#^(\w+\.)?\s*([\'\’\w]+)\s+([\'\’\w]+)\s*(\w+\.?)?$#', $name, $results);
print_r($results);
}

The result comes out like this:

FirstName LastName
Array
(
    [0] => FirstName LastName
    [1] => 
    [2] => FirstName
    [3] => LastName
)
Mr. First Last
Array
(
    [0] => Mr. First Last
    [1] => Mr.
    [2] => First
    [3] => Last
)
First Last Jr.
Array
(
    [0] => First Last Jr.
    [1] => 
    [2] => First
    [3] => Last
    [4] => Jr.
)
shaqueal o’neal
Array
(
    [0] => shaqueal o’neal
    [1] => 
    [2] => shaqueal
    [3] => o’neal
)
d’angelo hall
Array
(
    [0] => d’angelo hall
    [1] => 
    [2] => d’angelo
    [3] => hall
)

etc…

so in the array $array[0] contains the entire string. $array[2] is always first name and $array[3] is always last name. $array[1] is prefix and $array[4] (not always set) is suffix. I also added code to handle both ' and ’ for names like Shaqueal O’neal and D’angelo Hall.

4
  • There are many cases where this doesn't work - see my answer below, especially for internationalization. Commented Jan 19, 2013 at 23:57
  • This is a good use of the 80/20 rule. Just be aware of the 20 for cases where that is not ok.
    – Jeff Davis
    Commented Oct 7, 2013 at 21:12
  • Brilliant. Thanks for posting this, as others have posited it's not perfect but it's often good enough. Commented Nov 16, 2013 at 21:26
  • Not Work if name like Mr. Jomon K J, But will work for name like Mr. Jomon Johnson Commented Dec 18, 2020 at 14:28
19

The accepted answer doesn't work for languages other than english, or names such as "Oscar de la Hoya".

Here's something I did that I think is utf-8 safe and works for all of those cases, building on the accepted answer's assumption that a prefix and suffix will have a period:

/**
 * splits single name string into salutation, first, last, suffix
 * 
 * @param string $name
 * @return array
 */
public static function doSplitName($name)
{
    $results = array();

    $r = explode(' ', $name);
    $size = count($r);

    //check first for period, assume salutation if so
    if (mb_strpos($r[0], '.') === false)
    {
        $results['salutation'] = '';
        $results['first'] = $r[0];
    }
    else
    {
        $results['salutation'] = $r[0];
        $results['first'] = $r[1];
    }

    //check last for period, assume suffix if so
    if (mb_strpos($r[$size - 1], '.') === false)
    {
        $results['suffix'] = '';
    }
    else
    {
        $results['suffix'] = $r[$size - 1];
    }

    //combine remains into last
    $start = ($results['salutation']) ? 2 : 1;
    $end = ($results['suffix']) ? $size - 2 : $size - 1;

    $last = '';
    for ($i = $start; $i <= $end; $i++)
    {
        $last .= ' '.$r[$i];
    }
    $results['last'] = trim($last);

    return $results;
}

Here's the phpunit test:

public function testDoSplitName()
{
    $array = array(
        'FirstName LastName',
        'Mr. First Last',
        'First Last Jr.',
        'Shaqueal O\'neal',
        'D’angelo Hall',
        'Václav Havel',
        'Oscar De La Hoya',
        'АБВГҐД ЂЃЕЀЁЄЖЗ', //cyrillic
        'דִּיש מַחֲזֹור', //yiddish
    );

    $assertions = array(
            array(
                    'salutation' => '',
                    'first' => 'FirstName',
                    'last' => 'LastName',
                    'suffix' => ''
                ),
            array(
                    'salutation' => 'Mr.',
                    'first' => 'First',
                    'last' => 'Last',
                    'suffix' => ''
                ),
            array(
                    'salutation' => '',
                    'first' => 'First',
                    'last' => 'Last',
                    'suffix' => 'Jr.'
                ),
            array(
                    'salutation' => '',
                    'first' => 'Shaqueal',
                    'last' => 'O\'neal',
                    'suffix' => ''
                ),
            array(
                    'salutation' => '',
                    'first' => 'D’angelo',
                    'last' => 'Hall',
                    'suffix' => ''
                ),
            array(
                    'salutation' => '',
                    'first' => 'Václav',
                    'last' => 'Havel',
                    'suffix' => ''
                ),
            array(
                    'salutation' => '',
                    'first' => 'Oscar',
                    'last' => 'De La Hoya',
                    'suffix' => ''
                ),
            array(
                    'salutation' => '',
                    'first' => 'АБВГҐД',
                    'last' => 'ЂЃЕЀЁЄЖЗ',
                    'suffix' => ''
                ),
            array(
                    'salutation' => '',
                    'first' => 'דִּיש',
                    'last' => 'מַחֲזֹור',
                    'suffix' => ''
                ),
        );

    foreach ($array as $key => $name)
    {
        $result = Customer::doSplitName($name);

        $this->assertEquals($assertions[$key], $result);
    }
}
1
  • What if the user enters First Middle Last, and the output expects "First Middle" separate from "Last"
    – 000
    Commented Feb 4, 2014 at 0:23
6

You won't find a safe way to solve this problem, not even a human can always tell which parts belong to the firstname and which belong to the lastname, especially when one of them contains several words like: Andrea Frank Gutenberg. The middle part Frank can be a second firstname or the lastname with a maiden name Gutenberg.

The best you can do is, to provide different input fields for firstname and lastname, and safe them separated in the database, you can avoid a lot of problems this way.

2
  • 2
    To be a pedant, that assumes that people have two parts to their name, which isn't always the case. kalzumeus.com/2010/06/17/…
    – dsas
    Commented Jan 18, 2014 at 2:01
  • @dsas - This is only an example to show one possible problem. Of course not every person has several names, but that doesn't help if you are writing a software that should be able to handle all possibilities. Commented Jan 18, 2014 at 11:56
4

Don't split names. Always store peoples’ names in full; if you want to use something shorter, add a “What should we call you?” field instead.

The reason: you cannot reliably split names. Different nations put their names in different orders anyway (e.g. in France, surname typically comes first; the same is true in some far-Eastern countries as well, but you can’t use language to detect that because emigres from those countries often interchanged their names to avoid confusion… but not all emigres.)

And some nations don’t have the expected name structure at all; e.g. in Russia and Iceland people still use patronynms rather than family names.

Even in English, there are people with double-barrelled surnames that don’t have hyphens; then there are people with Mac, Mc, De, de, Van, van and other prefix words as part of their name. It’s much better just to ignore the problem and ask more sensible questions.

If you're forced to split names for e.g. credit card processing, I'd go with something simple, like splitting at the last whitespace, rather than trying to be clever and get the split correct. It’s much more likely that the card company, if it does the splitting, will have used this naïve approach, and the goal there is to match their likely behaviour. Do complain about interfaces that only allow split names, though.

4

If you simply want to split the name by:

  • Everything up until the first "space" character as $firstName
  • Everything after the first "space" character as $lastName

you can use:

$firstName = substr($string, 0, strpos($string, ' '));
$lastName = substr($string, strlen($firstName));

It's not the most sophisticated or culturally sensitive method, but it's only two lines of code and can often get the job done on projects that don't require a high degree of precision name splitting.

1
  • 1
    It's definitely not the most elegant solution here among other solutions, but it's dead simple and effective - in some cases. I was searching for exactly this (not that I can't write it, I was just too lazy), because sometimes you only want to separate 1 field into 2 fields. +1
    – dev_masta
    Commented Nov 6, 2020 at 0:37
3

Great library here that so far has parsed names flawlessly: https://github.com/joshfraser/PHP-Name-Parser

1
2

Not a simple problem, and to a large extent your ability to get a workable solution depends on cultural "norms"

  1. First hive off any "honorifics" - using preg_replace eg.

     $normalized_name = preg_replace('/^(Mr\.*\sJustice|Mr\.*\s+|Mrs\.*\s+|Ms\.\s+|Dr\.*\s+|Justice|etc.)*(.*)$/is', '$2', trim($input_name));
    
  2. Next hive off any trailing suffixes

    $normalized_name = preg_replace('/^(.*)(Jr\.*|III|Phd\.*|Md\.)$/is', '$1', $normalized_name);
    
  3. Finally split at the first blank to get a first name and last name.

Obviously in "english" alone there are many possible honorifics, I couldn't think of too many suffixes but there's probably more than I listed.

2

There is another solution:

// First, just for safety make replacement '.' for '. '
$both = str_replace('.', '. ', $both);

// Now delete titles
$both = preg_replace('/[^ ]+\./', '', $both);

// Delete redundant spaces
$both = trim(str_replace('  ', ' ', $both));

// Explode
$split = explode(" ", $both, 2);
if( count($split) > 1 ) {
    list($name, $surname) = $split;
} else {
    $name = $split[0];
    $surname = '';
}
1

First you explode the FIRST/LAST, then you concatenate the prefix.

The example above:

Vicent van Gogh

The firstname is the first index of the array. What comes after the firstname, is/are the lastname, so you just need to get the rest of the array indexes.

After that, you concatenate the prefix/sufix.

Mr. Vicent van Gogh
Vicent van Gogh jr.

1
  • 1
    So the first name of "Jan Willem van Gogh" would be "Jan" and the last name "Willen van Gogh"? Commented Dec 10, 2014 at 9:58
0

If you have a database, i'd create a column called prefix and suffix. Then run a query to extract that portion from the text.

UPDATE names SET prefix = 'mr.' WHERE name LIKE 'mr. %'
UPDATE names SET name = substring(name, 4) WHERE name LIKE 'mr. %'

This way you can keep the different prefix in the database, it works like a charm cause it's a batch statement and you can add as many suffix or prefix to your scan as you like and it's not that long to build.

Then you can split on the first space after removing all prefixes and suffixes this way.

0

Assuming you don't care about the Mr. or Jr. part and that $text contains the name:

$textarray = explode(" ", $text);

foreach($textarray as $key => $value)
{
    if (preg_match("/\./", $value))
    {
        unset($text[$key]);
    }
}

$first_last = array_values($text);

$firstname = $first_last[0];
$lastname = $first_last[1];

$firstname will be the first name and $lastname will be the last name. Not the cleanest way to do it, but it's a possibility.

0

Another Solution:

function getFirstLastName($fullName) {
    $fullName = $firstLast = trim($fullName);
    if (preg_match('/\s/', $fullName)) {
        $first = mb_substr($fullName, 0, mb_strpos($fullName, " "));
        $last = mb_substr($fullName, -abs(mb_strpos(strrev($fullName), " ")));
        $firstLast = $first . " " . $last;
    }
    return $firstLast;
}

Hope that is useful!

0

I always suggest capturing as much independent data from the user, as possible, while only requiring data that is needed for functions to work properly. Using this method allows for multiple formatting and name construction scenarios.

Independently capturing the following fields, at the end-user level, will likely remove the need for parsing, or, at least, weed out parsing issues with special characters or split names, such as ... "St. John", "de la Hoya", and "Jr. III".

  • salutation  (e.g. Mr., Ms., Dr., etc.)
  • givenname  (e.g. John, Mary-Catherine, Mary Lou, etc.)
  • middlename  (e.g. Davis, Alysia-Anne, D'Marco, etc.)
  • surname  (e.g. de la Hoya, Smith-Peters, St. John, etc.)
  • suffix  (e.g. Sr., Jr., Jr. III, etc.)

Once captured, these names can be rearranged, constructed, or formatted dynamically as the programmer or end-user (option provided by programmer) sees fit.

0

This is my function using regular expression. Can be easily extends for another rules e.g. more academic titles

function names($name) {
    $replace = [
        '/[,:]/' => '',
        // Academic degrees Czech Republic
        '/(doc\.|Ing\.|Ph\.D\.|Bc\.|Dr\.|RNDr\.|PhDr\.|JUDr\.|MDDr\.|MVDr\.|DiS\.|Dr\.|prof\.)/i' => '',
        // Academic degrees USA
        '/(B\.A\.|B\.S\.|M\.A\.|M\.S\.|Ed\.D\.|Ph\.D\.)/i' => '',
        '/^(mr|mrs|mrs|miss|sr|sir)\.? /i' => '',
        '/ (jr|sr)\.?$/i' => '',
        // multi spaces, new lines etc.
        '/\s+/mu' => ' ',
    ];
    $n = preg_replace(array_keys($replace), $replace, trim($name));
    if (strpos($n, ' ') !== false) {
        $names = preg_split('/[\s,]+/', trim($n));
        return ['first' => reset($names), 'last' => end($names)];
    }
}

here is some test:

foreach (
    [
        'Robert Downey Jr.',
        'Billy Bob Thornton',
        'John O\'Shea',
        'Sir Nicholas George Winton',
        'Billy el Niño',
        'Mr. Bean',
        'Miss Eve Moneypenny',
        'Miss Moneypenny',
        'D’angelo Hall',
        'Garry        Longhurst    Spaces',
        'doc. Ing. Ota Plk, Ph.D.',
        'J. J. Abrams',
        'Bruce A Johnson',
    ] as $name
) {
    echo 'Name: ' . $name . PHP_EOL . var_export(names($name), true) . PHP_EOL . str_repeat('-', 35) . PHP_EOL;
}

and results:

Name: Robert Downey Jr.
array (
  'first' => 'Robert',
  'last' => 'Downey',
)
-----------------------------------
Name: Billy Bob Thornton
array (
  'first' => 'Billy',
  'last' => 'Thornton',
)
-----------------------------------
Name: John O'Shea
array (
  'first' => 'John',
  'last' => 'O\'Shea',
)
-----------------------------------
Name: Sir Nicholas George Winton
array (
  'first' => 'Nicholas',
  'last' => 'Winton',
)
-----------------------------------
Name: Billy el Niño
array (
  'first' => 'Billy',
  'last' => 'Niño',
)
-----------------------------------
Name: Mr. Bean
NULL
-----------------------------------
Name: Miss Eve Moneypenny
array (
  'first' => 'Eve',
  'last' => 'Moneypenny',
)
-----------------------------------
Name: Miss Moneypenny
NULL
-----------------------------------
Name: D’angelo Hall
array (
  'first' => 'D’angelo',
  'last' => 'Hall',
)
-----------------------------------
Name: Garry        Longhurst    Spaces
array (
  'first' => 'Garry',
  'last' => 'Spaces',
)
-----------------------------------
Name: doc. Ing. Ota Plk, Ph.D.
array (
  'first' => 'Ota',
  'last' => 'Plk',
)
-----------------------------------
Name: J. J. Abrams
array (
  'first' => 'J.',
  'last' => 'Abrams',
)
-----------------------------------
Name: Bruce A Johnson
array (
  'first' => 'Bruce',
  'last' => 'Johnson',
)
-----------------------------------
0

Hows about this one -- sans regex:

function explode_name($name)
{
    $honorifics = "Mr. Mister Mrs. Misses Ms. Miss Mademoiselle Mlle Madam Fräulein Justice Sir. Dr. Lady Lord";
    $lastname_prefixes = "Van Von Mc";
    $suffixes = "Sr. Snr. Jr. Jnr. I II III IV V PhD PhyD Ph.D. AB A.B. BA B.A. BE B.E. B.F.A. BS B.S. B.Sc. MS M.S. M.Sc. MFA M.F.A. MBA M.B.A. JD J.D. MD M.D. DO D.O. DC D.C. EdD Ed.D. D.Phil. DBA D.B.A. LLB L.L.B. LLM L.L.M. LLD L.L.D. CCNA OBE MMFT DMFT MSC MSW DSW MAPC MSEd LPsy LMFT LCSW LMHC LCMHC CMHC LMSW LPCC LPC LCPC LPC-S LCAT";
    $name_parts = explode(' ', $name);
    $name_array = ['honorific'=>'', 'first'=>'', 'middle'=>'', 'last'=>'', 'suffix'=>''];

    // Look for Honorifics
    if (stripos($honorifics, $name_parts[0]) !== false)
    {
        // Shift the honorific off the front of the name_parts array.
        // This also has the effect that the honorific isn't there to
        // confuse things later.
        $name_array['honorific'] = array_shift($name_parts);
    }

    // Look for name suffixes
    if (stripos($suffixes, $name_parts[count($name_parts)-1]) !== false)
    {
        // Pop the suffix off the end of the name_parts array, with the
        // added benifit that the suffix won't be there to muck things 
        // up later on.
        $name_array['suffix'] = array_pop($name_parts);
    }

    $num_parts = count($name_parts);

    if ($num_parts == 0)
    {
        $name_array['first'] = $name;
        return $name_array;
    }
    else if ($num_parts == 1)
    {
        $name_array['first'] = $name;
        return $name_array;
    }
    else if ($num_parts == 2)
    {
        $name_array['first'] = $name_parts[0];
        $name_array['last'] = $name_parts[1];
        return $name_array;
    }
    else if ($num_parts == 3)
    {
        // Well then, things are a bit more dodgy, what?
        if (stripos("LLC Inc Store", $name_parts[2]) !== false)
        {
            // Then we assume this ia a business name, so put it all in the
            // first name
            $name_array['first'] = $name;
            return $name_array;
        }
        else if (stripos($lastname_prefixes, $name_parts[1]) !== false)
        {
            // Assume the last two parts are all part of the last name (and
            // there's no middle name
            $name_array['first'] = $name_parts[0];
            $name_array['last'] = $name_parts[1].' '.$name_parts[2];
            return $name_array;            
        }
        else
        {
            // Assume it's a first, middle, last affair
            $name_array['first'] = $name_parts[0];
            $name_array['middle'] = $name_parts[1];
            $name_array['last'] = $name_parts[2];
            return $name_array;            
        }
    }
    else
    {
        if (stripos($lastname_prefixes, $name_parts[2]) !== false)
        {
            // Assume it's a first, middle, last with one of those two part
            // last names.
            $name_array['first'] = $name_parts[0];
            $name_array['middle'] = $name_parts[1];
            // Concantinate the rest (returning the stripped out spaces) 
            // into the last name.
            for ($i=2; $i<$num_parts; ++$i)
            {
                $name_array['last'] .= $name_parts[$i].' ';
            }
            trim($name_array['last']);  // Trim off that trailing space
            return $name_array;            
        }
        else
        {
            // Not sure what is going on, so just put it all in the first name!
            $name_array['first'] = $name;
            return $name_array;
        }
    }
}

Test Code:

<table>
<tr><th>Full Name</th><th>Honorific</th><th>First</th><th>Middle</th>
    <th>Last</th><th>Suffix</th></th></tr>

<?php

$names = [
    "Gorzik von Gribblesnatch",
    "Dr. Philip Plimpton",
    "Dr Phil Dorselfin",
    "Reginald Klompkite III",
    "Dumpquip Higganog PhD",
    "SlumpGlum Muganerk",
    "Mr. Poon Noon",
    "Sir Geldin Blotchflooper",
    "Betsy Burger MMFT",
    "Dr. Grodd Mc Doogle",
    "Dr. Wilken Mc Dermott II",
    "Karen Debbie Donk",
    "Ferg Fleerper Fiddlenonk IV",
    "Quinten K. Flonk",
    "Dr Klonk Xiggle Bronhopper PhD",
    "Dr Blenton Flupp Yonkflibber",
];

foreach ($names as $name)
{
    echo "<tr>\n";
    $name_ex = explode_name($name);
    echo "<td>$name</td><td>{$name_ex['honorific']}</td><td>{$name_ex['first']}</td><td>{$name_ex['middle']}</td><td>{$name_ex['last']}</td><td>{$name_ex['suffix']}</td>\n";
    echo "</tr>\n";
}
?>
</table>    

And Results:

    table {
        background-color: #ccc;
        border: 2px solid black;
    }
    td, th {
        padding: 4px 8px;
    }
    td {
        background-color: #0ff;
    }
<table>
            <tr><th>Full Name</th><th>Honorific</th><th>First</th><th>Middle</th><th>Last</th><th>Suffix</th></th></tr>

<tr>
<td>Gorzik von Gribblesnatch</td><td></td><td>Gorzik</td><td></td><td>von Gribblesnatch</td><td></td>
</tr>
<tr>
<td>Dr. Philip Plimpton</td><td>Dr.</td><td>Philip</td><td></td><td>Plimpton</td><td></td>
</tr>
<tr>
<td>Dr Phil Dorselfin</td><td>Dr</td><td>Phil</td><td></td><td>Dorselfin</td><td></td>
</tr>
<tr>
<td>Reginald Klompkite III</td><td></td><td>Reginald</td><td></td><td>Klompkite</td><td>III</td>
</tr>
<tr>
<td>Dumpquip Higganog PhD</td><td></td><td>Dumpquip</td><td></td><td>Higganog</td><td>PhD</td>
</tr>
<tr>
<td>SlumpGlum Muganerk</td><td></td><td>SlumpGlum</td><td></td><td>Muganerk</td><td></td>
</tr>
<tr>
<td>Mr. Poon Noon</td><td>Mr.</td><td>Poon</td><td></td><td>Noon</td><td></td>
</tr>
<tr>
<td>Sir Geldin Blotchflooper</td><td>Sir</td><td>Geldin</td><td></td><td>Blotchflooper</td><td></td>
</tr>
<tr>
<td>Betsy Burger MMFT</td><td></td><td>Betsy</td><td></td><td>Burger</td><td>MMFT</td>
</tr>
<tr>
<td>Dr. Grodd Mc Doogle</td><td>Dr.</td><td>Grodd</td><td></td><td>Mc Doogle</td><td></td>
</tr>
<tr>
<td>Dr. Wilken Mc Dermott II</td><td>Dr.</td><td>Wilken</td><td></td><td>Mc Dermott</td><td>II</td>
</tr>
<tr>
<td>Karen Debbie Donk</td><td></td><td>Karen</td><td>Debbie</td><td>Donk</td><td></td>
</tr>
<tr>
<td>Ferg Fleerper Fiddlenonk IV</td><td></td><td>Ferg</td><td>Fleerper</td><td>Fiddlenonk</td><td>IV</td>
</tr>
<tr>
<td>Quinten K. Flonk</td><td></td><td>Quinten</td><td>K.</td><td>Flonk</td><td></td>
</tr>
<tr>
<td>Dr Klonk Xiggle Bronhopper PhD</td><td>Dr</td><td>Klonk</td><td>Xiggle</td><td>Bronhopper</td><td>PhD</td>
</tr>
<tr>
<td>Dr Blenton Flupp Yonkflibber</td><td>Dr</td><td>Blenton</td><td>Flupp</td><td>Yonkflibber</td><td></td>
</tr>
        </table>

0

If your PHP version >=7.1, you can get the first_name and last_name using the array destructuring:

[$first_name, $last_name] = explode(' ', $full_name);

Not the answer you're looking for? Browse other questions tagged or ask your own question.