Python program to properly format North American phone numbers

Question

I made this program to properly format a North American phone number. If its a 10-digit phone number (9999999999) it adds a 1 in the beginning and adds in dashes (1-999-999-9999). If its 11 digits (with a 1 at the beginning, 19999999999) we add in dashes (1-999-999-9999). Otherwise, its not a valid North American phone number.

def NorthAmerica(phone_number):
    if phone_number.isdigit():
        phone_number = phone_number
    else:
        raise Exception("Error in setting phone number:\n     Phone number has to be numbers only")

    if len(phone_number) == 10:
        pass
    elif len(phone_number) == 11 and str(phone_number)[0] == '1':
        pass
    else:
        raise Exception("Error in formatting phone number as North American:\n     North American phone numbers have to be 10 digits, or 11 digits with the digit before the area code being a 1")
    
    number_list = list(phone_number)

    if len(number_list) == 10:
        number_list.insert(0, "1")
        number_list.insert(1, "-")
        number_list.insert(5, "-")
        number_list.insert(9, "-")
                
    elif len(number_list) == 11:
        number_list.insert(1, "-")
        number_list.insert(5, "-")
        number_list.insert(9, "-")

    final_number = "".join(number_list)

    return final_number

Any comments or ideas on how to prove it?

There is literally nothing accomplished by saying phone_number = phone_number — UpAndAdam, Commented Sep 28, 2023 at 18:30
You are calling str() on the input value. Presumably, that's meant to handle integer inputs. But the first line of the function would fail on integers. Just commit to str or str-like input. — JimmyJames, Commented Sep 28, 2023 at 19:30
Note that every so often you see letters in phone numbers (still seen on touchtone phone buttons). It used to be much more common, with your number being something like "MU8-1234" using the local prefix MUrrey-8. You have a number of assumptions built into your function. — Jon Custer, Commented Sep 29, 2023 at 17:51
OP - If you want "proper" formatting for US phone numbers its going to be something like (609) 555-1212. We don't use dashes except for separating the last two parts, and we typically only include the leading 1 under protest as it isn't part of someone's actual phone number. Its just something the phone company makes us key in sometimes — StingyJack, Commented Sep 30, 2023 at 7:46
To tack on to @JonCuster's point, there are some businesses that have advertise phone numbers that are longer than 10 characters. An example would be "1 (800) BUY MY JUNK". The actual number would only be the "1 (800) 289 6958" (BUY MYJU). This is OK for them to do because the last "65" (NK) is ignored when dialing. — StingyJack, Commented Sep 30, 2023 at 7:52

Reinderien · Accepted Answer · 2023-09-28 12:21:16Z

Your reference specification for this implementation should be based on something like Wikipedia's conventions for writing telephone numbers. NANP asks for (NPA) NXX-XXXX but Canada asks for 1-NPA-NXX-XXXX; some applications may also need a + trunk prefix. So if your format was chosen with this in mind, then fine; but if it was chosen based on assumption then this assumption might not hold true.

Don't raise a bare Exception; probably raise a ValueError (or an exception class of your own).

Your if;pass should be replaced with a single if not .... For long predicates you can surround them in parens and split them over multiple lines.

Rather than coercing to a list, inserting and then re-joining, I find it more natural to slice on the original string, something like

f'{phone_number[0]}-{phone_number[1:4]}- ' ...

I think this function would be more useful if it unconditionally chose one format: trunk code or no trunk code, instead of allowing the original format to be preserved.

Add unit tests; doctests are easiest.

Suggested

def format_canada(phone_number: str) -> str:
    """
    :param phone_number: String of 10 or 11 digits
    :return: Phone number string in form 1-XXX-XXX-XXXX

    >>> format_canada('2343456789')
    '1-234-345-6789'

    >>> format_canada('12345678910')
    '1-234-567-8910'
    """
    if not phone_number.isdigit():
        raise ValueError(f'Phone number "{phone_number}" has to be numbers only')

    if len(phone_number) == 10:
        phone_number = '1' + phone_number
    elif len(phone_number) == 11:
        if phone_number[0] != '1':
            raise ValueError(f'Invalid trunk code {phone_number[0]}')
    else:
        raise ValueError(f'Phone number "{phone_number}" has invalid length')

    return f'{phone_number[0]}-{phone_number[1:4]}-{phone_number[4:7]}-{phone_number[7:]}'

\$\begingroup\$ The f-string use is much nicer \$\endgroup\$
– qwr
Commented Sep 28, 2023 at 18:31 — qwr, Commented Sep 28, 2023 at 18:31

QuasiStellar · Accepted Answer · 2023-09-28 12:15:39Z

I don't know your exact use-case, but if you want to be able to format any type of input users may consider valid, your validation is too strict.

For example these are not valid numbers according to your code:

999-999-9999 +19999999999 +1-999-999-9999

since they contain symbols you're not expecting to find there.

One way to solve it is to simply remove all non-numeric symbols from the string before we process it:

''.join(char for char in phone_number if char.isdigit())

Or if you're a regular expressions enjoyer:

import re

re.sub('[^0-9]', '', phone_number)

This will consider 9999999999abcde a valid number, but do we really care? No one will be deliberately adding those symbols there as phone numbers never include them. If there are any extra symbols that shouldn't be there, we can tolerate them.

But if you want to deem such numbers as invalid anyway, we can remove only + and - symbols:

phone_number.replace('+', '').replace('-', '')

Or if you're a regular expressions enjoyer:

import re

re.sub('\+|-', '', phone_number)

Nitpicks:

Functions in python are named using snake_case, not CamelCase:

def north_america(...)

Functions' names (and any other names for that matter) should tell us what the purpose of the object is. In this case "north america" doesn't tell us anything - it's only obvious to the author of the code.

def parse_number_as_north_american(...)

Long names are not a problem as long as you can tell what these names actually mean.

final_number = "".join(number_list)

return final_number

can be shortened to

return "".join(number_list)

Creating a new name doesn't help us understand the code, since it's clear that the number we're returning is "final".

Mostly OK. But in your function naming you're conflating parsing and formatting, if they're to be done in the same step. If you really wanted to keep them in the same function, perhaps "normalize". — Reinderien, Commented Sep 28, 2023 at 11:54
"No one will be deliberately adding those symbols there as phone numbers never include them" Well, they actuallly did it ;) — Bob__, Commented Sep 29, 2023 at 7:50
I would check for/remove any non-digit character before parsing it; that would take care of most of the formatting concerns and also help weed out the "adding letters to the phone number" issue. You say people won't do that—it would be more correct to say that correct, rational, valid users wouldn't do that intentionally. If there's any chance that letters could get into the number (i.e. if it's ever exposed to users), you should probably assume it's possible that they will, and take the safest route: filter out non-digits, then just format/mask it however you like once you're done. — Aos Sidhe, Commented Sep 29, 2023 at 14:07

Stephan Samuel · Accepted Answer · 2023-09-29 14:28:32Z

If you are serious about spending significant resources to parse and manage phone numbers, your library should probably be a class or even module.

Better yet, use the existing library for this. However, your requirements may not allow this for various reasons.

Some items to consider for why:

Others have (convincingly) pointed out that you are conflating parsing and formatting. Have an object that can parse multiple formats to a common internal format, then format for output. This variously splits these two things for consumers of your library to use sensibly. Otherwise you hold them at your mercy in terms of (somewhat arbitrary) design decisions.
This pattern is common in Python, for instance pathlib versus os.path. One may use one-off string functions in os.path or manage multiple successive modifications with a pathlib.Path. You may not need as many manipulations (are there many that are applicable to phone numbers, that you plan to implement?) but another common example is the core library datetime.
Using this model allows the possibility of having an object that has been parsed (perhaps in a failure condition) that can be queried for validity. You have one semi-limbo state after your first contract check and before the second; with the current code it is fine but if your requirements change (specifically, if a requirement is added to allow non-numeric characters, which is not outlandish if you are parsing input from humans) you may end up having an indeterminate state between the first two blocks. Having an object allows you to manage this state gracefully.
Even if you aren't so fussy about that sort of state, you're missing a great number of subtleties both from Python and the parsing of NA phone numbers. To the former: does your function take a string or an int, or both? Your code (using .isdigit() up front) seems to require a string yet the string your require must be a number encoded as a string. What if you're just given the number because some upstream library realizes it's one and does a quiet conversion for you? To the latter: is 312-000-0000 a valid phone number? Your function thinks, "yes," but the actual answer is, "no." Using a class allows you to make a distinction between, "it looks like a valid phone number," and, "it's a valid phone number."
A class would allow you to manage the difference between American and Canadian formatting (as specified in another answer), as well as easily grow to other formats.

If you are not interested in providing richer functionality (variable parsing and formatting), such as if you really only need to figure out a single boolean yes/no as to whether or not a particular string may possibly represent a North American phone number, consider using a single regex, that you probably don't need to write yourself (or an SO example). This is an exceedingly well-solved problem that you have invented a new and, "creative," solution to.

If you absolutely must use this solution, consider writing test cases in a test framework:

import unittest

class TestNorthAmericaPhoneNumberValidator(unittest.TestCase):

    # These cases should all pass.
    HAPPY = [
        8305551212,   # Just an ordinary phone number.
        "2120000000"  # Looks ordinary, but completely invalid. Should pass though.
    ]

    # These cases should not pass.
    SAD = [
        None, 0, "", False, # Degenerate cases should fail.
        "800ABCDEFG",       # Actual real phone number, but not valid per our reqs.
        12345               # Not enough digits.
    ]

    def test_happy_paths(self) -> None:
        from [my_code_file] import NorthAmerica as _NA
        for test_case in self.HAPPY:
            with self.subTest(test_case):
                self.assertEqual(test_case, _NA(test_case))

    def test_sad_paths(self) -> None:
        from [my_code_file] import NorthAmerica as _NA
        for test_case in self.SAD:
            with self.subTest(test_case):
                try:
                    _NA(test_case)
                    self.fail(test_case)
                except:
                    pass

if __name__ == "__main__":
    unittest.main()

This is an extremely simplistic test framework for your function but it does allow you to add as many more tests and run the full set every time. Make sure you test for things like Unicode input, lots of bad formats, etc. When someone comes to you with a bug report, you can add a test case for it (always add, never remove) then fix around it.

Also: you ought to do all the nitpicks from @QuasiStellar's answer. They're politely framed as minor but they follow PEP 8. At my code reviews, I expect legitimate explanations for every departure from standard. The easiest way to pass my review is by simply always following the standard. If at all possible, use a linter that does SCA and is aware of PEP 8, then follow every recommendation it gives you. I can think of ways in which to code this functionality that may require a departure from standard, however it is highly unlikely that you should need even one single departure for anything in this domain.

Side note: I'm surprised that a little bit of Google searching didn't turn up any good, canonical lists of example passing and failing phone numbers to try on this function for automated testing. Either I don't know where to find these things or someone needs to write a website to collect these.

Stack Exchange Network

Python program to properly format North American phone numbers

3 Answers 3

Suggested

Nitpicks:

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.

Hot Network Questions

Python program to properly format North American phone numbers

3 Answers 3

Suggested

Nitpicks:

Not the answer you're looking for? Browse other questions tagged python or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.