5
\$\begingroup\$

I gave problem #22 on Project Euler a try in Python3.

Other than suggested in the instructions, I used requests to fetch the file to work on.

The code below does produce the correct result but takes no measures to catch exceptions if something breaks, e.g. when the download is not successful.

I managed to split a string, in which names in double quotes are separated by a comma and turn it to a list, but I'm wondering if there isn't any better or more pythonic way.

As far as the summation of int in a list is concerned, I'm aware that reduce from functools might be an alternative, but for my first contribution here, I decided to keep it (more) simple.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''Project Euler - Problem 22 - Name scores
URL: https://projecteuler.net/problem=22

Resource (names.txt): https://projecteuler.net/project/resources/p022_names.txt

Instructions: Using names.txt (right click and 'Save Link/Target As...'), 
a 46K text file containing over five-thousand first names, begin by 
sorting it into alphabetical order. Then working out the alphabetical value 
for each name, multiply this value by its alphabetical position in the list 
to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is 
worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. 
So, COLIN would obtain a score of 938 × 53 = 49714.

Assignment: What is the total of all the name scores in the file?
'''

import sys
import requests

URL = 'https://projecteuler.net/project/resources/p022_names.txt'

def fetch_names(url):
    '''Get a text file with names and return a sorted list

    Parameters
    ----------
    url : str  
        url of a text file at Project Euler.

    Returns
    -------
        alphabetically sorted list of first names
    '''

    r = requests.get(url)
    # r.text is a string with names
    # the names are in capital letters, enclosed in double quotes
    # and separated by commas
    names = [name.strip('"') for name in r.text.split(',')]
    return sorted(names)


def calculate_name_score(pos, name):
    ''' 
    Calculate the "name sore" of a name at a specific position in a list

    Parameters
    ----------
    pos : int 
        zero-based index of a name in a list
    name : str
        name in CAPS

    Returns
    -------
    name_score : int

    '''

    # letter scores: 'A' = 1, 'B' = 2, etc.!
    ch_scores = [ord(ch) - 64 for ch in name]
    score = sum(ch_scores)
    # Project Euler: index of the names starts with 1!!
    name_score = score * (pos + 1)
    return name_score


def calculate_total_score(names):
    '''Return a "name score" for a list of names

    Parameters
    ---------- 
        names : list (of strings)

    Returns
    -------
        total_score : int

    '''
    scores = [calculate_name_score(pos, name) for (pos, name) in enumerate(names)]
    total_score = sum(scores)
    return total_score    


def main():
    names  = fetch_names(URL)
    result = calculate_total_score(names)
    print(result)

if __name__ == '__main__':
    sys.exit(main())

I'm grateful for any improvement, regardless whether it concerns the structuring of code, naming, documentation, or anything you can think of.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ just curious came across Project Euler first time. Who prepares the problems on this site. I mean no info about creator of the site is given in its About page. Just curious. \$\endgroup\$
    – Mahesha999
    Commented Dec 31, 2015 at 13:57

3 Answers 3

6
\$\begingroup\$

Code looks pretty good. Nicely commented, well-structured, seems to do what it’s supposed to do. Just a few small suggestions from me:

  • Don’t use magic numbers. In this line, it’s not immediately obvious what 64 means:

    ch_scores = [ord(ch) - 64 for ch in name]
    

    If I think a little, and find that ord('A') = 65, I can work out what’s happening here. But it would be even clearer to write:

    ch_scores = [ord(char) - ord('A') + 1 for char in name]
    

  • Cache the results from ord()? You’ll be making a call to ord() at least once for every letter in every name; twice if you adopt my strategy. It might be better to precompute the scores in a global dictionary, e.g.:

    import string
    
    CHAR_SCORES = {char: ord(char) - ord('A') + 1
                   for char in string.ascii_uppercase}
    

    and then just look up scores in this dictionary. It probably won’t make much of a difference for this problem, but it’s something to bear in mind.

  • Use generator comprehensions unless you need a list. A generator comprehension is like a list comprehension, but instead of precomputing all the elements upfront and storing them in a list, it computes-and-yields one element at a time. That can have big savings in memory.

    Both of your list comprehensions get passed straight to sum(), which can just get one element at a time – it isn’t jumping around the list.

    The change is minimal – just replace the square brackets of a list comprehension with parentheses instead, i.e.

    scores = (calculate_name_score(pos, name) for (pos, name) in enumerate(names))
    
\$\endgroup\$
1
  • 1
    \$\begingroup\$ You may go further and don't call ord at all using enumerate(string.ascii_uppercase, start=1) \$\endgroup\$
    – RiaD
    Commented Dec 31, 2015 at 16:02
5
\$\begingroup\$

Your program is neat: concise, readable, following conventions and well documented. Good job. Everything I’ll say, except for sys.exit is more or less nitpicking or subject to personal taste.

Exiting the program

When the control flow reaches the end of the file, the program exits with a status of 0. The same occurs when calling sys.exit(0) or sys.exit(None). Since main does not have a return statement, it returns None. Thus the call to exit is not needed and you can remove it.

The last line should simply be:

    main()

Docstrings

The docstring for calculate_name_score contains a typo ("sore") and should start on the same line than the '''.

Generator expressions vs. list-comprehensions

You tend to build lists out of list-comprehensions as an intermediate step of an other computation (sorted, sum, ...). Since you do not re-use this list, it is more memory-friendly to use a generator expression directly in the last step. For instance:

sum(ord(ch) - 64 for ch in name)
sorted(name.strip('"') for name in r.text.split(','))

Constants

64 is kind of a magic number here. It would have been better to define

A_OFFSET = ord('A') - 1

at the top of the file and used that instead.

Separation of concerns

Given their names, I would have computed only the score for a name in calculate_name_score and compute it's product with its position in calculate_total_score.

Something along the lines of:

def calculate_name_score(name):
    # letter scores: 'A' = 1, 'B' = 2, etc.!
    return sum(ord(ch) - 64 for ch in name)    

def calculate_total_score(names):
    return sum(calculate_name_score(name) * (pos + 1) for (pos, name) in enumerate(names))

Even better, you can use the second (optional) argument of enumerate to avoid computing pos + 1 each time. You also don't really need the parenthesis around pos, name to define the tuple:

def calculate_total_score(names):
    return sum(calculate_name_score(name) * pos for pos, name in enumerate(names, 1))
\$\endgroup\$
1
  • \$\begingroup\$ I was totally unaware of the optional argument in enumerate. Thanks a lot! \$\endgroup\$ Commented Dec 31, 2015 at 10:56
2
\$\begingroup\$

Your code looks readable in general.

Avoid useless variables

score = sum(ch_scores)
# Project Euler: index of the names starts with 1!!
name_score = score * (pos + 1)
return name_score

Becomes:

return sum(ch_scores) * (pos + 1)

The single expression is more immediate to understand. The same applies to other parts of the code.

Static typing

I see you like static typing, instead of writing it in the docstring you should take advantage of the new typing module and function annotation.

\$\endgroup\$
3
  • 1
    \$\begingroup\$ Your remark on useless variables is very interesting! Actually, my return was exactly as you have suggested and I only introduced the additional variable to have something more catchy to refer to in the docstring :D But I totally agree that the additional variable is in fact useless. Am I right that typing refers to PEP-0484? That looks interesting, thanks a lot! \$\endgroup\$ Commented Dec 31, 2015 at 9:35
  • \$\begingroup\$ @KlausWarzecha Yes I meant pep 0484 \$\endgroup\$
    – Caridorc
    Commented Dec 31, 2015 at 9:43
  • \$\begingroup\$ I disagree, I think adding the extra variables makes it more clear, especially when debugging. \$\endgroup\$ Commented Dec 31, 2015 at 16:41

Not the answer you're looking for? Browse other questions tagged or ask your own question.