3
\$\begingroup\$

This is a simple exercise to find the count of digits of all numerals in a given base up to a given limit (not including the limit), I wrote it to determine the ratio of bytes saved when the numbers are stored as integers rather than strings.

The logic is simple, in base-10, there are 10 natural numbers that can be expressed with only one digit: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. To express ten, you need two decimal digits. To express one hundred, three digits are needed. There are exactly 90 two-digit numerals.

So the sequence of count of numerals with n decimal digits is 10, 90, 900, 9000 and so on.

In binary, the count of one-bit numerals is 2, the count of two-bit numerals is also 2, and the count of three-bit numerals is 4. In general, the count of n digit numerals in base b is Sn - Sn - 1, where Si = bi for all i > 0, and S0 = 0, i must be a natural number.

So I just calculated the count of numerals of each length category below the limit, and summed their product with their one-based index.

def digits_total(limit: int, base: int = 10) -> int:
    counts = []
    power = 1
    last = 0
    while (power := power * base) < limit:
        counts.append(power - last)
        last = power
    limit -= last
    counts.append(limit)
    return sum(i * e for i, e in enumerate(counts, start=1))


def bytes_over_digits(limit: int) -> float:
    n = 1 << (8 * limit)
    return digits_total(n, 256) / digits_total(n)

I calculated the ratio of count of bytes of natural numbers up to 280000 when they are stored as integers (255 -> 1 byte, 65535 -> 2 bytes, 16777215 -> 3 bytes...) to the count of bytes when they are stored as decimal strings ('256' -> 3 bytes), and the result is somewhat close to what I obtained using math:

In [289]: bytes_over_digits(10000)
Out[289]: 0.4152381307217551

In [290]: math.log(10, 256)
Out[290]: 0.41524101186092033

How can this be improved?

\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

To make the first function run faster, you can sum as you go instead of doing it all at the end:

def digits_total2(limit: int, base: int = 10) -> int:
    total_sum = 0
    last = 0
    power = base
    i = 1
    while power < limit:
        total_sum += (power - last) * i
        last = power
        power *= base
        i += 1
    return total_sum + (limit - last) * i

This converts the space complexity from O(log(limit, base = base)) to O(1). Over all limits in range(100000) and all bases in range(2, 1000), here is the difference:

original: 123.10s
revised: 52.56s

Also, your code is very clear (which is why I couldn’t think of any other improvements). There is likely a closed form with geometric series, but I expect exponentiation is computationally expensive unless you make your own just for this purpose. So more math wouldn’t necessarily speed this up!

\$\endgroup\$
1
  • 1
    \$\begingroup\$ I was expecting something better, but your method does improve the efficiency and after some analysis I determined that the algorithm can't be better. So your answer is ok. \$\endgroup\$ Commented Aug 2, 2023 at 15:23

Not the answer you're looking for? Browse other questions tagged or ask your own question.