4
\$\begingroup\$

Simple tool I wrote in an hour or two (I'm not very fast, but I eventually get there - also for the "spending 6 hours to save 6 seconds" meme).

Would've written this in shellscript given that I'm calling external programs to do the actual measuring but shellscript doesn't math very well, and so I figured I might as well write the entire thing in Python so I can do the measuring and mathing in a single language (though I suppose it is arguable whether shelling out to call an external program still counts as "single language").

It may be a simple tool but I'm sure holes can still be poked in it. Tested on Python 3.11.9 (Cygwin provides the stat binary for Windows), but given its simplicity it probably works on any version (with string interpolation, but it should also be pretty simple to modify it to not require that).

#!/usr/bin/env python

import os
import subprocess
import sys
from typing import List

def main(args: List[str]) -> int:
    argc = len(args)
    if argc != 2 and argc != 3:
        print(f"Usage: {args[0]} [file]")
        print(f"       {args[0]} [oldfile] [newfile]")
        return 1

    if argc == 2:
        result = subprocess.run(f"stat {args[1]} --printf=%s", shell=True, capture_output=True)
        if result.returncode != 0:
            print(f"Bad file: {args[1]}")
            return 2
    
        print(f"{args[1]} is {int(result.stdout)} bytes")
        return 0

    if argc == 3:
        result_oldfile = subprocess.run(f"stat {args[1]} --printf=%s", shell=True, capture_output=True)
        if result_oldfile.returncode != 0:
            print(f"Bad file: {args[1]}")
            return 2
        result_newfile = subprocess.run(f"stat {args[2]} --printf=%s", shell=True, capture_output=True)
        if result_newfile.returncode != 0:
            print(f"Bad file: {args[2]}")
            return 2

        size_oldfile = int(result_oldfile.stdout)
        size_newfile = int(result_newfile.stdout)
        delta = size_newfile / size_oldfile * 100
        if delta > 100:
            print(f"{args[2]} is {round(delta - 100, 2)}% larger than {args[1]}")
        elif delta == 100:
            print(f"{args[2]} is the same size as {args[1]}")
        else:
            print(f"{args[2]} is {round(100 - delta, 2)}% smaller than {args[1]}")

        return 0

    return 3

if __name__ == "__main__":
    args = sys.argv
    args[0] = os.path.basename(args[0])
    sys.exit(main(args))
\$\endgroup\$
2
  • \$\begingroup\$ No need to call an external process. Discover os.stat \$\endgroup\$
    – vnp
    Commented May 29 at 4:30
  • \$\begingroup\$ @vnp Thanks, probably should've used that instead of shelling out, that would've been more efficient. \$\endgroup\$ Commented May 29 at 4:48

2 Answers 2

7
\$\begingroup\$

modern annotation

We used to have to do this:

from typing import List

But in a modern interpreter like 3.11 we prefer to just say

def main(args: list[str]) -> int:

(Note the lowercase "l".)

Anyway, thank you for the helpful annotations.

cracking argv

You're doing it the hard way.

    if argc != 2 and argc != 3:

Recommend you let typer sweat the details for you. Plus, it will offer --help to the hapless shell user.

stat()

        result = subprocess.run(f"stat {args[1]} --printf=%s", shell=True, capture_output=True)

You're definitely doing it the hard way.

First, use Path:

from pathlib import Path
...
        file = Path(args[1])

Now you're set up to ask questions like whether file.exists(). And most importantly, you can assign

        size = file.stat().st_size

ULP

It's not obvious to me that this always triggers when you want it to:

        elif delta == 100:

We did a FP divide, and a multiply to obtain that. Oh, wait! Not just any divide -- we only care about equality, and unity, FTW. Yes, this triggers when you want it to. I was going to suggest elif old == new:, but this computes the same thing.

However, in general, after certain FP operations such as divide, plan on obtaining a result which is \$\pm \epsilon\$ from what you thought you should be getting. So you might phrase it elif abs(delta - 100) < epsilon:, for some small \$\epsilon\$, maybe 1e-9.

Or as @Greedo observes, better still to use isclose(), which (approximately) defaults to that setting. The relative tolerance is usually what makes the most sense.

docstring

        return 0

    return 3

The four different statuses are very nice. But please give main() a """docstring""" which mentions what they mean.

\$\endgroup\$
1
  • 4
    \$\begingroup\$ math.isclose is more sane than < epsilon. It scales with the sizes of the files \$\endgroup\$
    – Greedo
    Commented May 29 at 7:20
2
\$\begingroup\$

Instead of manually writing a CLI with one mandatory argument and one optional argument, it's pretty easy to require one reference mandatory argument and zero or more arguments to compare.

Don't use subprocess when Python gives you filesystem tools, and don't bother traversing your own arguments and generating your own usage when argparse does that for you.

Your program will implode if the old file has 0 bytes.

Don't multiply by 100; use the % field format.

Don't round within a formatting field; use a dot-precision specifier.

Suggested

#!/usr/bin/env python3

import argparse
import pathlib
import sys
import typing


def get_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description='Compare file sizes')
    parser.add_argument('reference_file', type=pathlib.Path)
    parser.add_argument('other_files', type=pathlib.Path, nargs='*')
    return parser.parse_args()


class FileComparison(typing.NamedTuple):
    reference_file: pathlib.Path
    reference_size: int
    other_files: list[pathlib.Path]

    @classmethod
    def from_args(
        cls,
        reference_file: pathlib.Path,
        other_files: list[pathlib.Path],
    ) -> 'FileComparison':
        return cls(
            reference_file=reference_file,
            reference_size=reference_file.stat().st_size,
            other_files=other_files,
        )

    def dump(self, out: typing.TextIO = sys.stdout) -> None:
        print(f'{self.reference_file} is {self.reference_size} bytes', file=out)

        for other in self.other_files:
            try:
                size = other.stat().st_size
                print(f'{other} is {size} bytes', file=out)

                growth = size - self.reference_size
                if growth < 0:
                    print(f'{other} is {-growth/self.reference_size:.2%} smaller than {self.reference_file}', file=out)
                elif growth == 0:
                    print(f'{other} is the same size as {self.reference_file}', file=out)
                elif self.reference_size == 0:
                    print(f'{other} is larger than {self.reference_file}', file=out)
                else:
                    print(f'{other} is {growth/self.reference_size:.2%} larger than {self.reference_file}', file=out)
            except OSError as e:
                print(f'{other}: {e}', file=out)


def main(args: argparse.Namespace) -> None:
    try:
        comp = FileComparison.from_args(reference_file=args.reference_file, other_files=args.other_files)
    except OSError as e:
        print(e)
        sys.exit(2)
    comp.dump()


if __name__ == '__main__':
    main(get_args())
\$\endgroup\$
1
  • \$\begingroup\$ The % format is nice, didn't know that. TIL :) \$\endgroup\$ Commented May 30 at 13:01

Not the answer you're looking for? Browse other questions tagged or ask your own question.