Compressed output of set in Python

Question

In order to provide a better output of sets with many consecutive elements (a similar class could be written for a list or tuple, but in this case I stay with a set), I wrote the following class, derived from a set. However, some inspiration has been found in the itertools and operator package documentation. I'm using the code to output ports assigned to a device under test in a test system.

import itertools
import operator


class PrettySet(set[int]):
    """ Subclassed set which redefines __str__ operator so that consecutive elements are compressed in the
        formatted string """
    @staticmethod
    def _seg_to_str(segment: list):
        return f"{segment[0]}..{segment[-1]}" if len(segment) > 2 else ", ".join(map(str, segment))
    def __str__(self):
        groups = itertools.groupby(enumerate(sorted(self)), lambda x: x[1] - x[0])
        segments = [list(map(operator.itemgetter(1), g)) for _, g in groups]
        return "{" + ", ".join(map(self._seg_to_str, segments)) + "}"

Example output:

> data = {1, 2, 3, 6, 7, 8, 10, 11, 14, 15, 16, 18, 20, 21, 22, 23, 24, 25}
> print(PrettySet(data))

> {1..3, 6..8, 10, 11, 14..16, 18, 20..25}

Please review my code in terms of

readability
usability
size

What would you do differently? Thank you!

AJNeufeld · Accepted Answer · 2023-01-23 04:26:37Z

Stop Writing Classes

See Stop writing classes by Jack Diederich. Short summary: if you instantiate a class, use the class once, and then throw it away, (eg, print(PrettySet(data))) you should just write a function.

def pretty_set(data: set[int]) -> str:
    """Create a string representation of a set of integers, with the integers
    in increasing order, and consecutive integers represented by a start value,
    two dots, and the end value.

    >>> pretty_set({1, 5, 12, 2, 3, 13, 6, 7})
    '{1..3, 5..7, 12, 13}'
    """
 
    ... implementation here ...

Minor nit

The empty set will return the string {} which appears to be an empty dictionary, not an empty set.

Readability

Your code needs blank lines before method definitions.

You need to use type hints consistently. In def _seg_to_str(segment: list):, segment should be list[int], and the return type should be -> str.

groups = itertools.groupby(enumerate(sorted(self)), lambda x: x[1] - x[0]) is very dense and opaque. It returns something like ((1, [(0, 1), (1, 2), (2, 3)]), (3, [(3, 6), (4, 7), (5, 8)]), (4, [(6, 10), (7, 11)]), (6, [(8, 14), (9, 15), (10, 16)]), (7, [(11, 18)]), (8, [(12, 20), (13, 21), (14, 22), (15, 23), (16, 24), (17, 25)])), which is an iterable of tuples of an int and a list of tuples of two ints. No comment in sight as to what those mean, so maintainability is low.

Next, you have [list(map(operator.itemgetter(1), g)) for _, g in groups], which is making a nested list of lists ... plus for _, g in groups is extracting the second item of elements of groups, and operator.itemgetter(1) is also extracting the second item of something. Again, no comment in sight.

Different implementation

The segments = [list(...) for ...] statement could be turned into a generator to prevent unnecessary construction of the outer loop, but the inner lists all get unnecessarily created. A lighter weight implementation could be made with a function that takes elements one-at-a-time, collect sequential elements, and yield string of either single elements or ranges. No tuples or lists need to be constructed.

Perhaps:

def pretty_set(data: set[int]) -> str:
    """Create a string representation of a set of integers, with the integers
    in increasing order, and consecutive integers represented by a start value,
    two dots, and the end value.

    >>> pretty_set({1, 5, 12, 2, 3, 13, 6, 7})
    '{1..3, 5..7, 12, 13}'
    """
    
    def segments():
        it = iter(sorted(data))
        start = end = next(it)
        for item in it:
            if item == end + 1:
                end = item
            else:
                if end > start + 1:
                    yield f"{start}..{end}"
                else:
                    yield str(start)
                    if start != end:
                        yield str(end)

                start = item
                end = item

        if end > start + 1:
            yield f"{start}..{end}"
        else:
            yield str(start)
            if start != end:
                yield str(end)

    if len(data) < 2:
        return str(data)
    
    return "{" + ", ".join(segments()) + "}"
            
data = {1, 2, 3, 6, 7, 8, 10, 11, 14, 15, 16, 18, 20, 21, 22, 23, 24, 25}
print(pretty_set(data))

It is longer, but perhaps it might be more understandable, and possibly a little more efficient.

When I first saw the original code I also thought this was yet another class that should actually be just a function. But in this case I see a real benefit of the class because you can pass the PrettySet instance around and it will pretty print everywhere it is printed. — M472, Commented Jan 23, 2023 at 8:51
The actual usecase of this class is to be used as a set. However, I needed some improved output for logging purposes to prevent sth like: WARNING:root:Something happened on ports {1, 2, 3, 4, 5, 6, 7, 8, ..., 20}. Therefor, I replaced the original set containing a subset of ports by the PrettySet class. This way I can log failing ports without thinking about output format on each and every logger.warning(f"blabla {failing_ports}"). I also watched the video, which makes some good points, but I think it's biased as it misses any discussion of the drawbacks of the procedural approach. — Looser User, Commented Jan 25, 2023 at 13:58
@LooserUser Based on the usage shown in the question as posted, with print(PrettySet(data)), the "Stop Writing Classes" feedback is correct. In future questions, post all relevant code (ie, both the utility and usages of the utility) to get feedback relevant to your real question instead of a toy example. — AJNeufeld, Commented Jan 25, 2023 at 23:27
I'm still not completely decided whether I use the class or the proc approach. I's not that I object the proc approach. It's just that I was confronted very often with repetitive code which spreads pod's and dict accesses all over and I wished people used more type sane and reusable code being it a set of classes or even procedures. Anyway, there's a drawback in my derived class I didn't see before. The intersection, union and such do not work as expected, returning the original set instead of the PrettySet. A proc would free me from the need to implement those overloads. — Looser User, Commented Jan 26, 2023 at 9:28

M472 · Accepted Answer · 2023-01-24 08:25:47Z

Useability

I think your code does a good job at outputting the information in a readable and concise way. I also see potential for reuseability in other code.

In my opinion, contrary to AJNeufeld's answer it also makes sense for the pretty set to be a class instead of a function because this enables you to pass a PrettySet around and have it pretty printed everywhere.

Typing

As AJNeufeld already pointed out in their answer there are some issues with your type annotations.

What I would like to add here is that the type checker mypy can detect these errors if you run it in strict mode:

$ mypy --strict pretty_set.py
pretty_set.py:37: error: Function is missing a return type annotation
pretty_set.py:37: error: Missing type parameters for generic type "list"
pretty_set.py:39: error: Function is missing a type annotation
Fund 3 errors in 1 file (checked 1 source file)

Readability

I strongly agree with AJNeufeld's answer that the first two lines of your __str__ methods where the groups and segments are constructed are hard to understand.

I would suggest to use itertools.pairwise instead to compare each element with its predecessor. This can then be used to check if the elements are consecutive or not.

import itertools


class PrettySet(set[int]):
    """Subclassed set which redefines __str__ operator so that consecutive
    elements are compressed in the formatted string"""

    def __str__(self) -> str:
        """Create a string representation of the set.

        Consecutive values are compressed, for example {1, 3, 4, 5, 7}
        is formatted as {1, 3..5, 7}.
        """

        parts: list[str] = []
        if self:

            def make_string(start: int, prev: int) -> str:
                return f"{prev}" if start == prev else f"{start}..{prev}"

            ordered = sorted(self)
            start = ordered[0]
            for prev, elem in itertools.pairwise(ordered):
                if elem - prev > 1:
                    parts.append(make_string(start, prev))
                    start = elem
            parts.append(make_string(start, ordered[-1]))

        return f"{{{', '.join(parts)}}}"

I think this code is easier to understand but I may have a bias because I wrote it.

I am also not sure if I find return f"{{{', '.join(parts)}}}" or return "{" + ", ".join(parts) + "}" more readble.

Size

I do not know what exactly you mean by reviewing the code with regard to "size", but I don't think number of characters, number of bytes, number of lines or anything like that is relevant in this case.

prev is not None is always true, so redundant. The assignment to prev is overwritten by the next iteration of the for prev, elem … loop, so is meaningless, and can be removed. — AJNeufeld, Commented Jan 24, 2023 at 5:03
You are right, these are artifacts from an earlier stage when I did not use itertools.pairwise. I edited the code accordingly. — M472, Commented Jan 24, 2023 at 8:26
For completeness, you might add a from_compressed() class method to reconstruct a set from the compressed string. — RootTwo, Commented Jan 25, 2023 at 3:57
I started to look how I could implement the suggestions given in this thread. When I implemented my PrettySet class I dealt with int set's. However, now I think a type hint like set[Ordered] would be more appropriate, so a set using complex numbers as keys would not suffice. I've seen some suggestions, here: stackoverflow.com/questions/37669222/…. However, it would add more boilerplate code for little benefit. So I would perhaps stay with class PrettySet(set)??? — Looser User, Commented Jan 25, 2023 at 14:34

Toby Speight · Accepted Answer · 2023-01-26 16:26:27Z

Here is my new implementation which adresses the following issues as suggested by AJNeufeld and M472. I just wanted to share this code. I'm not posting just to accept my own answer as a solution.

Readability: Using itertools.groupby is hard to read, therefore using itertools.pairwise
The custom set was implemented as a class called PrettySet so it can be used as a set and being passed around. However, the methods which should produce new PrettySet's from a combination or copy return sets instead. This surprising behaviour contradicts the original justification for writing a class. Therefore, the class has been replaced by a function which takes a normal set[int] as input. The overhead of implementing all those necessary overloads cannot be justified.
The code sections were properly separated by empty lines.
Intermediate creation of data structures (e.g. nested lists) has been avoided in order to format the string directly. This addresses memory usage and performance.
The corner case for formatting an empty set is solved by returning a string which cannot be confused with a dict.
Type issues have been checked using mypy.
The original implementation compressed a section of consecutive elements of length 3 and up. This was considered an implementation detail and was not taken into account here.

import itertools


def pretty_format_set(src_set: set[int]) -> str:
    """ Formats a set into a comma separated string compressing consecutive elements, for example
        {1, 3, 4, 5, 7} is formatted as {1, 3..5, 7}. """

    def make_string(range_start: int, range_end: int) -> str:
        """ Returns a string representation of an integer range, compressing consecutive values """
        return f"{range_start}" + ("" if range_start == range_end else f"..{range_end}")

    if src_set:
        formatted_str = "{"
        ordered = sorted(src_set)
        start = ordered[0]

        # Iterate over the src_set to find consecutive elements and format string on the fly
        for prev, elem in itertools.pairwise(ordered):
            if elem - prev > 1:
                formatted_str += make_string(start, prev) + ", "
                start = elem
        else:
            # Terminate the output after the last loop iteration
            formatted_str += make_string(start, ordered[-1]) + "}"

        return formatted_str

    return "{*[]}"

Usage:

test_array: list[set[int]] = [{1}, {1, 2}, {1, 2, 3}, {1, 3}, {1, 2, 4}, {1, 3, 4},
                              {1, 2, 3, 6, 7, 8, 10, 11, 14, 15, 16, 18, 20, 21, 22, 23, 24, 25}, {*[]}]

for data in test_array:
    print(pretty_format_set(data))

Output:

{1}
{1..2}
{1..3}
{1, 3}
{1..2, 4}
{1, 3..4}
{1..3, 6..8, 10..11, 14..16, 18, 20..25}
{*[]}

Type check:

user@l-user:~/.config/JetBrains/PyCharm2022.3/scratches$ python3 -m mypy --strict pretty_print_set2.py
Success: no issues found in 1 source file

Stack Exchange Network

Compressed output of set in Python

3 Answers 3

Stop Writing Classes

Minor nit

Readability

Different implementation

Useability

Typing

Readability

Size

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
set
or ask your own question.

Hot Network Questions

Compressed output of set in Python

3 Answers 3

Stop Writing Classes

Minor nit

Readability

Different implementation

Useability

Typing

Readability

Size

Not the answer you're looking for? Browse other questions tagged pythonpython-3.xset or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
set
or ask your own question.