Removing Elements from an Array Based on a Second Array

Question

I've written a function that takes two arrays of integers as input and removes all the elements in the first array that are also present in the second array. I would like to share my code with you and kindly request your feedback for potential improvements. Please find my code implementation below, and any suggestions or alternative solutions would be greatly appreciated.

def remove_(array1, array2):
    to_remove = []
    for i in array2:
        for j in array1:
            if i == j:
                to_remove.append(i)
    for r in to_remove:
        array1.remove(r)
    return to_remove

# Example usage:
array1 = [1, 2, 3, 4, 5]
array2 = [2, 4]
result = remove_(array1, array2)
print("Removed elements:", result)
print("Updated array1:", array1)

This function works by first iterating through both arrays and appending the elements that need to be removed to a new list called to_remove. It then iterates through the to_remove list and removes each element from array1. The function returns the removed elements.

I'm interested in suggestions for optimizing the code or making it more Pythonic.

\$\begingroup\$ You have used set before. Why did you not use set here? \$\endgroup\$
– Peilonrayz ♦
Commented May 10, 2023 at 22:48 — Peilonrayz, Commented May 10, 2023 at 22:48

FMc · Accepted Answer · 2023-05-11 17:24:25Z

When feasible, use specific names. Your function and variable names are generic. Help your reader out by selecting more communicative names. A second problem is that the current names are easily misread: array1 and array2 look similar when scanned quickly, and nothing intuitive links i to array1 and j to array2. Even in a fully generic context, one can do much better. For example, the following names are more compact, less easily confused, and more meaningful because single values are linked to their collection.

for x in xs:
    for y in ys:
       ...

Use sets to determine what should be removed [but see below]. Selecting appropriate data structures is one of the keys to successful programming. Before writing code, put some real thought behind each collection you need: should it be a tuple, list, dict, or set? Good choices tend to simplify coding logic greatly. By switching to sets in this example, most of the need for our own algorithm disappears.

Reconsider your function's design. There are many advantages that flow from building programs out of functions that don't modify their inputs. Do you really need to modify the original list, or would it be fine to return a new list? If possible, opt for the latter. A second drawback for your current approach is its co-mingling of mutation and return values. In Python, a close analogue for your function is list.remove(), which returns None. It could have returned true/false, but the language designers opted against that -- an approach often taken in Python. These considerations don't lead to hard and fast rules, but in the abstract I would favor a no-mutation approach.

# Mutation.
def remove_all(xs, discard):
    removed = set(xs).intersection(set(discard))
    xs[:] = [x for x in xs if x not in removed]
    return list(removed)

# No mutation.
def remove_all(xs, discard):
    removed = set(xs).intersection(set(discard))
    return (
        [x for x in xs if x not in removed],
        list(removed),
    )

Addendum: the code above does not do exactly what your code does. The set logic eliminates duplicates from removed. If you don't want to do that, you can adjust things along these lines.

# No mutation.
def remove_all(xs, discard):
    discard = set(discard)
    return (
        [x for x in xs if x not in discard],
        [x for x in xs if x in discard],
    )

Kate · Accepted Answer · 2023-05-10 23:16:05Z

This implementation is heavy and a naive approach. In Python you're going to use list comprehensions often. Especially for something that trivial. Rather than update a list in a loop, which is not so great in terms of performance, you would rather generate a new list simply. Don't bother tampering with the original list.

Accordingly, a one-liner could suffice, along these lines:

remaining_items = [item for item in array1 if item not in array2]

Or better yet, use sets eg:

array1 = {1, 2, 3, 4, 5}
array2 = {2, 4}
remaining_items = array1.difference(array2)

Do you really need anything more? When you think of it: two nested for loops just for that? That is already a red flag. What would happen if you had to match more than two sets/lists? Or combine word lists containing thousands of members? How would that approach scale? And you're appending items to a third list (to_remove) while at the same time removing items from array1, that's kinda crazy isn't it.

On a side note, the naming is unfortunate: remove_ is too generic and thus not meaningful. Function names should be more descriptive but in the present case a function is not warranted. A one-liner is enough and should speak for itself.

Suggested reading materials: Sets in Python

Unfortunately the code description is unclear on the required behaviour when duplicate elements appear in one or both lists, so we don't really know whether a set is the correct approach. It probably is, but if not, we might want to use a multiset, aka bag aka Counter instead. Perhaps you might mention that as a possibility (as well as to improve the function documentation!) — Toby Speight, Commented May 11, 2023 at 6:49

Stack Exchange Network

Removing Elements from an Array Based on a Second Array

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
algorithm
functional-programming
mathematics
or ask your own question.

Hot Network Questions

Removing Elements from an Array Based on a Second Array

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonalgorithmfunctional-programmingmathematics or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
algorithm
functional-programming
mathematics
or ask your own question.