Removing duplicates from the Attribute table by expression in QGIS

Question

I have some duplicates in my data attribute table, you can see below:

I used the following formula:

count(1, "Site ID") > 1

based on this question: Identifying duplicate attributes in field using QGIS

But as you can see I have just the duplicate values selected. When I click on any of them in order to delete it, the rest of the selected stuff disappears (is deselected).

I am wondering about the option, which would allow me to remove these duplicates by some expression.

A similar problem was considered in these threads:

where the Python approach was suggested.

I personally think, the delete duplicate features Removing overlapping/duplicate polygons in the same layer using QGIS is not the best option here, because I have to base my selection on just one column in the data attribute table. How can I sort it out?

Sorry, the question is not clear: what is your problem? You have a selection and don't know how to delete? Or you don't know how to generate an appropriate selection? Based on which fields exactly duplicates should be identiefied? — Babel, Commented Apr 4, 2022 at 9:31
I have the selection and I don't know how to delete just 1 record from 2. As you can see I have some of them doubled and I need one of them deleted for each case. I have them 27. — Geographos, Commented Apr 4, 2022 at 9:38
@Taras I am not the best at reading the references and here is the problem. If you could explain to me how to read these, I would be really grateful. — Geographos, Commented Apr 4, 2022 at 9:55
Please, do not forget about "What should I do when someone answers my question?" — Taras, Commented Dec 2, 2022 at 6:28

Taras · Accepted Answer · 2022-12-02 06:27:58Z

It is not an expression, but there is a tool in QGIS for deleting duplicates called "Delete duplicates by attribute".

Deletes duplicate rows by only considering the specified field / fields. The first matching row will be retained, and duplicates will be discarded.

Optionally, these duplicate records can be saved to a separate output for analysis.

Let's assume there is a polygon layer 'poly_test' with several duplicates, see the image below.

After applying the algorithm with these settings, where the "id" field was used as Fields to match duplicates by

The following output will appear

If there is a need to delete duplicates based on its data "poorness", proceed with the following workflow.

Let's assume there is a polygon layer 'poly_test' with several duplicates, see the image below.

Step 1. Create a field "Quality" using the following expression:

array_count(array("Data1", "Data2", "Data3"), '')

Step 2. Apply the "Extract by expression" with the following expression:

"Quality" = minimum("Quality", group_by:="id")

Step 3. Finally apply the "Delete duplicates by attribute" algorithm (with the "id" field as Fields to match duplicates by) and get the output like this:

Taras · Accepted Answer · 2022-12-02 07:44:30Z

3

Let's say you have duplicate values in field "value" and want to keep just one of the duplicate: the one with the smallest "id" value: use "Select by expression" with this expression:

array_contains (
    with_variable (
        'array',
        array_agg( 
            $id, 
            group_by:="Site ID", 
            order_by:=$id
        ),
        array_remove_all( 
            @array, 
            array_first( @array)
        )
    ),
    $id
)

edited Dec 2, 2022 at 7:44

Taras

33.2k4 gold badges68 silver badges139 bronze badges

answered Apr 4, 2022 at 9:52

Babel

73k14 gold badges80 silver badges219 bronze badges

I tried: array_contains ( with_variable ( 'array', array_agg( 'Site ID', group_by:='Site ID', order_by:='Site ID' ), array_remove_all( @array, array_first( @array) ) ), 'Site ID' ) and it selects nothing, as have the "false" in the result
– Geographos
Commented Apr 4, 2022 at 9:58
That's why I asked based on what you want to identify duplicates - it seems only based on Site Id. Try: array_contains ( with_variable ( 'array', array_agg( 'Site ID', group_by:='Site ID', order_by:=$id ), array_remove_all( @array, array_first( @array) ) ), $id)
– Babel
Commented Apr 4, 2022 at 10:45
Still no reaction, I have even created the id column not existing before, but the result is the same.
– Geographos
Commented Apr 4, 2022 at 11:02
Sorry, did not have access to QGIS when I wrote my last comment. I mean: take my initial expression and replace all id by $id and value by Site Id, thus the expression should look like: array_contains ( with_variable ( 'array', array_agg( $id, group_by:=Site ID, order_by:=$id ), array_remove_all( @array, array_first( @array) ) ), $id )
– Babel
Commented Apr 4, 2022 at 11:39

Add a comment |

Stack Exchange Network

Removing duplicates from the Attribute table by expression in QGIS

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
qgis
attribute-table
delete
duplication
qgsexpression
or ask your own question.

Linked

Hot Network Questions

Removing duplicates from the Attribute table by expression in QGIS

2 Answers 2

Not the answer you're looking for? Browse other questions tagged qgisattribute-tabledeleteduplicationqgsexpression or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
qgis
attribute-table
delete
duplication
qgsexpression
or ask your own question.