Finding Duplicate Records in MongoDB

Looking to find duplicate records in your database MongoDB ? In this article I will explain how to find the duplicate documents (records) using the aggregate method.

the database

Let's suppose we've massively loaded data from a list of users and we want to find out how many duplicate records there are with the same CPF.
We will use the following document structure as an example:

db.list.findOne();
{
    "_id" : ObjectId("8902a01b2ec12a2383328b61"),
    "name" : "Henrique Marques Fernandes",
    "site": "https://marquesfernandes.com",
    "city" : "SP",
    "cpf": "182.983.460-68"
}

Finding Duplicate Data with Aggregate

For this we will use the method aggregate with the operators $group and $match to group and filter our result, using the CPF field as a unique identifier and we'll add two new fields: The "idsUnicos" field containing all duplicate unique ids found and the "total" field adding the total number of duplicate documents found by CPF:

db.list.aggregate([  
    {$group: {
        _id: {cpf: "$cpf"},
        Unique ids: {$addToSet: "$_id"},
        total: {$sum: 1}
        }
    }
]);

The query above will return a list with all CPFs and their respective counts. Now to find and return only CPFs with one or more duplicate records, we must add the $match operator to filter only queries with more than one record in the total field:

db.list.aggregate([
    {$group: {
        _id: {cpf: "$cpf"},
        Unique ids: {$addToSet: "$_id"},
        total: {$sum: 1}
        }
    },
    {$match: { 
        total: {"$gt": 1}
        }
    }
]);

0
0
votos

Nota do Artigo

0 Comentários

newest

oldest most voted

Inline Feedbacks

View all comments

Written by: Henrique Marques Fernandes
Posted on: 29/07/2019

MF.

Finding Duplicate Records in MongoDB

the database

Finding Duplicate Data with Aggregate

Let's Talk

henrique@marquesfernandes.com