Finding Duplicate Records in MongoDB

Looking to find duplicate records in your database MongoDB ? In this article I will explain how to find the duplicate documents (records) using the aggregate method.

the database

Let’s suppose we’ve massively loaded data from a list of users and we want to find out how many duplicate records there are with the same CPF.
We will use the following document structure as an example:

db.list.findOne(); { “_id” : ObjectId(“8902a01b2ec12a2383328b61”), “name” : “Henrique Marques Fernandes”, “site”: “http://marquesfernandes.com”, “city” : “SP”, “cpf”: “182.983.460-68” }

Finding Duplicate Data with Aggregate

For this we will use the method aggregate with the operators $group and $match to group and filter our result, using the CPF field as a unique identifier and we’ll add two new fields: The “idsUnicos” field containing all duplicate unique ids found and the “total” field adding the total number of duplicate documents found by CPF:

db.list.aggregate([
{$group: { _id: {cpf: “$cpf”}, idsUnicos: {$addToSet: ”$_id”}, total: {$sum: 1} } } ]);

The query above will return a list with all CPFs and their respective counts. Now to find and return only CPFs with one or more duplicate records, we must add the $match operator to filter only queries with more than one record in the total field:

db.list.aggregate([ {$group: { _id: {cpf: “$cpf”}, idsUnicos: {$addToSet: ”$_id”}, total: {$sum: 1} } }, {$match: { total: {“$gt”: 1} } } ]);