In this recipe, we will look at creating unique indexes on a collection. Unique indexes, from the name itself, tell us that the value with which the index is created has to be unique. What if the collection already has data and we want to create a unique index on a field whose value is not unique in the existing data?
Obviously, we cannot create the index, and it will fail. There is, however, a way to drop the duplicates and create the index. Curious how this can be achieved? Yes? Keep reading this recipe.
For this recipe, we will create a collection called userDetails
. We will need the server to be up and running. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to learn how to start the server. Also, start the shell with the UniqueIndexData.js
script loaded. This script will be available on the book's website for download. To find out how to start the shell with a script reloaded, refer to the Connecting to a single node from the Mongo shell with a preloaded JavaScript recipe in Chapter 1, Installing and Starting the MongoDB Server.
loadUserDetailsData
method.> loadUserDetailsData()
> db.userDetails.count()
login
field on the userDetails
collection:> db.userDetails.ensureIndex({login:1}, {unique:true})
{ "err" : "E11000 duplicate key error index: test.userDetails.$login_1 dup key: { : "bander" }", "code" : 11000, "n" : 0, "connectionId" : 6, "ok" : 1 }
> db.userDetails.ensureIndex({login:1}, {unique:true, dropDups:true})
> db.userDetails.count()
> db.userDetails.find({login:'mtaylo'}).explain()
We initially loaded our collection with 100 documents using the loadUserDetailsData
function from the UniqueIndexData.js
file. We looped 100 times and loaded the same data over and over again. Thus, we got duplicate documents.
We will then try to create a unique index on the login
field in the userDetails
collection as follows:
> db.userDetails.ensureIndex({login:1}, {unique:true})
This creation fails and indicates the duplicate key it first encountered on index creation. It is bander
in this case. Can you guess why an error was first encountered for this user ID? This is not even the first ID we saw in the loaded data.
In such a scenario, we are left with two options:
{unique:true}
option used to create a unique index, we will provide an additional dropDups:true
option (or dropDups:1
if you wish) that will blindly delete all the duplicate data it encounters during index creation. Note that there is no guarantee of which document will be retained and which one will be deleted, but just one will be retained. In this case, there are 20 unique login IDs. On unique index creation, if the value of the login ID is not already present in the index, it will be added. Subsequently, when the login ID encountered is already present in the index, the corresponding document is deleted from the collection; this explains why we were left with just 20 documents in the userDetails
collection.