0

Dataset

> use temp_index_issue
> db.createCollection("mycollection")
> for (let i=0; i<900000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1), "location": {"country": "India"}})};
> for (let i=900000; i<1000000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1), "location": {"country": "USA"}})};
> for (let i=1000000; i<1050000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1), "location": {"country": null}})};
> for (let i=1050000; i<1100000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1)})};

I would like to count number of documents where location.country is "India" or field does not exists or field has null value.

> db.mycollection.countDocuments({"location.country": {"$in": ["India", null]}})
1000000 (0.493 sec)

Now if I set index on "location.country" field, it's taking much more time.

> db.mycollection.createIndex({"location.country": 1})
> db.mycollection.countDocuments({"location.country": {"$in": ["India", null]}})
1000000 (1.42 sec)

What I'm missing here?

5
  • 1
    @D.SM , yes the explain plan show that mongo scans the index (if existing) but it does not answer the question why it is slower! Commented Mar 17, 2021 at 9:57
  • 1
    @WernfriedDomscheit yes correct. Indexing making it much slower. It should have optimised query time instead of making it slower. Commented Mar 17, 2021 at 10:47
  • Yes, but this is not trivial. In earlier days Oracle was using a "Rule Based Optimizer", which should be similar to MongoDB principles. The "Cost Based Optimizer" is much more complex, the (Cost-Based Oracle Fundamentals)[amazon.de/Cost-Based-Oracle-Fundamentals-Experts-Voice/dp/… book has 536 pages. It took Oracle 10 years to develop a useful Cost-Based Optimizer! Maybe read What are CBO and RBO in order to get an idea of it. So, I think we have to wait a few years till MongoDB will be sophisticated like this. Commented Mar 17, 2021 at 12:43
  • I think the problem lies in count operation how it uses index.
    – John
    Commented Mar 17, 2021 at 14:22

1 Answer 1

2

It's a common misunderstanding that an index is always faster.

Check the statistics with db.mycollection.stats(), here an extract:

{ 
    "ns" : "so.mycollection", 
    "size" : 825.0, 
    "count" : 11000.0, 
    "storageSize" : 172.0, 
    "nindexes" : 2.0, 
    "totalIndexSize" : 164.0, 
    "totalSize" : 336.0, 
    "indexSizes" : {
        "_id_" : 112.0, 
        "location.country_1" : 52.0
    }, 
    "scaleFactor" : 1024.0, 
}

Without index Mongo has to read all documents from disk, i.e. 172 kiB of data.

With index, Mongo first reads the index from disk, in this case app. 52 kiB * (10000/11000) = 47 kiB of data and then the actual documents, in this case app. 172 kiB * (10000/11000) = 156 kiB.

So, with an index Mongo reads totally 203 kiByte data from disk compared to 172 kiByte without an index.

An index on a field with low cardinality (cardinality is the number of distinct values in relation to number of all values) usually does not improve the performance.

This is not specific for MongoDB, it applies for any database. Exception are Bitmap-Indexes, which are optimized for low-cardinality fields. According to my knowledge from the major database systems only Oracle supports Bitmap-Indexes.

Most likely this query db.mycollection.find({ "location.country": { "$in": ["USA", null] } }) will be much faster as you probably expect.

Not the answer you're looking for? Browse other questions tagged or ask your own question.