1

I have windows 10 on my system with enough storage.

I have a database that is 208 GB in a file with the .agz extension.

When I import the database into my MongoDB, I face the following error:

2021-09-12T20:00:49.930+0430 2021-09-12T20:00:52.622+0430 smartshark_2_1.clone_instance 383GB 2021-09-12T20:00:52.622+0430 
finished restoring smartshark_2_1.clone_instance (989924000 documents, 0 failures) 2021-09-12T20:00:52.622+0430 
Failed: smartshark_2_1.clone_instance: error restoring from archive 'D:\MSRChallenge2022\smartshark_2_1.agz': (InvalidBSON) incorrect BSON length in element with field name 'clone_class_metrics.CE' in object with _id: ObjectId('5cbad340504acf99a43e3724') 2021-09-12T20:00:52.622+0430 
989924000 document(s) restored successfully. 0 document(s) failed to restore.

This error appears after I imported 383 GB of data from the database.

To import the database, I wrote mongorestore --gzip --archive=D:\my-directory\smartshark_2_1.agz in the cmd.

smartshark_2_1.agz is my database.

How can I fix the error? I downloaded the database from the following link: https://smartshark.github.io/dbreleases/

1 Answer 1

0

You have hit the maximum BSON document size which is 16 megabytes, but you have successfully imported close to a billion documents.

You might try to download the data again, in case you had a corrupted document, or you might just ignore the error, if the database seems correct. One record among almost a billion doesn't seem like much.

If you truly need to store documents (or files) larger than 16MB, you can use the GridFS API which will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)

Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

You can use this method to store images, files, videos, etc in the database with basically no limit on the size.

3
  • Thank you for the answer. But how can I import the dataset into MongoDB using GridFS without unzipping the dataset, because with unzipping it is large-scale; and I don't have enough storage for the unzipped dataset.
    – Balive13
    Commented Sep 13, 2021 at 19:13
  • It's either that or just ignoring the problem.
    – harrymc
    Commented Sep 13, 2021 at 19:15
  • Thank you so much for your answer. It worked for me.
    – Balive13
    Commented Sep 26, 2021 at 14:41

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .