19

I imported some sort-of sloppy XML data into a Mongo database. Each Document has nested sub-documents to a depth of around 5-10. I would like to find() documents that have a particular value of a particular field, where the field may appear at any depth in the sub-documents (and may appear multiple times).

I am currently pulling each Document into Python and then searching that dictionary, but it would be nice if I could state a filter prototype where the database would only return documents that have a particular value of the field name somewhere in their contents.

Here is an example document:

{
    "foo": 1,
    "bar": 2,
    "find-this": "Yes!",
    "stuff": {
        "baz": 3,
        "gobble": [
            "wibble",
            "wobble",
            {
                "all-fall-down": 4,
                "find-this": "please find me"
            }                
        ],
        "plugh": {
            "plove": {
                "find-this": "Here too!"
            }
        }
   }
}

So, I'd like to find documents that have a "find-this" field, and (if possible) to be able to find documents that have a particular value of a "find-this" field.

2
  • Holy server-side scripting, Batman! I had no idea you could run JS in the database! That's really cool, and your solution makes perfect sense. Thanks very much!
    – Dave M.
    Commented Jul 3, 2015 at 22:38
  • Oh, you know what? I bet you could do a find() with an "or" $where clause: let the database find "key-to-search" and "value-to-search" using its own (fast) mechanisms for where the key is at the top level, and provide the recursive-searching JS function to be used on all the nodes where "key-to-search" isn't at the top level. I'll add that to this question if I can get it working.
    – Dave M.
    Commented Jul 3, 2015 at 22:43

2 Answers 2

17

You are right in the certain statement of a BSON document is not an XML document. Since XML is loaded into a tree structure that comprises of "nodes", searching on an arbitary key is quite easy.

A MonoDB document is not so simple to process, and this is a "database" in many respects, so it is generally expected to have a certain "uniformity" of data locations in order to make it easy to both "index" and search.

Nonetheless, it can be done. But of course this does mean a recursive process executing on the server and this means JavaScript processing with $where.

As a basic shell example, but the general function is just a string argument to the $where operator everywhere else:

db.collection.find({
  $where: function () {
    var findKey = "find-this",
      findVal = "please find me";

    function inspectObj(doc) {
      return Object.keys(doc).some(function (key) {
        if (typeof doc[key] == "object") {
          return inspectObj(doc[key]);
        } else {
          return key == findKey && doc[key] == findVal;
        }
      });
    }
    return inspectObj(this);
  },
});

So basically, test the keys present in the object to see if they match the desired "field name" and content. If one of those keys happens to be an "object" then recurse into the function and inspect again.

JavaScript .some() makes sure that the "first" match found will return from the search function giving a true result and returning the object where that "key/value" was present at some depth.

Note that $where essentially means traversing your whole collection unless there is some other valid query filter than can be applied to an "index" on the collection.

So use with care, or not at all and just work with re-structring the data into a more workable form.

But this will give you your match.

0
4

Here is one example, which I use for recursive search for Key-Value anywhere in document structure:

db.getCollection('myCollection').find({

    "$where" : function(){

        var searchKey = 'find-this';
        var searchValue = 'please find me';

        return searchInObj(obj);

        function searchInObj(obj){                            
          for(var k in obj){       
            if(typeof obj[k] == 'object' && obj[k] !== null){
              if(searchInObj(obj[k])){
                return true;
              }
            } else {
              if(k == searchKey && obj[k] == searchValue){
                return true;
              }
            }          
          }                         
          return false;
        }       
    }    
})
2
  • I know it's an old answer, but where does that obj comes from in return searchInObj(obj);?
    – Keselme
    Commented Oct 2, 2019 at 11:49
  • As documentatiion describes: "Reference the document in the JavaScript expression or function using either this or obj" Commented Oct 16, 2019 at 10:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.