17

This is a question about the best way to add up a series of data in an array where I have to match another element. I'm trying to use the 2.2 Aggregation framework and it's possible I can do this with a simple group.

So for a given set of documents I'm trying to get an output like this;

{
    "result" : [
            {
                "_id" : null,
                "numberOf": 2,
                "Sales" : 468000,
                "profit" : 246246,
            }
    ],
    "ok" : 1
}

Now, I originally had a list of documents, containing values assigned to named properties, eg;

[
{
    _id : 1,
    finance: {
        sales: 234000,
        profit: 123123,
    }
}
,
{
    _id : 2,
    finance: {
        sales: 234000,
        profit: 123123,
    }
}
]

This was easy enough to add up, but the structure didn't work for other reasons. For instance, there are may other columns like "finance" and I want to be able to index them without creating thousands of indexes, so I need to convert to a structure like this;

[
{
    _id : 1,
    finance: [
        {
            "k": "sales",
            "v": {
                "description":"sales over the year",
                v: 234000,
            }
        },
        {
            "k": "profit",
            "v": {
                "description":"money made from sales",
                v: 123123,
            }
        }
    ]
}
,
{
    _id : 2,
    finance: [
        {
            "k": "sales",
            "v": {
                "description":"sales over the year",
                v: 234000,
            }
        },
        {
            "k": "profit",
            "v": {
                "description": "money made from sales",
                v: 123123,
            }
        }
    ]
}
]

I can index finance.k if I want, but then I'm struggling to build an aggregate query to add up all the numbers matching a particular key. This was the reason I originally went for named properties, but this really needs to work in a situation whereby there are thousands of "k" labels.

Does anyone know how to build an aggregate query for this using the new framework? I've tried this;

db.projects.aggregate([
    {
        $match: {
            // QUERY
            $and: [
                // main query
                {},
            ]
        }
    },
    {
        $group: {
            _id: null,
            "numberOf": { $sum: 1 },
            "sales":    { $sum: "$finance.v.v" },
            "profit":   { $sum: "$finance.v.v" },
        }
    },
])

but I get;

{
    "errmsg" : "exception: can't convert from BSON type Array to double",
    "code" : 16005,
    "ok" : 0
}

** For extra kudos, I'll need to be able to do this in a MapReduce query as well.

4
  • why do you need to do this in MapReduce? Aggregation framework will be faster than M/R and simpler to read usually. Commented Aug 29, 2012 at 1:09
  • Well besides the fact that the agg fx isn't in production yet :), the MR will be quicker for the majority of cases because I can pre-cache the output. The agg fx solution will be for when we don't have a MR cached version.
    – cirrus
    Commented Aug 29, 2012 at 7:00
  • 2.2 has been released into production so agg framework is in production. Commented Aug 29, 2012 at 16:56
  • True, but actually only just within the time between my last comment and yours :) It'll still be a short while before it shows up as a standard offering at places like MongoLab and MongoHQ etc, but being able to pre-calculate and store the MR results is still a requirement unless the $out feature is working.
    – cirrus
    Commented Aug 29, 2012 at 18:12

2 Answers 2

18

You can use the aggregation framework to get sales and profit and any other value you may be storing in your key/value pair representation.

For your example data:

var pipeline = [
    {
        "$unwind" : "$finance"
    },
    {
        "$group" : {
            "_id" : "$finance.k",
            "numberOf" : {
                "$sum" : 1
            },
            "total" : {
                "$sum" : "$finance.v.v"
            }
        }
    }
]

R = db.tb.aggregate( pipeline );
printjson(R);
{
        "result" : [
            {
                "_id" : "profit",
                "numberOf" : 2,
                "total" : 246246
            },
            {
                "_id" : "sales",
                "numberOf" : 2,
                "total" : 468000
            }
        ],
        "ok" : 1
}

If you have additional k/v pairs then you can add a match which only passes through k values in ["sales","profit"].

3
  • Great answer by the way! Shame I can't mark both answers correct.
    – cirrus
    Commented Aug 30, 2012 at 17:53
  • I think this is just what I need - how should the $match look if I want to filter out other fields that might be there, and just return "sales" and "profit"?
    – cirrus
    Commented Aug 31, 2012 at 14:30
  • $match syntax is identical to first argument to find() - so {$match:{"finance.k":{$in:["sales","profit"]}}} and you would put it after the unwind (you can actually put it both before and after :) Commented Aug 31, 2012 at 19:06
6

You will have to use '$unwind" to break out the values in the array, which will mean that you can't get the sum of the sales and the profit in a single aggregation command. Given that, the query itself is easy:

var pipeline = [ 
        {"$unwind": "$finance" } ,
        {"$match": {"finance.k": "sales"} },
        { $group: 
            { _id: null,
                numberOf: { "$sum": 1 },
                sales: {"$sum": "$finance.v.v" }
            }
        }
    ];

R = db.tb.aggregate( pipeline );
printjson(R);

{
        "result" : [
                {
                        "_id" : null,
                        "numberOf" : 2,
                        "sales" : 236340
                }
        ],
        "ok" : 1
}

You can run a similar query for profit, just substitute "profit" for "sales" in the "$match" operator.

Oh, and here's the map/reduce example:

map = function() {
    var ret = { sales: 0.0 , profit: 0.0, count: 1 };

    // iterate over 'finance[]' array
    this.finance.forEach( function (i) { 
        if ( i.k == "sales" ) ret.sales =  i.v.v ;
        if ( i.k == "profit" ) ret.profit =  i.v.v ;
    } );

    emit( 1, ret );
}

reduce = function(key, values) {
    var ret = { sales: 0.0 , profit: 0.0, count: 0 };

    values.forEach(function(v) {
        ret.sales += v.sales;
        ret.profit += v.profit;
        ret.count += v.count;
    });

    return ret;
    };
//
// run map/reduce
//
res = SOURCE.mapReduce( map, reduce );
1
  • Great answer, thanks! The map reduce works well. Shame I can't mark both answers correct.
    – cirrus
    Commented Aug 30, 2012 at 17:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.