mongodb - Using the aggregation framework to compare array element overlap -
i have collections documents structured below:
{ carrier: "abc", flightnumber: 123, dates: [ isodate("2015-01-01t00:00:00z"), isodate("2015-01-02t00:00:00z"), isodate("2015-01-03t00:00:00z") ] }
i search collection see if there documents same carrier
, flightnumber
have dates in dates
array on lap. example:
{ carrier: "abc", flightnumber: 123, dates: [ isodate("2015-01-01t00:00:00z"), isodate("2015-01-02t00:00:00z"), isodate("2015-01-03t00:00:00z") ] }, { carrier: "abc", flightnumber: 123, dates: [ isodate("2015-01-03t00:00:00z"), isodate("2015-01-04t00:00:00z"), isodate("2015-01-05t00:00:00z") ] }
if above records present in collection return them because both have carrier
: abc
, flightnumber
: 123
, have date isodate("2015-01-03t00:00:00z")
in dates
array. if date not present in second document neither should returned.
typically grouping , counting below:
db.flights.aggregate([ { $group: { _id: { carrier: "$carrier", flightnumber: "$flightnumber" }, uniqueids: { $addtoset: "$_id" }, count: { $sum: 1 } } }, { $match: { count: { $gt: 1 } } } ])
but i'm not sure how modify array overlap. can suggest how achieve this?
you $unwind
array if want @ contents "grouped" within them:
db.flights.aggregate([ { "$unwind": "$dates" }, { "$group": { "_id": { "carrier": "$carrier", "flightnumber": "$flightnumber", "date": "$dates" }, "count": { "$sum": 1 }, "_ids": { "$addtoset": "$_id" } }}, { "$match": { "count": { "$gt": 1 } } }, { "$unwind": "$_ids" }, { "$group": { "_id": "$_ids" } } ])
that in fact tell documents "overlap" resides, because "same dates" along other same grouping key values concerned have "count" occurs more once. indicating overlap.
anything after $match
"presentation" there no point reporting same _id
value multiple overlaps if want see overlaps. in fact if want see them best leave "grouped set" alone.
now add $lookup
if retrieving actual documents important you:
db.flights.aggregate([ { "$unwind": "$dates" }, { "$group": { "_id": { "carrier": "$carrier", "flightnumber": "$flightnumber", "date": "$dates" }, "count": { "$sum": 1 }, "_ids": { "$addtoset": "$_id" } }}, { "$match": { "count": { "$gt": 1 } } }, { "$unwind": "$_ids" }, { "$group": { "_id": "$_ids" } }, }}, { "$lookup": { "from": "flights", "localfield": "_id", "foreignfield": "_id", "as": "_ids" }}, { "$unwind": "$_ids" }, { "$replaceroot": { "newroot": "$_ids" }} ])
and $replaceroot
or $project
make return whole document. or have done $addtoset
$$root
if not problem size.
but overall point covered in first 3 pipeline stages, or in "first". if want work arrays "across documents", primary operator still $unwind
.
alternately more "reporting" format:
db.flights.aggregate([ { "$addfields": { "copy": "$$root" } }, { "$unwind": "$dates" }, { "$group": { "_id": { "carrier": "$carrier", "flightnumber": "$flightnumber", "dates": "$dates" }, "count": { "$sum": 1 }, "_docs": { "$addtoset": "$copy" } }}, { "$match": { "count": { "$gt": 1 } } }, { "$group": { "_id": { "carrier": "$_id.carrier", "flightnumber": "$_id.flightnumber", }, "overlaps": { "$push": { "date": "$_id.dates", "_docs": "$_docs" } } }} ])
which report overlapped dates within each group , tell documents contained overlap:
{ "_id" : { "carrier" : "abc", "flightnumber" : 123.0 }, "overlaps" : [ { "date" : isodate("2015-01-03t00:00:00.000z"), "_docs" : [ { "_id" : objectid("5977f9187dcd6a5f6a9b4b97"), "carrier" : "abc", "flightnumber" : 123.0, "dates" : [ isodate("2015-01-03t00:00:00.000z"), isodate("2015-01-04t00:00:00.000z"), isodate("2015-01-05t00:00:00.000z") ] }, { "_id" : objectid("5977f9187dcd6a5f6a9b4b96"), "carrier" : "abc", "flightnumber" : 123.0, "dates" : [ isodate("2015-01-01t00:00:00.000z"), isodate("2015-01-02t00:00:00.000z"), isodate("2015-01-03t00:00:00.000z") ] } ] } ] }
Comments
Post a Comment