Home:ALL Converter>Comparing documents between two MongoDB collections

Comparing documents between two MongoDB collections

Ask Time:2013-11-07T00:22:09         Author:TechDawg270

Json Formatter

I have two existing collections and need to populate a third collection based on the comparison between the two existing.

The two collections that need to be compared have the following schema:

// Settings collection:
{
  "Identifier":"ABC123",
  "C":"1",
  "U":"V",
  "Low":116,
  "High":124,
  "ImportLogId":1
}

// Data collection
{
  "Identifier":"ABC123",
  "C":"1",
  "U":"V",
  "Date":"11/6/2013 12AM",
  "Value":128,
  "ImportLogId": 1
}

I am new to MongoDB and NoSQL in general so I am having a tough time grasping how to do this. The SQL would look something like this:

SELECT s.Identifier, r.ReadValue, r.U, r.C, r.Date
FROM Settings s
JOIN Reads r
  ON s.Identifier = r.Identifier
  AND s.C = r.C
  AND s.U = r.U
WHERE (r.Value <= s.Low OR r.Value >= s.High)

In this case using the sample data, I would want to return a record because the value from the Data collection is greater than the high value from the setting collection. Is this possible using Mongo queries or map reduce, or is this bad collection structure (i.e. maybe all of this should be in one collection)?

A few more additional notes: The Settings collection should really only have 1 record per "Identifier". The Data collection will have many records per "Identifier". This process could potentially be scanning hundreds of thousands of documents at one time, so resource consideration is somewhat important

Author:TechDawg270,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/19817275/comparing-documents-between-two-mongodb-collections
Xavier Guihot :

Starting Mongo 4.4, we can achieve this type of \"join\" with the new $unionWith aggregation stage coupled with a classic $group stage:\n\n// > db.settings.find()\n// { \"Identifier\" : \"ABC123\", \"C\" : \"1\", \"U\" : \"V\", \"Low\" : 116 }\n// { \"Identifier\" : \"DEF456\", \"C\" : \"1\", \"U\" : \"W\", \"Low\" : 416 }\n// { \"Identifier\" : \"GHI789\", \"C\" : \"1\", \"U\" : \"W\", \"Low\" : 142 }\n// > db.data.find()\n// { \"Identifier\" : \"ABC123\", \"C\" : \"1\", \"U\" : \"V\", \"Value\" : 14 }\n// { \"Identifier\" : \"GHI789\", \"C\" : \"1\", \"U\" : \"W\", \"Value\" : 43 }\n// { \"Identifier\" : \"ABC123\", \"C\" : \"1\", \"U\" : \"V\", \"Value\" : 45 }\n// { \"Identifier\" : \"DEF456\", \"C\" : \"1\", \"U\" : \"W\", \"Value\" : 8 }\ndb.data.aggregate([\n { $unionWith: \"settings\" },\n { $group: {\n _id: { Identifier: \"$Identifier\", C: \"$C\", U: \"$U\" },\n Values: { $push: \"$Value\" },\n Low: { $mergeObjects: { v: \"$Low\" } }\n }},\n { $match: { \"Low.v\": { $lt: 150 } } },\n { $out: \"result-collection\" }\n])\n// > db.result-collection.find()\n// { _id: { Identifier: \"ABC123\", C: \"1\", U: \"V\" }, Values: [14, 45], Low: { v: 116 } }\n// { _id: { Identifier: \"GHI789\", C: \"1\", U: \"W\" }, Values: [43], Low: { v: 142 } }\n\n\nThis:\n\n\nStarts with a union of both collections into the pipeline via the new $unionWith stage.\nContinues with a $group stage that:\n\n\nGroups records based on Identifier, C and U\nAccumulates Values into an array\nAccumulates Lows via a $mergeObjects operation in order to get a value of Low that isn't null. Using a $first wouldn't work since this could potentially take null first (for elements from the data collection). Whereas $mergeObjects discards null values when merging an object containing a non-null value.\n\nThen discards joined records whose Low value is bigger than let's say 150.\nAnd finally output resulting records to a third collection via an $out stage.\n",
2020-03-15T16:04:09
zero323 :

There is no good way of performing operation like this using MongoDB. If you want BAD way you can use code like this:\n\ndb.settings.find().forEach(\n function(doc) {\n data = db.data.find({\n Identifier: doc.Idendtifier,\n C: doc.C,\n U: doc.U,\n $or: [{Value: {$lte: doc.Low}}, {Value: {$gte: doc.High}}]\n }).toArray();\n // Do what you need\n }\n) \n\n\nbut don't expect it will perform even remotely as good as any decent RDBMS. \n\nYou could rebuild your schema and embed documents from data collection like this:\n\n{\n \"_id\" : ObjectId(\"527a7f4b07c17a1f8ad009d2\"),\n \"Identifier\" : \"ABC123\",\n \"C\" : \"1\",\n \"U\" : \"V\",\n \"Low\" : 116,\n \"High\" : 124,\n \"ImportLogId\" : 1,\n \"Data\" : [\n {\n \"Date\" : ISODate(\"2013-11-06T00:00:00Z\"),\n \"Value\" : 128\n },\n {\n \"Date\" : ISODate(\"2013-10-09T00:00:00Z\"),\n \"Value\" : 99\n }\n ]\n}\n\n\nIt may work if number of embedded document is low but to be honest working with arrays of documents is far from being pleasant experience. Not even mention that you can easily hit document size limit with growing size of the Data array.\n\nIf this kind of operations is typical for your application I would consider using different solution. As much as I like MongoDB it works well only with certain type of data and access patterns. ",
2013-11-06T17:52:25
dbra :

Without the concept of JOIN, you must change your approach and denormalize.\n\nIn your case, looks like you're doing a data log validation. My advice is looping settings collection and with each of them use the findAndModify operator in order to set a validation flag on data collection records who matches; after that, you could just use the find operator on the data collection, filtering by the new flag.",
2013-11-06T16:52:45
yy