Join 2 Collections
Overview
A join operation is usually used in many relational database management systems (RDBMS) for the analysis of data across multiple tables. Mongodb, being a NoSQL database uses a denormalized data model in which data that are related can be stored together in a single document, rather than being split across multiple tables or collections. MongoDB also provides join features through its famous aggregation framework for complex data processing and manipulation.
Introduction
Joining data from multiple collections is a common requirement in many database applications, as it allows for more complex queries and analysis of data. MongoDB's denormalized data model, which stores related data together in a single document, provides advantages in terms of performance, scalability, and flexibility, as it allows for efficient retrieval of related data and eliminates the need for costly joins in many cases.
However, there are situations where joining collections in MongoDB is still necessary, such as when integrating with external data sources. MongoDB provides a powerful Aggregation Framework that allows for joining data from multiple collections using the lookup operator and other aggregation pipeline stages to join collections in MongoDB and how to optimize the join process for better performance and scalability.
The denormalized data model represents the ability of a document to contain nested structures and arrays, allowing for the representation of complex and hierarchical data in a single document.
How To Join Two Collections In MongoDB?
The $lookup operator in the aggregation pipeline of MongoDB can be used to perform a left outer join between two collections. The Aggregation Pipeline in MongoDB is a powerful data processing framework that allows for the transformation, grouping, and filtering of data within a collection. It consists of multiple stages and at each stage, an operation is performed on the data and is passed to the next stage. The pipeline provides a flexible and efficient way to query and analyze data in MongoDB. Let us look into the syntax of the $lookup operator,
- The from field specifies the name of the collection to join.
- The localField field specifies the field from the document in which the aggregation is performed or the input documents that will be used to match against the foreignField.
- The foreignField field specifies the field from the document's value of the from collection that will be used to match against the localField.
- The as field specifies the name of the output array field that will contain the joined documents. This field will be created with the joined document in the "from" collection.
The $lookup operator can be used as a stage of the aggregation pipeline and can also be used with subqueries and expressions to perform more complex joins and transformations of the data.
We can also perform join operations recursively using the $graphLookup stage of the aggregation pipeline. We can also join two results of two separate computations using the $facet stage of the aggregation framework.
Steps for Joining Two Collections In MongoDB
Let us consider the following example to explore the steps for joining two collections in MongoDB using the $lookup operator in the aggregation pipeline.
Let us consider two collections, orders and customers. The order collection contains order information such as the order ID, order date, customer ID, and total amount.
The customer collection contains information about customers, including their ID, name, and contact information.
We can use the $lookup operator to join the orders collection with the customer's collection to have the corresponding customer information for each order in the same collection. The query will be,
- The from parameter describes that we have to join the orders collection with the customer's collection.
- The localField is selected from the orders collection and in our example thecustomer_id field is used as the localField.
- The foreignField is selected from the customers collection and the _id field is the foreignField in our example.
- The resulting documents will be stored as an array in the customer field in the orders collection. This is specified by the as parameter. If there are no customers with _id that matches the customer_id field in the order, we will get a customer field with a null value.
- We can also use the $project operator in the next stage of the aggregation pipeline to reshape the output of the query to only include the fields we want to see in the output.
Output
How MongoDB Joins Two Collections With Specific Conditions using Pipeline & $Lookup?
The aggregation pipeline has a series of stages that can be used to modify data. To apply conditions for Join 2 Collections in MongoDB, we can use the $match operator. The syntax of the $match operator is,
The field is the name of the field in the collection in which the condition is to be applied. The condition is the rule that the field must meet for the document to be included in the output.
Let us consider the example in the last section in which we want to join collections in which the order is placed after a certain date and show only the specific details like customer name, email,order_date, and total money. The following query can be used to achieve the above requirements,
- The $match stage filter all the orders made after January 1st, 2022. The output of this stage has to include documents where the order_date field is greater than January 1st, 2022.
- The $lookup stage joins the customer collection based on the values of customer_id of the orders collection and _id of the customer collection. The output is stored as a customer field in the
- The $project stage is used to reshape the output including only the fields that are required. The 0 value represents that the field will not be included and the 1 represents that the field will be included.
A sample output of the above query will be,
We can see the customer field has only name and email because of the projection stage of the aggregation framework. We can also use other stages of the pipeline like limit, and sort to further process the data.
The performance considerations for the lookup operator to Join 2 Collections in MongoDB are,
- The size of the collections being joined can have a significant impact on performance. Large collections may slow down performance.
- Indexing is crucial for improving the performance of the $lookup stage. Both the local field and the foreign field should be indexed to speed up the join process.
- The distribution of data across the collections can also impact performance. If the data being joined is uneven, then it can impact query performance.
- Optimizing the overall aggregation pipeline by reshaping the data before performing the join can improve performance.
- Data modeling and denormalization techniques can be used to minimize the need for frequent joins.
How Does MongoDB Join Two Collections With The Where Clause?
In MongoDB, the $lookup operator acts as a pipeline stage in the aggregation framework and can be used to join data from two collections together based on a common field. We can also use a where clause with the $lookup operator to filter the joined documents based on a specified condition. This can be done by adding a $match stage after the $lookup stage in the aggregation pipeline.
Consider the same example collections orders and customers. Let us add a field age to the customer's documents and if we want to perform join on both documents and get only the documents where the age of the customer who has placed the order is greater than 30 and less than 50, the following query can be used,
The output will be the document from the orders collection in which the customers will have ages between 30 and 50.
Using Compass To Join Two Collections In MongoDB
Compass is a graphical user interface (GUI) for MongoDB that provides an easy and intuitive way to interact with databases and collections. We can perform queries, and aggregation and also join two collections using mongoDB compass. The following steps have to be followed to Join 2 Collections in MongoDB using the compass,
- Launch Compass and connect to your MongoDB instance. The connection is done using the connection string.
- Select the database that contains the collections you want to join.
- Click on the "Aggregations" tab in the left sidebar.
- Click on the "Create New" button to create a new aggregation pipeline.
- In the aggregation pipeline editor, click on the "+ Add stage" icon to add a new stage.
- Select the $lookup stage from the list of available stages.
-
Configure the $lookup stage as follows:
- Set the from field to the name of the collection you want to join.
- Set the localField field to the name of the field in the current collection that should match the collection to be joined.
- Set the foreignField field to the name of the field in the joined collection that you want to join with.
- Set the as field to the name of the field that will contain the result joined documents.
We can also see the preview in the right tab dynamically as we update the values in the stage.
- We can add multiple stages using the + Add stage button.
- Finally, click the Run button to run the created aggregation pipeline. The results will be displed in the same tab.
- If you are satisfied with the results, click on the Export button to save the aggregation pipeline.
Conclusion
- Joining two collections in MongoDB involves combining data from multiple collections into a single result set.
- Mongodb follows a denormalized data model, which stores related data together in a single document.
- This $lookup aggregation stage can be used to perform a left outer join between two MongoDB Collections.
- MongoDB's powerful aggregation framework provides multiple stages like sort, limit, and match to perform complex data transformation operations.
- We can use the $match stage to filter the joined collections using a condition. The $project stage can be used to show only the required fields in the output.
- The MongoDB compass provides a graphical user interface through which the joining of collections and other aggregation operations can be performed.