Aggregation in MongoDB

Learn via video courses
Topics Covered

Overview

Aggregation in mongodb is the process of grouping various documents, performing any action, and obtaining a single result. The fundamental alteration of documents using aggregations will be covered in this article.

Pre-requisites

For this lesson to work, you will need :

  • A server with a UFW-configured firewall and an ordinary, non-root user with sudo access.
  • You have MongoDB set up on your server.
  • The MongoDB instance on your server is protected by activating authentication and setting up an administrative account.
  • Experience with filtering and searching MongoDB collections.

Or you can simply have an account in MongoDB Atlas and connect to it using your command line terminal.

What is Aggregation in MongoDB?

When querying and analyzing data, MongoDB's "aggregate" functionalities are heavily utilized. Aggregation in mongodb is the process of going through various phases with a huge collection of documents to process them. A pipeline is made up of several stages. Filtering, sorting, grouping, reshaping, and altering documents as they move through a pipeline are all possible.

There are three techniques to conduct aggregate in MongoDB :

  1. Map-reduce function.
  2. Single-purpose aggregation
  3. Aggregation Pipeline.

Calculating aggregate values for collections of documents is one of Aggregation's most often used use cases. This is comparable to the fundamental aggregation offered by the GROUP BY clause and the COUNT, SUM, and AVG functions in SQL. Nevertheless, aggregation in mongodb takes a step further and can also conduct joins that resemble those in a relational database, restructure documents, create and edit existing collections, and more.

Although there are various ways to acquire aggregate data in MongoDB, for the majority of tasks, the aggregation in mongodb framework is the suggested option. Single-purpose methods, such as estimatedDocumentCount(), distinct(), and count() are quick to use but have a narrow range of applications because they are attached to a find() query.

The aggregation framework was previously known as the map-reduce framework on MongoDB, however it was far more difficult to utilize.

How Does Aggregation Work in MongoDB?

The aggregation in mongodb pipeline can be used when you need to perform more complicated aggregation. Aggregation pipelines are groups of steps that, when used in conjunction with the MongoDB query syntax, let you get an aggregated result.

Let's examine the aggregate pipeline's functions and operation before getting into the coding. You list out a sequence of instructions at a "stage" in the aggregation pipeline. MongoDB does each stage that is defined one after the other to provide a finished output that is usable.

Let's examine an illustration of the aggregate command in action :

We run a stage named $match in the example above. Following the completion of that step, the $group stage receives its output.

We can take a collection of objects and use the $match function to only get the items with status values of A.

Then, we utilize $group to group documents depending on the cust id attribute. We add up all of the amount fields for each group as part of the $group stage.

There are other operators available in MongoDB that you can use in your aggregations in addition to $sum.

Map-Reduce Function

The map-reduce programming approach for data processing is used in MongoDB to conduct operations on big data sets and get aggregated results. To do out map-reduce operations, MongoDB offers the mapReduce() function. The map function and reduce function are the two primary functions of this function.

All of the data are grouped using the map function based on their key-value pairs, and actions are carried out on the grouped data using the reduce function. As a result, the data is independently reduced and mapped in several areas before being combined in the function, with the finished result being saved to the designated new collection.

Syntax

Examples

Examine the document structure for user postings in the example below. The user's user name and the post's status are stored in the document.

The following code will be used to select all of the active posts from our posts collection, group them based on user name, and then calculate the total number of posts made by each user.

The above mapReduce query outputs the following result :

Single-Purpose Aggregation

Single-Purpose aggregation in mongodb is employed whenever we require quick access to a document, such as when we need to count the number of documents or locate all distinct values within a document. It lacks the flexibility and abilities of the pipeline because it only offers access to the common aggregation process via the distinct(), estimatedDocumentCount(), and count() methods.

ExpressionDescriptionExample
$sumAdds up the defined value across the entire collection of documents.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$maxObtains the highest value for each associated field across all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$minObtains the lowest matching value throughout the entire collection of documents.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$avgcomputes the mean of all input values throughout the whole collection of documents.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$pushAdds the value to an array in the finished product.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$firstThe first document is retrieved from the source files in accordance with the grouping. Most of the time, this only makes sense when used in conjunction with a previously used $sort-stage.db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$lastRetrieves the most recent document from the source files in accordance with the grouping. Usually, this only makes sense when used in conjunction with a preceding $sort-stage.db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])
$addToSetAdds the value to an array in the finished document without duplicating it.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])

Examples

A collection called db that contains only the following documents is provided.

Use the insertMany command to insert the following data in the database.

The subsequent action would total all of the collection's documents and return that number - db.records.count()

Output :

The subsequent procedure will only count documents with a field value of 1 and return - db.records.count( { a: 1 } )

Output :

Aggregation Pipeline

The aggregation pipeline in MongoDB has multiple stages, each of which modifies the document. Alternatively, to put it another way, the aggregation in mongodb pipeline is a multi-stage pipeline, thus in each state, the documents are taken as input and produce the resultant set of documents in the next stage(if available). The resultant documents are taken as input and produce output, and this process continues until the last stage.

The fundamental pipeline stages give users access to filters that act as queries, document transformations that change the output document, and further pipelines that give users access to tools for grouping and sorting documents. The aggregation pipeline is equally applicable to sharded collections.

Stages

The potential phases in aggregation in mongodb framework are as follows :

  • $sort − Sorts the documents.
  • $skip − Using this, a certain number of papers can be skipped ahead in the list of documents.
  • $project − used to select a subset of a collection's fields.
  • $limit − This reduces the number of documents to be reviewed by the given number beginning with the current locations.
  • $match − Due to the fact that this is a filtering procedure, fewer documents may be provided as input to the following stage.
  • $group − This actually does the aggregate mentioned above.
  • $unwind − This is used to unwind array-using documents. Data is sort of pre-joined when utilizing an array; however, this action will be undone so that individual documents are once again available. We will therefore add more documents to the following step with the completion of this stage.
  • $out - It is used to add new documents to a collection as a result.

Expressions

It relates to the name of the field in input documents, such as . { $group : { _id : “$id“, total:{$sum:”$fare“}}} where $id and $fare are expressions.

Accumulators

Accumulators: Often, these are utilized at the group stage in aggregation in mongodb.

  • last: From the grouping, it retrieves the last document.
  • count: It totals the number of documents.
  • avg: It determines the mean of all inputted values across all documents.
  • min: Of all the documents, it receives the lowest value.
  • sum: It adds up the numerical counts for each category of documents.
  • max: It extracts the maximum value from each document.
  • first: The first document is obtained from the grouping.

Examples

Consider we have the following database db :

Using the insert many commands to insert the records.

Lets us now Calculate the Total Order Quantity:

The two-stage aggregate pipeline example above returns the total number of medium-sized pizza orders, sorted by pizza name.

Output :

FAQs

Q. What is Aggregation in MongoDB?

A. Aggregation in MongoDB is the process of going through various phases with a huge collection of documents to process them. A pipeline is made up of several stages. Filtering, sorting, grouping, reshaping, and altering documents, as they move through a pipeline, are all possible.

Q. Types of Aggregation Provided by MongoDB

A. There are three techniques to conduct aggregate in MongoDB :

  1. Map-reduce function.
  2. Aggregation-pipeline.
  3. Single-purpose aggregation

Conclusion

  • Aggregation in MongoDB is the process of going through various phases with a huge collection of documents to process them.
  • The aggregation in MongoDB framework was previously known as the map-reduce framework on MongoDB, however, it was far more difficult to utilize.
  • The map-reduce programming approach for data processing is used in MongoDB to conduct operations on big data sets and get aggregated results.
  • In MongoDB mapReduce() function all of the data are grouped using the map function based on their key-value pairs, and actions are carried out on the grouped data using the reduce function.
  • The aggregation pipeline in MongoDB has several phases, each of which modifies the document.
  • The fundamental pipeline stages give users access to filters that act as queries, document transformations that change the output document, and further pipelines that give users access to tools for grouping and sorting documents.