Index Analysis using Explain Command

Learn via video courses
Topics Covered

Overview

Index analysis plays a crucial role in optimizing the performance of MongoDB queries. By utilizing the explain() command, developers can gain valuable insights into query plans formed by the query optimizer and make the best decisions on index usage in MongoDB. This article explains all details regarding Index analysis using Explain command in MDB.

Introduction

The Explain command in MongoDB is a powerful tool used to obtain insights into the query execution plan and performance. It is used mostly for optimization. MongoDB provides different types of Explain commands based on the object it is used on, including the db.collection.explain() method, the cursor.explain() method, and the explain command itself.

Let us consider a simple collection with documents having fields such as name and price as examples for illustrating the queries.

  • The db.collection.explain() method is used to analyze the query plan for a specific query executed on a collection. By prefixing the find(), aggregate(), or other query methods with this explain the method, you can retrieve information about the query execution.
  • The cursor.explain() method is used to examine the query plan for a specific cursor. It provides information about the query execution plan for a query that has already been executed and stored in a cursor object.
  • The explain command is a MongoDB shell command that provides information about the query execution plan without actually executing the query. It helps in analyzing and optimizing query performance.

The output for the explain() method will be discussed in the following sections along with MongoDB Index analysis using Explain command.

Explain Output Structure

  • The MongoDB Index analysis using Explain command gives an output structure that provides detailed information about the query execution plan, including the information of the query planner.
  • The Explain output is presented as a document in JSON format and the structure is based on the usage of either the classic query engine or the slot-based execution query engine.
  • Starting from MongoDB 4.4, a new slot-based execution query engine called the Atlas Query Accelerator (AQA) was introduced. The AQA enhances query performance by utilizing optimized execution strategies.
  • On using the slot-based query engine, the query explain plan output is modified by introducing new fields in the output.

Explain Output for MongoDB 5.1 and Later

The output for Index analysis using Explain command in MongoDB 5.1 and later provides a comprehensive and detailed analysis of the query execution plan and performance. Let us study the output using a products collection with fields such as name, price, and rating. Let us run the following query in this collection,

When we run the above query using the explain() command, we get an output that provides insights into the query execution plan. The output is a large JSON object and has multiple sections. These sections will be explained in consecutive sections.

queryPlanner

The queryPlanner section in the output of the MongoDB Index analysis using Explain command, provides valuable information about the query planning process and the chosen query plan.

The queryPlanner section from the output has the following fields and values,

  • The plannerVersion field indicates the version of the query planner used to generate the query plan.
  • The namespace field specifies the collection on which the query is being executed.
  • The indexFilterSet field indicates whether any index filters were applied during the query planning. Index filters are additional conditions that can be utilized to further optimize the query execution.
  • The parsedQuery field represents the query expressed in a structured form.
  • The winningPlan field describes the selected query plan that is deemed most efficient by the query optimizer. It provides information about the following stages,
    • The stage field represents the current stage in the query plan. The value FETCH indicates that the plan involves retrieving documents from the collection.
    • The filter field specifies the filtering conditions applied during the fetch stage.
    • The inputStage field represents the initial stage of the query plan. The IXSCAN indicates that an index scan is performed.
      • The stage sub-field represents the type of scan being performed, which is an index scan in this case.
      • The keyPattern specifies the index key pattern used for the index scan. In our example, an index is created on the price and rating fields in ascending order.
      • The indexName field provides the name of the index being utilized
      • The isMultiKey field indicates whether the index scan involves multi-key index entries.
      • The isUnique, isSparse, isPartial fields determine the type of index.
      • This direction field indicates the scan direction of the index.
      • The indexBounds field specifies the ranges of values being scanned for each field. The price is scanned from 50 (exclusive) to infinity, and the rating is scanned from 4 (exclusive) to infinity.
  • The rejectedPlans field indicates any alternative query plans that are considered by the query optimizer but are rejected.

executionStats

The output of index analysis using Explain command in MongoDB also has the executionStats section that provides detailed statistics and information about the query execution.

The executionStats section continues as the output after the queryPlanner section and has the following field with values,

  • The executionSuccess field indicates whether the execution of the query was successful.
  • The nReturned gives the number of documents returned by the query.
  • The executionTimeMillis field has the time taken to execute the query, measured in milliseconds.
  • The totalKeysExamined and totalDocsExamined shows the total number of index keys and documents examined during the query execution.
  • The executionStages is an object that provides detailed information about the execution stages involved in the query execution.
    • The nReturned field has the number of documents returned at the current execution stage.
    • The executionTimeMillisEstimate field specifies an estimate of the time taken to execute the current stage.
    • The works field shows the total number of work items processed by the current stage. A work item is a unit of task performed during the execution.
    • The advanced field has the count of the number of documents advanced to the next stage.
    • The needTime field has the actual time required by the current stage to complete its work.
    • The saveState field shows the number of states saved in the current stage.
    • The restoreState field shows the number of states restored in the current stage.
    • The isEOF is used to identify whether the current stage has reached the end of the result set.
    • The invalidates field specifies whether the current stage invalidates any of the documents it reads.
    • The alreadyHasObj field shows whether the current stage has encountered any documents that have already been processed.
    • The inputStage has fields similar to the previous section and we will discuss only the new fields here.
    • The dupsTested field has the number of duplicate documents tested during the index scan stage.
    • The dupsDropped field has the number of duplicate documents dropped during the index scan stage.
    • The seenInvalidated field indicates whether the index scan stage has seen invalidated documents.
  • The allPlansExecution field is an array containing information about the execution plan for all plans considered during query optimization.

NOTE: A document becomes an invalid document if a document is modified or deleted after the query planner has accessed it but before it is fully processed.

serverInfo

The serverInfo section of the MongoDB Index analysis using Explain command provides detailed information about the MongoDB server, including multiple configurations on the server.

The serverInfo section continues as the output after the executionStats section and has the following field with values,

  • The host field has the hostname or IP address of the MongoDB server where the query was executed.
  • The port is the port number on which the MongoDB server is listening.
  • The version specifies the version of MongoDB running on the server.
  • The gitVersion field indicates the git commit identifier associated with the MongoDB server version.
  • The modules field has information about any additional modules or plugins loaded by the MongoDB server.
  • The openssl field depicts the OpenSSL library version used by the MongoDB server for secure connections.
    • The running field specifies the OpenSSL version currently in use at runtime.
    • The compiled field indicates the version used during compilation.
  • The buildEnvironment field has details such as the operating system, architecture, compiler paths (cc and cxx), and compilation flags (ccflags, cxxflags, linkflags).
  • The bits field indicates the number of bits(32-bit or 64-bit) in the MongoDB server.
  • The debug field indicates if the server was built in debug mode or not
  • The maxBsonObjectSize field represents the maximum size, in bytes, of a BSON object that MongoDB can store.
  • The storageEngines field has a list of storage engines supported by the MongoDB server.
  • The operationTime field has the timestamp of the most recent operation performed on the server.
  • The $clusterTime field has the cluster timestamp and signature associated with the most recent operation performed on the server in a clustered environment.

Note: In a clustered environment, multiple servers, known as nodes, work together to form a cluster for higher availability, fault tolerance, and scalability. Therefore, the clusterTime represents a global timestamp for the latest operation across the entire cluster and helps to check coordination and consistency whereas. The operationTime field represents the local timestamp of the latest operation.

Execution Plan Statistics for Query with $lookup Pipeline Stage

The$lookup stage in MongoDB's aggregation framework is used to perform a left outer join between two collections based on the join conditions. The execution plan for such an operation provides insights into how the query with the $lookup stage is executed and the performance statistics associated with it.

The syntax to obtain the Execution Plan Statistics for a Query with the $lookup stage using the explain() method in MongoDB is,

The output contains the Winning Plan and Stage Statistics with the fields that are discussed in the previous sections along with the Join Statistics which has information about the number of documents joined, the number of matches found, the time taken for the join operation, and any additional information specific to the join process.

Overall, the execution plan Statistics provide information about the overall execution time and resource usage of the query with the $lookup stage.

Collection Scan

  • Collection Scan, also known as a Collection Scan Stage(COLLSCAN), is an operation in MongoDB that involves scanning the entire collection to fulfill a query without utilizing any indexes.
  • The stage occurs when there are no suitable indexes available for the query, or when the query optimizer determines that a collection scan is more efficient than using indexes.
  • In this stage, MongoDB examines every document in the collection to determine if it matches the query conditions.

Starting from MongoDB 5.3, a new feature has been introduced where the query planner can select a clustered index, which is a type of index that stores the physical arrangement of the data on disk for a clustered collection. In such cases, the CLUSTERED_IXSCAN stage from the output of Index analysis using Explain command in MDB is used and this stage includes details about the clustered index key and the index bounds used during the query processing.

A Collection Scan is considered to be less efficient than utilizing indexes because it requires reading and evaluating every document in the collection, which can result in higher resource usage and slower query execution.

Covered Queries

  • A covered query is a query in which all the requested fields can be fulfilled by scanning the index without the need to access the actual documents in the collection.
  • A query is considered a covered query when the index contains all the necessary data to satisfy the query's projection and filtering criteria.

Consider a collection with documents having fields such as name, email, and age. Let's say we want to retrieve all documents where the age field is greater than 25, and we only want thename field as output. To make this query a covered query, we need an index that covers both the age and name fields,

Now, when we execute the query, only the elements in the index can be considered a covered query,

In the output of the explain() command, if the stage field is indicated as IXSCAN (index scan) in the output of MongoDB Index analysis using Explain command, and the indexOnly field is true, it signifies that the query is covered. In our output, such conditions will satisfy and also the keysExamined and docsExamined fields will have a lower value as the index has also the projected data which makes no need for accessing the actual documents.

Index Intersection

  • Index Intersection is a feature in MongoDB that allows the query optimizer to combine multiple indexes to satisfy a query efficiently.
  • Index Intersection is particularly useful when a query involves multiple fields, and no single index can be used for all the queries optimally.

By using the intersecting of indexes, MongoDB can significantly reduce the number of documents that need to be scanned or examined.

We can utilize we use the hint() method in MongoDB to specify the indexes we want to intersect in a query. The syntax of hint() method is,

It is important to remember that index1 and index2 has to be set as indexes using the db.users.createIndex() command before using the hint() method.

While using the explain() method, we have additional Candidate Indexes Selection, Intersection Planning, and Index Intersection stages. The stage is named either AND_SORTED or AND_HASH with a series of inputStages(child stages) for the indexes.

$or Expression

The $or expression of MongoDB allows the usage of multiple conditions within a single query. It is specified by the OR stage in the query planner. When the query planner encounters the $or expression, it generates an execution plan that includes separate evaluation stages in the inputStages, similar to the previous method for handling each expression in the $or operator. When using the explain() command the output structure will be,

Please note that excessive use of the $or expression with a large number of conditions can impact query performance, as each condition requires individual evaluation.

sortandsort and group Stages

The $sort operation is used to sort the documents in a specified order. In the explain command output, the $sort stage is represented as a separate stage in the executionStats section. There are two additional fields for the sort stage,

  • The totalDataSizeSortedBytesEstimate field has the data regarding the estimated number of bytes processed in the sort stage.
  • We can find if the sort stage has written the data to the disk through the usedDisk field.

The $group stage is used to group documents together based on specified field data. In the explain command output, the group stage is represented as a separate stage in the executionStats section. It provides information about the key used for grouping. There are also two additional fields for the group operation,

  • The totalOutputDataSizeBytes field gives the estimate of the computed total size of all documents that are outputted by the group stage. The data will be in bytes.
  • The usedDisk field to find if the grouped data is written to disk.

Sort Stage

  • The SORT stage is used when a query requires sorting the results based on one or more fields. + The SORT stage is represented in the executionStages section in the output of MongoDB Index analysis using Explain command along with details about the sorting operation performed during the query execution.
  • In MongoDB, a blocking sort operation is an in-memory sorting of the query results. It can be used to sort query results that cannot be efficiently sorted using an index.
  • The blocking sort operation loads all the query results into memory and performs the sort operation on them.
  • If more than 100 megabytes of memory is required by the blocking sort operation to load the data, MongoDB returns an error.
  • To prevent such error we can use the cursor.allowDiskUse() method which is introduced in MongoDB 4.4 and allows MongoDB to write data to temporary files on disk during the execution of a query when the system's available memory is insufficient.

The syntax of using this method is,

Conclusion

  • The Index analysis using Explain command in MDB is a powerful tool in MongoDB that provides insights into query execution plans and performance.
  • The output of MongoDB Index analysis using Explain command in MongoDB 5.1 and later versions offer a detailed analysis of query execution plans.
  • The output includes information about the query planner, execution statistics, and index usage, among other information.
  • A covered query is a high-performance query in which all the requested fields can be fulfilled by index.
  • Index Intersection is a feature in MongoDB that allows the query optimizer to combine multiple indexes.
  • Based on the operation used in queries such as $sort, $or, and $group, the Explain output includes information about respective stages.
  • MongoDB provides an in-memory sorting feature with a better performance called blocking sort operation.
  • By understanding the Explain output structure and interpreting its various sections, developers can improve query performance.