Amazon DynamoDB Query

Topics Covered

Overview

DynamoDB is a hosted NoSQL database provided by AWS (Amazon Web Services). The Amazon DynamoDB query is a very powerful operation in DynamoDB that finds items based on the primary key values. It uses the KeyConditionExpression parameters to return all the items with the matching given partition key attribute and a single value for that attribute. DynamoDB query enables us to select multiple items that have different sort ("RANGE") keys but with the same partition ("HASH") key.

What is Dynamo DB

It is a hosted NoSQL database provided by AWS (Amazon Web Services). Amazon DynamoDB offers a small, relatively simple API that supports both basic key access and more advanced query patterns; a managed experience, preventing the need to SSH-ing into servers for upgrading the crypto libraries; also provides reliable performance even as it scales.

For the following use cases, in particular, DynamoDB works well:

  • Data Sets with Simple, Known Access Patterns: DynamoDB is a quick, dependable option if you create recommendations and serve them to users because of its straightforward key-value access patterns.
  • Serverless Applications Using AWS Lambda: In response to event triggers, AWS Lambda offers auto-scaling, stateless, ephemeral computing. Building Serverless applications are a perfect fit for DynamoDB query because it performs authentication & authorization via IAM roles and is accessible via an HTTP API.
  • Applications with Large Amounts of Data and Strict Latency Requirements: JOINs and complex SQL operations can make queries take longer as your data volume grows. Your queries will execute with predictable latency with DynamoDB up to and including 100 TBs in size!.

Amazon DynamoDB is a fully managed NoSQL database service that offers quick and dependable performance with seamless scalability. You can free yourself from having to worry about hardware provisioning, setup, replication, software patching, and cluster scaling by using DynamoDB to offload the administrative tasks associated with operating and scaling a distributed database.

Additionally, DynamoDB provides encryption at rest, which removes the complexity and operational hassle associated with protecting sensitive data. See DynamoDB encryption at rest for more information.

Concepts

There are some key concepts that we should know in the Amazon DynamoDB query:

  • Read and Write Capacity;
  • Primary Keys;
  • Secondary Indexes;
  • Tables, Items, and Attributes.

Read and Write Capacity

  • A specific server is provisioned to host your database when you use a database like MySQL, Postgres, or MongoDB.
  • You must decide on the size of your instance, including the number of CPUs, RAM, storage GBs, and other factors.
  • You can automatically scale your read and write capacity units with DynamoDB.
  • As a result, it is much simpler to scale your application up during busy periods and down during off-peak hours to save money.
  • Unlike DynamoDB query, though. You should provision read and write capacity units instead. The number of operations per second allowed by these units is fixed.
  • Pricing can more closely reflect actual usage in this environment, which is fundamentally different from the instance-based world.

Primary Key

A primary key serves as the unique identifier for every item in a table. when adding a new item to the table, the primary key must be provided and the primary key must be defined at the creation of the table.

There are two different kinds of primary keys: a composite primary key composed of a partition key and a sort key and a simple primary key composed only of a partition key.

  1. Composite Primary Key: It is comparatively more difficult to use the composite primary key. The sort key is used to (wait for it) sort items with the same partition. Both a partition key and a sort key are specified when using a composite primary key. An Orders table for tracking customer orders on an e-commerce website is one example. CustomerId would serve as the sort key and OrderId as the partition key.
  2. Simple Primary Key: Using a Simple primary key or Accessing rows in a SQL table by a primary key are both common key-value stores like Memcached. One example would be a Users table with a Username as the primary key.

Keep in mind that, even with a composite key, each item in a table has a primary key that uniquely identifies it. The same partition key may be used for multiple items with different sort keys when using a table with a composite primary key. With a specific partition key and sort key pairing, only one item is allowed.

Secondary Indexes

A table's primary key uniquely identifies each item, and you can use the primary key to perform queries on the table. You may occasionally have additional access patterns, though, that would make your primary key ineffective. Secondary indexes are a concept in the DynamoDB query that allows for these extra access patterns.

There are two types of secondary indexes: a local secondary index and a global secondary index:

  1. Local Secondary Index: An index with a different sort key than the base table's partition key. Every partition of a local secondary index is "local" in the sense that it is restricted to a base table partition with the same partition key value.
  2. Global Secondary Index: A key for an index that can be different from the base table's partition key and sort key. Because queries on the index can cover all of the data in the base table, across all partitions, it is referred to as a global secondary index. A global secondary index scales independently from the base table and is stored in its own partition space separate from the base table.

Tables, Items, and Attributes

The core building blocks of DynamoDB query are Tables, items, and attributes.

  1. Item: An item is a single data record. The table's declared primary key acts as a unique identifier for each item in the table. A specific User would be an item in your Users table. A row in a relational database or a document in MongoDB is similar to an item in a table.
  2. Table: A table is a collection of information records. You might, for example, have a Users table for user information and an Orders table for user order information. A table in a relational database or a collection in MongoDB is similar to this idea.
  3. Attributes: A single item can have multiple attributes. An age attribute that simply stores a user's age could be used here. A column in a relational database or a field in MongoDB is similar to an attribute. Only the attributes that make up your primary key must be present on an item for DynamoDB to be used.

How to Use Query?

Request Syntax

Request Parameters

ParametersTypeConstraintsBriefRequired
TableNameStringMinimum length of 3. The maximum length of 255.the name of the table where the requested items are located.Yes
AttributesToGetArray of stringsMaximum length of 65535.This parameter is a legacy parameter. Instead, use ProjectionExpression.No
ConditionalOperatorStringAND or ORThis parameter is a legacy parameter. Instead, use FilterExpression.No
ConsistentReadBooleanTrue or FalseThe operation uses strongly consistent reads if this value is true; otherwise, it uses eventually consistent reads. This value determines the read consistency model.No
ExclusiveStartKeyString to AttributeValue object mapMaximum length of 65535.This operation will evaluate the first item's primary key. Use the result that LastEvaluatedKey's previous operation's return code provided.No
ExpressionAttribute-NamesString to string mapMaximum length of 65535.One or more tokens that can be used in an expression to replace attribute names.No
ExpressionAttribute-ValuesString to AttributeValue object mapMaximum length of 65535.one or more substitutable values for an expression. Dereference an attribute value in an expression by using the : (colon) character.No
FilterExpressionStringMinimum length of 3. The maximum length of 255.a string that DynamoDB query uses to apply conditions after the Query operation but before returning the data to you. The FilterExpression criteria are not met by items, so they are not returned.No
IndexNameStringMinimum length of 3. The maximum length of 255.a search query's index name. This index may be a global secondary index or a local secondary index on the table. Notably, you must also supply TableName if you use the IndexName parameter.No
KeyConditionExpressionStringMinimum length of 3. The maximum length of 255.the condition that details the key values for the items that the Query action will retrieve. A single partition key value must be subject to an equality test in the condition.No
KeyConditionsString to Condition object mapMaximum length of 65535This parameter is a legacy parameter. Instead, use KeyConditionExpression.No
LimitIntegerMinimum value of 1The most items possible for evaluation (not necessarily the number of matching items).No
ProjectionExpressionStringMinimum length of 3. The maximum length of 255.the name of one or more attributes to retrieve from the table as a string. Scalars, sets, or JSON document elements can be included in these attributes. Commas must be used to separate each attribute in the expression.No
QueryFilterString to Condition object mapMaximum length of 65535.This parameter is a legacy parameter. Instead, use FilterExpression.No
ReturnConsumed-CapacityStringINDEXES, TOTAL, or NONEdetermines how much information is provided in the response about throughput consumption, either provisioned or on demand.No
ScanIndexForwardBooleanTrue or Falsedefines the index traversal order: The traversal is carried out in ascending order if TRUE (the default); FALSE if it is carried out in descending order.No
SelectStringMinimum length of 3. The maximum length of 255.The attributes that will be included in the output. All the number of matching items and particular item attributes can be retrieved.No

Response Syntax

Response Elements

ParametersTypeBrief
ConsumedCapacityConsumedCapacity objectthe number of capacity units that the Query operation used. The information returned includes statistics for the table and any involved indexes, as well as the total provisioned throughput consumed.
CountIntegerCount is the number of items returned following the application of the filter, and ScannedCount is the number of matching items before the application of the filter if a QueryFilter was used in the request.
ItemsArray of string to AttributeValue object mapsAn array of item attributes that meet the search criteria. This array's elements each contain an attribute name and its associated value.
LastEvaluatedKeyString to AttributeValue object mapthe item's primary key, inclusive of the previous result set, at which the operation terminated Start a new operation using this value, but leave it out of the new request.
ScannedCountIntegerBefore any QueryFilter is applied, the number of items that were evaluated. A Query operation is inefficient when the ScannedCount value is high but there are few or no Count results.

Errors

  • ResourceNotFoundException - HTTP Status Code: 400, An attempt was made to access a table or index that didn't exist. The resource may not be properly specified or its status may not be ACTIVE.
  • RequestLimitExceeded - HTTP Status Code: 400, The throughput amount for your account has been exceeded. To request a quota increase, please get in touch with AWS Support.
  • ProvisionedThroughputExceededException- HTTP Status Code: 400, Your request volume is extreme. Requests that encounter this exception are automatically retried by the AWS SDKs for the DynamoDB query. Unless your retry queue is too big to finish, your request is eventually successful. Limit the number of requests per second and employ exponential backoff. Visit Error Retries and Exponential Backoff in the Amazon DynamoDB query Developer Guide for more details.
  • InternalServerError - HTTP Status Code: 500, An error occurred on the server side.

Examples

1. Count Items

This example returns the number of items for a specific forum in the Thread table.

Sample Request

Sample Response

2. Retrieve a Range of Items

The example that follows searches a forum's Reply table for comments made by specific users. For quick lookups on these attributes, the Reply table has a local secondary index called PostedBy-Index.

Which attributes are returned is determined by the ProjectionExpression parameter.

Sample Request

Sample Response

Querying Tables and Indexes: Java

You can query a table or a secondary index in Amazon DynamoDB query using the Query operation. A partition key value and an equality test must be supplied. By including a sort key value and a condition, you can narrow the results if the table or index has a sort key.

The steps to retrieve an item using the AWS SDK for Java Document API are as follows.

  1. Create a `DynamoDB class instance.
  2. To represent the table you want to work with, create an instance of the Table class.
  3. Call the Table instance's query method. Along with any optional query parameters, you must specify the partition key value of the items you want to retrieve.

An ItemCollection object containing all of the items the query returned is included in the response.

The tasks mentioned earlier are shown in the following Java code example. The example presumes that your system has a Reply table that stores forum thread replies.

Each forum thread has a distinct ID and is capable of receiving one reply or more. Therefore, the forum name and forum subject are both included in the Id attribute of the Reply table. The composite primary key for the table is made up of the partition key Id and the sort key ReplyDateTime.

The following query returns all comments for a particular thread subject. The table name and the Subject value are both necessary for the query.

Querying Tables and Indexes: .NET

You can query a table or a secondary index in Amazon DynamoDB query using the Query operation. A partition key value and an equality test must be supplied. By including a sort key value and a condition, you can narrow the results if the table or index has a sort key.

The steps to query a table using the basic AWS SDK for .NET API are listed below.

  1. Create an AmazonDynamoDBClient class instance.
  2. Create a new instance of the QueryRequest class and enter the parameters for the query operation...
  3. Use the QueryRequest object you created in the previous step to call the Query method. The QueryResult object, which offers all the results of the query, is part of the response.

The below example of C# code shows the preceding tasks. The code assumes that you have a Reply table with replies to forum threads.

Each forum thread has a distinct ID and is capable of receiving one reply or more. As a result, the Id (partition key) and ReplyDateTime make up the primary key (sort key).

The following search returns all comments for a particular thread subject. Both the table name and the Subject value are necessary for the query.

How to Query Amazon DynamoDB?

Hash Key in DynamoDB

The main reason for that complexity is that the DynamoDB query requires the hash key to be queried. Consequently, it is prohibited to query the entire database. which means you cannot perform what would be known as a full table scan in another database.

The sort key is a second attribute that can be included in the primary (partition) key. The essential search requirement is = (equals). However, the sort key operators include:

  • 〉=
  • 〈=
  • _
  • between
  • begins_with

Each query consists of two separate parts:

  1. The filter expression
  2. The partition key hash, the key condition query, and optionally the sort key.

Load Sample Data

Below are some examples, but first, some data is required:

  1. Install and run DynamoDB locally.
  2. Install node to enable JavaScript execution.
  3. Using the npm module from the node, install the Amazon SDK using npm install aws-sdk

Run these programs from the Amazon JavaScript examples:

  1. Run MoviesCreateTable.js to create the Movies table.
  2. Unzip the sample data downloaded from this page.
  3. Run MoviesLoadData to load some data.

Inspect the Data

Take a look at the data file's header. It is a movie database with arrays and nested JSON fields. It is therefore intended for use as a teaching exercise.

Describe Table

You can check to see if the data was loaded by using:

Query Structure

For query execution, we'll use the Amazon CLI command line interface. The principles apply whether you're using one of the programming languages SDKs or another. Only the syntax varies throughout.

There are two components to queries:

  1. Filter expression
  2. Key condition expression

Filter Expressions for Query

Keep in mind that the key condition is used to choose a set of records based on how they are divided. A subset of the database is pulled, and that subset is then queried in the filter expression.

In the example below, we want to display all 2010 movies with a rating of at least 8.5. The year is obtained by the (1) key condition, and the (2) filter expression enables you to query by rating all 2010 movies. Because there are less data to query, it is designed this way for speed.

Similar to the key condition, the filter expression consists entirely of an attribute on the left, an operator in the middle, and a placeholder on the right. In other words, it isn't JSON, even though JSON is used elsewhere.

Key Condition Expressions for Query

The partition key and, optionally, the sort key may both be present in the key condition expression. DynamoDB refers to this primary key as the partition key. A second key, referred to as the sort key, is an additional option. Year is the partition key name and Title is the sort key in the movies database. Only equals to (=) are supported in the partition key query expression. However, the sort key operators include:

  • 〉=
  • 〈=
  • _
  • between
  • begins_with

The word "year" is a reserved word in the Amazon sample data. Therefore, we must address that complexity right away.

Below is the query. As the continuation character in the bash shell, we use backslashes (\) to end each line. Similarly, we enclose JSON in a single quote (') to allow it to span multiple lines.

This expression is formatted as partition key name = placeholder.

Additionally, a sort key could be added, which in this database is the title. However, since we are looking for a title rather than searching by one, we don't want it here.

Retrieving All Items With a Given Partition Key

As discussed earlier, Whether retrieving a User's profile or changing a User's name, we typically manipulate one User at a time.

In other circumstances, such as when working with Orders, it is less beneficial. Sometimes we want to grab a specific Order, but other times we might want to show all the Orders for a specific User. For each User's Orders, it would be inefficient to store the various partition keys and then query those items separately.

Let's see how we can fulfill the latter request by making a Query API call. Our daffy duck User's Orders will first be fully retrieved. What matters to notice here is the —key-condition-expression option. We decide which options to choose in this manner.

DynamoDB Scan VS Query – When to Use What?

DynamoDB query provides two methods for accessing stored data: Query and Scan.

DynamoDB QueryDynamoDB Scan
A query can quickly and effectively retrieve a specific item (or set of items) by pointing directly to it.As the name implies, scanning will browse every item on the table. The direction of the scanning order can be determined using the sort key.
In a query, a direct lookup to a chosen partition is carried out using the primary or secondary partition's hash key.While Scan "scans" the entire table in search of elements that meet the criteria,
The most effective way to retrieve data from a DynamoDB table is by querying by the primary keyRunning a Scan is costly and ineffective, so it should almost always be avoided.
While Query usually returns results within 100msScan might even take a few hours to find the relevant piece of data.
KeyConditionExpression parameter which is required in Query operationFilterExpression can be used to narrow down the results.

When to Use What?

In general, you should always choose Query over Scan. If it's a frequently used pattern and it's not possible (for instance, when you're trying to find a piece of data with a key you don't know), think about adding a GSI (Global Secondary Index) to index that attributes and enable Query. Use Scan as a last resort.

dynamodb scan and query operations

The use of DynamoDB Scan and Query operations is fundamentally different, despite their closer resemblance. You can retrieve all of your data using scans, but they can be expensive and perform poorly. Queries, which are much better than scans because they let you retrieve data by partition key, can, however, restrict how you can access your data.

For the reasons listed above, it's important to carefully design your table so you can take advantage of Query or GetItem operations as much as possible. By doing this, you can keep costs to a minimum while still getting the best results, regardless of scale.

Conclusion

  • The Amazon DynamoDB query is a very powerful operation in DynamoDB that finds items based on the primary key values.
  • DynamoDB query enables us to select multiple items that have different sort (“RANGE”) keys but with the same partition (“HASH”) key.
  • DynamoDB is a quick, dependable option if you create recommendations and serve them to users because of its straightforward key-value access patterns.
  • You can query a table or a secondary index in Amazon DynamoDB query using the Query operation. A partition key value and an equality test must be supplied.
  • The main reason for that complexity is that the DynamoDB query requires the hash key to be queried. Consequently, it is prohibited to query the entire database.
  • The most effective way to retrieve data from a DynamoDB table is by querying by the primary key and we use the Amazon CLI command line interface for query execution.
  • Queries, which are much better than scans because they let you retrieve data by partition key, can, however, restrict how you can access your data.