AWS DocumentDB
Overview
AWS introduced the AWS DocumentDB as a fast, reliable, and fully managed database service that offers easy-to-store, query, and index JSON data. With the AWS DocumenetDB, you can easily set up, and operate, as well as scale MongoDB-compatible databases in the cloud. It can scale the JSON workloads without any stress with its fully managed document database service.
What is AWS DocumentDB?
You might have come across the popularity of document databases recently. Many developers are using the new document model format they use for their application code. Sometimes the data model is not considered and normalized in a rows and columns format, this is where the document databases demand arises. Data gets easily represented as a JSON document which makes it more powerful, and offer intuitive APIs to the developers for fast, flexible, and agile development.
Keeping with the changing needs, AWS introduced the AWS DocumentDB as a fast, reliable, and fully managed database service that offers an easy store, query, and index JSON data. With the AWS DocumentDB, you can easily set up, operate, as well as scale MongoDB-compatible databases in the cloud. It can scale the JSON workloads without any stress with its fully managed document database service.
Few of the key features are listed below:
- Ability to achieve 99.999999999% durability offering automatic replication, strict network isolation, and continuous backup.
- Can be used with the existing MongoDB drivers and tools having Apache 2.0 open-source MongoDB 3.6 and 4.0 APIs.
- A fully managed database service used for operating mission-critical MongoDB workloads.
- Independently scale the computing and storage which helps to support millions of document read requests per second.
- Capability to automate hardware provisioning, setup, patching, as well as other database management tasks.
- Offers 750 hrs of db.t3.medium instances at no extra cost for one month if the users are using the AWS Free Tier
Benefits of AWS DocumentDB
Listed below are the benefits that AWS DocumentDB offers to its customers to start unleashing those and implement wherever the use case permits:
MongoDB-compatible:
You can now work effortlessly with your existing MongoDB drivers and tools with Amazon DocumentDB, as it implements the Apache 2.0 open-source MongoDB 3.6 and 4.0 APIs. This happens as it mimics the responses just like the MongoDB server sends to the MongoDB client. You can easily update the application similar to updating the database endpoint with the new Amazon DocumentDB cluster.
Fully managed:
You don't have to worry about database management tasks, like patching, setup, hardware provisioning, configuration, backups, or scaling when you implement Amazon DocumentDB as your database. It shall automatically and continuously monitor as well as back up the database to AWS S3. This enables point-in-time recovery for the last 35 days.
Performance at scale:
You get better performance with Amazon DocumentDB when you need to scale, helping the user achieve twice the throughput that you get from the currently available MongoDB managed services. During the increase in traffic, the Amazon DocuentDb is capable of scaling up and using a fault-tolerant, distributed, and self-healing storage system which offers the potential to scale up to 64TB per data cluster. The architecture of Amazon DocumentDB is such that it separates the storage and compute and allows scaling independently, offering you to scale the read capacity to millions of requests per second by adding up to 15 low latency read replicas across three Availability Zones in minutes.
How AWS DocumentDB Works?
Now we shall learn how AWS DocumentDB works. Below we discuss the working based on two factors that are,
- Migration of the self-managed MongoDB workloads to Amazon DocumentDB.
- Store, query, and index the JSON data
Let us, deep dive, into each of them.
Migration of the self-managed MongoDB workloads to Amazon DocumentDB:
The above diagram shows, that it gets quite a difficult, expensive as well as time taking process if you have to self-manage the MongoDB databases. But with Amazon DocumentDB, you can easily set up, secure, and scale MongoDB-compatible databases without any stress and worrying about configuring backups, running cluster management software, or monitoring the production workloads.
Store, query, and index JSON data:
As shown in the diagram, it becomes a lot easier to insert, query, index, and perform aggregations over JSON data with the Amazon DocumentDB ( a type of NoSQL document database). These days modeling the application data as JSON is quite popular and intuitive for developers as JSON has become a de facto format for data exchange. Developers can iterate faster and be more productive, by storing and querying the JSON data in its natural format.
What is NoSQL?
Let us discuss in detail the concept of the NoSQL database and what it is in this section to understand the concepts around DocumentDB a bit deeper. We can define the NoSQL database as a nonrelational type of database where you can run queries via APIs, declarative structured query languages, and query-by-example languages. NoSQL data offer ease of development, functionality, scalability as well as better price at scale. The major applications of NoSQL databases are found in real-time web applications and big data.
Benefits
NoSQL databases can be imported with many modern applications like mobile, web, and gaming. These scenarios took advantage of the benefits of the NoSQL database which we shall discuss below:
- Highly functional:
Ideal for big data, online shopping, gaming, IoT, customer 360, social networks, real-time web apps, and online advertising applications, NoSQL databases offer highly functional APIs that are intelligently crafted and designed for distributed data stores with large data storage needs. - Scalability:
NoSQL databases scale out when the traffic increase to support the meet demand with zero downtime. By using distributed clusters of hardware it scales up to add more servers which provides the ability to support increased traffic. - Flexibility:
With the flexibility of handling any type of data structure( structured, semi-structured, and unstructured data ) for a single data store is what makes the NoSQL databases one of a kind. It also offers flexible schemas which provide faster and more iterative development to its users. - High Availability:
With reduced latency, NoSQL databases help to provide high availability to their users by replicating the data across various data centers, servers, or cloud resources. - High-performance:
Offers high performance with its scale-out architecture as it expands or adds more functional units when the data volume or traffic increases. The response times are in milliseconds, due to which the NoSQL databases can ingest data and quickly deliver the data reliably. This feature is highly unleashed in applications that collect terabytes of daily data.
Types
Let us take our discussion on NoSQL data a bit deeper by understanding the different types of NoSQL databases that exist:
The different types of NoSQL databases are shown in the diagram below and explained in further discussion:
- Key-Value:
The most flexible type of NoSQL database is the Key-Value NoSQL database as it offers highly partitionable as well as permits horizontal scaling. The value field is controlled by the application which has the full right to determine what can be stored in the value field without any limitations or restrictions. The AWS DynamoDB (based on key-value data model format) offers milliseconds latency. - Search:
Scenarios such as near-real-time visualizations as well as analytics on machine-generated data (through searching, aggregating, and indexing the semi-structured logs collected) are the foundation on which the search database was developed. The AWS Elasticsearch Service (also known as Amazon ES) is a great example of a search database. - In Memory:
For ultra-fast performance and durability, the In-MemoryDB is a highly accepted form of a database. Modern, microservices applications are some of its famous use cases. AWS offers the Amazon ElastiCache (fully managed, in-memory caching service) which is based on this in-memory database concept. - Graph
We use the Graph databases where we have the use case related to social networks, reservation systems, and fraud detection. In this database, the data is organized as nodes, and a link between them is represented as relationships that help in running applications that have highly connected datasets. Amazon Neptune(a fully managed graph database service) is the best example where this type of Graph database is the data model offered. - Document:
For scenarios dealing with a lot of semi-structured datasets, the Document NoSQL database is best suited for this use case. You can store, retrieve as well as manage semi-structured data with a Document database. The application data is stored and represented in the form of JSON-like or any object. The AWS MongoDB and Amazon DocumentDB are widely known document databases offering flexible and iterative development with powerful and intuitive APIs.
The below diagram shows a few of the most predominant types of NoSQL databases:
Relational vs NoSQL Database
Below are a few of the key differences that we shall find between a Relational Database compared with a NoSQL Database:
Key Properties | NoSQL Database | Relational Database |
---|---|---|
ACID (atomicity, consistency, isolation, and durability) Properties | We do not have the ACID properties with NoSQL Database. | We can perform the ACID properties with Relational Database. |
Read/Write operations | Offers both read and write scalability. | Offers only read scalability. |
APIs | To store as well as retrieve data, NoSQL databases use APIs. Here, with the partition keys, the applications can look up column sets, key-value pairs, or semi-structured documents. | To store as well as retrieve data, relational databases use SQL. |
Workloads | NoSQL databases are designed for data access patterns that include low latency and are used for applications with semi-structured data. | Relational Databases are designed for Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) applications |
Scale | Highly scalable | Low scalable. |
Performance | Performance of NoSQL databases is dependent on the underlying network latency, hardware cluster size, as well and the calling application. | Performance for Relational databases is highly dependent on the optimization of indexes, queries, and table structures. |
Types of data | Can manage structured, unstructured as well as semi-structured data. | Can manages only structured data. |
Data Model | The NoSQL databases works on various data models like document, key-value, and graph, which are known for their high performance and scaling capability with this optimization in pace. | The Relational database works on the relational model which normalizes the data into tables containing rows and columns. The referential integrity in relationships between tables is implemented. |
Schema | No fixed schema is followed | Follows a fixed schema to give the table a structure. |
AWS DocumentDB vs DynamoDB vs MongoDB
Below we have compared the AWS DocumentDB with AWS DynamoDB and AWS MongoDB to understand in detail the difference between the three.
While Amazon DocumentDB is a NoSQL database that is based on open-source MongoDB which requires less development as well as scaling whereas DynamoDB is a fully managed NoSQL database service offering fast and predictable performance along with high scalability. We can also store and retrieve huge amounts of data and hence primarily used to create a table.
Both AWS DocumentDB and MongoDB use JSON-like documents for storing schema-free data which allows them to create documents without the stress of handling the structure of the document. While with DynamoDB, primary keys are utilized to identify each item in a table, along with the secondary indexes which offer more flexibility. The DynamoDB utilizes the tables, items, and attributes as its core components.
For both DocumentDB as well as MongoDB, Indexes are preferred. This means when a situation where the index is missing, each document is searched to choose the documents concerning the request made by the query, which impacts the read performance and slows it down. While with DynamoDB, a secondary index is always created depending on the key attributes, and this is widely utilized for querying or scanning.
Summing up, it is no doubt that the AWS DocumentDB offers easy to query, store, retrieve, and index JSON data.
Use Cases of AWS DocumentDB
A few of the popular use cases of AWS DocumentDB are described below:
Mobile and web applications:
With AWS DocumentDB, you can lower your operational burden and build high-performance mobile and web applications with the ability to scale to process millions of user requests per second providing millisecond latency and enabling unique experiences for the customers. The development time is reduced, as implementing the AWS DocumentDB’s flexible indexing, document model, and data types offers the capability to iterate quickly on the applications.
Content and catalog management:
To serve the customers, various online publications, point-of-sale terminals, digital archives, shopping sites, and self-service kiosks have a dependency on content and catalog management systems. Requiring fast and reliable access to ratings, images, user reviews, product information, comments, etc which can be efficiently achieved with AWS DocumentDB’s data types, flexible document model, and indexing. We can store and quickly query the content (like the user reviews and demo videos on shopping websites) and catalogs ( such as financial trades for trading platforms and inventory lists for point-of-sale terminals ) intuitively.
Profile management:
As the number of users increased, we now see complex user profile data with demand for better user experience expectations, companies need to have high performance, scalability, and data flexibility. With user profile management, user preferences, online transactions, as well as user authentication have grown increasingly. By implementing AWS DocumentDB’s document data model, we can efficiently manage profiles & preferences for millions of users & provide millisecond latency while scaling the user requests per second for millions of users.
Pricing Structure of AWS DocumentDB
Now let us, deep dive, into the pricing structure of the AWS DocumentDB. No extra upfront cost is incurred with pay as you go pricing structure offered by AWS DocuemntDb.
While we do have four major types of pricing structures offered by AWS Docuemnetdb to choose from as per our requirements:
- On-Demand instances
- Database storage and IOs
- Backup storage
- Data transfer
Let us discuss one by one in detail.
On-Demand instances:
With the On-Demand Instances, you get charged based on the hour with no long-term commitments or any hidden upfront fees. For development, testing, and other short-lived workloads, the On-Demand instances offer freedom from any cost as well as the complexity of planning and purchasing the database capacity ahead of the requirements. From the time the instance is launched, the pricing is per instance-hour consumed until the instance gets stopped or deleted. Whereas, partial instance hours are charged for one-second increments, with a 10-minute minimum charge for a billable status changes like modifying, creating, or deleting an instance.
Below is the pricing structure offered based on ON-demand instances:
Database storage and IOs:
For pricing structure based on Database storage and IOs is charged concerning the storage consumed by the AWS DocumentDB cluster per GB-month. Also, I/Os consumed are charged per million I/O requests where the reads and writes to the storage volume are counted as the billable I/O requests like change streams and TTL indexes. The billable storage includes your indexes, data, and change stream data. Lastly, you don't need to provide any of these instances in advance and you only pay for the storage and IOs that the AWS DocumentDB cluster consumes.
Below is the price chart for Database storage and IOs:
Backup storage:
The pricing structure for the backup storage is allocated by region. We define the backup storage as the storage linked with the automated cluster backups along with any manual cluster snapshots. No additional costs are incurred for backup storage of up to 100% of the total AWS DocumentDB cluster storage for the defined region. Once you increase the retention period of the backup or if the manual cluster snapshots are increasing, more backup storage gets consumed. The sum of the storage for all backups in a defined region is equal to the total backup storage space. Hence, when you copy a snapshot from one region to another, it increases the allocated backup storage in that target region.
Also, no additional charge is incurred for backup storage when the backup retention period is 1 day and you have not taken any manual snapshots beyond the backup retention period.
The rates at which your billing can happen are detailed below, irrespective of whether the cluster is active or has been deleted:
Data Transfer
The pricing structure based on the data transfer is categorized for the amount of data that is transferred “in” and “out” of the AWS DocumentDB.
No charge is incurred when the data is transferred between AWS DocumentDB and AWS EC2 Instances in the same Availability Zone. Also, the data transfer between Availability Zones for replication of Multi-AZ deployments is free.
For the pricing structure of AWS DocumentDB database instances, both inside and outside VPC, lets us discuss below:
- Outside VPC:
If data is transferred( in or out ) between an AWS EC2 instance and an AWS DocumentDB database instance in different Availability Zones but in the same Region, then no cost is incurred. Whereas when the data transfer happens in or out of the Amazon EC2 instance, then standard Amazon EC2 Regional Data Transfer charges are incurred. - Inside VPC:
If data is transferred( in or out )between an AWS EC2 instance and an AWS DocumentDB database instance in different Availability Zones but in the same Region, the AWS EC2 Regional Data Transfer charges are incurred for both sides of data transfer.
Below is the pricing band for Data transfer IN and OUt from AWS DOcumenetDb:
Below is the pricing band for Data transfer from AWS documented to the internet:
FAQs
Below is a few frequently asked question about AWS DocumentDB:
Q. Does AWS DocumentDB have a free tier?
A. No, The AWS DocumentDB does not have a free tier.
Q. What happens to the backups when we delete the cluster?
A. If you create a final snapshot while deleting the cluster, you can then make use of that snapshot and restore the deleted cluster when required. The AWS DocumentDB can retain the final user-created snapshot in addition to all the other manually created snapshots even after the cluster is deleted.
Q. Explain the working of per-second billing.
A. The pricing for the instance is calculated from the time an instance is created until the instance is stopped or deleted. With a 10-minute minimum charge, the instances are changed in one-second increments, following billable status changes like creating, modifying, or deleting an instance.
Q. Can we automatically share snapshots?
A. No, sharing of automatic cluster snapshots is not supported. You need to manually create a copy of the snapshot, and then explicitly share the copy.
How to Develop with AWS DocumentDB?
Now after learning so much about AWS DocumentDB, let us learn to develop with AWS DocumentDB by following the below-given steps.
Creating the AWS DocumentDB Clusters:
-
Start by navigating to the Amazon DocumentDB, then go to the Dashboard where you shall find the Create Cluster button from the AWS Management Console as shown below.
-
Now enter the necessary details such as Cluster identifier, and Engine version, choose a compatible instance class as well as the number of instances you shall require concerning your scenarios as shown below.
-
You can now navigate to confirm that the cluster is created as shown below.
Programmatically Connecting to AWS DocumentDB:
Even though the TLS is enabled by default before programmatically connecting to the AWS docuentDB verified if the TLS is enabled or not.
Now to determine the cluster’s parameter group, run the below command from AWS CLI:
The output you shall see:
To determine the TLS parameter in the cluster's parameter group, run the below command from AWS CLI:
The output you shall see is given below, where you shall see that the “ParameterValue”: “enabled”:
Conclusion
- The AWS DocumentDB is a fast, reliable, and fully managed database service that offers an easy store, query, and index JSON data. With the AWS DocumentDB, you can easily set up, and operate, as well as scale MongoDB-compatible databases in the cloud. It can scale the JSON workloads without any stress with its fully managed document database service.
- The most flexible type of NoSQL database is the Key-Value NoSQL database as it offers highly partitionable as well as permits horizontal scaling. The value field is controlled by the application which has the full right to determine what can be stored in the value field without any limitations or restrictions.
- Both AWS DocumentDB and MongoDB use JSON-like documents for storing schema-free data which allows them to create documents without the stress of handling the structure of the document. While with DynamoDB, primary keys are utilized to identify each item in a table, along with the secondary indexes which offer more flexibility. The DynamoDB utilizes the tables, items, and attributes as its core components.