In this blog, we describe the different types of AWS’ managed databases and their various features and merits. By the end of the blog, you should have better information to choose the right AWS database that would match your application’s needs.
Evolution of Database Servers
The initial database servers used just flat files for storage and access and were cumbersome and slow. Relational databases emerged to improve data access and support different types of relationships between stored data and different types of accesses. The two main types of databases traditionally used in the IT world are Relational (SQL) and NoSQL database management systems. While both are equally valuable, few essential distinctions may make one preferable over the other as the case demands. Relational database systems had challenges dealing with unstructured data, and NoSQL database technology emerged to handle large amounts of unstructured data.
4 Major Types of NoSQL Databases:
- Key-value databases – Simplest of all NoSQL architectures. The model is based on keys (identifiers for looking up data) and values (data that is associated with keys).
- Document databases – Organize groups of key-values into collections of data items known as a document. Documents are stored together in a flexible structure.
- Column family databases – Share superficial similarities to relational databases such as rows and columns. They trade off some of the RDBMS functionality, such as the ability to link or join tables, for improved performance.
- Graph databases – Well suited to model objects and relationships between objects. Instead of using columns and rows, a graph database uses specialized structures called nodes and relationships. They are usually optimized for OLTP (transactional performance) and built for ensuring transactional integrity and operational availability.
Database Service Choices with Amazon
If you are migrating to or hosting applications in the Amazon cloud, you have many different options for cloud-based database solutions. The three most common ones are:
- Amazon Relational Database Service (RDS).
- Amazon Aurora.
- Amazon DynamoDB.
In this blog, we will discuss each of these services in detail and recommend what situations each service may fit best.
Amazon RDS
Amazon Relational Database Service (RDS) is a managed SQL database service provided by Amazon Web Services (AWS). Amazon RDS automates all the time-consuming administration tasks such as provisioning, setup, patching, and backups and makes it easy to set up, operate, and scale a relational database in the cloud.
You get support for the most popular relational database servers, including Oracle, Microsoft SQL Server, MariaDB, MySQL, and PostgreSQL. RDS makes it easy to move on-premise database workloads to the cloud and helps automate and offload all the time-consuming administrative tasks associated with managing databases.
Many organizations look to adopt RDS because Amazon handles high availability, recovery, backups, and patching.
- With Multi-AZ deployment, Amazon RDS automatically creates synchronous master/slave pairs across availability zones. If an unplanned outage happens, RDS can automatically failover to a standby replica in a different availability zone. This provides several benefits: data redundancy, failover support and minimal latency during system backups.
- AWS RDS read replicas provide the benefit of having multiple read-only copies of your database instance within the same or different AWS region. Any updates made to the source database are asynchronously copied to the read replicas. You get multiple benefits: fault-tolerant availability, load balancing for high-volumes of read traffic, scalability for read-heavy database workloads. You also have multiple DB support – read replicas are available MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server as well as Aurora.
Amazon Aurora
Amazon Aurora is a relational database engine that can be run on RDS or as Aurora Serverless. Aurora is MySQL and PostgreSQL compatible but has significantly higher performance stats compared to MySQL (up to 5 times) and PostgreSQL (up to 3 times). Aurora databases can be set up quickly, and applications are configured to access Aurora databases using existing code, drivers, and programs with minimal changes thanks to its compatibility with MySQL and PostgreSQL. Aurora is designed for fault tolerance, availability, and storage elasticity and can be set up with cross-region read replicas.
Amazon DynamoDB
DynamoDB is Amazon’s NoSQL database solution. Like other NoSQL databases, DynamoDB is commonly used to handle big data – large volumes of unstructured or semi-structured data.
Amazon DynamoDB is a fully managed NoSQL database service that supports document and key-value data models. Availability and fault tolerance are built in with automatic backup and restore, security, and multiregion, multimaster distribution along with in-memory caching.
DynamoDB is an ideal fit for internet-scale mobile, web, gaming, IoT, retail, media, and entertainment applications that require single-digit millisecond low latency data access and need to support petabytes of data. DynamoDB can automatically scale up/ down, and provides ACID transactions support. Your DBAs do not need to provision, patch, or manage servers. There’s no software to install, maintain, or operate.
There are two key differentiators in DynamoDB where it scales much better than traditional RDBMS:
- Schema flexibility lets DynamoDB store complex hierarchical data within a single item.
- Composite key design lets it store related items close together on the same table.
If you use AWS as the primary cloud infrastructure for your cloud-native applications, DynamoDB is a good choice for workloads that fit the low-latency/ high-traffic use case.
When is DynamoDB not a good fit?
- You need ad hoc query access: It is cumbersome to implement entity relationships across DynamoDB tables.
- You are building a data warehouse: You need a normalized (relational) view of your data. This use case is best implemented using a relational database.
- You need to store documents, images or music files: You need Binary Large Object(BLOB) storage. While it can store binary items up to 400KB, DynamoDB is not generally suited to storing documents or images.
How to choose the Right Database Service Option?
The following table lists the key differences between RDS and DynamoDB. The right option depends on your scalability, storage, and pricing needs.
Features | RDS | DynamoDB |
---|---|---|
Database type | Relational database management system (RDBMS) | Non-relational |
Structure | Tables with rows and columns | Collection of JavaScript Object Notation (JSON) documents, key-value, graph, or column |
Schema | Predefined | Dynamic |
Scale | Vertical | Horizontal |
Language | SQL structured | Java, .NET, Node.js, and JavaScript in the browser |
Performance | Suitable for online analytical processing (OLAP) | Built for online transaction processing (OLTP) at scale |
Optimization | Optimized for storage | Optimized for read/write |
Scalability/ Replication:
- If you have heavy write workloads and require more than five read replicas, Aurora is a better choice. Since Aurora uses shared storage for writer and readers, there is minimal replica lag. RDS allows only up to five replicas and the replication process is slower than Aurora.
- If your scaling needs are for standard/ general purpose applications, RDS is the better option. You can auto-scale the database to max capacity with just a few clicks on the AWS console.
- You also have the option of Aurora Serverless that can scale up or scale down well, you have to be aware of several restrictions that apply in the Serverless mode.
- If you have to handle a very high volume of read/write requests, DynamoDB is a better choice. It scales seamlessly with no impact on performance. You can run these database servers in on-demand or provisioned capacity mode.
Storage:
- RDS allows for 64 TB for most engines but only allows 16 GB for SQL Server.
- Aurora’s max capacity is 128 TB.
- DynamoDB has limitless storage capacity.
Pricing:
- In general, given the same workloads, Aurora might be more expensive than RDS. With RDS, you have the same predictable and easy to understand model as EC2. You pay a fixed per-hour cost for every instance. The Aurora pricing model is difficult to predict because you pay for I/O operations (per million request increments) in addition to the hourly cost.
- DynamoDB has two pricing models: pay-per-capacity (i.e. on-demand) and provisioned capacity.
Here are the key differences:
- In the on-demand model, users will be charged per read/write. AWS charges $1.25 per million writes, and $0.25 per million reads.
- In the provisioned model, you get a certain throughput for your DynamoDB table. Example: For 100 Read Capacity Units (RCUs), you get 100 strongly-consistent 4KB reads per second. There is also a variation of the provisioned model called ‘Reserved Provisioned Capacity’ which is similar to AWS’ reserved instances in that you pay a discounted rate for committing to a certain amount of read/write usage upfront.
Note that the table design in DynamoDB is key since poor design can have cost implications.
For example, table scans to find all items of a specific type can get expensive on large tables if the items you need to update are sparse. It is important to understand application access patterns and design accordingly.
Conclusion
- If you only need a managed solution for database instances, choose RDS to use existing applications or architecture models with minimal changes.
- If you are looking for a native High Availability (HA) solution for a read-intensive workload within an HA environment – Aurora is a good choice.
- If you require a low-latency response to high-traffic queries and use AWS as the primary cloud infrastructure, strongly consider DynamoDB since it makes technical and economic sense.
Which database you use depends on the needs of your applications. Many organizations use a combination of different database types to handle different application workloads.
Irrespective of which database engine you use, performance monitoring will be important once your applications are deployed in production. Database monitoring has to be done from two perspectives:
- The database administrator’s perspective to make sure that the database server is sized correctly and tuned for maximum performance.
- The application operations team’s perspective is to identify the slowest queries that are causing application slowness.
Using an integrated application and infrastructure monitoring approach, eG Enterprise addresses the key needs of both these perspectives from a single console.