Compare databases on AWS
This post is my note taken from the Udemy course Ultimate AWS Certified Solutions Architect Associate SAA-C03 instructed by Stephane Maarek.
There are a number of database services out there on AWS. Choosing them correctly is related to a number of dimensions:
- The characteristic of your data throughput (read-heavy or write-heavy, does the traffic load fluctuate etc.)
- The size of your data and desired retention period
- What the source of truth for the data is
- Latency and concurrency requirement
- Data schema requirement
- License cost
Depending on those different factors, AWS generally provides these options:
- RDBMS (Relational Database Management System) = SQL / OLTP, such as RDS or Aurora
- NoSQL databases, such as DynamoDB, ElastiCache
- Object Store, such as S3
- Data Warehouse, such as Redshift (or OLAP), Athena, EMR
- Search type, such as OpenSearch
- Graph type: such as Amazon Neptune
- Ledger type: such as Amazon Quantum Ledger Database
- Time series type: such as Amazon Timestream
General comparison
Database | Type | Use Case |
---|---|---|
RDS | Managed SQL DB (PostgresSQL, MySQL ec.) | Relational DB |
Aurora | PostgresSQL / MySQL compatible Amazon SQL DB | Same as RDS but less maintenance |
ElastiCache | Managed Redis or Memcached | In-memory data store with extremely low latency |
DynamoDB | Amazon Proprietary NoSQL DB | Great to rapidly evolve schemas |
DocumentDB | Amazon version of MongoDB | Similar with DynamoDB, but in MongoDB norm |
S3 | Key value object storage | Static files, big files, website hosting |
Neptune | Amazon graph database | Fraud Detection, recommendation, social network… |
QLDB | Managed serverless immutable DB | review change history, good for financial regulation |
Keyspaces | Managed serverless version of Apache Cassandra (NoSQL) | IoT devices info, time-series data |
Timestream | Managed serverless time series database | Trillions events per day, IoT, real-time analytics… |
- All databases above support auto-scale, though their auto-scale capability varies. Some scale with limitation, while some others scale infinitely
- All databases above support Multi-AZ, but S3 provides a One Zone-IA option which only stores data in one availability zone
More details of important options
RDS
- Requires provisioning beforehand
- Automated backup with Point in Time restore up to 35 days
- Manual DB snapshot for longer-term recovery
- IAM authentication supported
- Managed maintenance (to patch the underlying instances) with downtime, can be scheduled
- The RDS Custom option is available for Oracle and SQL Server, which allows user accessing the underlying instances (not allowed for other RDS types)
- Performance: RDS Proxy to reuse DB connection and decrease the fail-over time
Aurora
- Mostly same with RDS, but only compatible with PostgresSQL and MySQL
- Storage and compute are separated
- Same security, monitoring and maintenance feature as RDS
- For unpredictable workloads, there is an option Aurora Serverless
- For continuous writes fail-over, Aurora Multi-Master
- For global availability, Aurora Global (up to 16 read instances in each region and < 1s replication)
- To perform ML using SageMaker and Comprehended on Aurora, Aurora Machine Learning
- To replicate data, Aurora Database Cloning
ElastiCache
- Must provision an EC2 instance type
- Backup / Snapshot / Point in time restore supported
- Managed and scheduled maintenance
- Requires some application code changes
DynamoDB
- With TTL feature, DynamoDB can replace ElastiCache as a key/value store
- Millisecond latency, DAX cluster (read cache) further reduces the read latency to microsecond level
- Event Process streaming: DynamoDB Streams or Kinesis Data Streams
- Global Table feature: active-active
- Automated backup up to 35 days with PITR (restore to new table)
- Manual backup for longer-term recovery
- Exporting and importing to and from S3 is supported within the PITR window. This won’t cost the RCU or WCU
- No SQL query language supported
S3
- Good for bigger objects, not so good for many small objects
- Special features: versioning, encryption, replication, MFA delete, access logs
- Security control: IAM, bucket policy, ACL, access points, object lambda, CORS, obejct/value lock
- Encryption: SSE-S3, SSE-KMS, SSE-C, client-side, TLS in transit and optional default encryption
- Batch operations supported
- Performance: multi-part upload, S3 Transfer Acceleration, S3 Select
- Automation: S3 Event Notification (SNS, SQS, Lambda, EventBridge)
DocumentDB
- MongoDB is used to store, query and index JSON data
- Same deployment concepts as Aurora
- Storage automatically grows in increments of 10GB, up to 64TB