Amazon DocumentDB - Overview
- Fully-managed (non-relational) document database for MongoDB workloads
- JSON documents (nested key-value pairs) stored in collections (~tables)
- Compatible w/ majority of MongoDB applications, drivers, and tools
- High performance, s calability, and availability
- Support for flexible indexing, powerful ad-hoc queries, and analytics
- Storage and compute can scale independently
- Supports 15 low-latency read replicas (Multi-AZ)
- Auto scaling of storage from 10GB to 64TB
- Fault-tolerant and self-healing storage
- Automatic, continuous, incremental backups and PITR
Document Database
- Stores JSON documents (semi-structured data)
- Key-value pairs can be nested
Relational DB (SQL) | DocumentDB (MongoDB) |
Table | Collection |
Rows | Documents |
Columns | Fields |
Primary key | Object ID |
Nested table / object | Embedded documents |
Index / view / array | Index / view / array |
Why document database?
- JSON is the de-facto format for data exchange
- DocumentDB makes it easy to insert, query, index, and perform aggregations over JSON data
- Store JSON output from APIs straight into DB and start analysing it
- flexible document model, data types, and indexing
- Add / remove indexes easily
- Run ad hoc queries for operational and analytics workloads
- for known access patterns - use DynamoDB instead
DocumentDB Architecture
- 6 copies of your data across 3 AZ (distributed design)
- Lock-free optimistic algorithm (quorum model)
- 4 copies out of 6 needed for writes (4/6 write quorum -data considered durable when at least 4/6 copies acknowledge the write)
- 3 copies out of 6 needed for reads (3/6 read quorum)
- Self healing with peer-to-peer replication, Storage is triped across 100s of volumes
- One DocumentDB Instance takes writes (master)
- Compute nodes on replicas do not need to write/replicate (=improved read performance)
- Log-structued distributed storage layer-passes incremental log records from compute to storage layer (=faster)
- Master + up to 15 Read Replicas serve reads
- Data is continuously backed up to S3 in real time, using storage nodes (compute node performance is unaffected)
DocumentDB Cluster
- Recommended to connect using the cluster endpoint in replica set mode (enables your SDK to auto-discover the cluster arrangement as instances get added or removed from the cluster.
DocumentDB Replication
- Up to 15 read replicas
- ASYNC replication
- Replicas share the same underlying storage layer
- Typically take 10s of miliseconds (replication lag)
- Minimal performance impact on the primary due to replication process
- Replicas double up as failover targets (standby instance is not needed)
DocumentDB HA failovers
- Failovers occur automatically
- A replica is automatically promoted to be the new primary during DR
- DocumnetDB flips the CNAME of the DB instance to point to the replica and promotes it
- Failover to a replica typically takes 30 seconds (minimal downtime)
- Creating a new instance takes about 8-10 minutes (post failover)
- Failover to a new instance happens on a best effort basis and can take longer
DocumentDB Backup and Restore
- Supports automatic backups
- Continuously backs up your data to S3 for PITR (max retention period of 35 days)
- latest restorable time for a PITR can be up to 5 mins in the past
- The first backup is a full backup. Subsequent backups are incremental
- Take manual snapshots to retain beyond 35 days
- Backup process does not impact cluster performance
DocumentDB Backup and Restore
- Can only restore to a new cluster
- Can restore an unencrypted snapshot to an encrypted cluster (but not the other way round)
- To restore a cluster from an encrypted snapshot, you must have access to the KMS key
- Can only share manual snapshots (can copy and share automated ones)
- Can't share a snapshot encrypted using the default KMS key of the a/c
- Snapshots can be shared across accounts, but within the same region
DocumentDB Scaling
- MongoDB sharding not supported (instead offers read replicas / vertical scaling / stroage scaling)
- Vertical scaling (scale up / down) - by resizing instances
- Horizontal scaling (scale out / in) - by adding / removing up to 15 read replicas)
- Can scale up a replica independently from other replicas (typically for analytical workloads)
- Automatic scaling storage - 10GB to 64TB (no manual intervention needed)
DocumentDB Security - IAM & Network
- You use IAM to manage DocumentDB resources
- Supports MongoDB default auth SCRAM (Salted Challenge Response Authentication Mechanism) for DB authentication
- Supports built-in roles for DB users with RBAC (role-based access control)
- DocumentDB clusters are VPC-only (use private subnets)
- Clients (MongoDB shell) can run on EC2 in public subnets within VPC
- Can connect to your on-premises IT infra via VPN
DocumentDB Security - Encryption
- Encryption at rest - with AES-256 using KMS
- Applied to cluster data / replicas / indexes / logs / backups / snapshots
- Encryption in transit - using TLS
- To enable TLS, set tls parameter in cluster parameter group
- To connect over TLS:
- Download the certificate (public key) from AWS
- Pass the certificate key while connecting to the cluster
DocumentDB Pricing
- On-demand instances - pricing per second with a 10-minute minimum
- IOPS - per milion IO requests
- Each DB page reads operation from the storage volume counts as one IO (one page = 8KB)
- Write IOs are counted in 4KB units
- DB Storage - per GB per month
- Backups - per GB per month (backups up to 100% of your cluster's data storage is free)
- Data transfer - per GB
- Can temporarily stop compute instances for up to 7 days
DocumentDB Monitoring
- API calls logged with CloudTrail
- Common CloudWatch metrics
- CPU or RAM utilization - CPUUtilization / FreeableMemory
- IOPS metrics - VolumeReadIOPS / VolumeWriteIOPS / WriteIOPS / ReadIOPS
- Database connections - DatabaseConnections
- Network traffic - NetworkThroughput
- Storage volume consumption - VolumeBytesUsed
- Two Types of logs can be published/exported to CloudWatch Logs
- Profiler logs
- Audit logs
DocumentDB Profiler (profiler logs)
- Logs (into CloudWatch Logs) the details of ops performed on your cluster
- Helps identify slow operations and improve query performance
- Accessible from CloudWatch Logs
- To enable profiler:
- Set the parameters - profiler, profiler_threshold_ms, and profiler_sampling_rate
- Enable Logs Exports for Audit logs by modifying the instance
- Both the steps above are mandatory
DocumentDB audit logs
- Records DDL statements, authentication, authorization, and user management events to CloudWatch Logs
- Exports your cluster's auditing records (JSON documents) to CloudWatch Logs
- Accessible from CloudWatch Logs
- To enable auditing:
- Set parameter audit_logs=enabled
- Enable Logs Exports for Audit logs by modifying the instance
- Both the steps aboe are mandatory
DocumentDB Performance Management
- Use explain command to identify slow queries
db.runCommand({explain: {<query document>}})
- Can use db.adminCommand to find and terminate queries
- Example - to terminate longo running / blocked queries
db.adminCommand({killOp: 1, op: <opid of the query>});
Demo
'AWS Database > AWS Other Database' 카테고리의 다른 글
[AWS Certificate]-Amazon QLDB (0) | 2022.01.16 |
---|---|
[AWS Certificate]-Amazon Timestream (0) | 2022.01.16 |
[AWS Certificate]-Amazon Elasticsearch Service (0) | 2022.01.16 |
[AWS Certificate]-Amazon Neptune (0) | 2022.01.16 |
[AWS Certificate]-ElastiCache (0) | 2022.01.15 |