본문 바로가기

AWS Database/AWS Other Database

[AWS Certificate]-DocumentDB

Amazon DocumentDB - Overview

  • Fully-managed (non-relational) document database for MongoDB workloads
  • JSON documents (nested key-value pairs) stored in collections (~tables)
  • Compatible w/ majority of MongoDB applications, drivers, and tools
  • High performance, s calability, and availability
  • Support for flexible indexing, powerful ad-hoc queries, and analytics
  • Storage and compute can scale independently
  • Supports 15 low-latency read replicas (Multi-AZ)
  • Auto scaling of storage from 10GB to 64TB
  • Fault-tolerant and self-healing storage
  • Automatic, continuous, incremental backups and PITR

Document Database

 

  • Stores JSON documents (semi-structured data)
  • Key-value pairs can be nested
Relational DB (SQL) DocumentDB (MongoDB)
Table Collection
Rows Documents
Columns Fields
Primary key Object ID
Nested table / object Embedded documents
Index / view / array Index / view / array

 


Why document database?

 

  • JSON is the de-facto format for data exchange
  • DocumentDB makes it easy to insert, query, index, and perform aggregations over JSON data
  • Store JSON output from APIs straight into DB and start analysing it
  • flexible document model, data types, and indexing
  • Add / remove indexes easily
  • Run ad hoc queries for operational and analytics workloads
  • for known access patterns - use DynamoDB instead

DocumentDB Architecture

 

  • 6 copies of your data across 3 AZ (distributed design)
    • Lock-free optimistic algorithm (quorum model)
    • 4 copies out of 6 needed for writes (4/6 write quorum -data considered durable when at least 4/6 copies acknowledge the write)
    • 3 copies out of 6 needed for reads (3/6 read quorum)
    • Self healing with peer-to-peer replication, Storage is triped across 100s of volumes
  • One DocumentDB Instance takes writes (master)
  • Compute nodes on replicas do not need to write/replicate (=improved read performance)
  • Log-structued distributed storage layer-passes incremental log records from compute to storage layer (=faster)
  • Master + up to 15 Read Replicas serve reads
  • Data is continuously backed up to S3 in real time, using storage nodes (compute node performance is unaffected)

DocumentDB Cluster

  • Recommended to connect using the cluster endpoint in replica set mode (enables your SDK to auto-discover the cluster arrangement as instances get added or removed from the cluster.

DocumentDB Replication

 

  • Up to 15 read replicas
  • ASYNC replication
  • Replicas share the same underlying storage layer
  • Typically take 10s of miliseconds (replication lag)
  • Minimal performance impact on the primary due to replication process
  • Replicas double up as failover targets (standby instance is not needed)

 

 

 

 

 


DocumentDB HA failovers

 

  • Failovers occur automatically
  • A replica is automatically promoted to be the new primary during DR
  • DocumnetDB flips the CNAME of the DB instance to point to the replica and promotes it
  • Failover to a replica typically takes 30 seconds (minimal downtime)
  • Creating a new instance takes about 8-10 minutes (post failover)
  • Failover to a new instance happens on a best effort basis and can take longer

 

 

 

 

 

 


DocumentDB Backup and Restore

 

  • Supports automatic backups
  • Continuously backs up your data to S3 for PITR (max retention period of 35 days)
  • latest restorable time for a PITR can be up to 5 mins in the past
  • The first backup is a full backup. Subsequent backups are incremental
  • Take manual snapshots to retain beyond 35 days
  • Backup process does not impact cluster performance

 


DocumentDB Backup and Restore

 

  • Can only restore to a new cluster
  • Can restore an unencrypted snapshot to an encrypted cluster (but not the other way round)
  • To restore a cluster from an encrypted snapshot, you must have access to the KMS key

  • Can only share manual snapshots (can copy and share automated ones)
  • Can't share a snapshot encrypted using the default KMS key of the a/c
  • Snapshots can be shared across accounts, but within the same region

 


DocumentDB Scaling

  • MongoDB sharding not supported (instead offers read replicas / vertical scaling / stroage scaling)
  • Vertical scaling (scale up / down) - by resizing instances
  • Horizontal scaling (scale out / in) - by adding / removing up to 15 read replicas)
  • Can scale up a replica independently from other replicas (typically for analytical workloads)
  • Automatic scaling storage - 10GB to 64TB (no manual intervention needed)


DocumentDB Security - IAM & Network

 

  • You use IAM to manage DocumentDB resources
  • Supports MongoDB default auth SCRAM (Salted Challenge Response Authentication Mechanism) for DB authentication
  • Supports built-in roles for DB users with RBAC (role-based access control) 
  • DocumentDB clusters are VPC-only (use private subnets)
  • Clients (MongoDB shell) can run on EC2 in public subnets within VPC
  • Can connect to your on-premises IT infra via VPN

DocumentDB Security - Encryption

 

  • Encryption at rest - with AES-256 using KMS
    • Applied to cluster data / replicas / indexes / logs / backups / snapshots
  • Encryption in transit - using TLS
    • To enable TLS, set tls parameter in cluster parameter group
  • To connect over TLS:
    • Download the certificate (public key) from AWS
    • Pass the certificate key while connecting to the cluster

DocumentDB Pricing

 

  • On-demand instances - pricing per second with a 10-minute minimum
  • IOPS - per milion IO requests
  • Each DB page reads operation from the storage volume counts as one IO (one page = 8KB)
  • Write IOs are counted in 4KB units
  • DB Storage - per GB per month 
  • Backups - per GB per month (backups up to 100% of your cluster's data storage is free)
  • Data transfer - per GB
  • Can temporarily stop compute instances for up to 7 days

DocumentDB Monitoring

 

  • API calls logged with CloudTrail
  • Common CloudWatch metrics
    • CPU or RAM utilization - CPUUtilization / FreeableMemory
    • IOPS metrics - VolumeReadIOPS / VolumeWriteIOPS / WriteIOPS / ReadIOPS
    • Database connections - DatabaseConnections
    • Network traffic - NetworkThroughput
    • Storage volume consumption - VolumeBytesUsed
  • Two Types of logs can be published/exported to CloudWatch Logs
    • Profiler logs
    • Audit logs

DocumentDB Profiler (profiler logs)

 

  • Logs (into CloudWatch Logs) the details of ops performed on your cluster
  • Helps identify slow operations and improve query performance
  • Accessible from CloudWatch Logs
  • To enable profiler:
    • Set the parameters - profiler, profiler_threshold_ms, and profiler_sampling_rate
    • Enable Logs Exports for Audit logs by modifying the instance
    • Both the steps above are mandatory

DocumentDB audit logs

 

  • Records DDL statements, authentication, authorization, and user management events to CloudWatch Logs
  • Exports your cluster's auditing records (JSON documents) to CloudWatch Logs
  • Accessible from CloudWatch Logs
  • To enable auditing:
    • Set parameter audit_logs=enabled
    • Enable Logs Exports for Audit logs by modifying the instance
    • Both the steps aboe are mandatory

DocumentDB Performance Management

 

  • Use explain command to identify slow queries
db.runCommand({explain: {<query document>}})
  • Can use db.adminCommand to find and terminate queries
  • Example - to terminate longo running / blocked queries
db.adminCommand({killOp: 1, op: <opid of the query>});

Demo