[AWS Certificate]-Amazon RDS Replication & DR

Multi-AZ Deployments in RDS

For high availability, data durability and fault-tolerance (not used for scaling)
Offers SYNC replication to standby instance in another AZ over low latency links
Performs automatic failover to standby instance in another AZ in case of planned or unplanned outage
Uses DNS routing to point to the new master (no need to update connection strings)
Failover time (RTO) are typically 60-120 seconds (minimal downtime)
Backups are taken from standby instead of primary to ensure performance level during backup activity
Recommended for production use cases
To force a failover or simulate AZ-failure, reboot the master instance and choose Reboot with failover

RDS Read Replicas

Read-only copies of master(primary) DB instance
Up to 5 Read Replicas
Within AZ, Cross AZ or Cross Region
Replication is ASYNC, so reads are eventually consistent
Applications must update the connection string to leverage read replicas

RDS Read Replicas

Boost DB performance and durability
Useful for scaling of read-heavy workloads
Can be promoted to primary (complements Multi-AZ)
To create a replica, you must enable automatic backups with at least one day retention period
Replica can be Multi-AZ (= a replica with its own standby instance)

Multi-AZ Replicas in RDS

RDS Read Replicas as Multi-AZ

Supported for MySQL / MariaDB / PostgreSQL / Oracle
Works as a DR target. When promoted to primary, it works as Multi-AZ
There's added network cost when data goes from one AZ to another

RDS Read Replicas - Use Case

You have a production database that is taking on normal load
You want to run a reporting application to run some analytics
You create a Read Replica to run the new workload there
The production application is unaffected
Read replicas are used for SELECT (=read) only kind of statements (not INSERT, UPDATE, DELETE)

Promoting a Read Replica to a Standalone DB Instance

Promoted instance is rebooted and becomes an independent DB instance (separate from its source)
Will no longer work as a replica. Does not affect other replicas of the original DB instance
You cannot promote a replica to standalone instance while a backup is running

Promoting a Read Replica to a Standalone DB Instance - Use cases

Use as a DR strategy
Avoid performance penalty of DDL operations (like rebuilding indexes)
- perform DDL ops on a read replica and promote it to a standalone instance. Then point your app to this new instance.
Sharding (splitting a large DB into multiple smaller DBs)

--- Demo ---

Enabling writes on a read replica

For MySQL / MariaDB read replica, set the parameter read_only = 0 for the read replica to make it writable
You can then perform DDL operations on the read replica as needed without affecting the source DB
Actions taken on the read replica don't affect the performance of the source DB instance
You can then promote the replica to a standalone DB

RDS Read Replica Capabilities

Can create multiple read replicas in q uick succession
Can use DB snapshot to perform PITR of a Read Replica
Can create a replica from an existing replica
- reduces replication load from the master DB instance
- second-tier replica can have higher replication lag

Demo

Cross-Region Read Replicas in RDS

Supported for MariaDB, MySQL, Oracle, and PostgreSQL
Not supported for SQL Server
Advantages
- Enahanced DR capability
- Scale read operations closer to the end-users
Limitations
- Higher replica lag times
- AWS does not guarantee more than five cross-region read replica instances

RDS replicas with an external database

Replication b/w an external DB and an RDS replica
Supported for MySQL / MariaDB engines
Two ways
- Binlog replication
- GTID based Replication

RDS Disaster Recovery Strategies

To ensure business continuity despite unexpected failures/events
Multi-AZ is not enough (it can't protect from logical DB corruption, malicious attacks etc.)
Key metrics for DR plan - RTO and RPO
RDS PITR offers RPO of 5 minutes (typically)
RTO (Recovery time objective)
- How long it takes you to recover after a disaster
- Expressed in hours
RPO (Recovery point objective)
- How much data you could lose due to a disaster
- Expressed in hours (e.g. RPO of 1 hour means you could lose an hour worth of data)

Comparing RDS DR Strategies

	RTO	RPO	Cost	Scope
Automated backups	Good	Better	Low	Single Region
Manual snapshots	Better	Good	Medium	Cross-Region
Read replicas	Best	Best	High	Cross-Region

Replica lag - the amount of time that the replica is behind the source DB
Replica lag can impact your recovery
Failover to an RDS read replica is a manual process (not automated)
A good DR plan should include a combination of backups, replicas and Multi-AZ/Multi-region deployment

Troubleshooting high replica lag

Asynchronous logical replication typically results in replica lag
You can monitor ReplicaLag metrics in CloudWatch
ReplicaLag metric reports Seconds_Behind_Master values
Replication deplays can happen due to:
- Long-running queries on the primary instance (slow query log can help)
- Insufficient instance class size or storage
- Parallel queries executed on the primary instance

Troubleshooting replication errors

Recommendations:

Size the replica to match the source DB (storage size and DB instance class)
Use compatible DB parameter group settings for source DB and replica
Ex.max_allowed_packet for read replica must same as that of the source DB instance
Monitor the Replication State field of the replica instance
If Replication State = Error, then see error details in the Replication Error field
Use RDS event notifications to get alerts on such replica issues
Writing to tables on a read replica
- Set read_only=0 to make read replica writable
- Use only for maintenance tasks (like creating indexes only on replica)
- If you write to tables on read replica, it might make it incompatible with source DB and break the replication
- So set read_only=1 immediately after completing mainetance tasks
Replication is only supported with transactional storage engines like InnoDB. Using engines like MyISAM will cause replication errors
Using unsafe nondeterministic queries such as SYSDATE() can b reak replication
You can either skip replication errors (if its not a major one) or delete and recreate the replica

Troubleshooting MySQL read replica issues

Errors or data inconsistencies b/w source instance and replica
- Can happen due to binlog events or InnoDB redo logs aren't flushed during a replica or source instance failure
- Must manually delete and recreate the replica
Preventive recommendations:
- sync_binlog=1
- innodb_flush_log_at_trx_commit=1
- innodb_support_xa=1
These settings might reduce performance (so test before moving to production)

Performance hit on new read replicas

RDS snapshots are EBS snapshots stored in S3
When you spin up a new replica, its EBS volume loads lazily in the background
This results in first-touch penalty (when you query any data, it takes longer to retrieve it for the first time)
Suggestions:
- If DB is small, run "SELECT * FROM <table>" query on each table on the replica
- Initiate a full table scan with VACUUM ANALYZE (in PostgreSQL)
Another reason could be an empty buffer pool (cache for table and index data)

Scaling in RDS

Vertical Scaling (Scaling up)
- Single-AZ instance will be unavailable during scaling op
- Multi-AZ setup offers minimal downtime during scaling op-standby DB gets upgraded first and then primary will failover to the upgraded instance

Horizontal Scaling (Scaling out)
- Useful for read-heavy workloads
- Use read-replicas
- Replicas also act as a DR target

Sharding in RDS

Sharding = horizontal partitioning
Split and distribute data across multiple DBs (called shards)
Mapping / routing logic maintained at application tier
Offers additional fault tolerance (since no single point of failure)
If any shard goes through failover, other shards are not impacted

'AWS Database > AWS RDS & Aurora' 카테고리의 다른 글

[AWS Certificate]-Amazon Aurora (0)	2022.01.06
[AWS Certificate]-Amazon RDS Good thing to know (0)	2022.01.06
[AWS Certificate]-Amazon RDS Monitoring and Logs (0)	2022.01.06
[AWS Certificate]-Amazon RDS Backup & Restore (0)	2022.01.05
AWS RDS Aurora 스토리지 및 IO 비용 계산 (0)	2021.12.20

Clark의 IT Container

[AWS Certificate]-Amazon RDS Replication & DR

'AWS Database > AWS RDS & Aurora' 카테고리의 다른 글

티스토리툴바

[AWS Certificate]-Amazon RDS Replication & DR

'AWS Database > AWS RDS & Aurora' 카테고리의 다른 글

관련글

티스토리툴바