Amazon ElastiCache - Overview
- Fully-managed in-memory data store (caching service, to boost DB read performance)
- It is a remote caching service, or a side cache i.e. separate dedicated caching instance
- Provides rapid access to data across distributed nodes
- Two flavors (both are open-source key-value stores)
- Amazon ElastiCache for Redis
- Amazon ElastiCache for Memcached - Sub-millisecond latency for real-time applications
- Redis supports complex data types, snapshots, replication, encryption, transactions, pub/sub messaging, transactional Lua scripting, and support for geospatial data
- Memcached suitable for relatively simple applications like static website caching
Database caches
- Store frequently access data (read operations)
- Improve DB performance by taking the most read load off the DB
- Three types - integrated / local / remote caches
- Database integrated ache (stores data within DB)
- Typically limited by available memory and resources
- Example - Integrated cache in Aurora
- integrated and managed cache with built-in write-through capabilities
- enabled by default and no code changes needed
- Local cache (stores data within application)
- Remote cache (stores data on dedicated servers)
- Typically built upon key/value NoSQL stores
- Example - Redis and Memcached
- Suport up-to a million requests per second per cache node
- Offer sub-millisecond latency
- Caching of data and managing its validity is managed by your application
ElastiCache use cases
Caching Strategies
Lazy loading - loads data into the cache only when necessary
- Reactive approach
- Only the queried data is cached (small. size)
- There is cache miss penalty
- Can contain stale data (use appropriate TTL)
Write through - loads data into the cache as it gets written to the DB
- Proactive approach
- Data is always current (never stale)
- Results in cache churn (most data is never read, use TTL to save space)
Lazy loading with write through
- Get the benefits of both strategies
- Always use appropriate TTL
Lazy loading illustrated
Write through illustrated
User session store illustrated
- User logs into any the application
- The application writes the session data into ElatiCache
- The user hits another instance of our application
- The instance retrieves the data and the user is already logged in
Redis Architecture - Cluster mode disabled
- Redis clusters are generally placed in private subnets
- Accessed from EC2 instance placed in a public subnet in a VPC
- Cluster mode disabled - single shard
- A shard has a primary node and 0-5 replicas
- A shard with replicas is also. called as a replication group
- Replicas can be deployed as Multi-AZ
- Multi-AZ replicas support Auto-Failover capability
- Single reader endpoint (auto updates replica endpoint changes)
Redis Architecture - Cluster mode enabled
- Cluster mode enabled - multiple shards
- Data is distributed across the available shards
- A shard has a primary node and 0-5 replicas
- Multi-AZ replicas support Auto-Failover capability
- Max 90 nodes per cluster (90 shards w/ no replicas to 15 shards w/5 replica each)
- Minimum 3 shards r ecommended for HA
- Use nitro system-based node types for higher performance (e.g. M5 / R5 etc)
Redis Multi-AZ with Auto-Failover
- Failes over to a replica node on outage
- Minimal downtime (typically 3-6 minutes)
- ASYNC replication (=can have some data loss due to replication lag)
- Manual reboot does not trigger auto-failover
(other reboots/failures do) - You can simulate/test a failover using AWS console / CLI / API
- During planned maintenance for auto-failover enabled clusters
- If cluster mode enabled - no write interruption
- If cluster mode disabled - brief write interruption (few seconds)
Redis Backup and Restore
- Supports manual and automatic backups
- Backups are point-in-time copy of the entire Redis clsuter, can't backup individual nodes
- Can be used to warm start a new cluster (=preloaded data)
- Can backup from primary node or from replica
- Recommended to backup from a replica (ensures primary node performance)
- Backups (also called snapshot) are stored in S3
- Can export snapshots to your S3 buckets in the same region
- Can then copy the exported snapshot to other region / account using S3 API
Redis Scaling
Cluster Mode Disabled
- Vertical Scaling
- Scale up / scale down node type
- minimal downtime
- Horizontal Scaling
- add/remove replica nodes
- if Multi-AZ with automatic failover is enabled, you cannot remove the last replica
Cluster Mode Enabled
- Vertical Scaling (Online)
- scale up / scale down node type
- no downtime
- Horizontal scaling (=resharding and shard reblancing)
- allows partitioning across shards
- add/remove/rebalance shards
- resharding = change the number of shards as needed
- shard rebalancing = ensure that data is equally distributed across shards
- two modes - offline (with downtime) and online (no downtime)
Horizontal scaling - resharding / rebalancing
Online Mode (=no downtime) |
Offline Mode (=downtime) |
|
Cluster availability during scaling up | YES | NO |
Can scale out / scale in / rebalance | YES | YES |
Can scale up / down (change node type) | NO | YES |
Can upgrade engine version | NO | YES |
Can specify the number of replica nodes in each shard independently | NO | YES |
Can specify the keyspace for shards independently | NO | YES |
Redis Replication
Cluster Mode Disabled | Cluster Mode Enabled |
1 Shard | Up to 90 shards |
0-5 replicas | 0-5 replicas per shard |
If 0 replicas, primary failure = total data loss | If 0 replicas, primary failure = total data loss in that shard |
Multi-AZ supported | Multi-AZ required |
Supports scaling | Support partitioning |
If primary load is read-heavy, you can scale the cluster (though up to 5 replica max) | Good for write-heavy nodes (you gert additional write endpoints, one per shard) |
Redis - Global Datastore
- Allows you to create cross region replicas for Redis
- Single writer cluster (primary cluster), multiple reader clusters (secondary clusters)
- Can replicate to up to two other regions
- Improves local latency (bring data closer to your users)
- Provides for DR (you can manually promote a secondary cluster to be primary, not automatic)
- Not available for single node clusters (must convert it to a replication group first)
- Security for cross-region communication is provided through VPC peering
- Cluster cannot be modified / resized as usual
- you scale the clusters by modifying the global datastore
- all member clusters wil get scaled
- To modify a global datatstore's parameters
- modify the parameter group of any member cluster
- Change gets applied to all member clusters automatically
- Data is replicated cross-region in < 1 sec (typically, not an SLA)
- RPO (typical) < 1 sec (amt of data loss due to disaster)
- RTO (typical) < 1 min (time taken for DR)
Redis - Good things to know
- Replica lag may grow and shrink over time. If a replica is too far behind the primary, reboot it
- In case of latency/throughput issues, scaling out the cluster helps
- In case of memory pressure, scaling out the cluster helps
- If the cluster is over-scaled, you can scale in to reduce costs
- In case of online scaling
- cluster remains available, but with some performance degradation
- level of degradation would depend on CPU utilization and amout of data
- You cannot change Redis cluster mode after creating it (can create a new cluster and warm start it with existing data)
- All nodes within a cluster are of the same instance type
Redis best practice
- Cluster mode - connect using the configuration endpoint (allows for auto-discovery of shard and keyspace (slot) mapping
- Cluster mode disable - use primary endpoint for writes and reader endpoint for reads (always kept up to date with any cluster changes)
- Set the parameter reserved-memory-percent=25% (for background processes, non-data)
- Keep socket timeout = 1 second (at least)
- Too low => numerous timeouts on high load
- Too high => application might take longer to detect connection issues
- Keep DNS caching timeout low (TTL = 5-10 seconds recommended)
- Do not use the "cache forever" option for DNS caching
Redis use cases - Gaming Leaderboards
- Use Redis sorted sets - automatically stores data sorted
- Example - top 10 scores for a game
Redis use cases - Pub/sub messaging or queues
Redis use cases - Recommendation Data
- Uses INCR or DECR in Redis
- Using Redis hashes, you can maintain a list of who liked / disliked a product
Memcached Overview
- Simple in-memory key-value store with sub-millisecond latency
- Automatic detection and recovery from cache node failures
- Typical applications
- Session store (persistent as well as transient session data store)
- DB query results caching (relational or NoSQL DBs - RDS / DynamoDB etc.)
- Webpage caching
- API caching
- Object caching (images/files/metadata)
- Well suited for web / mobile apps, gaming, IoT, ad-tech, and e-commerce
Memcached Architecture
- Memchached cluster is generally placed in private subnet
- Accessed from EC2 instance placed in a public subnet in a VPC
- Allows access only from EC2 network (apps should be hosted on whitelisted EC2 instances)
- Whitelist using security groups
- Up to 20 nodes per cluster
- Data is distributed across the available nodes
- Replicas are not supported
- Node failure = data loss
- Nodes can be deployed as Multi-AZ (to reduce data loss)
Memcached Auto Discovery
- Allows client to automatically identify nodes in your Memcached cluster
- No need to manually connect to individual nodes
- Simply connect to any one node (using configuration endpoint) and retrieve a list of all other nodes
- The metadat (list of all nodes) get s updated dynamically as you add/remove nodes
- Node failures are automatically detected, and nodes get replaced
- Enabled by default (you must use Auto Discovery capable client)
Memcached Scaling
- Vertical scaling not supported
- can resize by creating a new cluster and migrating your application
- Horizontal scaling
- allows you to partition your data across multiple nodes
- up to 20 nodes per cluster and 100 nodes per region (soft limit)
- no need to change endpoints post scaling (if you use auto-discovery)
- must re-map at least some of your keyspace post scaling (evently spread cache keys across all nodes)
Demo
Choosing between Redis and Memcached
Redis | Memcached |
Sub-millisecond latency | Sub-millisecond latency |
Supports complex data types (sorted sets, hashes, bitmaps, hyperloglog, geosparial index) | Support only simple data types (string, objects) |
Multi AZ with Auto-Failover, supports sharding | Multi-node for sharding |
Read Replicas for scalability and HA | Non persistent |
Data durability using AOF persistence | No backup and restore |
Backup and restore features | Multi-threaded architecture |
ElastiCache Security - Encryption
- Memcached does not support encryption
- Encryption at rest. for Redis (using KMS)
- Encryption in-transit for Redis (using TLS/SSL)
- Between server and client
- Is an optional feature
- Can have some performance impact
- Supports encrypted replication
- Redis snapshots in S3 use S3's encryption capabilities
ElastiCache Security - Auth and Access Control
- Authentication into the cache
- Redis AUTH - server can authenticate the clients (requires SSL/TLS enabled)
- Server Authentication - clients can authenticate that they are connecting to the right server
- IAM
- IAM policies can be used for AWS API-level security (create cache, update cache etc.)
- ElastiCache doesn't support IAM permissions for actions within ElastiCache
(which clients can access what)
ElastiCache Security - Network
- Recommended to use private subnets
- Control network access to ElastiCache through VPC security groups
- ElastiCache Security Groups - allows to control access to ElastiCache clusters running outside Amazon VPC
- For clusters within Amazon VPC, simply use VPC security groups
ElastiCache Logging and Monitoring
- Integrated with CloudWatch
- Host level metrics - CPU / Memory / Network
- Redis metrics - replication lag / engine CPU utilization / metrics from Redis INFO command
- 60-second granularity
- ElastiCache Events
- Integrated with SNS
- Log of events related to cluster instances / SGs / PGs
- Available within ElastiCache console
- API calls logged with CloudTrail
ElastiCache Pricing
- Priced per node-hour consumed for each node type
- Partial node-hours consumed are billed as full hours
- Can use reserved nodes. for upfront discounts (1-3 year terms)
- Data transfer
- No charge for data transfer between EC2 and ElastiCache within AZ
- All other data transfer chargeable
- Backup storage
- For automated and manual snapshots (per GB per month)
- Space for one snapshot is complimentary for each active Redis cluster
'AWS Database > AWS Other Database' 카테고리의 다른 글
[AWS Certificate]-Amazon QLDB (0) | 2022.01.16 |
---|---|
[AWS Certificate]-Amazon Timestream (0) | 2022.01.16 |
[AWS Certificate]-Amazon Elasticsearch Service (0) | 2022.01.16 |
[AWS Certificate]-Amazon Neptune (0) | 2022.01.16 |
[AWS Certificate]-DocumentDB (0) | 2022.01.15 |