본문 바로가기

AWS Database/AWS Other Database

[AWS Certificate]-ElastiCache

Amazon ElastiCache - Overview


  • Fully-managed in-memory data store (caching service, to boost DB read performance)
  • It is a remote caching service, or a side cache i.e. separate dedicated caching instance
  • Provides rapid access to data across distributed nodes
  • Two flavors (both are open-source key-value stores)
    - Amazon ElastiCache for Redis
    - Amazon ElastiCache for Memcached
  • Sub-millisecond latency for real-time applications
  • Redis supports complex data types, snapshots, replication, encryption, transactions, pub/sub messaging, transactional Lua scripting, and support for geospatial data
  • Memcached suitable for relatively simple applications like static website caching


Database caches


  • Store frequently access data (read operations)
  • Improve DB performance by taking the most read load off the DB
  • Three types - integrated / local / remote caches
  • Database integrated ache (stores data within DB)
    • Typically limited by available memory and resources
    • Example - Integrated cache in Aurora
      • integrated and managed cache with built-in write-through capabilities
      • enabled by default and no code changes needed
  • Local cache (stores data within application)
  • Remote cache (stores data on dedicated servers)
    • Typically built upon key/value NoSQL stores
    • Example - Redis and Memcached
    • Suport up-to a million requests per second per cache node
    • Offer sub-millisecond latency
    • Caching of data and managing its validity is managed by your application

ElastiCache use cases



Caching Strategies


Lazy loading - loads data into the cache only when necessary

  • Reactive approach
  • Only the queried data is cached (small. size)
  • There is cache miss penalty
  • Can contain stale data (use appropriate TTL)

Write through - loads data into the cache as it gets written to the DB

  • Proactive approach
  • Data is always current (never stale)
  • Results in cache churn (most data is never read, use TTL to save space)

Lazy loading with write through

  • Get the benefits of both strategies
  • Always use appropriate TTL

Lazy loading illustrated


Write through illustrated

User session store illustrated


  • User logs into any the application
  • The application writes the session data into ElatiCache
  • The user hits another instance of our application
  • The instance retrieves the data and the user is already logged in



Redis Architecture - Cluster mode disabled


  • Redis clusters are generally placed in private subnets
  • Accessed from EC2 instance placed in a public subnet in a VPC
  • Cluster mode disabled - single shard
  • A shard has a primary node and 0-5 replicas
  • A shard with replicas is also. called as a replication group
  • Replicas can be deployed as Multi-AZ
  • Multi-AZ replicas support Auto-Failover capability
  • Single reader endpoint (auto updates replica endpoint changes)




Redis Architecture - Cluster mode enabled


  • Cluster mode enabled - multiple shards
  • Data is distributed across the available shards
  • A shard has a primary node and 0-5 replicas
  • Multi-AZ replicas support Auto-Failover capability
  • Max 90 nodes per cluster (90 shards w/ no replicas to 15 shards w/5 replica each)
  • Minimum 3 shards r ecommended for HA
  • Use nitro system-based node types for higher performance (e.g. M5 / R5 etc)





Redis Multi-AZ with Auto-Failover


  • Failes over to a replica node on outage
  • Minimal downtime (typically 3-6 minutes)
  • ASYNC replication (=can have some data loss due to replication lag)
  • Manual reboot does not trigger auto-failover
    (other reboots/failures do)
  • You can simulate/test a failover using AWS console / CLI / API
  • During planned maintenance for auto-failover enabled clusters
    • If cluster mode enabled - no write interruption
    • If cluster mode disabled - brief write interruption (few seconds)

Redis Backup and Restore


  • Supports manual and automatic backups
  • Backups are point-in-time copy of the entire Redis clsuter, can't backup individual nodes
  • Can be used to warm start a new cluster (=preloaded data)
  • Can backup from primary node or from replica
  • Recommended to backup from a replica (ensures primary node performance)
  • Backups (also called snapshot) are stored in S3
  • Can export snapshots to your S3 buckets in the same region
  • Can then copy the exported snapshot to other region / account using S3 API


Redis Scaling


Cluster Mode Disabled


  • Vertical Scaling
    • Scale up / scale down node type
    • minimal downtime
  • Horizontal Scaling
    • add/remove replica nodes
    • if Multi-AZ with automatic failover is enabled, you cannot remove the last replica



Cluster Mode Enabled


  • Vertical Scaling (Online)
    • scale up / scale down node type
    • no downtime
  • Horizontal scaling (=resharding and shard reblancing)
    • allows partitioning across shards
    • add/remove/rebalance shards
    • resharding = change the number of shards as needed
    • shard rebalancing = ensure that data is equally distributed across shards
    • two modes - offline (with downtime) and online (no downtime)

Horizontal scaling - resharding / rebalancing

  Online Mode
(=no downtime)
Offline Mode
Cluster availability during scaling up YES NO
Can scale out / scale in / rebalance YES YES
Can scale up / down (change node type) NO YES
Can upgrade engine version NO YES
Can specify the number of replica nodes in each shard independently NO YES
Can specify the keyspace for shards independently NO YES

Redis Replication


Cluster Mode Disabled Cluster Mode Enabled
1 Shard Up to 90 shards
0-5 replicas 0-5 replicas per shard
If 0 replicas, primary failure = total data loss If 0 replicas, primary failure = total data loss in that shard
Multi-AZ supported Multi-AZ required
Supports scaling Support partitioning
If primary load is read-heavy, you can scale the cluster (though up to 5 replica max) Good for write-heavy nodes (you gert additional write endpoints, one per shard)

Redis - Global Datastore

  • Allows you to create cross region replicas for Redis
  • Single writer cluster (primary cluster), multiple reader clusters (secondary clusters)
  • Can replicate to up to two other regions
  • Improves local latency (bring data closer to your users)
  • Provides for DR (you can manually promote a secondary cluster to be primary, not automatic)
  • Not available for single node clusters (must convert it to a replication group first)
  • Security for cross-region communication is provided through VPC peering
  • Cluster cannot be modified / resized as usual
    • you scale the clusters by modifying the global datastore
    • all member clusters wil get scaled
  • To modify a global datatstore's parameters
    • modify the parameter group of any member cluster
    • Change gets applied to all member clusters automatically
  • Data is replicated cross-region in < 1 sec (typically, not an SLA)
  • RPO (typical) < 1 sec (amt of data loss due to disaster)
  • RTO (typical) < 1 min (time taken for DR)

Redis - Good things to know


  • Replica lag may grow and shrink over time. If a replica is too far behind the primary, reboot it
  • In case of latency/throughput issues, scaling out the cluster helps
  • In case of memory pressure, scaling out the cluster helps
  • If the cluster is over-scaled, you can scale in to reduce costs 
  • In case of online scaling
    • cluster remains available, but with some performance degradation 
    • level of degradation would depend on CPU utilization and amout of data
  • You cannot change Redis cluster mode after creating it (can create a new cluster and warm start it with existing data)
  • All nodes within a cluster are of the same instance type

Redis best practice


  • Cluster mode - connect using the configuration endpoint (allows for auto-discovery of shard and keyspace (slot) mapping
  • Cluster mode disable - use primary endpoint for writes and reader endpoint for reads (always kept up to date with any cluster changes)
  • Set the parameter reserved-memory-percent=25% (for background processes, non-data)
  • Keep socket timeout = 1 second (at least)
    • Too low => numerous timeouts on high load
    • Too high => application might take longer to detect connection issues
  • Keep DNS caching timeout low (TTL = 5-10 seconds recommended)
  • Do not use the "cache forever" option for DNS caching

Redis use cases - Gaming Leaderboards


  • Use Redis sorted sets - automatically stores data sorted
  • Example - top 10 scores for a game


Redis use cases - Pub/sub messaging or queues 

Redis use cases - Recommendation Data


  • Uses INCR or DECR in Redis
  • Using Redis hashes, you can maintain a list of who liked / disliked a product


Memcached Overview


  • Simple in-memory key-value store with sub-millisecond latency
  • Automatic detection and recovery from cache node failures
  • Typical applications
    • Session store (persistent as well as transient session data store)
    • DB query results caching (relational or NoSQL DBs - RDS / DynamoDB etc.)
    • Webpage caching
    • API caching
    • Object caching (images/files/metadata)
  • Well suited for web / mobile apps, gaming, IoT, ad-tech, and e-commerce

Memcached Architecture


  • Memchached cluster is generally placed in private subnet
  • Accessed from EC2 instance placed in a public subnet in a VPC
  • Allows access only from EC2 network (apps should be hosted on whitelisted EC2 instances)
  • Whitelist using security groups
  • Up to 20 nodes per cluster
  • Data is distributed across the available nodes
  • Replicas are not supported
  • Node failure = data loss
  • Nodes can be deployed as Multi-AZ (to reduce data loss)

Memcached Auto Discovery


  • Allows client to automatically identify nodes in your Memcached cluster
  • No need to manually connect to individual nodes
  • Simply connect to any one node (using configuration endpoint) and retrieve a list of all other nodes
  • The metadat (list of all nodes) get s updated dynamically as you add/remove nodes
  • Node failures are automatically detected, and nodes get replaced
  • Enabled by default (you must use Auto Discovery capable client)

Memcached Scaling


  • Vertical scaling not supported
    • can resize by creating a new cluster and migrating your application
  • Horizontal scaling
    • allows you to partition your data across multiple nodes
    • up to 20 nodes per cluster and 100 nodes per region (soft limit)
    • no need to change endpoints post scaling (if you use auto-discovery)
    • must re-map at least some of your keyspace post scaling (evently spread cache keys across all nodes)



Choosing between Redis and Memcached


Redis Memcached
Sub-millisecond latency Sub-millisecond latency
Supports complex data types (sorted sets, hashes, bitmaps, hyperloglog, geosparial index) Support only simple data types (string, objects)
Multi AZ with Auto-Failover, supports sharding Multi-node for sharding
Read Replicas for scalability and HA Non persistent
Data durability using AOF persistence No backup and restore
Backup and restore features Multi-threaded architecture


ElastiCache Security - Encryption


  • Memcached does not support encryption
  • Encryption at rest. for Redis (using KMS)
  • Encryption in-transit for Redis (using TLS/SSL)
    • Between server and client
    • Is an optional feature
    • Can have some performance impact
    • Supports encrypted replication
  • Redis snapshots in S3 use S3's encryption capabilities


ElastiCache Security - Auth and Access Control


  • Authentication into the cache
    • Redis AUTH - server can authenticate the clients (requires SSL/TLS enabled)
    • Server Authentication - clients can authenticate that they are connecting to the right server
  • IAM
    • IAM policies can be used for AWS API-level security (create cache, update cache etc.)
    • ElastiCache doesn't support IAM permissions for actions within ElastiCache
      (which clients can access what)

ElastiCache Security - Network



  • Recommended to use private subnets
  • Control network access to ElastiCache through VPC security groups
  • ElastiCache Security Groups - allows to control access to ElastiCache clusters running outside Amazon VPC
  • For clusters within Amazon VPC, simply use VPC security groups







ElastiCache Logging and Monitoring


  • Integrated with CloudWatch
    • Host level metrics - CPU / Memory / Network
    • Redis metrics - replication lag / engine CPU utilization / metrics from Redis INFO command
    • 60-second granularity
  • ElastiCache Events
    • Integrated with SNS 
    • Log of events related to cluster instances / SGs / PGs
    • Available within ElastiCache console
  • API calls logged with CloudTrail

ElastiCache Pricing


  • Priced per node-hour consumed for each node type
  • Partial node-hours consumed are billed as full hours
  • Can use reserved nodes. for upfront discounts (1-3 year terms)
  • Data transfer
    • No charge for data transfer between EC2 and ElastiCache within AZ
    • All other data transfer chargeable
  • Backup storage
    • For automated and manual snapshots (per GB per month)
    • Space for one snapshot is complimentary for each active Redis cluster