본문 바로가기

AWS Database/AWS Other Database

[AWS Certificate]-Amazon Neptune

Amazon Neptune - Overview

  • Fully managed graph database service (non-relational)
  • Relationships are first-class citizens
  • Can quickly navigate relationships and retrieve complex relations between highly connected datasets
  • Can query bilions of relationships. with milisecond latency
  • ACID compliant with immediate consistency
  • Supports transaction semantics for highly concurrent OLTP workloads (ACID transactions)
  • Supported graph query languages - Apache TinkerPop Gremlin and RDF/SPARQL
  • Support 15 low-latency read replicas (Multi-AZ)
  • Use cases:
    • Social graph / Knowledge graph
    • Fraud detection
    • Real-time big data mining
    • Customer interests and recommendations (Recommendation engines)

Graph Database

  • Models relationships between data
    • e.g. Subject / predicate / object / graph (quad)
    • Joe likes pizza
    • Sarah is friends with Joe
    • Sarah likes pizza too
    • Joe is a student and lives in London
    • Let's you ask questions like "identify Londoners who like  pizza" or "identify friends of Londoners who like pizza"
  • Uses nodes (vertices) and edges (actions) to describe the data and relationships between them
  • DB stores - person / action / object (and a graph ID or edge ID)
  • Can filter or discover data based on strenght, weight, or quality of relationships

Graph query languages

  • Neptune supports two popular modeling frameworks - Apache TinkerPop and RDF/SPARQL
  • TinkerPop uses Gremlin traversal language
  • RDF (W3C standard) uses SPARQL
  • SPARQL is great for multiple data sources, has large variety of datasets available
  • We can use Gremlin or SPARQL to load data into Neptune and then to query it
  • You can store both Gremlin and SPARQL graph data on the same Neptune cluster
  • It gets stored separately on the cluster
  • Graph data inserted using one query language can only be queried with that query language (and not with the other)

 

 

 


 

Neptune Architecture

 

  • 6 copies of your data across 3 AZ (distributed design)
    • Lock-free optimistic algorithm (quorum model)
    • 4 copies out of 6 needed for writes (4/6 write quorum -data considered durable when at least 4/6 copies acknowledge the write)
    • 3 copies out of 6 needed for reads (3/6 read quorum)
    • Self healing with peer-to-peer replication, Storage is striped across 100s of volumes
  • One Neptune Instance takes writes (master)
  • Compute nodes on replicas do not need to write/replicate(=improved read performance)
  • Log-structured distributed storage layer-passes incremental log records from compute to storage layer (=faster)
  • Master + up to 15 Read Replicas serve reads
  • Data is continuously backed up to S3 in real time, using storage nodes (compute node performance is unaffected)

Neptune cluster

 

  • Loader endpoint - to load the data into Neptune (say, from S3)
    • e.g. https://<cluster_endpoint>:8182/loader
  • Gremlin endpoint - for Gremlin queries
    • e.g. https://<cluster_endpoint>:8182/gremlin
  • Sparql endpoint - for Sparql queries
    • e.g. https://<cluster_endpoint>:8182/sparql

Demo


 

Bulk loading data into Neptune

 

  • Use the loader endpoint (HTTP POST to the loader endpoint)
    • e.g.
      curl -X POST -H 'Content-Type: application/json'
      https://<cluster_endpoint>:8182/loader -d
      '{
          "source": "s3://bucket_name/key_name,
          ...
       }'​
  • S3 data can be accessed using an S3 VPC endpoint (allows access to S3 resources from your VPC)
  • Neptune cluster must assume an IAM role with S3 read access
  • S3 VPC endpoint can be created using the VPC management console
  • S3 bucket must be in the same region as the Neptune cluster
  • Load data formats
    • csv (for gremlin), ntripples / nquads / rdfxml / turtle (for sparql)
  • All files must be UTF-8 encoded
  • Multiple files can be loaded in a single job

Demo


Neptune. Replication

  • Up to 15 read replicas
  • ASYNC replication
  • Replicas share the same underlying storage layer
  • Typically take 10s of milliseconds (replication lag)
  • Minimal performance impact on the primary due to replication process
  • Replicas double up as failover targets (standby instance is not needed)

 

 

 

 


Neptune High Availability

  • Failovers occur automatically
  • A replica is automatically promoted to be the new primary during DR
  • Neptune flips the CNAME of the DB instance to point to the replica and promotes it
  • Failover to a replica typically takes under 30-120 seconds (minimal downtime)
  • Creating a new instance takes about 15 minutes(post failover)
  • Failover to a new instance happens on a best-effort basis and can take longer

 

 

 

 


Neptune Backup and Restore

 

  • Supports automatic backup
  • Continuously backs up your data to S3 for PITR (max retention period of 35 days)
  • latest restorable time for a PITR can be up to 5 mins in the past (RPO = 5 minutes)
  • The first backup is full backup. Subsequent backups are incremental
  • Take manual snapshots to retain beyond 35 days
  • Backup process does not impact cluster performance

 

 


Neptune Backup and Restore

 

  • Can only restore to a new cluster
  • Can restore an unencrypted snapshot to an encrypted cluster (but not the other way round)
  • To restore a cluster from an encrypted snapshot, you must have access to the KMS key
  • Can only share manual snapshots (can copy and share automated ones)
  • Can't share a snapshot encrypted using the default KMS key of the a/c
  • Snapshots can be shared across accounts, but within the same region


Neptune Scaling

  • Vertical scaling (scale up / down) - by resizing instances
  • Horizontal scaling (Scale out / in) - by adding /removing up to 15 read replicas
  • Automatic scaling storage - 10GB to 64TB (no manual intervention needed)


Database Cloning in Neptune

  • Different from creating read replicas - clones support both reads and writes
  • Different from replicating a cluster - clones use same storage layer as the source cluster
  • Requires only minimal additional storage
  • Quick and cost-effective
  • Only within region (can be in different VPC)
  • Can be created from existing clones
  • Uses a copy-on-write protocol
    • both source and clone share the same data initially
    • data that changes, is then copied at the time it chages either on the source or on the clone (i.e. stored separately from the shared data)
    • delta of writes after cloning is not shared

Neptune Security - IAM

  • Uses IAM for authentication and authroization to manage Neptune resources
  • Supports IAM Authentication (with AWS SigV4)
  • You use temporary credentials using an assumed role
    • Create an IAM role
    • Setup trust relationship
    • Retrieve temp creds
    • Sign the requests using the creds

 

 

 

 

 


Neptune Security - Encryption & Network

 

  • Encryption in transit - using SSL/TLS
    • Cluster parameter neptune_enforce_ssl = 1 (is default)
  • Encryption at rest - with AES-256 using KMS
    • encrypts data, automated backups, snapshots, and replicas in the same cluster
  • Neptune clusters are VPC-only (use private subnets)
  • Clients can run on EC2 in public subnets within VPC
  • Can connect to your on-premises IT infra via VPN
  • Use security groups to control access

Neptune Monitoring

 

  • Integrated with CloudWatch
  • can use Audit log files by enabling DB cluster parameter netpune_enable_audit_log
  • must restart DB cluster after enabling audit logs
  • audit log files are rotated beyond 100MB (not configurable)
  • audit logs are not stored in sequential order (can be ordered using the timestamp value of each record)
  • audit log data can be published (exported) to a CloudWatch Logs log groups by enabling Log exports for your cluster
  • API calls logged with CloudTrail

 

Query Queuing in Neptune

 

  • Max 8192 queries can be queued up per Neptune instance
  • Queries beyond 8192 will result in ThrottlingException 
  • Use CloudWatch metric MainRequestQueuePendingRequests to get number of queries queued (5 min granularity)
  • Get acceptedQueryCount value using Query Status API
    • For Gremlin, acceptedQueryCount = current count of queries queued
    • For SPARQL, acceptQueryCount = all queries accepted since the server started

 


Neptune Service Errors

 

  • Graph engine errors
    • Errors related to cluster endpoints, are HTTP error codes
    • Query errors - QueryLimitException / MemoryLimitExeededException / TooManyRequestsException etc.
    • IAM Auth errors - Missing Auth / Missing token / Invalid Signature / Missing headers / Incorrect Policy etc
  • API errors
    • HTTP errors related to APIs (CLI / SDK)
    • InternalFailure / AccessDeniedException / MalformedQueryString / ServiceUnavailable etc
  • Loader Error
    • LOAD_NOT_STARTED / LOAD_FAILED / LOAD_S3_READ_ERROR / LOAD_DATA_DEADLOCK etc

SPARQL federated query

 

  • Query across multiple Neptune clusters or external data sources that support the protocol, and aggregate the results
  • Supports only read operations


Neptune Streams

 

  • Capture changes to your graph (change logs)
  • Similar to DynamoDB streams
  • Can be processed with Lambda (use Neptune Streams API)
  • SPARQL
    • https://<cluster_endpoint>:8182/sparql/stream
  • Gremlin
    • https://<cluster_endpoint>:8182/gremlin/stream
  • Only GET method is allowed

Use Cases

  • Amazon ES Integration
    • To perform full-text search queries on Neptune data
    • Uses Streams + federated queries
    • Supported for both gremlin and SPARQL
  • Neptune-to-Neptune Replication

 

Neptune Pricing

 

  • You only pay for what you use
  • On-demand instances - per hour pricing
  • IOPS - per million IO requests
    • Every DB page read operation = one IO
    • Each page is 16KB in Neptune
    • Write IOs are counted in 4KB units
  • DB Storage - per GB per month
  • Backups (automated and manual) - per GB per month
  • Data transfer - per GB
  • Neptune Workbench - per instance hour