In this session we will review Amazon EFS and how it delivers fully managed, petabyte-scale file storage for Amazon EC2 instances. Large scale and consistent performance make Amazon EFS ideal for web and content serving, enterprise applications, media processing, container storage, and Big Data analytics use cases. Session attendees will learn how to identify appropriate applications for use with Amazon EFS, understand performance details and security models, and hear how established customers are using it in production. The target audience is file system administrators, application developers, and application owners that operate or build file-based applications that require consistent latencies at cloud scale.
Report
Share
Report
Share
1 of 79
Download to read offline
More Related Content
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
2. What customers are using EFS for today
Web serving
Content management
Analytics
Media and Entertainment
workflows
Workflow management
Home directories
Container storage
Database backups
3. Shared File Solutions in the Cloud… before EFS
3rd Party Software
Do It Yourself
3rd Party Hardware in AWS
Direct Connect locations
4. Do It Yourself – NFS Architecture
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
5. Do It Yourself – NFS Architecture
Launch, patch, monitor, & pay for EC2 instances
Create, attach, monitor, & pay for provisioned EBS
volumes
Create, maintain, and monitor auto scaling group
Install, patch, monitor, & pay for* file system software
Configure, maintain, monitor, & pay for file system data
intra/inter-AZ replication
• IOPS for replication are still IOPS
Configure DNS for client HA access to inter-AZ NFS fleet
11. Amazon Elastic File System (EFS)
Provides simple, scalable, highly available
& durable file storage in the cloud
Petabyte scale file system distributed
across an unconstrained number of storage
servers in multiple Availability Zones (AZs)
Elastic capacity, automatically growing &
shrinking as you add & remove files
12. Amazon Elastic File System (EFS)
Standard file system interface & semantics
Shared storage
Highly available & highly durable
Consistent low latency
Strong read-after-write consistency
Elastic capacity
Fully managed
13. Cloud Data Migration
Direct
Connect
Snow* data
transport
family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
The AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS
14. Do you need an EFS file system?
If you have an application running on EC2 or use case that
requires a file system…
AND
• Requires multi-attach OR
• GBs/s throughput OR
• Multi-AZ availability/durability OR
• Requires automatic scaling (grow/shrink) of storage
15. Where is EFS available today?
• US West (Oregon)
• US East (N. Virginia)
• US East (Ohio)
• EU (Ireland)
• Asia Pacific (Sydney)
More coming soon!
17. Resources for Amazon EFS
File System
• Tags
• Key-value pairs
• Mount Targets
• Subnet ID
• Security Groups
18. Resources for Amazon EFS
File System
• Regional construct
• Ten per account per region (soft)
• Default throughput limit 3 GB/s (soft)
• Metered size updates approx. every hour
• Accessible from EC2
• VPC, EC2-Classic via ClassicLink
• Accessible from on-premises
• AWS Direct Connect
19. Resources for Amazon EFS
File System cont…
• Scenarios for on-prem via Direct Connect
Bursting
Migration
Tiering
Backup / DR
20. Resources for Amazon EFS
Tags
• Typical key-value pair
• Create & associate tag with file system
• Up to 50 tags per file system
21. Resources for Amazon EFS
Mount Targets
• One or more per file system
• Create in a VPC Subnet
• One per Availability Zone
• Must be in the same VPC
22. Resources for Amazon EFS
Security Groups
• Standard VPC Security Group
• Same VPC as subnet
• Up to five per mount target
• Allow inbound TCP port 2049
from NFS clients
25. Recommended Mount Options
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,async
Mount using NFSv4.1 (default options)
Specify 1MB read/write buffers
Hard mount
Timeout of 60 seconds (600 tenths of a second)
2 minor timeouts & retransmissions before major timeout
Ensure operations are asynchronous
26. Mount an EFS File System
Launch EC2 instance from EC2 Console
Connect to the instance
Make a directory
Mount EFS file system
Query disk file system & file system table
• df; df -hT; df -h -t nfsv4; mount -t nfsv4
30. Security
Control network traffic using VPC security
groups and network ACLs
Control file and directory access by using
POSIX permissions
Control administrative access (API access) to
file systems by using AWS Identity and Access
Management (IAM)
action-level and resource-level permissions
31. High throughput and parallel I/O
Low latency and serial I/O
Genomics
Big data analytics
Scale-out jobs
Home directories
Content management
Web serving
Metadata-intensive
jobs
Amazon EFS is designed for wide spectrum of
performance needs
32. Performance modes for different workloads
Mode What’s it for Advantages Tradeoffs When to use
General
purpose
(default)
Latency-
sensitive
applications and
general-purpose
workloads
Lowest
latencies for file
operations
Limit of 7K
ops/sec
Best choice for
most workloads
Max I/O
Large-scale and
data-heavy
applications
Virtually
unlimited ability
to scale out
throughput /
IOPS
Slightly higher
latencies
Consider for
large scale-out
workloads
33. EFS CloudWatch Metric - PercentIOLimit
Determine whether you’re being constrained by General Purpose
mode (PercentIOLimit at or near 100%)
34. EC2
EC2
…
EC2
EC2
…
EC2
EC2
…
• File systems distributed across
unconstrained number of servers
• Avoids bottlenecks/constraints of
traditional file servers
• Enables high levels of aggregate
IOPS/throughput
• Data also distributed across
Availability Zones (durability,
availability)
Amazon EFS - distributed data storage design
35. How to think about EFS perf relative to EBS
Amazon EFS Amazon EBS PIOPS
Performance
Per-operation
latency
Low, consistent Lowest, consistent
Throughput
scale
Multiple GBs per second Single GB per second
Characteristics
Data availability
/ durability
Stored redundantly across multiple AZs Stored redundantly in a single AZ
Access
1 to 1000s of EC2 instances, from
multiple AZs, concurrently
Single EC2 instance in a single AZ
Use cases
Big Data and analytics, media processing
workflows, content management, web
serving, home directories
Boot volumes, transactional and
NoSQL databases, data warehousing
& ETL
49. GNU parallel
• Tool for executing jobs in parallel
• Similar to xargs
• Replace loops in shell scripts
• GNU parallel makes sure output
from the commands is the same
output as you would get had you
run the commands sequentially.
https://www.gnu.org/software/parallel/
For people who live life in the parallel lane
50. Use parallel threads – GNU parallel
# Create destination directory tree from source
find . -type d -print0 | parallel -j $N_THREADS -0 "mkdir -p
${DST_DIR}/{}" > /dev/null 2>&1
# Copy files
find . ! ( -type d ) -print0 | parallel -j $N_THREADS -0 "cp -
f {} ${DST_DIR}/{}"
53. Benchmark different instance types
• Determine the optimal instance size
• What is best? T2, C3, C4, M3, M4,
R3, X?
• Transfer test set of 1000 small files
• Increase thread count from 1-1024
concurrent threads
68. Summary / tl;dr
• Parallelize Everything
• Instances
• Threads
• Test, Test, Test
• Capture & Analyze Test Data
• Less than $5/hr for 300 Instances
70. Burst Model
Based on size of file system
Starts w/ 2.1 TiB burst credits
Min. burst throughput 100 MiB/s
Baseline throughput 50 MiB/s per TiB
Burst throughput 100 MiB/s Per TiB
75. EFS Economics
No minimum commitments or up-front fees
No need to provision storage in advance
No other fees, charges, or billing dimensions
Price: $0.30/GB-Month (US Regions)
$0.33/GB-Month (EU Ireland)
$0.36/GB-Month (AP Sydney)
76. EFS TCO example
Let’s say you need to store ~500 GB and require high availability and durability
Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilization)
and fully replicate the data to a second Availability Zone for availability/durability
Example comparative cost:
Storage (2x 600 GB EBS gp2 volumes): $120 per month
Compute (2x m4.xlarge instances): $350 per month
Inter-AZ data transfer costs (est.): $129 per month
Total $599 per month
EFS cost is (500GB * $0.30/GB-month) = $150 per month, with no additional charges
77. Key recommendations
• Test your application!
• Use General Purpose mode for lowest latency, Max-I/O for
scale-out
• Use Linux kernel version 4.0 or newer, mount via NFSv4.1
• To optimize, look for opportunities to:
• Aggregate I/O
• Perform async operations
• Parallelize (demo later)
• Cache (demo later)
• Don’t forget to check your burst credit earn/spend rate when
testing – ensure sufficient amount of storage