AWS EBS storage balancing size versus throughput

One of my teams is rolling out a new application in AWS on EC2 instances where the operating system runs on an EBS drive. We selected one of the M3 machines that came with SSDs.  The application makes moderate use of disk I/O.  Our benchmarks were pretty disappointing. It turns out we really didn't understand what kind of I/O we had requested and where we had actually put our data.

The Root Drive

The root drive on an EC2 instance can be SSDs or Magnetic based on the type of machine selected. All additional mounted/persistent disk drives on that machine will probably be of the same type.  This is an SSD but it is a network drive.  

EBS disks IOPS and MB/S are provisioned exactly as described in the EC2 documentation.  The most common GP2 SSDs have a burst IOPS limit and a sustained IOPS limit. They also have a maximum MB/S transfer rate.  Both the sustained IOPS limit and the maximum transfer rate are affected by the size of the provisioned disk.  Larger disks can sustain higher IOPS and have higher throughput.  

We sized our disk at 20GB which gave us low IOPS credit earning rates and lower MB/S transfer rates. That was a mistake. The sweet spot for disk drive performance is around 214 GB.  This is the smallest disk that gives you the highest transfer rate and the highest burst credit acquisition rates.  

Teams should do their own alaysis before picking the more expensive EBS volume types.  EBS GP2 burstdable SSDs may provide a higher value than fixed provisined SSDs (io1)

Burst Credits

Burst credits are a way you can store IOPS in a credit bucket so that you can exceed your provisioned sustained IOPS rate.  This lets you reach up to 3000 IOPS (GP2) in short bursts without having to pay for higher performing drives.  New machines are given 30 minutes of burst credits in order to provision the machines and warm up applications at the fastest speed possible.  Burst credits are earned based on the size of the EBS volume and the provisioned sustained IOPS rate.  Larger disks earn IOPS credits faster than smaller ones.  

IOPS vs MB/S

<I have no idea what I was planning on putting here>

IOPS, MB/S and Block Sizes

I/O Operations per second, I/O bandwidth and the data block size interact with each other to limit total throughput.  Machines that use the AWS default 16KB block size may not use their full I/O bandwidth.  AWS machines default to 16KB block sizes. Our test results agree with concept.
  • effective bandwidth = number of IOPS * the block size of each write
Teams may have to do some math to tune their disk drives in I/O bound applications.

Ephemeral local SSD

EC2 machines can make use of SSDs attached to the host machine that the EC2 instance is running on.  These disks provide significantly higher performance that must be balances against ephemberal nature. Local SSDs cannot have snapshots and disapear whenever a mechine is terminated and restarted.  All Ephemeral SSD data must be reconstitutable from other data sources since the VM with its local SSD could disapear at anny time

Benchmarks

The following table describes several benchmark tests against various drive configurations. Regular GP2 SSDs provide exactly the specified speed and througput with burst credits available and with no burst credits.  The main area of interest is around latency and the relative performance of EBS vs Ephemeral and around the impact of disk encryption. I don't understand why we have outliers in the data.  Sometimes different machines gave different ephemeral performance.  Note that Amazon does not seem to specify performance data for local SSDs.

Process Reference pages

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_procedures.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

Results

This chart show the transfer rate of various disk drives.  It makes clear how much of a performance improvement can be obtained using Local SSD (ephemeral) drives over network attached EBS.  


Disk SizeDisk TypeMB/s rand write 512m bs=16kiops rand write 512m bs=16kclat latency 95% usecMB/s rand read 512m bs=16kiops rand read 512m bs=16kclat latency 95% usecMB/s AWS max MB/s 16KB maxIOPS SpecTest Device
20GBSSD (GP2)1601602.560.9660/3000No Burst
80GBSSD (GP2)47299183844930701235212848240/3000Burst
80GBSSD (GP2)3.82403.824010.243.84240/3000No Burst
240GBSSD (GP2)47.82990151684829991568016048720/3000Burst
240GBSSD (GP2)42.426521676857.335851568016048720/3000Burst
32GBEphemeral3161979236022527n/an/an/an/a
32GBEphemeral1288059432040425272916n/an/an/an/a
20GBSSD (io1)8503138741288500Fixed

*Measured and calculated using Burst credits
#Measured and calculated with no available Burst credits

Disk encryption does not affect disk throughput or IOPS.  It does increase disk latency. Local SSD performance can be affected by other VMs on the same hardware device sharing the same drives.

System configuration and Test Commands

ssh -i speedtest.pem ec2-user@ec2-54-196-6-33.compute-1.amazonaws.com
sudo yum update
sudo yum install fio
sudo mkfs -t ext4 /dev/xvdb
sudo fio --directory=/media/ephemeral0 --name=randwrite --direct=1 --rw=randwrite --bs=16k --size=1G --numjobs=16 --time_based --runtime=180 --group_reporting --norandommap
sudo fio --directory=/media/ephemeral0 --name=randread --direct=1 --rw=randread --bs=16k --size=1G --numjobs=16 --time_based --runtime=180 --group_reporting --norandommap 

Other AWS related references



This blog not yet finished

Comments

  1. What was the type of the tested instance - r3.large? If so, that may have limited EBS volumes speed due to low network speed for tis instance type.
    Did you try it on r3.8xlarge etc?
    It probably wouldn't change the conclusion but numbers could be different.

    ReplyDelete

Post a Comment

Popular posts from this blog

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

DNS for Azure Point to Site (P2S) VPN - getting the internal IPs