VPS Disk Performance, Digital Ocean vs. Linode

arielweisberg · on Feb 3, 2014

I didn't dig too deep into this blog, but right off the bat I suspect that there is caching going on. Since this is a virtualized environment that is not surprising, you don't know if the entire volume you are working with is being cached in the hypervisor or possibly the disk controller or disk itself.

You don't get fraction of a millisecond random reads from spinning disks without caching... period. So whatever you think your measuring you aren't.

My experience with performance measurements is that the vast majority of people don't measure what they think they are measuring or don't measure what is actually relevant for the use case (and all related use cases that need to be considered). And if you don't know what the expected result is they can be equally useless because you can't know that your results don't fit reality or the speed of light or whatever.

I wish I had a great answer for how you can get started doing performance work, but it starts with understanding the orders of magnitude of various operations. Flash seeks are hundreds of micros for crappy flash, spindles are ~4 milliseconds and that is with fast spindles. If you are working in memory you should know the difference between instructions (for instructions branching and stalled execution vs continuous execution) and cache misses and the various kinds of cache misses (different tiers, remote NUMA) as well as what happens when you have contended mutual exclusion or CAS primitives.

thraxil · on Feb 3, 2014

I agree with your general assessment and skepticism of performance tests, particularly on top of virtualization.

Minor point though: Digital Ocean at least advertises SSD for their servers, not spinning disk. Fraction of a ms random reads seem within the realm of possibility.

slashdev · on Feb 3, 2014

I came here to say this, glad to see other hacker news people picking up on the impossible disk seek times as well.

voidlogic · on Feb 3, 2014

Looks like -D will make ioping to direct

zzzcpan · on Feb 3, 2014

Won't disable neither disk cache nor raid controller cache though.

rosser · on Feb 3, 2014

If you're using a RAID controller that's worth what you paid for it, then it has disabled the on-board cache on the disks it's managing. Otherwise, the following scenario becomes possible:

  1. OS issues write to RAID HBA, write is stored to NVRAM (or battery-backed RAM on older cards).
  2. RAID HBA issues write command to disk.
  3. Disk accepts write into onboard buffer, acknowledges it as committed.
  4. RAID controller releases cached pages.
  5. Power loss.

...and you've lost data.

EDIT: Notice that the write never actually touches disk in this scenario. Once a disk drive acknowledges a write, the RAID controller releases the data that was "written" from its cache. Disk writes take milliseconds, while memory writes take microseconds, and usually just nanoseconds. That leaves a relatively huge window during which the power could go out, but before which a write has been safely persisted to disk.

arielweisberg · on Feb 3, 2014

The on disk cache can still cache data for reads and they actually may actually cache data for writes as well if they are using write barriers.

There is no reason for the RAID controller to not let the on-disk cache and scheduling work while it is doing writeback, it only needs acknowledgement at the end before it flushes it's non-volatile cache.

This could also be something that I don't know about. Maybe in the world of disks write barriers are < some disable write caching command? Can controllers issue writes large enough to make up for the lack of caching + write barriers? I have no idea, how SATA works.

Again this gets into why it is hard to know what to expect from a disk IO benchmark. You have to know how the caches are operating and there are many of them and configuration can vary.

RyanZAG · on Feb 3, 2014

RAID controllers will usually have backup batteries for just this reason.

rosser · on Feb 3, 2014

Yes, or NVRAM, exactly as step 1 in my scenario mentions.

The problem is that the disks they're managing don't. (EDIT: barring SSDs with supercaps, but that's an entirely other discussion.)

If a write has been accepted by the disk and acknowledged as written — but in reality has only been stored in the disk's on-board cache — and you suffer a power loss before the write can be flushed to permanent storage (be it spinning rust or NAND cells), then you have lost that data.

This is exactly why a RAID controller worth using will disable a drive's onboard cache. Because disks lie.

Was my first comment somehow unclear?

SudoAlex · on Feb 3, 2014

As someone who uses virtual machines on both providers (and a few more) - benchmarking a single virtual machine and taking that as the single conclusion is not going to give you accurate results which reflect the entire network of servers which either provider has.

Both providers are likely to have certain host servers which are relatively empty, and others where a number of busy servers are going to really skew your results.

The one thing I'd recommend is to install monitoring which can keep an eye on your servers to ensure the performance hasn't dropped significantly since you first provisioned it. Munin http://munin-monitoring.org/ graphs disk IO latency by default, keeping a track of this allows you to make a decision on if you want to migrate a virtual machine elsewhere on the same provider - to see if you can avoid the noisy neighbours problem.

Never assume that you'll get the same performance on a daily basis.

meteorfox · on Feb 3, 2014

It seems to me that the workloads on Digital Ocean are not fully utilizing the block device. FIO shows about 50% util for DO, and 81% for Linode on the queue depth=1 test, and 70%/82%, respectively, on the queue depth=8 test. Which seems to indicate a bottle neck somewhere else (at least on DO), and that seems to align with OP's statement at the end, where it seems DO is being capped. Also as it was pointed out, the sizes of these VPS are not even mentioned, and most likely the working set size of 128MB, might be either too small or too big for any of these VPS, making even harder to see the value of these results.

Brendan Gregg has excellent info on these topics,

http://www.joyent.com/blog/benchmarking-the-cloud http://dtrace.org/blogs/brendan/2011/05/11/file-system-laten... http://dtrace.org/blogs/brendan/2012/10/23/active-benchmarki...

noelwelsh · on Feb 3, 2014

I found this blog a good example of bad design. I tried to skim read it and it seems there is less space between a heading and the paragraph above it than the paragraph below that the heading actually applies to. This means it is very difficult to tell at a glance which results apply to which system. I thought the headings were captions under (i.e. after) the results but from reading the conclusion it seems they are above (i.e. before) the results they label. It's a simple thing to fix and it would greatly improve the comprehensibility of the post.

catinsocks · on Feb 3, 2014

Did I miss where it says what size Linode and DO server is being compared, and the price of each?

I'm guessing the smallest DO may have more competition for disk IO than the smallest Linode or the DO box could have had aggressive neighbors while the Linode didn't.

ibelimb · on Feb 3, 2014

This seems to be a very unscientific approach. It doesn't appear that the tests were run on a multitude of machines. The digital ocean machine could've had malfunctioning hardware, or been a rare occurrence of a overloaded server. While I'd say this may not likely be the case, but you'd have to be a fool to make a decision based on just one machine from each provider.

voidlogic · on Feb 3, 2014

Is ioping legit? It obviously looks like it is latency focused. I must be have been living under a rock.

I usually use gnome-disks, hdparm, dd and iozone.

akanet · on Feb 3, 2014

I'm very interested in hearing more about DO vs Linode performance, as someone torn between both providers.

henryaj · on Feb 3, 2014

These might help:

DigitalOcean $5/mo plan - http://serverbear.com/1990-2gb-ssd--2-cpu-digitalocean

DO $10/mo plan - http://serverbear.com/1989-1gb-ssd--1-cpu-digitalocean

DO $20/mo plan - http://serverbear.com/1990-2gb-ssd--2-cpu-digitalocean

Linode $20/mo plan - http://serverbear.com/10-linode-1gb-linode

That site also gives each product a score out of 100 which scales depending on how much it costs. DO seems to do really well at the low end.

elithrar · on Feb 3, 2014

> I'm very interested in hearing more about DO vs Linode performance, as someone torn between both providers.

For me it wasn't solely performance or price, but additional features. Linode's control panel has a fair few features, and they have an API, but there doesn't seem to be many major users of that API.

Being able to deploy test machines to DO via Vagrant and then build my real boxes with Packer is a huge, huge plus for me as a solo dev. I'm also sitting on a 4Mbit/.8Mbit DSL pipe here in Australia and not having to upload an entire machine image is obviously a positive. And worst case, if DO somehow falls over, I can change a couple of lines and deploy to EC2 to keep things running. Building images for Linode wouldn't give me anywhere near that flexibility.

eli · on Feb 3, 2014

It's really hard to generalize. Each provider has different physical hardware with different neighbors and even multiple data centers.

blueskin_ · on Feb 3, 2014

Linode are a clear winner on everything else. Is I/O your number one priority or you do also need IPV6, a host that doesn't censor, etc?

ksec · on Feb 4, 2014

1. They should have compare it within the same price range. $20 DO Droplet will definitely perform better then the $5. While the difference is should not change the outcome, it would have put DO is a fairer position.

2. I dont think Linode has officially roll out their new Storage System yet which is SSD + HDD Based but not using SSD for cache. At first I was skeptic of why not solely using SSD instead. But the results speaks for themselves, although one should note this is still in beta and not being used in Production so performance could degrade once the mass majority are on it. Compare to DO which is a real world performance.

3. There are whole loads of other things Linode does better then DO. Linode has pooled bandwidth, which to this date DO still haven't got round to implement. Different Linode are on different Hardware by default. Which DO is only just started doing it in a few cases. Bandwidth and Latency on Linode are far better. Linode has IPv6. Linode has a very decent Nodebalancer.

DO wins on memory / price and Control Panel. Other then that Linode wins. Linode now also offer metered per hour billing ( in beta ).

ddrager · on Feb 3, 2014

All VPS systems suffer from the "noisy neighbor" problem. You may have been on a DigitalOcean system with a lot of neighbors with other high i/o activity, and on Linode you could have been on a server with no neighbors.

The only way to test this is to get a random sampling of servers (which in of itself may be difficult) and then re-do the testing.

That is to say, on a VPS you aren't the only customer using i/o.

jaequery · on Feb 3, 2014

Linode uses SAS drives (i'm guessing RAID-10), which from my testing, performs marginally better than commodity SSD drives which I believe DO are using. I don't think it's even an Intel 520. The study indicates to me DO also may be SSD's on SAN or network drives, which is good for their scalability but bad for performance (think EC2).

Linode on the other hand is Direct Attached, similar to Rackspace Cloud (but they don't use SAS which are 15k5 RPM where as regular SATA drives are 7200RPM).

It's hard to beat a DAS RAID-10 SAS for performance and reliability. If DO offered a DAS RAID-10 SSD, then I'm sure their $5/month plan would increase to atleast $30/month.

If DO really wants to be the coolest kid on the block, they should offer DAS RAID-10 Intel S3500 SSDs and offer it for $5/month on a Ivy Bridge Intel E5-26xx's. Now that'll blow away the competition for sure. :D But that setup would atleast cost $100/month for perhaps 20gb of data.

TLDR: You get what you pay for.

ilaksh · on Feb 3, 2014

LOL. Fast SCSI spinning disks are 200 IOPS. If you RAID 10 of them then you get 2000 IOPS.

Consumer SSDs are now at 50-100K IOPS. How on earth do people think that 2000 IOPS is "marginally better" than 50000? And I am just comparing one SSD to a RAID array.

I think you just must be completely unaware of random access disk performance and the massive disparity there.

The results were close because either the Linode host he was testing on actually had SSD storage or more likely because of some caching which indicates that the benchmark did not actually test random reads or writes or seeks.

ilaksh · on Feb 3, 2014

Whoever downvoted this, please let me know if my facts are wrong and link to some kind of evidence.

vidarh · on Feb 3, 2014

SAS is an interface, not a type of drive. You can get SSD drives with SAS interface easily.

You also write "SAS which are 15k5 RPM", but SAS does not dictate rotational speed (nor, as mentioned, that the drives use spinning platters at all). They are usually higher RPM if they are regular hd's as it doesn't make much sense to buy SAS controllers and SAS drives if you're going to opt for cheap consumer level performance. But my usual source of drives have both 7200 rpm and 10k RPM SAS drives.

Spending on SSD's for SAN/NAS storage would make absolutely zero sense unless your connectivity to said SAN/NAS is extremely low latency and high throughput - it'd be like burning money, so I very much doubt that's the bottleneck.

blueskin_ · on Feb 3, 2014

So even Digital Ocean's selling point is actually worse than Linode. Classic.

Between this, no IPV6 (in 2014, really?), censorship and the security fails, nobody should use Digital Ocean.

wldlyinaccurate · on Feb 3, 2014

I wouldn't say that faster I/O is Digital Ocean's selling point - it's the price. If your bottleneck is I/O then you probably need more RAM, which Digital Ocean offers for 50% cheaper than Linode.

Regarding your other points, personally I'm not affected by the lack of IPv6 support (yet), and I don't know anybody else who is. I also think it's rich mentioning censorship and security considering Linode's history.

welterde · on Feb 3, 2014

> personally I'm not affected by the lack of IPv6 support

Doesn't make it any less embarrassing though.

For me personally, no provider is worth a second look that only provides legacy connectivity.

csmithuk · on Feb 3, 2014

This really doesn't bother me. What bothers me is how much of my RAM is likely to page out to disk somewhere because it's oversold and if the CPU is being hammered with context switches and cache waits. If that's zero, this argument is moot as my datasets will run entirely from RAM and I'll pick a VM that has the capacity to do this.

nreece · on Feb 3, 2014

While looking for high-performance SSD VPSs last year, I came across a bunch of disk performance benchmarks done on ServerBear and LowEndBox, which had RamNode as a clear winner. We've been very happy with RamNode so far.

uptown · on Feb 3, 2014

I've hosted with MediaTemple for a number of years, but am interested in alternatives since their GoDaddy acquisition. What's everybody's go-to choice for managed-hosting these days?

mp99e99 · on Feb 3, 2014

Can you test http://www.atlantic.net/cloud ? It has slightly lower price than DO and larger SSD Disks. I would be interested in seeing the results.

dsl · on Feb 3, 2014

You should mention that you work there.

spyder · on Feb 3, 2014

Check it on serverbear.com. Disk performance looks similar to DO but DO seems to have more cpu power and 10x the network bandwidth, but less disk space for the same price.

zzzcpan · on Feb 3, 2014

It doesn't seem to offer 32 bit OS images, so it's actually significantly more expensive, since one has to order larger node to use the same software.