Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Yea, it is definitely a fake HTTP server which I acknowledge in the article

It's not actually an HTTP server though... For these purposes, it's essentially no more useful than netcat dumping out a preconfigured text file. Titling it "HTTP Performance showdown" is doubly bad here since there's no real-world (or even moderately synthetic) HTTP requests happening; you just always get the same static set of data for every request, regardless of what that request is. Call it whatever you like but that isn't HTTP. A key part of performance equation on the web is the difference in response time involved in returning different kinds (sizes and types) of responses.

A more compelling argument could be made for the improved performance you can get bypassing the Kernel's networking, but this article isn't it. What this article demonstrates is that in this one very narrow case where you want to always return the same static data, there's vast speed improvements to be had. This doesn't tell you anything useful about the performance in basically 100% of the cases of real-world use of the web, and its premise falls down when you consider that kernel interrupt speeds are unlikely to be the bottleneck in most servers, even caches.

I'd really love to see this adapted to do actual webserver work and see what the difference is. A good candidate might be an in-memory static cache server of some kind. It would require URL parsing to feed out resources but would better emulate an environment that might benefit from this kind of change and certainly would be a real-world situation that many companies are familiar with. Like it or not, URL parsing is part of the performance equation when you're talking HTTP.



> It's not actually an HTTP server though...

Correct, it is a fake HTTP server, serving a real HTTP workload. This post is about comparing two different networking stacks (kernel vs DPDK) to see how they handle a specific (and extreme) HTTP workload. From the perspective of the networking stack, the networking hardware, and the AWS networking fabric between instances, these are real HTTP requests and responses.

> I'd really love to see this adapted to do actual webserver work and see what the difference is.

Take a look at my previous article[1]. It is still an extreme/synthetic benchmark, but libreactor was able to hit 1.2M req/s while fully parsing the HTTP requests using picohttpparser[3].

From what I recall, when I played with disabling HTTP parsing in libreactor, the performance improvement was only about 5%.

1. https://talawah.io/blog/extreme-http-performance-tuning-one-...


ScyllaDB uses Seastar as an engine and the DynamoDB compatible API use HTTP parsing, so this use case is real. Of course the DB has much more to do than this benchmark with a static http reply but Scylla also uses many more core in the server, thus it is close to real life. We do use the kernel's tcp stack, due to all of its features and also since we don't have capacity for a deeper analysis.

Some K/V workloads are affected by the networking stack and we recently seen issues if we chose not the ideal interrupt mode (multiqueue vs single queue in small machines)


Few questions if you will, it's an interesting work and I figure you're on ScyllaDB team?

1. Is 5s experiment with 1s warmup really a representative workload? How about running for several minutes or tens of minutes? Do you observe the same result?

2. How about 256 connections on 16 vCPUs creating contention against each other and therefore skewing the experiment results? Aren't they competing for the same resources against each other?

3. Are the experiment results reproducible on different machines (at first use the same and then similar SW+HW configurations)?

4. How many times is experiment (benchmark) repeated and what about the statistical significance of the observed results? How do you make sure to understand that what you're observing, and hence drawing a conclusion out of it in the end, is really what you thought you were measuring?


Am ScyllaDB but Marc did completely independent work. The client vcpus don't matter that much, the experiment compares the server side, the client shouldn't suck. When we test ScyllaDB or other DBs, we run benchmarks for hours and days. This is just a stateless, static http daemon, so short timing is reasonable.

The whole intent is to make it a learning experience, if you wish to reproduce, try it yourself. It's aligned with past measurements of ours and also with former Linux optimizations by Marc.


I'm myself doing a lot of algorithmic design but I also enjoy designing e2e performance testing frameworks in order to confirm theories I or others had on a paper. The thing is that I fell too many times into a trap without realizing that the results I was observing weren't what I thought I was measuring. So what I was hoping for is to spark a discussion around the thoughts and methodologies other people from the field use and hopefully learn something new.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: