What this article fails to note is that writing C++ code to serve up web pages would be much less efficient from a human time perspective. You'd need many more developers working, and the average American emits 20 tons of CO2 per year. So if Facebook has 100 developers hacking PHP code, they'd probably need at least 200, or possibly 300 developers hacking C++ code. 100 developers is 2,000 tons of CO2 per year (at average rates). From an economic standpoint, it would make much more business sense to use carbon offsets. Not to mention that a 10x end-to-end efficiency improvement is AWFULLY optimistic. I'd say 2x or 3x at best.
A "web server" in Facebook's cluster is just Apache+PHP. It doesn't do any disk work. It's not reading anything from disk, as that gets forked over to memcached or MySQL. If you've got thousands of boxes in this arrangement, clearly reducing the CPU usage thru whatever means will get you more efficiency.
Even on a diskless machine, the limit still is not necessarily "CPU usage". The machine might be limited by I/O bandwidth -- both to the clients and to those memcached and MySQL servers. It might be limited by the amount of memory that must be dedicated to each connection or process, even while that process is sleeping on I/O (which is most of the time).
Most likely, the overhead of running a server is so large that the power consumption problem is all about saving servers, rather than saving CPU cycles. In which case these other variables are just as likely to be a big concern as the speed of your HTML template's WHILE loops.
I totally agree, which is why a 10x speedup is ludicrous. The C++ code would likely just accelerate the code to the point where it was network and/or memory bound. I think a 2-3x improvement is warranted given all of the things I know about the Facebook architecture.
The webserver will only do computations which data they get from other services (memcache, etc...). They will never fetch the data locally, so there is no local bottleneck except the local bandwith and the efficiency of the code, thus the cpu.
then only logical reasoning behind is cheap machines for webservers and very powerful ones for "data" servers. otherwise network bandwith is only a fraction cpu-storage one.
But American's who would code in C++ already exist and are emitting CO2 regardless of what they are doing. As long as there is unemployment, there would be no net gain in CO2 by switching to C++.
One could argue that any resource Facebook was using, human, energy, physical, or otherwise, is going to get consumed in some form regardless. What the article focuses on is Facebook's operation itself, not the net impact. Also, the unemployment rate of programmers in general is relatively very low, explaining the mean annual wage of $87,900.
Facebook makes heavy use of, and is one of the major contributors to, APC, the opcode cache. Most of the time, Facebook is hitting this cache, so its interpreted load is much, much less than this article estimates.
In other words, this article's author is railing against some imagined version of Facbook running some imagined implementation of PHP, not the Facebook that actually exists running PHP with opcode caching and memcached.
APC has lots of users. Judging the userbase of a piece of software by the rate of major releases is like judging the speed of your car by the rate at which it breaks down.
It's an opcode cache. It doesn't need to gain features until it can read email.
The reference used to justify the 10:1 superiority of C++ http://shootout.alioth.debian.org/u32q/benchmark.php?test=al... seems unlikely to have much bearing in this case. I doubt Facebook is calculating the Mandelbrot set or traversing huge binary trees in PHP.And as others have pointed out, CPU is hardly the only thing going on here.
Useful real-world benchmarks of even simple real-world systems are rare and require a lot of resource (mostly human) to construct. Here's an example benchmark I proposed once. http://jackfoxy.blogspot.com/2009/10/time-for-real-world-lin... At the time I couldn't find any bechmarks anything like this. I haven't tried looking since, but I suspect no one has bothered. It's a lot of work.
It gets you closer to the theoretical maximum of your network and memory busses. When a single machine can generate queries and lookups and log and render results faster, you need fewer machines to handle a given load. I know we increased our web front end capacity by 5x when we migrated from PHP to Java that runs backend queries asynchronously, so we reallocated some boxes and now we won't need to light up new hardware for a while. The inefficiency of PHP really adds up when you reach hundreds of cores.
That sounds like an architectural change, not a problem with PHP's runtime.
It's quite possible that the solution to your particular problem can be better expressed in another language than PHP, or will run faster under a different language's runtime. This doesn't mean that PHP "is slow". It might mean that your problem is an imperfect fit for a PHP-based architecture at your particular scale.
We have made architectural changes, but not that day. We literally had a drop-in replacement that did the same work but sustained 5x the request volume before saturating on the same machines, with the rest of the datacenter unchanged. (The async client was merely for latency, not throughput.)
I don't believe any PHP runtime can ever be the fastest solution for a problem. The language requires runtime type checks and coercions inside the operators and builtin functions, both keyed and ordered access to every collection, ad hoc creation and destruction of variable bindings by name, and dynamically switching to reference semantics for any existing value. Each of these imposes a cost on passing arguments and evaluating expressions, and I didn't find any of them paid off by making code much simpler.
On a related note, I suggest the everyone run Wattzon.org to see where their carbon footprint comes from. I found that while I was a much lower user on everything else, heat accounted for 40% of my entire carbon footprint. Immediately I ran around the house and fixed the duct vents that I could close before, and now I'll have probably halved that, saving me $15 a month, $180 a year, and 20% of my carbon footprint.
Why rag on Facebook? Smells like linkbait to me, why not argue the general case? At least Facebook is using their servers, what about all the unused servers? Old servers that could be consolidated? Servers running even more inefficient code than PHP?
Good point. But I fail to see why C++ is the solution, when there are plenty of other languages that execute just as efficiently but do not have C++'s shortcomings. I would pick Haskell, but even good-old-Java will do just as well as C++ for Facebook's workload.
If facebook has 30K servers, I highly doubt they have 25K of them being webserver.. Hyves (a dutch social network) at some point explained about 1 in 4 servers was webserver. The rest was storage, db, caching, proxys, chat, whatnot.