Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have to ask, is there even a practical purpose for this? Is there even some remote screwball application for having to have one machine handle 1,000,000 requests? One of the things I like about coming to HN is that the items on the front page are often actionable pieces of advice or clever and interesting hacks. I don't feel that "$LANG can do $LARGE_NUMBER of things" fits the bill. For example:

  C - Handling 1 Million Concurrent Connections 
  Java - Handling 1 Million Concurrent Connections 
  Javascript - Handling 1 Million Concurrent Connections 
  Go - Handling 1 Million Concurrent Connections
If it was 10^10 connections, now we are talking about some clever hacks to get that to work.


WhatsApp had over 2 million users connected to their (Erlang) server last year.

http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2...

Here is the same in Erlang for reference (from a few years ago, I would be interested to see if there is a more efficient way now):

http://www.metabrew.com/article/a-million-user-comet-applica...


Thank you so much for that last link especially. Interesting stuff :-)


start with Ruby as backend for online games...

continue with Ruby as backend for audio/video chats...

consider Ruby for streaming podcasts...

etc. etc.


What happens when one of the fully loaded 1 million connection nodes goes bang? That's potentially a million users getting a poor experience.

Re-establishing a million connections at once is going to be hard on the network - the million were built up over a period of time previously yet now they're being re-established Big Bang style.


For any given user the probability of the one machine with everyone on it going bang is similar to the probability of the particular server that they were connected to in a horizontally scaled scenario. However the cost of redundancy may be higher if it is a replication of 100% of main system on the other hand a big system may be designed for high uptimes.


Would the probability not be less in this case? In general, less moving parts = less chance of outage. E.g. if a device is rated for 300,000 hours MTBF and you have 2 of them, their individual MTBF remains the same, but your chance of experiencing an outage in either one has doubled because you have 2 of them.

It's more the impact side of the risk equation i'm thinking of than the probability.

EDIT: typo


Depends whether looked at from the ops point of view or the end user point of view. You expressed concern about 1 million customers simultaneously having a bad experience. For a given end user if the hardware is equally reliable the odds of something happening are the same whether they are sharing with 1 million or 1 hundred thousand (or even have the server to themselves). On the ops side there is more to go wrong and failures will be more frequent but affect less end users each time.

The positive in the one big machine scenario is that you have potential to take strong efforts to keep it reliable. The advantage in the lots of machines scenario is that there is a better chance you have well tested failover solutions.

It is the combination of impact and risk that I am discussing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: