No, it really isn't. It is perfectly normalizable. Every twitterer is a newslett...

evanelias · on Nov 22, 2022

The approach you're describing (fully materialized inbox model) simply doesn't scale. I say this as someone who worked on social media databases and scalability for a decade. Otherwise, every time Obama or Musk tweets, you need to do one hundred million writes. That amount of write amplification is completely ludicrous and would crush any system.

At minimum you need to do a hybrid approach which special-cases the more widely-followed users. This problem has been well-known for quite a long time. Yahoo's "Feeding Frenzy" whitepaper came out in 2010, but the concepts were definitely known before that; I remember hearing about hybrid activity feed designs in 2009 from colleagues who formerly worked on LiveJournal.

recuter · on Nov 22, 2022

> Otherwise, every time Obama or Musk tweets, you need to do one hundred million writes. That amount of write amplification is completely ludicrous and would crush any system.

I knew somebody would bring that up. Currently there are fewer than 500k "verified users". Not that many people have 1 million+ followers and they don't tweet all that often.

> At minimum you need to do a hybrid approach which special-cases the more widely-followed users.

Great, so we are in agreement that for 99.75% of users it is all quite trivial.

I'll tell you what has happened since 2010: hardware has gotten a lot faster. 1 million iops is not a big deal anymore. Keeping that in mind you should refresh your assumptions.

https://aws.amazon.com/blogs/storage/aws-san-in-the-cloud-mi...

https://spdk.io/news/2021/05/06/nvme-80m-iops/

  We built a 2U Intel Xeon server system capable of 80 MILLION 512B random read I/O operations by combining the latest 3rd Generation Xeon Scalable Processor (code-named Ice Lake) with Intel Optane SSDs.

https://twitter.com/axboe/status/1554115250588471297

  122M IOPS in 2U, with > 80% of the system idle. Easy.

  Just to put this into perspective, at 4K random reads, this is 144GB/sec of bandwidth from storage, at 36M IOPS.

  Fancy 512b random reads? You now get ~120M IOPS.

  You could saturate streamed network traffic on 11-12 100Gbit NICs.

evanelias · on Nov 22, 2022

You can't just hand-wave away those edge cases. That's not how anything works.

If your potential write amplification for a single operation is anywhere remotely near a factor of 1 million, you have a serious problem and need to completely change your approach to the problem, and use different data structures and algorithms.

Hardware hasn't gotten that much faster really. PCIe flash cards started to get used over a decade ago -- yes, modern storage is better than those were, but not by a huge multiple. And meanwhile max CPU frequencies today aren't much higher at all. What we have instead is more cores. And a lot more RAM per box. Faster networking, sure. But none of this lets you get away with massive write amplification from choosing an overly-naive algorithm.

And iops aside, even just the storage capacity from full inbox materialization (along with necessary indexing overhead) will bankrupt you, especially on that blazing fast storage you keep talking about. Keep in mind everything needs to be replicated to multiple regions / data centers for DR/HA, as well as keeping the data closer to users to lower the latency.

I'm not making "assumptions" that I need to "refresh". I've literally spent the majority of my career working on this stuff at extreme scale, both in 2010 and today, and all times in between.

recuter · on Nov 22, 2022

> And iops aside, even just the storage capacity from full inbox materialization (along with necessary indexing overhead) will bankrupt you, especially on that blazing fast storage you keep talking about.

The twitter firehose is usually bellow 50MB/s. 200GB a day of tweets will bankrupt no one. An 100TB Nimbus Exadrive that does 100,000+ iops costs about $30,000. 1 year of tweets. Thousands of twitter employees fired probably saves $3+ billion/year in salaries, I'm sure they have a hefty enough hardware budget.

> Keep in mind everything needs to be replicated to multiple regions / data centers for DR/HA, as well as keeping the data closer to users to lower the latency.

Does it really? I don't think so. Not with tweets.

For DR you can stream into a blob store like s3 in the background and have an automated process that stands up a fresh shadow cluster from it every couple of hours. Hardly costs anything with this volume of data. That is cold data, doesn't need the fancy blazing fast storage.

evanelias · on Nov 22, 2022

> The twitter firehose is usually bellow 50MB/s

That's a single copy of all tweets. Not fanning out to up to 100 million inboxes. Completely different problems.

You keep citing raw hardware speeds of a single machine, yet we're talking about the feasibility of a distributed system being able to sustain random bursts of write amplification factor of 100 million across a decentralized database, with ideally exactly-once write semantics even if a failure occurs mid-way -- and that's all in addition to whatever the normal baseline write activity of all "normal" users with more reasonable follower counts. Again, completely different problems.

> I don't think so. Not with tweets.

So in your design, if the singular data center that maintains all users' inboxes goes offline for a long period of time, the entire product just goes down. And you think that's acceptable for a business valued in the tens of billions of dollars?

You seem absolutely convinced that a massive social network can be run on a shoestring budget with tiny staff, and no amount of evidence from someone like me (who actually worked on this stuff in depth, and posts with my real name, and expertise in profile) will convince you otherwise, so I suppose I should just stop replying to you.

recuter · on Nov 22, 2022

> yet we're talking about the feasibility of a distributed system being able to sustain random bursts of write amplification factor of 100 million across a decentralized database, with ideally exactly-once write semantics even if a failure occurs mid-way

I think my assumptions are just a lot more relaxed than yours. This isn't a trading platform I don't see why you need exactly-once write semantics.

> Not fanning out to up to 100 million inboxes.

https://en.wikipedia.org/wiki/List_of_most-followed_Twitter_...

There are only 6 users with 100m+ followers and they avg a lot less than a daily tweet. @BBCWorld is #50 and it drops to 38m accounts. #1000 has 2 million followers.

> And you think that's acceptable for a business valued in the tens of billions of dollars?

Elon by his own admission grossly overpaid for it. Twitter has hardly ever eeked out a profit, it is not worth tens of billions of dollars. Nothing much would happen if it went down for a bit except maybe bored journalists would report on it thus, as ever, driving even more users to the website. But that is neither here nor there.

More to the point: if Elon and Obama and Bibster tweeted in the same minute (what are the odds) you would, gasp, have to stagger the fan out of the updates. That's alright too, for Twitter. It isn't really actually real time.

Those follower counts are also grossly inflated and as you understand yourself only a small fraction of them are online using the app at the same time as the person is tweeting. By the time they do check they might never even see the tweet.

To the people offline you don't need to fan out in a timely manner.

In short I believe the write amplification is much closer to 1 million than 100 million even with the pathological cases. And beefy enough hardware can handle those peaks.

Here's another way to think about it: Elon has 118m followers and just posted twitter has 260m daily average users. He is a bit like Tom from MySpace, half the users on the website are subscribed to his updates (not exactly really but for simplicity).

I think it is perfectly alright if it takes a full minute until all those users see his latest meme. It is very unlikely that even a quarter of all his followers are using the app during that exact minute, so we're talking 30m writes in 60 seconds. Big whoop.

> You seem absolutely convinced that a massive social network can be run on a shoestring budget with tiny staff

I would bet a budget of say <$1 billion/year and 100 engineers for the core functionality as is.

> no amount of evidence from someone like me (who actually worked on this stuff in depth, and posts with my real name, and expertise in profile) will convince you otherwise

Neither one of us presented any evidence, just opinions as outsiders, as part of an informal conversation. An appeal to authority isn't an impressive argument, I am also from this industry and with similar experience.

There is no need to take things personally. I think we just have a very different estimation of just how much activity twitter sees at peak and how strict the requirements are.

evanelias · on Nov 22, 2022

> I don't see why you need exactly-once write semantics.

World leaders use Twitter. It's a major international one-to-many communication platform. If tweets are lost or duplicated, it makes the platform look unreliable (because it literally would be) and as well as potentially making the tweeter look incompetent for posting twice. World leaders don't like to look incompetent, that can cause really bad things to happen...

> @BBCWorld is #50 and it drops to 38m accounts. #1000 has 2 million followers.

Even a write amplification factor of 100,000 is extremely problematic for the fully-materialized inbox model. A lot of prominent twitter users have followings larger than that.

> To the people offline you don't need to fan out in a timely manner.

So now you're adding additional systems on top, in order to scale. That's good, I guess you're starting to see that the problem is more complex than just spraying out every tweet to every follower's inbox. Now consider that when you actually build and scale a system like this, you'll need to keep doing that in a bunch of different areas, and the complexity keeps snowballing.

> And beefy enough hardware can handle those peaks.

There's no way to fit every users' fully-materialized inbox feed on one machine, so we're definitely talking about a large distributed storage tier / database here. Will you use "beefy" hardware for every single shard of your inbox storage tier?

> It is very unlikely that even a quarter of all his followers are using the app during that exact minute, so we're talking 30m writes in 60 seconds. Big whoop.

Once again, this really isn't like doing 30m write ops on a single box. It's queueing the writes via RPCs across a huge storage tier, while also needing some way to handle timeouts, retries, failovers on either side of the operation. All while the "normal" background level of thousands of tweets per second is happening from everyone else.

> An appeal to authority isn't an impressive argument, I am also from this industry and with similar experience. > There is no need to take things personally.

I've literally built a reverse-chronological social network activity feed implementation, which successfully scaled to over 110 million posts/day. (For sake of comparison, Twitter was around 500 million tweets/day at that time, so this was def smaller than Twitter, but still quite large.) It did not use an inbox model. Took many months of my life, some of the most rigorous work I've ever done. My teammates and I evaluated several alternative designs, including fully-materialized inbox, running all the numbers in depth and building several prototypes. The takeaway was that a naive fully-materialized inbox would be completely and ludicrously infeasible in terms of necessary hardware footprint.

Separately, I've also spent years working on database infrastructure at extreme scale, including one of the largest relational database footprints on earth. I have a very good sense of what this requires. Yes, I'm posting "opinions", but they are based on many years of direct personal expertise.

Scaling a social network involves a massive number of challenging problems. Faster hardware doesn't magically make these problems go away. And while I haven't worked at Twitter, up until this month I knew four infra/backend engineers working there, and they're some of the best engineers I've ever known in my 17 year career.

I'm taking your comments personally because your comments are offensive. You're blindly saying I need to "refresh [my] assumptions" about a topic I'm literally an expert in. You're claiming Twitter could use some completely asinine overly-simplistic feed model, as if no one else ever thought of that, which would strongly imply every infra engineer at Twitter must be an idiot. In another subthread on this page, you wrote "The job cuts are clearly justified because of the extremely toxic work culture / cult" and it is necessary to "replace every single person who worked there and the entire tech stack". Seriously, WTF? These are hard-working humans with lives and families, they don't deserve this shit from their employer, and certainly not from offensive pseudonymous randos who have no idea what they're talking about. Have some empathy.