Digg: 40x Performance Increase by Sorting in PHP Rather than MySQL

petercooper · on March 24, 2010

On Feed Digest (a RSS feed processor I sold a few years back but was doing ~300m "digests" per month) I had a similar experience, so I think I know why and what they were doing here..

The problem was that even though I denormalized my "posts" (from RSS feeds) table into a "post contents" and a "post metadata" table, MySQL wants to create a "temporary table" to do sorting and pulls every relevant row into it (I found this was even true with indexed columns at the time - I hope they improved it). This can lead to heavy disk access if the number of rows needing to be sorted is too high (consider 100,000 rows of 150 bytes each.. 15 megabytes, not good for a server under heavy load that needs to return a query in < 0.2 seconds).

My solution was to pull out ONLY the post IDs and date (or whatever column I wanted to sort by) from MySQL, do the sort in memory, then pull out the posts from the database by already ordered IDs (e.g. SELECT * FROM x WHERE id IN(9,4,22,38,4,..)). MySQL is crazy fast at giving you back rows X, Y, Z, and so on, if you specify them directly.

I ended up with many more fast queries rather than fewer deathly slow queries, and that was a great tradeoff in the end. I believe the system still runs that way under its new owner.

rythie · on March 24, 2010

You can only do it on the index, the query from the index will return in the sort order. You have to be very careful in your index and query design to make this work.

For example if your index is (username, date_posted) then you can do something like

  SELECT * FROM table WHERE username='user' ORDER BY date_posted

but if you do

  SELECT * FROM table WHERE username IN ('user','user2') ORDER BY date_posted

then it will need a temporary table.

You could change your index to (date_posted, username) to make it work, but then it will be slow when querying a small percent of the users.

ars · on March 24, 2010

But that doesn't guarantee the results will be in that order. Just because you put your ID's in a certain order in the IN() doesn't mean you'll get them back in that order.

petercooper · on March 24, 2010

Interesting. It's been 3 years since I sold the business and touched the code. The basic mechanic worked as I described, but you're right (just tested), we must have had extra workarounds.

Update: I just realized I have the code in an archive here so I loaded it up. I can't believe this wasn't obvious to me! Since we already had the ordered IDs in memory (to build the SELECT query!) we loaded the posts with a SELECT .. IN into a hash (with the id as the key) then just went through the ordered ID list and pulled out the posts from the hash in the right order! :-)

ars · on March 24, 2010

So then why did you bother with the first query and then the IN() ?

Just get your data directly with a search, then array sort the hash using the hash key.

petercooper · on March 24, 2010

Just get your data directly with a search, then array sort the hash using the hash key.

If I have a table with 10m rows (it was about that) and I want 10 posts from feeds X, Y and Z in reverse date order, to get them in just one query with local sorting I'd have to retrieve every row for feed X, Y and Z down the wire before doing the local date sort.

If I did the sorting with ORDER BY to avoid that issue, the complete JOIN (meta<->full) is used in the temporary sort table so, a bigass temporary table would result for MySQL to do its sorting.. bringing us back to square one :-)

Hopefully I'm not misunderstanding your point though.. it's been known to happen.

ars · on March 24, 2010

Using it with LIMIT makes sense, and I forgot you were doing that.

BTW, you were doing this to avoid using too much memory at the database, but by doing this (assuming it's PHP; you didn't say) you are loading the result set twice into webserver memory.

Obviously it worked for you, maybe the webserver had more free memory.

petercooper · on March 24, 2010

It did (though less total memory). Significantly, though, the memory use was ephemeral at the application server end. Do the collation, push to a memcache server, server, and die :-)

karipatila · on March 24, 2010

You'd have to have something like SELECT id FROM table WHERE id in () ORDER BY FIND_IN_SET(id,""); to get them in a predefined order.

learnalist · on March 24, 2010

I have never used FIND_IN_SET.
But I use FIELD() http://dev.mysql.com/doc/refman/5.0/en/string-functions.html...

SELECT FROM x WHERE id IN(9,4,22,38,4,..) ORDER BY FIELD(`id`, 9,4,22,38,4,..)

That said, I dont know if sql or php is quicker.

adatta02 · on March 24, 2010

in MySQL at least, you can add an ORDER BY FIELD ID (4, 3, 6) to guarantee they are returned in that order.

smackfu · on March 24, 2010

1) Sometimes MySQL is bad at things that databases should be good at it.

2) Moving stuff from your database server to your webserver is actually changing the hardware environment it runs on. Especially relevant if your database server isn't keeping up or is RAM constrained or whatever else is different.

wanderr · on March 24, 2010

It's also worth considering that most big sites have more web servers than MySQL servers, and I'm sure Digg is in that boat as well. Doing your sorts on 20 front end nodes instead of 2-4 MySQL servers is obviously going to help improve performance and scaling of your MySQL servers.

brutimus · on March 24, 2010

This quote rubs me the wrong way a little...

"The relational database tool chain is not evolving. It has failed for large scale, real-time environments."

I've been a big user of Postgres for going on five years now and it has made giant leaps in both features and performance. You can't blindly say relational tools are done evolving. I know the other players have made a lot of progress lately too (Oracle, MySQL, etc).

wisty · on March 24, 2010

Database systems are optimized to store data, and do real-time transactional processing of data.

You can mirror the DB and configure it to do off-line analysis, or use something more suitable, but if it doesn't need to be transactional (and really, the top 10 comments your friends just made don't need to be transactional) then use a more suitable tool with less strict behavior.

rmaccloy · on March 24, 2010

PostgreSQL has gained many nice features (windowing queries, parallel restore, moving towards more standardized master/slave replication, etc), but none of them have substantially affected its suitability as a high-throughput/large-data/(relaxed-consistency)/low-latency datastore. (On the other hand, it has become significantly nicer for BI/OLAP workloads.)

Same for MySQL, modulo (possibly) the Drizzle project.

Oracle has a lot of different irons in the fire, but "cheap scaling" isn't one of them. (If you've got the money to burn... TimesTen?)

That's not a knock on any of the above, it's just a different use case.

cturner · on March 24, 2010

Vertica is designed for scaling. That's based on postgresql and under active development to that end.

rmaccloy · on March 24, 2010

So is Greenplum (more so, actually.)

Neither are useful for the Digg usecase. Edit: That is to say, they're optimized for large datasets, low concurrency, medium latency, medium-to-high consistency and medium-to-low availability.

barredo · on March 24, 2010

Old story. Sorry if it's offtopic:

I was running a internal webapp with a 25M rows in a MySQL table. The query for search was something like (pseudosql ahead):

  select * from table where match title_field against search_term order by id desc limit 20

It took around 10 to 12 seconds (it was a slow machine). When I removed the sorting statement in the SQL query (and so, the limit)

  select * from table where match title_field against search_term

Then sorted the returned results with PHP and get the first 20. It took only 1 second.

rythie · on March 24, 2010

There won't be an index with both the full text search and the id in, so MySQL will have to create a temporary table with the results and sort that and do the limit. If you have a TEXT or a BLOB column or it's too big, MySQL will do that on disk rather than in memory, so it's slow.

I'd suggest using "EXPLAIN EXTENDED" before your "SELECT" to see what it's doing and to consider specifying the columns, though in general I'd do what you did.

ars · on March 24, 2010

You probably needed to run analyze table.

What it probably did was sort by id (in memory), and then run the search (i.e. check each row if it matched, and stopping after 20 rows), not realizing that it should do the search first, and only then sort.

EXPLAIN .... is your friend here.

Without the order by, the only possible index was the search, so it used that.

And digg probably had a similar error, since it makes no sense for php to be any faster in sorting than mysql.

petercooper · on March 24, 2010

since it makes no sense for php to be any faster in sorting than mysql.

Perhaps MySQL has improved lately but working on my working knowledge of about 3 years ago, it can make sense to do sorting externally... if you can work with fewer data to do the sorting externally ;-)

MySQL does/used to pull entire rows into the temporary table created for sorting. So if you have a 10,000 row, 100MB table and you're doing a SELECT that grabs, say, 10% (1000) of the rows and sorts them.. MySQL's temporary sort table will be 10MB (I found this to be true regardless of indexes - which only sped up the creation of the temporary table, but did not reduce its size - but as I was doing complex multi column sorts, it didn't matter anyway).

If you do it in PHP/wherever, you can pull out ONLY the "id" and the column you want to sort by (say, a date), and use only about 1000 * 10 bytesish == 10KB of data to do the id sort. Then you pull the full rows out by id (which MySQL is quick at).

Disclaimer: I last did all this about 3 years ago on an early version MySQL 5 deployment doing about 200 queries a second. So I ain't no RDBMS expert and what I say should probably be taken with a pinch of salt.

ars · on March 24, 2010

This could work, but then you'd need to generate 1000 queries one at a time, meaning lots and lots of disk seeks.

As long as your sort table stays in memory you are fine even if it's a lot of data, but if you hit the disk then you could be right. (But even if mysql says it hits the disk, doesn't mean it really does since the OS could cache it.)

But there are better ways of handling this. Number one being an index on that column.

You could also split the table in two, one part for sort data, the other for the large data. But I would only recommend that in very special cases, and only after careful benchmarking.

Most of the time just increase the maximum size of in-memory temporary tables.

petercooper · on March 24, 2010

This could work, but then you'd need to generate 1000 queries one at a time, meaning lots and lots of disk seeks.

Not with a SELECT .. WHERE id IN (x,x,x,...) type query. But, yeah, sort of. I actually did trial with separate SELECTs for every ID at first and it performed surprisingly well.

As long as your sort table stays in memory you are fine even if it's a lot of data

Trueish. At the time, though, it was a 10-12GB database on a 4GB machine and in 2006 I didn't have the cashflow to get anything better. So the most popular parts of the DB were cached OK, but there was little left for temporary tables. I say "trueish", though, because you're still using more memory bandwidth with temporary tables in this situation and that's an issue sometimes.

Number one being an index on that column.

The problem was that there were about 8 metadata columns and sorts were done with varying combinations of these (though usually one). Indexes became a big problem at a certain point because of how difficult they made it to clear stale data from the database (at one point we were doing 6 hour maintenance periods each month - with caches picking up the slack - to delete old records).

If I were doing it again, I'd do a lot differently (and not use MySQL), of course.. so your suggestions are certainly useful to anyone tackling a similar problem with 2010 eyes.

jseifer · on March 24, 2010

What would you use instead of MySQL?

petercooper · on March 24, 2010

MongoDB is a running joke with you but, seriously, probably MongoDB along with Redis. It wouldn't be using the same algorithms as it did on MySQL, though. There are better ways to structure that sort of data nowadays.

jseifer · on March 24, 2010

It's funny that it comes off as a joke since I really do love it. I mean, it powers http://nyhtp.com. I've had nothing but great experience with Redis, mixed experience with tokyo cabinet and tyrant, and not enough real work experience with Mongo.

petercooper · on March 24, 2010

MongoDB + Redis are proving a winning combination for me so far. Or, rather, the combination of having a fast key/value store with ephemeral key support and a document database is a winner. It gives you two levels and more immediate optimization and separate of concerns opportunities.

brown9-2 · on March 24, 2010

And digg probably had a similar error, since it makes no sense for php to be any faster in sorting than mysql.

The statement here isn't that a sort in php is 4000% faster than a sort in mysql, all things equal.

What is left out of the discussion is the fact that all things are likely not equal, their database servers are way more resource constrained / under heavier load than the web servers. Thus, a CPU-bound operation like sorting will perform better if you can move the operation to a server that has more of the resource in question available.

mrtron · on March 24, 2010

Was the column you were ordering by indexed?

rythie · on March 24, 2010

I don't think you can have the fulltext index in the same index as the id since fulltext indexes don't mix with anything.

barredo · on March 24, 2010

Yes. Fullindex (also MyISAM)

hasanove · on March 24, 2010

You have probably done something wrong if you got figures like that or had a very specific edge case.

akamaka · on March 24, 2010

Whenever I read about Digg's infrastructure, I think of Markus Frind's comments from back in the day. He's the guy who built plentyoffish.com single-handedly. A lot has changed since 2006, but at the time he said that Digg had the worst infrastructure of any major site. A fascinating read, because his basic point is still very relevant today:

"Digg is doomed unless they fire their tech staff." - http://plentyoffish.wordpress.com/2006/10/08/digg-is-doomed-...

figgs · on March 24, 2010

"Scaling practices turn a relational database into a non-relational database."

When will people learn that the 'relation' in relational database comes from the correspondence between database tables and the mathematical notion of relations, not the ability to describe relationships between tables? Unless scaling practices make a relational database stop using tables, the database still being used as a relational database.

whatusername · on March 24, 2010

Around the same time they learn that just because mysql stuggles with something it doesn't mean that all relational databases do.

gnaritas · on March 24, 2010

While you're technically correct, language does evolve and if 90% of developers think relational means foreign keys and normals forms then that's what it means. Words are used to communicate and sometimes get misused and take on a new meaning; that's just the way it goes. I've met very few developers or database guys who know that a relation comes from the relational algebra and has nothing to do with foreign keys or normals forms.

gizmomagico · on March 24, 2010

> Words are used to communicate and sometimes get misused and take on a new meaning; that's just the way it goes.

I sure hope that's not always the way it goes. Because then "then" would soon mean "than" and "than" would mean "then".

urmane · on March 24, 2010

At my first SA job someone was using the database for some cargo-cult manipulation and sorting - it took eight hours. I rewrote it in a few lines of perl, and it took 17 seconds - 12 after removing the debugging print statements :-)

_3ex7 · on March 24, 2010

nessence · on March 24, 2010

If the query cache, key buffer, sort buffer, and various engine configs are too small for the dataset then MySQL will take a less optimal approach. MySQL is an RDBMS for persistent data, not a spreadsheet...

"right tool for the job", etc.

fauigerzigerk · on March 24, 2010

The most popular "solution" to all kinds of scalability problems seems to be to simply not support use cases that don't scale well.

You can't have consistency. You can't query and view your data the way you want to. Only prearranged queries are allowed.

It's all very well to do that, but it's not a better solution to the original problem. It's acknowledging that the original problem is unsolvable or put differently, it's giving up.

Scalability, for me, is making the things scale that I want to do, not doing something that scales.

Sometimes it seems to me that we're heading straight back to the mainframe era where everything that wasn't prearranged, didn't fit the nightly batch window or was otherwise unexpected simply couldn't be done.

The answer to static precomputed reports and highly specialised, highly inflexible data structures were relational database systems and Excel.

easp · on March 25, 2010

And yet we still haven't completely replaced those old, inflexible systems, even as we are inventing new ones.

So, were relational databases and spreadsheets really the answer, or just an alternative?

fauigerzigerk · on March 25, 2010

Nothing is ever completely replaced, but you can answer your own question by asking _why_ the old systems were inflexible and _how_ RDBMS and spreadsheets improved that situation. It's not just some historical freak event like Betamax v VHS. There's logic.

Hard coded access paths and data structures combined with explicit and selective consistency management is always faster and more scalable provided the number of different use cases is small and requirements don't change much. Doing a small number of things on a huge scale justifies the approach that people like twitter are taking.

Unfortunately what scales physically doesn't scale in terms of logical complexity. As soon as a greater variety of tasks, views and analyses have to be supported, productivity plummets and things come to a screeching halt. That's exactly what happened to corporate IT departments back then. Everything took ages to implement because you couldn't just join over or group by some unexpected set of attributes.

As a reaction to that, two things happened. For one, people tried to free their data from the iron grip of centralised IT departments and put it on PCs and into spread sheets that used very generic data structures to support a very large variety of views and analyses. Nothing had to be predetermined. Of course that also led to utter chaos in terms of consistency and it didn't scale to large amounts of data.

The other idea to cope with hard coded information was to separate the logical representation of information from pysical representations and access paths. Relational normalisation doesn't just guarantee consistency, much more importantly it guarantees the productivity of creating a wide variety of views on data. A separate and independent layer contains all the special casing and optimisations for frequent use cases.

I think the idea of modelling information according to a general, formally well defined logic and separating that model from implementation constraints is a timeless one. It's not just "an alternative". It's a fundamental principle of dealing with information and it's not specific to the relational model. I've been doing this for long enough to know that this kind of purity is hard to achieve and has to be comprimised now and then.

Throwing out good design principles is sometimes necessary. I can see why it is necessary for Google, Amazon or Facebook. But it is going to make these behemoths slower and less flexible. It's a price they pay for their size. It makes no sense for small companies to pay that price whilst not benefiting from the same economies of scale.

bluesmoon · on March 24, 2010

4000% sounds so much cooler than 40x

nazgulnarsil · on March 24, 2010

41x one of the reasons I don't like using percents is that people always make mistakes with them. 100% increase is doubling, 200% tripling etc.

gamache · on March 24, 2010

A 4000% increase isn't a 41x increase. It's a gain factor of 41, or a 40x increase. (Not that it even matters three percent either way here.)

bluesmoon · on March 24, 2010

it's also misleading in the sense that there is no "one number". You can't say something increased in performance after just one run. You need a few to know that it's consistent, but at least 200 runs to be statistically sound, and you're not gonna get exactly 40x on each of those runs. Sometimes it will be 40x, sometimes 35x, sometimes 1.5x.

What the article doesn't mention is, was that the highest measured performance increment, or was it the mean or median or most common? Also, what was the margin of error in that reading?

dejb · on March 24, 2010

I wish I was running a site big enough for these scaling issues with MySql to be a problem. I'm sure an SSD drive or a battery backed RAM drive would sustain some pretty serious growth for all but the really mega sites like digg. This makes me wonder if these super scalable systems will ever be made into products suitable for small and medium sites without a team of uber geeks to support it.

megaman821 · on March 24, 2010

I wonder how many sites there are that couldn't be served by MySQL or Postgres with SSDs in a RAID array and 64 gigs of RAM. Of course it is easier to spin up 10 EC2 instances with Cassandra on them, but for the vast majority of sites I don't think relation databases have reached their limits yet.

known · on March 24, 2010

What’s different/better about an in-memory database versus STL or Boost collections, or even just creating my own memory-mapped file(s)?

Any database system goes far beyond giving you a set of interfaces to manage collections, lists, etc. This typically includes support for ACID (atomic, consistent, isolated and durable) transactions, multi-user access, a high level data definition language, one or more programming interfaces (including industry-standard SQL), triggers/event notifications, and more.

http://www.mcobject.com/in_memory_database

ccarpenterg · on March 24, 2010

MySQL is very slow. I've tried MongoDB with the C++ API but I had a lot of "vague" problems and bugs so I gave it up and I think MongoDB is not ready for a production env. However the JSON representation of data is very convenient and very powerful in my case.

I also was playing with Redis and redis-py, but Python is too slow for me. So now at this very moment I'm playing with Memcached and its C API.

Redis and MongoDB look very promising but not reliable yet.

antirez · on March 24, 2010

But reliable enough for Craigslist?

No critical bug found in 1.0 and 1.2 since their release apart for a replication bug happening in corner case conditions.

Please share with us how Redis is not reliable and I'll try to fix this ASAP.

metachor · on March 24, 2010

"I couldn't figure out how to make it work, so it must be broken."

memoryfault · on March 24, 2010

I remember reading a while back how Digg had tables with thousands of columns. I know MySQL isn't the fastest, but a speedup for Digg really isn't that impressive.

jhaile · on March 24, 2010

If sorting in PHP is 4000% faster than sorting in the DB, then it's a failure of the DB. That shouldn't be the case.

jhaile · on March 24, 2010

I guarantee that sorting in PostgreSQL is faster than doing it in PHP. MySQL is the problem, not databases in general.

audidude · on March 24, 2010

i thought slashdot figured this out like a decade ago.

leff_f · on March 24, 2010

Old news... We all know what a real PHP coder must write their on DBMS... mostly because sorting is just aweful in Mysql.