None of these algorithms take into account the number of views for each item. Wo...

stratomorph · on Sept 1, 2009

I think that would be a helpful factor in theory, but in practice would be too inconsistent to rely on. Using Reddit as an example (because I have no familiarity with HN's source) there is client-side javascript that intercepts clicks on links and appends story IDs to one of the Reddit cookies. Next time a request goes to Reddit, they get a list of recent clicks.

This can fail in a lot of ways, most obviously if Javascript or cookies are turned off. Also, the cookie isn't sent to Reddit until I load another page, so if I read an entire page of links and then close the browser without refreshing the page, the cookie doesn't get sent. Plus, the script clips the list around 20 elements, so even if I did refresh Reddit, it wouldn't know I'd clicked on more than 20.

My point is not the numerous weaknesses of Reddit's approach. Instead, it's self-reported information that must necessarily be suspect and incomplete. If an article on an obscure programming language pops up here, and every single person who reads it uses Lynx with cookies turned off for security, there might be no opportunity to record any views.

roundsquare · on Sept 2, 2009

I would think that would bias your algorithm towards articles with great headlines but not really great content.