Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While I agree with the author in principle, I think there is an implicit criteria they ignore, which is the intuitive correctness from the perspective of the user.

Imagine a user chooses "Sort by rating", and they subsequently observe an item with an average 4.5 ranking above a score of 5.0 because it has a higher Wilson score. Some portion of users will think "Ah, yes, this makes sense because the 4.5 rating is based on many more reviews, therefore its Wilson score is higher." and the vast, vast majority of users will think "What the heck? This site is rigging the system! How come this one is ranked higher than that one?" and erode confidence in the rankings.

In fact, these kinds of black-box rankings* frequently land sites like Yelp into trouble, because it is natural to assume that the company has a finger on the scale so to speak when it is in their financial interests to do so. In particular, entries with a higher Wilson score are likely to be more expensive because their ostensibly-superior quality commands (or depends upon) their higher cost, exacerbating this effect due to perceived-higher margins.

So the next logical step is to present the Wilson score directly, but this merely shifts the confusion elsewhere -- the user may find an item they're interested in buying, find it has one 5-star review, and yet its Wilson score is << 5, producing at least the same perception and possibly a worse one.

Instead, providing the statistically-sound score but de-emphasizing or hiding it, such as by making it accessible in the DOM but not visible, allows for the creation of alternative sorting mechanisms via e.g. browser extensions for the statistically-minded, without sacrificing the intuition of the top-line score.

* I assume that most companies would choose not to explain the statistical foundations of their ranking algorithm.



In another article, the author (Evan Miller) recommends not showing the average unless there are enough ratings. You would say "2 ratings" but not show the average, and just sort it wherever it falls algorithmically.

https://www.evanmiller.org/ranking-items-with-star-ratings.h...

In that article, he even includes a formula for how many ratings you'd need:

> If you display average ratings to the nearest half-star, you probably don’t want to display an average rating unless the credible interval is a half-star wide or less

In my experience, the second article is more generally useful, because it's more common to sort by star rating than by thumb-up/thumb-down ranking, which is what the currently linked article is about.

And the philosophical "weight on the scale" problem isn't as bad as you'd think when using these approaches. If you see an item with a perfect 5-star average and 10 reviews ranked below an item with a 4.8-star average and 1,000 reviews, and you call the sort ranking "sort by popularity," it's pretty clear that the item with 1,000 reviews is "more popular."


> Imagine a user chooses "Sort by rating", and they subsequently observe an item with an average 4.5 ranking above a score of 5.0 because it has a higher Wilson score. Some portion of users will think "Ah, yes, this makes sense because the 4.5 rating is based on many more reviews, therefore its Wilson score is higher." and the vast, vast majority of users will think "What the heck? This site is rigging the system! How come this one is ranked higher than that one?" and erode confidence in the rankings.

It also erodes confidence in ratings when something with one fake 5 star review sorts above something else with 1000 reviews averaging 4.9.

I think you're mainly focusing on the very start of a learning curve, but eventually people get the hang of the new system. Especially if it's named correctly (e.g. "sort by review-count weighted score").


I'd opt for a simpler and less precise name like "Sort by Rating", but then offer the more precise definition via a tooltip or something, to minimize complexity for the typical user but ensure that accurate information is available for those who are interested.


Do you think the average user wants to see the thousands of movies on IMDB with one 10-star rating to be shown above The Godfather?

This is already done to a degree on most sites. The author is just describing a better possible way to do it.


No, did I suggest that approach?


Better in my opinion to give an item a rating until it has some number of reviews. You can still show the reviews, but treat it as unrated.


I prefer to call it "Sort by Popularity."


I don’t like that measure because popularity doesn’t translate into “good”.

What’s the most popular office pen? Papermate, Bic? I may be looking for more quality.

What’s the most popular hotel in some city? Maybe I’m looking for location or other aspects other than popularity among college kids.


When you use the OP article's formula, you're sorting by popularity. You may choose not to sort by popularity, but when you use it, you should call it sorting by "popularity."


Popularity could also mean number of purchases/downloads/clicks. With no regard for review scores. Or at least it is unclear to me.


Not having faith in the user is a giant step towards mediocrity. Does a weighted average provide better results? Then use a weighted average! The world isn't split into an elite group of power users and the unwashed masses. There are just people with enough time and attention to fiddle with browser extensions, and everyone else. And all of them want the best result to show up first.

Yelp didn't get dinged because their algorithms were hidden. They lost credibility because they were extorting businesses. Intention matters.


I don't think this is an easy problem to solve.

The inherent problem to me seems like we're trying to condense reviews into a tiny signal of an integer in the range of 1 to 5.

For many things, this simply doesn't cut it.

2 stars, what does that mean? Was the coffee table not the advertised shade of grey? Does the graphics card overheat on medium load because of a poor cooler design? Was the delivery late (not related to the product, but many people leave these kinds of reviews)? Did you leave a 2 star review because you don't like the price but you didn't actually order the product?

All these things I've seen on reviews and I've learned to ignore star ratings because not only they can be gamed, they are essentially useless.

Props to users who take the time to write out detailed reviews of products which give you an idea of what to expect without having to guess what a star rating means, although sometimes these can be gamed as well as many sellers on Amazon and such will just give out free products in exchange for favourable reviews.

Being a consumer is not easy these days, you have to be knowledgeable in what you're buying and assume every seller is an adversary.


My goto method for reading reviews is sort by negative and only look at detailed reviews, or at least ones that explain what they thought was lacking. Often the 1/2 star reviews have fair points but might be for a use case I don't care about or similar. This generally gives me an idea of the actual pros and cons of the product as opposed to just a vague rating.

That's assuming you can trust the reviews themself of course.


The problem with having faith in your users is you have to actually do it. If you're sorting by Wilson score when the user clicks a column that displays a ranking out of five, then you're mixing two scores together in a frustrating way because you think your users are too dumb to understand.

There has to be a way to let users choose between "sort by rating, but put items without many reviews lower" and "sort by rating, even items with only one or two reviews" in a way that helps give control back to them.


The way I've seen it done is a single column with avg stars + # reviews, which isn't clickable, because why would you want to sort by minimum ranking?


> why would you want to sort by minimum ranking?

To read reviews of awful products for entertainment, I guess?


This is a fair point but it's not as if knowing with items are actually good is something that should only be available to power users. The real goal ought to be: making sure your customers get access to actually good things. Not merely satisfying what might be some customers' naive intuition that things with higher average ratings are actually better.

I think there's better approaches that can be taken here to address possible confusion. E.g., if the Wilson score rating ever places an item below ones with higher average rating, put a little tooltip next to that item's rating that says something like "This item has fewer reviews than ones higher up in the list." You don't need to understand the full statistical model to have the intuition that things with only a few ratings aren't as "safe".


This is a UX problem, which can be solved by not showing the exact rating, but showing a "rating score" which is the Wilson score.


OP addressed that:

> So the next logical step is to present the Wilson score directly, but this merely shifts the confusion elsewhere -- the user may find an item they're interested in buying, find it has one 5-star review, and yet its Wilson score is << 5, producing at least the same perception and possibly a worse one.

Though I'm not convinced how big of a deal this is. Even if you're worried about this, a further optimization may be to simply not display the score until there's enough reviews that it's unlikely anyone will manually compute the average rating.


You could probably get around this by

A) labelling 1-2 review items with "needs more reviews" message

Or B) not giving an aggregate review score for low review items. Actually replacing the review star bar with "needs more reviews". Then when the user goes from the listing page to the detail page, you can show the reviews next to a message saying "this item only has a few reviews, so we can't be sure they're accurate until more people chime in"


C) normalizing the display of stars to the score


I think this can be solved with better UI: Instead of stars, show a sparkline of the distribution the of scores. The user can then see the tiny do representing the single 5 star review and the giant peak representing the many 4 star reviews.


While I agree with your comment in principle, is any shopper happy with the utterly useless ranking system that everybody uses now?

Ranking a 4.9 star item with 500 reviews above a 5 star item is intuitive to many already, and will become intuition quickly for everyone else because it’s broadly more useful. The average customer doesn’t care that much how the sausage is made, they care about quality of the results.

Ranking items is basic functionality and it’s broken across the web. It shouldn’t be a feature that’s only available to users willing and able to fiddle with browser extensions.


If you don’t provide a “Sort by rating” option but instead include options like sort by “popularity,” “relevance,” “confidence,” or similar, then it is more accurate description, more useful to the user, and not so misleading about what is being sorted.

I agree that if I “sort by rating” then an average rating sort is expected. The solution is to simply not make sorting by rating an option, or to keep the bad sorting mechanism but de-emphasize it in favor of the more useful sort. Your users will quickly catch on that you’re giving them a more useful tool than “sort by average rating.”


To me "rating" is pretty clear cut. I expect some sort of ranking based on the ratings provided by users.

"relevance" and "confidence" can mean a lot of different things and I tend to expect those types of sorts to be gamed by the site in order to promote whatever they'd prefer that I buy. For example, assuming an equal number of reviews a site could decide a more expensive item rated at 4 stars is more "relevant" than a cheaper item with a 5 star rating.

If it's not explicitly explained what determines confidence and relevance and/or users don't have the ability to access the information used to assign those scores it degrades trust that the results being promoted are genuinely beneficial to the user vs the website/service.

Amazon for example uses "featured" which is transparently gamed in Amazon's favor and Avg. Customer Review which should be clear enough and remove most of the worst items and the number of reviews is easily seen in the list as well (although the legitimacy of reviews still has to be considered and there are a lot of other problems with the way amazon handles reviews in general)

Generally I'll sort by rating and look deeper at the reviews for the ones with both high ratings and a high number of reviews. It's not perfect, but it makes a great starting point.


I think you're overemphasizing the confusion that an alternate ranking schema would cause. We have Rotten Tomatoes as a very obvious example of one that a lot of people are perfectly happy with even though it's doing something very different from the usual meaning of X% ratings.

I feel like all that's really needed is a clear indicator that it's some proprietary ranking system (for example, "Tomatometer" branding), plus a plain-language description of what it's doing for people who want to know more.


That's a really good point. I wonder if folks would intuitively get it if you provided little data visualization (visible on hover or whatever). Like:

Result 1: (4.5 )

Result 2: (5.0 )

edit: HN stripped out the unicode characters :(. I was using something like this: https://blog.jonudell.net/2021/08/05/the-tao-of-unicode-spar....


I worked on an e-commerce site that attempted to solve the issue by simply not giving an average rating to an item until it had a certain amount of reviews. We still showed the reviews and their scores, but there was no top level average until it had enough reviews. We spent a lot of time in user testing and with surveys trying to figure it how to effectively communicate that.


In order to deal with that, I would place two sorting options related to average: - regular average - weighted average (recommended, default)

Then the user can pick the regular average if they want, whereas the so-called weighted average (the algorithm described in the article) would be the default choice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: