Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I deal with a lot of business people who have processes that rely on 15th/85th percentile, or 25th/75th percentile. They want to see the median, the low/high percentiles, the max/min or outliers, and they don't want to see all the data points jittered in between. It's just overwhelming extraneous information. They in fact like tables with those numbers written down, but they want to compare ten different (time series of historical prices for different markets) and see it on one Powerpoint slide. The box plot allows a fast visual comparison of medians and other key percentiles (label the plot with the percentiles if you're doing something non-standard!). With jitter or violin they get hung up on weird random stuff and it derails meetings.

Important caveats: the generating processes for all these quantities are the same in a physical sense, so they are comparable. All the distributions are roughly lognormal-ish, so they are single-peaked distributions, as folks are discussing here. The point of the visualization in theses cases is not to understand the properties of the distribution per se, it's to show the important percentiles because they have business implications.



That’s a good explanation, thank you!


Who drew those boundaries at 15/85? What makes those boundaries useful or correct?


It sounds like they are business relevant parameters. They are self selected and independent of the data or distribution.

The point is that they are parameters of relevance to observer.

I work in medicine sometimes work with box-plots for this reason. The questions "what is the 25th percentile outcome" is perfectly legitimate




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: