> humans have to deal with the consequences of "dumb" actions (e.g. by looking t...

> humans have to deal with the consequences of "dumb" actions (e.g. by looking through their spam folders for false positives)

Email programs generally have a mechanism for reviewing email and changing the classification. I think your "pure-AI" phrase describes a system that doesn't have any mechanism for reviewing and adjusting the machine's classification. The fact that a spam message winds up in your inbox sometimes is probably that low-confidence human-in-the-loop process we've been talking about. I'm sure that the system errs on the side of classifying spam as ham, because the reverse is much worse. Why have two different interfaces for reading emails, one for reading known-ham and one for reviewing suspected-spam, when you can combine the two seamlessly?

Perhaps you've confused bad user interface decisions for bad machine learning system decisions. I'd like to see some kind of likelihood-spam indicator (which the ML system undoubtedly reports) rather than a binary spam-or-not, but the interface designer chose to arbitrarily threshold. I think in this case you should blame the user interface designer for thinking that people are stupid and can't handle non-binary classifications. We're all hip to "they" these days.