Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The description from the summaries sound very flawed.

1. They only tested 2 Radiologists. And they compared it to one model. Thus the results don’t say anything about how Radiologists in general perform against AI in general. The most generous thing the study can say is that 2 Radiologists outperformed a particular model.

2. The Radiologists were only given one type of image, and then only for those patients that were missed by the AI. The summaries don’t say if the test was blind. The study has 3 authors, all of which appear to be Radiologists, and it mentions 2 Radiologists looked at the ai-missed scans. This raises questions about whether the test was blind or not.

Giving humans data they know are true positives and saying “find the evidence the AI missed” is very different from giving an AI model also trained to reduce false positives a classification task.

Humans are very capable at finding patterns (even if they don’t exist) when they want to find a pattern.

Even if the study was blind initially, trained humans doctors would likely quickly notice that the data they are analyzing is skewed.

Even if they didn’t notice, humans are highly susceptible to anchoring bias.

Anchoring bias is a cognitive bias where individuals rely too heavily on the first piece of information they receive (the "anchor") when making subsequent judgments or decisions.

They skewed nature or the data has a high potential to amplify any anchoring bias.

If the experiment had controls, any measurement error resulting from human estimation errors could potentially cancel out (a large random sample of either images or doctors should be expected to have the same estimation errors in each group). But there were no controls at all in the experiment, and the sample size was very small. So the influence of estimation biases on the result could be huge.

From what I can read in the summary, these results don’t seem reliable.

Am I missing something?



The did NOT test radiologists. There were NO healthy controls. They evaluated AI false negative rate and used exclusively unblinded radiologists to grade the level of visibility and other features of the cancer.

Utility of the study is to evaluate potential AI sensitivity if used for mass fully automated screenings using mammography data. But says NOTHING about the CRUCIAL false positive rate (no healthy controls) and NOTHING about AI vs. human performance.

See my main comment elsewhere in this threat.


Huh? I was commenting that there were no controls and the doctors were given skewed data, so any conclusions of ai ability vs Dr ability seem misplaced. Which seems to be what you just said… so I am confused about what I said that was inaccurate.

Can you clarify?

I also hinted at the fact that I only had access to the posted summary and the original linked article, and not the study. So if there is data I am missing… please enlighten me.


I was just reinforcing that point as your comment was worded in a way that left room for doubt. Sorry if this came across as critical toward you or implying you held a different interpretation.


This article is about measuring how often an AI missed cancer by giving it data only where we know there was cancer.

> Am I missing something?

Yes. The article is not about AI performance vs human performance.

> Humans are very capable at finding patterns (even if they don’t exist) when they want to find a pattern

Ironic


The article has the headline "AI Misses Nearly One-Third of Breast Cancers, Study Finds".

It also has the following quotes:

1. "The results were striking: 127 cancers, 30.7% of all cases, were missed by the AI system"

2. "However, the researchers also tested a potential solution. Two radiologists reviewed only the diffusion-weighted imaging"

3. "Their findings offered reassurance: DWI alone identified the majority of cancers the AI had overlooked, detecting 83.5% of missed lesions for one radiologist and 79.5% for the other. The readers showed substantial agreement in their interpretations, suggesting the method is both reliable and reproducible."

So, if you are saying that the article is "not about AI performance vs human performance", that's not correct.

The article very clearly makes claims about the performance of AI vs the performance of doctors.

The study doesn't have the ability to state anything about the performance of doctors vs the performance of AI, because of the issues I mentioned. That was my point.

But the study can't state anything about the sensitivity of AI either because it doesn't compare the sensitivity of AI based mammography (XRay) analysis with that of human reviewed mammography. Instead it compares AI based mammography vs human based DWI when the humans knew the results were all true positives. It's both a different task ("diagnose" vs "find a pattern to verify an existing diagnosis") and different data (XRay vs MRI).

So, I don't think the claims from the article are valid in any way. And the study seems very flawed.

Also, attempting to measure sensitivity without also measuring specificity seems doubly flawed, because there are very big tradeoffs between the two.

Increasing sensitivity while also decreasing specificity can lead to unnecessary amputations. That's a very high cost. Also, apparently studies have show that high false positive rates for breast cancer can lead to increased cancer risks because they deter future screening.

Given that I don't have access to the actual study, I have to assume I am missing something. But I don't think it's what you think I'm missing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: