Fwd: PDF - www.pnas.org

Linda M. Isbell lisbell at PSYCH.UMASS.EDU
Wed Aug 24 01:56:54 UTC 2016


 

Hi Mark and others - 

Ok I'll take stab at this... 

Yes, this is a complicated piece methodologically and statistically, but
the implications you suggest (Mark) below are generally correct.  It is
a great paper and one that I am sure psychologists especially like, but
I think there are some important caveats to keep in mind if you were to
apply in practice.  I'll describe them below and also try to elaborate
some on the methods/stats as I understand them.  

First, be careful not to draw wide-ranging conclusions about how this
might actually work in practice - that is, these were independent
judgments from a group of doctors (for whom there were meaningful
individual scores of diagnostic accuracy available) who diagnosed and
rated their confidence for each of many images for which there were
correct/known diagnoses (101 radiologists for the mammograms, 40
dermatologists for the skin images) - images were divided into different
groups for different docs so not everyone rated all of them (due to the
large number I'm sure).  No one ever communicated with anyone. 
Following each diagnosis, docs rated their confidence.  Virtual groups
were created by randomly sampling up to 1000 groups for each of the two
types of images (breast and skin) AND for each of three different doc
group sizes (2 v. 3 v. 5 doctors).  So, what they did is essentially
create a bunch of randomly selected groups of doctors  by repeatedly
sampling from their "population" of doctors/diagnoses (for these 6
"conditions" in what can be thought of as a 2 [skin v. breast images] x
3 [group size: 2 v. 3 v. 5 doctors] design).  So they created up to 6000
virtual groups (up to 1000 for each) - something I think is really cool
methodologically!   Each doctor got a sensitivity score (that is,
proportion of positive results identified as such) and a specificity
score (that is, proportion of negative results identified as such). 
Youden's index (J) is an accuracy measure that takes into account both
of these scores and is equal to (sensitivity + specificity) - 1.  The
index ranges from -1 to +1 where a score of 0 means that the proportion
of people identified with the disease is the same regardless of whether
they actually have it or not.  A 1 means the test is perfect (no false
positives or negatives).  For a pair of docs in any given group, a
change in J was computed (ΔJ), which is the difference in accuracy
between that pair of doctors.  So, basically then, when ΔJ is small,
that is when docs have similar accuracy - based on all of the cases they
judged). 

The "confidence rule" means that the doctor with the most confidence in
his/her diagnosis in any given group "wins" on a specific diagnostic
assessment - and that becomes the outcome/diagnosis for the group (and
that outcome is compared to the diagnosis of the best doctor in the
group - the one with the highest accuracy score based on all diagnoses
from all images rated).  So, regardless of group size, it turns out that
if you have a group of doctors who generally perform similarly well
across all of their diagnostic assessments, then going with the
diagnosis in any given case/image that is associated with doc who is
most confident with it will be best/most accurate.    For groups of 3 or
5 docs, if they have similar accuracy levels in general, then going with
the majority "vote" (diagnosis) is more accurate than the diagnosis of
the single best/most accurate doc in the group.  As you can see in
Figure 2 in the article, if docs aren't pretty similar in their overall
accuracy in a given group, then they are MUCH better off going with the
diagnosis of the best diagnostician in the group. 

SO that's how I read/understand all this.   The tricky part, I think,
about applying this to practice prior to more research is that these
were all independent judgments/diagnoses and accuracy scores were
computed for each doc based on a large number of images that each
evaluated.  This is how it was determined who the docs are that are
similar in accuracy to one another.  In the real world (everyday
practice), I am not sure you would actually know this - would you?  (I'm
an outsider here - a social cognition expert, not an MD or clinician). 
I am guessing you have a sense of who the good docs are who make good
judgments, but I wonder how much more info you need about their general
accuracy in order for the effects reported in this article to emerge in
real clinical practice (in a way that is CLINCIALLY significant and not
just statistically significant)?  There is a noteworthy literature in
social psychology that demonstrates that group decisions can sometimes
lead to bad outcomes and in some cases to very good ones - the trick is
to figure out what those conditions are that take you one way or the
other.  If group members can truly add some expertise/understanding to a
problem, outcomes can improve.  However, much work suggests that groups
can lead to situations in which individuals are overly influenced by
others and thinking gets kind of stuck or overly influenced by some
ideas that may well be wrong (which can lead to confirmatory hypothesis
testing around those ideas if people engage in more discussion/thought,
and may ultimately lead to premature closure either with or without the
confirmatory hypothesis testing).  Of course much of this work also has
been done with group discussions and interactions - something that is
noticeably missing in the study reported in the PNAS article (but
appropriately, they do note this in their discussion). 

Overall, it seems that in diagnostic decisions that are relatively
dichotomous (as in this article - though I also wonder how many
decisions really are quite this dichotomous??  If there are few, then
more research is needed to see what happens when their are multiple
possibilities/diagnoses/outcomes), these simple decision rules (majority
and confidence rules) could work out well and be relatively efficient IF
one actually knows the diagnostic accuracy of the group members and
knows that they are similarly good.  Personally, I see that as kind of a
big if --- because if you are wrong about this - ugh - these decision
rules lead to MORE error than if you just went with the best doc! (Again
see figure 2).  I guess this is where I wonder most about applying this
in practice.   SO at the moment at least, this research looks very
promising to me for application down the road, but more work would be
needed to get there and feel confident that the rules actually do lead
to fewer errors in practice (and not too more errors....yikes!).  Plus
that whole issue of communication between docs seems extremely important
for practice too. 

All that said, I like the paper VERY much as an important contribution
to basic research with the strong potential to one day to have applied
implications - but I don't think we are there yet.  

Very interested also in others' thoughts, 

Linda 

---

Linda M. Isbell, Ph.D.
 Professor, Psychology
 Department of Psychological and Brain Sciences
 University of Massachusetts
 135 Hicks Way -- 630 Tobin Hall
 Amherst, Massachusetts 01003
 Office Phone:  413-545-5960
 Website:  http://people.umass.edu/lisbell/ 

On 2016-08-23 12:17, graber.mark at GMAIL.COM wrote: 

> Thanks to Nick Argy for bringing this article to attention.   The methods and findings are a bit hard to follow, but if I understand things correctly, the article finds that diagnostic accuracy can be improved by second opinions or larger groups if the diagnosticians have similarly high skill levels, but that accuracy is degraded to the extent that the variability increases.  I'd really like to hear what others get out of this paper, because these findings have important implications for recommendations to move in the direction of getting more second opinions, or using the new group-based diagnosis approaches. 
> 
> Mark
> 
> -------------------------
> 
> 
> To unsubscribe from IMPROVEDX: click the following link:
> http://list.improvediagnosis.org/scripts/wa-IMPDIAG.exe?SUBED1=IMPROVEDX&A=1 
> 
> or send email to: IMPROVEDX-SIGNOFF-REQUEST at LIST.IMPROVEDIAGNOSIS.ORG
> 
> Visit the searchable archives or adjust your subscription at: http://list.improvediagnosis.org/scripts/wa-IMPDIAG.exe?INDEX 
> Moderator:David Meyers, Board Member, Society for Improving Diagnosis in Medicine
> 
> To learn more about SIDM visit:
> http://www.improvediagnosis.org/ 
> 
> -------------------------
> 
> 
> To unsubscribe from IMPROVEDX: click the following link:
> http://list.improvediagnosis.org/scripts/wa-IMPDIAG.exe?SUBED1=IMPROVEDX&A=1 
> 
> or send email to: IMPROVEDX-SIGNOFF-REQUEST at LIST.IMPROVEDIAGNOSIS.ORG
> 
> Visit the searchable archives or adjust your subscription at: http://list.improvediagnosis.org/scripts/wa-IMPDIAG.exe?INDEX 
> Moderator:David Meyers, Board Member, Society for Improving Diagnosis in Medicine
> 
> To learn more about SIDM visit:
> http://www.improvediagnosis.org/
 






Moderator: David Meyers, Board Member, Society to Improve Diagnosis in Medicine

HTML Version:
URL: <../attachments/20160823/33abe6bd/attachment.html>


More information about the Test mailing list