We are all Bayseans? The role of experience in hypothesis verification phase of diagnosis

Bruno, Michael mbruno at PENNSTATEHEALTH.PSU.EDU
Wed Jun 27 21:42:16 UTC 2018

Excellent points & eloquent discussion, Dr. Brush!  Thanks!!

I think that we often err in our attempts at using Baysean reasoning precisely because we fall prey to this "spectrum bias," in essence, we are simply mis-estimating the pretest (prior) probabilities. It's not the math that fails, it is us.

And the problem is extremely common with regard to radiology tests--perhaps because the images are often so beautiful (at least I think they are), so we tend to "believe our own eyes," and over-estimate imaging's diagnostic power--in our minds grossly underestimating the actual false-negative rate.

We all WANT to be Bayseans, at least in spirit, but, as the scripture says 'the spirit is willing, but the flesh is weak!'

All the best,


Michael A. Bruno, M.D., M.S., F.A.C.R.

Professor of Radiology & Medicine

Vice Chair for Radiology Quality & Safety

Chief, Division of Emergency Radiology

Penn State Hershey - Milton S. Hershey Medical Center

500  University Drive  |  Hershey, PA

email: mbruno at pennstatehealth.psu.edu<mailto:mbruno at pennstatehealth.psu.edu>

From: John Brush <jebrush at ME.COM>
Sent: Wednesday, June 27, 2018 7:29 AM
Subject: Re: [IMPROVEDX] [No SPF Record] Re: [IMPROVEDX] The role of experience in hypothesis verification phase of diagnosis

I think this is why physicians almost all choose to use subjective probabilities, rather than pulling out a nomogram or calculator to compute post-test probabilities. They intuitively take into account the probability and relative strength of a test result, but also the potential consequences of making/missing a particular diagnosis. Still, I think that we can calibrate our intuition with some simple numbers derived from clinical epidemiology that will help us avoid common fallacies. We can’t just wing it. We need to apply the science of medicine to individual patients and there are ways to do that.

It is important to remember two potential pitfalls: base-rate neglect and spectrum bias. It would be foolish to test for pheochromocytoma every time you see a patient with newly diagnosed hypertension. Persistence and follow up allows recognition of patients with refractory hypertension, weight loss, orthostatic hypotension, tachycardia, and other clues where pheochromocytoma may be a plausible diagnosis, but screening up front would be a false-positive-test generating strategy.
Spectrum bias occurs when you define the sensitivity and specificity of a test in one setting and use the test in a different setting with a lower prevalence of disease. Most people think that the operating characteristics of a test are fixed characteristics of the test itself, but as Alvin Feinstein showed a long time ago, they are not. If you subsequently use a test in a setting where the prevalence of disease is lower, your false positive rate goes up, decimating your specificity. Your positive likelihood ratio (true positive rate/false positive rate, or hit rate/miss rate) plummets. It’s no wonder that we don’t believe troponin levels anymore. They are sent off on everyone who comes through the door, regardless of pre-test probability.
There is a science to the art of medicine.

John E. Brush, Jr., M.D., FACC
Professor of Medicine
Eastern Virginia Medical School
Sentara Cardiology Specialists
844 Kempsville Road, Suite 204
Norfolk, VA 23502
Cell: 757-477-1990
jebrush at me.com<mailto:jebrush at me.com>

On Jun 26, 2018, at 4:59 PM, Rory Jaffe <rjaffe at CHPSO.ORG<mailto:rjaffe at chpso.org>> wrote:

This discussion illustrates the general weakness of simple mathematical models for diagnosis. Prioritization is not just on raw likelihood (see Peter’s excellent discussion)—time and severity play major factors, as well as reversibility of changes not caught beforehand (e.g., the implications of an abdominal aortic aneurysm that may rupture vs a herniated disc protruding that may threaten to press on the cauda equina vs poor lifting habits causing chronic muscle injuries). And common sense needs to be used, as no model can be comprehensive enough to always produce a reasonable prior probability, though I would hope that sex-specific diagnoses would—but even there, you could be fooled if the patient’s current identified sex (e.g., female) is different than the one she was born with—she may still get prostate cancer.

Models are still useful if sophisticated enough to include these considerations, and statistical analysis of a patient’s probabilities is still useful, but comparing simple odds, whether using Bayesian or frequentist methods, is not.


From: Sittig, Dean F <Dean.F.Sittig at UTH.TMC.EDU<mailto:Dean.F.Sittig at uth.tmc.edu>>
Sent: Tuesday, June 26, 2018 11:06 AM
To: IMPROVEDX at LIST.IMPROVEDIAGNOSIS.ORG<mailto:IMPROVEDX at list.improvediagnosis.org>
Subject: [No SPF Record] Re: [IMPROVEDX] The role of experience in hypothesis verification phase of diagnosis

The power of the Bayesian method is in helping put the evidence into context. If we would take your example say of a positive pregnancy test which I would venture carries a likelihood ratio of 10 or more. If your patient and the purported source of the sample is a male then the a priori probability of a positive pregnancy test is zero. Therefore using bayes’ formula one should treat the positive pregnancy result as most likely an error rather than the first case of male pregnancy in the history of mankind.

Sent from my iPhone

On Jun 26, 2018, at 10:34 AM, Jain, Bimal P.,M.D. <BJAIN at PARTNERS.ORG<mailto:BJAIN at PARTNERS.ORG>> wrote:
Thanks Dr. Elias for your comments.

I agree in general with your description of the diagnostic process in an office setting.
The point I am making in my hypothesis verification paper is that if a highly informative test result with likelihood ratio greater  than 10 is observed in a patient, then the definitive diagnosis of a disease in this patient is validated by our experience of the accuracy of this diagnosis in practically every patient seen by us regardless of prior probability.

For example, if we suspect hypothyroidism  in a patient with fatigue seen in office and find elevated TSH, the definitive diagnosis of hypothyroidism would be validated by our experience of the accuracy of this diagnosis in other patients regardless of prior probability seen by us in the past.

I would like to point out the correct technical meaning of the term ‘likely’ employed in your opening sentence by substituting ‘disease’ for ‘hypothesis’ and ‘test result’ for evidence.
The likelihood of a disease given a test result is proportional to the probability of a test result given the disease.

For example, the likelihood of acute MI given acute EKG changes is proportional to the probability of acute EKG changes given acute MI.

Thus it is customary to speak of how likely acute MI is, given acute EKG changes.

As far as I know, the term likely or likelihood is not used to refer to frequency of a test result given a disease.

The correct term would be probability of test result given a disease.

In Bayesian analysis, we are assessing probability of a disease given the test result.
The likelihood of a disease given the test result is known to us which is employed as part of likelihood ratio in Bayesian analysis.


From: Elias Peter [mailto:pheski69 at GMAIL.COM]
Sent: Wednesday, June 20, 2018 9:07 PM
Subject: Re: [IMPROVEDX] The role of experience in hypothesis verification phase of diagnosis

        External Email - Use Caution

I have several comments to make.

First, I see Bayesian assessments as telling us how likely it would be for that evidence given a hypothesis, not how likely the hypothesis is given the evidence. A recent article offered a wonderful analogy I wish I had heard years ago: given a dog (test result) we can be pretty certain there are four legs (the evidence), but given four legs (the evidence) we need much more information to know if we are dealing with a dog, camel, or turtle.

Second, I think it is a mistake - or at least, too narrow a framing - to see diagnosis as an event rather than a process. This may reflect my 40 years in primary care, as I have noticed during my career that clinicians who work in settings like intensive care units or acute trauma centers have a very different process when caring for undiagnosed patients. In primary care (and in many specialties that deal mostly with chronic illness) it is often that one is best and most efficient making a diagnosis over time rather than during a single encounter. In this setting, it is unusual to be in the position of having a patient with a clinical snapshot (I don’t say picture, because it really IS a snapshot, obtained relatively quickly and in a limited context) and then having to make a diagnosis based on a test result.

Third, at least in primary care, a very substantial number of diagnoses are made by a clinical picture over time (natural history). We don’t have a diagnostic ‘test’ for anxiety, depression, most of the causes of low back pain, most of the causes of headache, most of the causes of fatigue…   In these settings the process is something like this:

•         Are there any things I can’t miss, right now, in this visit, without putting the patient at immediate risk?
o    If so, what can I do to determine their presence or absence.
o    If not, how do I remember this list and refer back to it if the picture changes?
•         What are the most likely causes of what I am seeing and hearing?
•         Are there any likely causes of what I am seeing and hearing that I can easily and efficiently prove or disprove?
•         Have a conversation with the patient about the diagnostic possibilities, the degree of certainty and uncertainty.
•         Ascertain what part of the clinical picture the patient is most concerned about. (Some patients want a diagnosis and some want treatment and some want both, depending on the setting.)
•         Given that we have a collection of possibilities of varying severity and frequency and likelihood and we know what the patient’s preferences are (because we asked her and listened), what is a reasonable approach to managing the problem, including but not limited to:

o    Work on diagnosis, hold off on treatment?
o    Work on diagnosis, treat symptoms?
o    Treat symptoms and observe the course. (Here natural history is ‘the test’ we are using, but it doesn’t have a ‘result’ in the sense of Q waves or blood sugar or potassium.)
o    Not treat symptoms and observe the course.
o    Trial and error - treat something and see if it works.

Very few of my patients with back pain or headaches or fever have any ‘tests’ done. In primary care, depending on the patient context, fatigue may be best diagnosed by history and a brief exam - though there are settings where tests are clearly essential. I doubt that more than 2 patients a day needed a ’test’ in the sense that is being discussed in these threads. (My daily volume was 18-20 on a bad day, 16 on a good day.)

My point here is that framing the diagnostic process around how one interprets a test result considers a very limited piece of the diagnostic universe. It is important when it is germane - when I present to an ED poorly responsive and hypotensive, I want the test results to be properly and quickly evaluated. However, from my PCP, patient, and caregiver perspectives I think this is a tiny part of the diagnostic universe and not easily generalized across the broad landscape of medicine. I am much more interested in and concerned about ways to improve the diagnostic process in the 90% (or more) of circumstances where the results of ‘a test’ are unlikely to be definitive. Of course, that reflects my 40 years in the primary care front line where ambiguity is part of the air we breathe.


On 2018.06.20, at 12:46 PM, Jain, Bimal P.,M.D. <BJAIN at PARTNERS.ORG<mailto:BJAIN at PARTNERS.ORG>> wrote:

In this attached paper, I discuss that experience plays an important role in validating the verification of a diagnostic hypothesis by a test result. As our experience is gained from a heterogenous population of patients with varying prior probabilities, this validation is represented by a confidence and not by a Bayesian argument.

Please review and comment on this paper.


Bimal P Jain MD
Northshore Medical Center
Lynn MA 01904.



To unsubscribe from IMPROVEDX: click the following link:


Visit the searchable archives or adjust your subscription at: http://list.improvediagnosis.org/scripts/wa-IMPDIAG.exe?INDEX<https://urldefense.proofpoint.com/v2/url?u=http-3A__list.improvediagnosis.org_scripts_wa-2DIMPDIAG.exe-3FINDEX&d=DwMGaQ&c=6vgNTiRn9_pqCD9hKx9JgXN1VapJQ8JVoF8oWH1AgfQ&r=sPJ6cvqFFdxFqFrb97KoyYk0NuBfdKf4oSNscarc_mU&m=E6WpLIP2dL0CaUhGmpXjTpCsmMZvOj9-vXjvEpuxOMI&s=XUgWXpPJsHcEMC7AK4gC-esFzNMRYo6fPvzWSB4kZeY&e=>

Moderator:David Meyers, Board Member, Society for Improving Diagnosis in Medicine

To learn more about SIDM visit:

<The role of experience in the hypothesis verification phase of diagnosis.pdf>

Moderator: David Meyers, Board Member, Society to Improve Diagnosis in Medicine

HTML Version:
URL: <../attachments/20180627/a903a11b/attachment.html>

More information about the Test mailing list