Clinical Prediction Rules: a little primer

Clinical Prediction

 How do we understand if a patient is likely to:

 

a)      have the condition/dysfunction we think they have?
b)      respond in a meaningful and positive way to a chosen intervention?
c)       get better within a particular time-period?

 

Clinical Prediction Rules (CPR)

Algorithmic tools developed to assist clinical decision-making by interpreting data from original studies into probabilistic statistics.

 

  • Diagnostic CPR (DCPR)

 

  • Interventional CPR (ICPR)

 

  • Prognostic CPR (PCPR)

 

CPR Quality

 

Grading Scale  IV (poor) to  I (best) (McGinn et al, JAMA,2000;284:79-84)

 

IV     Derivation only  

 III      Validation in narrow population                                 

II       Validation in broad population                                  

I         Impact Analysis

 

 

Derivation

  • Data derived from primary study (ideally RCTs, but usually prospective cohort studies).
  • Need to clearly define target conditions, reference standards, and potential predictor variables.
  • Need dichotomous outcomes (e.g. “condition present or absent” / “treatment successful of unsuccessful” / “condition persistent or not-persistent”).
  • Study needs to report PROPORTIONS.

 

Validation

  • Separate study with either a narrow of broad population (ideally RCT, ideally block-randomisation).
  • Separate study subjects and therapists to primary study.
  • To confirm that predictor variable effects are not due to chance.

 

Impact analysis

  • To determine meaningful impact on clinical practice (ideally RCT)
  • Multi-centre
  • Is rule implemented?
  • Does it maintain improved care?

 

Formal quality assessment

DCPR: no validated formal assessment tool

ICPR: 18-item tool.  Beneciuk et al, Phys Ther,2009; 89:10-11

PCPR: 18-item tool. Kuijpers et al, Pain, 2005;109:429-430

 

Statistics of interest

The whole world can be represented by a “2 x 2” contingency table!

  Ref Standard P/Outcome P Ref Standard N/Outcome N  
Test P / Control Group a                 TP b                 FP a+b  TP+FP
Test N / Rx Group c                 FN d                 TN c+d  FN+TN
  a+c       TP + FN b+d         TN+FP  

 

 

 

Diagnosis / Intervention Intervention only
SENSITIVITY (“TP rate”) = a/(a+c)    SnNOut

SPECIFICITY (“TN rate”)= d/(b+d)     SpPIn

LIKELIHOOD RATIO +  =  sensitivity/(1-specificity)

LIKELIHOOD RATIO –  =  (1-sensitivity)/specificity

 

 

 

Probability shifts:

 

LR+

1 – 2: small, unimportant

2-5: small but possibly important

5-10: moderate

>10: large, possibly conclusive

 

LR-

0.5 – 1: small, unimportant

0.2 – 0.5: small but possibly important

0.1 – 0.2: moderate

>0.1: large, possibly conclusive

 

CONTROL EVENT RATE (CER) number of Control Group people with +ve outcome divided by total number of Control Group people. In other words: i.e.: a/(a+b)

 

EXPERIMENTAL EVENT RATE (EER) = same as above for Rx Group c/(c+d)

 

RELATIVE RISK, or RISK RATIO (RR): RR = EER/CER (a RR of 1 means there is no difference between groups; >1 means increased rate of outcome in Rx group, and <1 means less chance of outcome)

 

ABSOULTE RATE REDUCTION (ARR): ARR = CER – EER

 

RELATIVE RISK/RATE REDUCTION (or increase!) (RRR): RRR = (CER-EER)/CER

 

NUMBER NEEDED TO TREAT (NNT): NNT = 1/ARR

 

Ratios:

EXPERIMENTAL EVENT ODDS (EEO): c/d

 

CONTROL EVENT ODDS (CEO): a/b

 

ODDS RATIO (OR): EEO/CEO

 

(The greater above 1, the better)

 

 

EFFECT SIZE = (mean score of group 1) – (mean score of group 2)

SD (of either group, or even pooled data)

 

Other tests:

Test / Statistic Purpose (outcome statistic)
T-test Difference between 2 groups (p-value)

 

Chi-Squared  Frequency of observations (p-value)

 

Receiver Operating Characteristic (ROC) curve Identifies score which maximises TPs and minimises FNs (Youden’s J)

 

Logistic Regression Identifies predictor cut-off points, and predictor clusters (Beta-values (Expβ))

 

Recursive Partitioning Repeated sub-group analysis to identify best-fit patients (index of diversity)

 

Confidence Intervals Describe precision (variance) (%)

 

Examples of Lower Quadrant CPRs

 

Diagnosis (Medical Screening – DCPR)

 

Target Condition:  Deep vein thrombosis (lower limb).

 

Test:  Wells’ Score (Wells et al, J Intern Med, 1998;243:15-23)

 

Quality: Level  I (impact analysis: 10 relevant acceptable quality associated studies)

 

Test details (predictor variables)

 

  1. Activecancer                                                                                                                1
  2. Paralysis, paresis, or recent plaster immobilisation of the lower extremity            1
  3. Recently bedridden for >3 days and/or major surgery with 4 weeks                         1
  4. Localised tenderness along the distribution of the deep venous system                                1
  5. Thigh and calf swollen                                                                                                                    1
  6. Calf swelling 3cm > asymptomatic side (measured 10cm below tibial tuberosity)                1
  7. Pitting oedema; symptomatic leg only                                                                                    1
  8. Dilated superficial veins (non-varicose) in symptomatic leg only                                 1
  9. Alternative diagnosis as or more likely than DVT                                                                             -2

 

Test scoring

≤ 0 points            Low Risk                               6% probability of DVT

1 or 2 points       Moderate Risk                  28% probability of DVT

≥ 3                          High risk                               73% probability of DVT

 

Reference standard(s): Plethysmography and venography

 

Study parameters

Inclusion:

Signs and symptoms for < 60 days

 

Exclusion:

Previous DVT of PE

Renal insufficiency

PE suspected

Pregnancy

Anticoagulation treatment for >48 hours

Below-knee amputation

Strong alternative diagnosis

 

 

Bottom Line

Best quality CPR (Level I): Recommended for clinical use within confines of study parameters.

 

 

 

 

 

 

 

Diagnosis (Orthopaedic Diagnosis – DCPR)

 

Target condition: Lumbar/buttock/leg pain arising from the sacroiliac joint

 

Test: 6-item predictor cluster based on physical examination responses (Laslett et al, Man Ther, 2005;10:207-218)

 

Quality: Level IV (derivation only (poor quality), no validation, no impact analysis); no regression analysis, no recursive partitioning.

 

Test details (predictor variables)

 

  1. Positive SIJ compression test
  2. Positive SIJ distraction test
  3. Positive femoral shear test
  4. Positive sacral provocation
  5. Positive right Gaenslen’s test
  6. Positive left Gaenslen’s test

 

Test scoring

3 or more predictor variable present =LR+ 4.3 (95%CI 2.3 – 8.6)

 

Reference standard(s): fluoroscopy-guided provocative SIJ injection

 

Study parameters

Mean age 42 (+/- 12.3)

Mean symptom duration (months) 31.8 (=/- 38.8)

 

Inclusion:

Buttock pain +/- lumbar/leg pain

Had imaging

Unsuccessful previous therapeutic interventions

 

Excludes:

Mid-line or symmetrical pain above L5

Nerve root compression signs

Referred for non-SIJ injection

Too frail for manual therapy

Pain free on day of assessment

Bony obstruction to injection

 

Bottom Line:

Not validated –  study findings could be due to chance. Small probability shift power of LR+. Not recommended for clinical use.
Interventional (ICPR) 

 

Target condition: Acute low back pain, manipulation

 

Test: 5-item predictor cluster (Flynn et al, Spine, 2002;27:2835-2843)

 

Quality: Level II (broad validation, no impact analysis. 4 associated high quality validation studies)

 

Test details (Predictor variables)

  1. No pain below knee
  2. Onset ≤ 16 days ago
  3. Lumbar hypomobility
  4. Medial hip rotation > 35deg (either hip)
  5. Fear Avoidance Belief Questionnaire (Work subscale) <19

 

 

Test Scoring

4 or more predictor variables present = LR+ 24.4 (95% CI 4.6 – 139.4)

 

Reference Standard(s) (i.e. definition of success) 

50% or more improvement on modified Oswestry Disability Index

 

Study parameters

Mean age 37.6 (+/- 10.6)

 

Inclusion:

Lumbosacral physiotherapy diagnosis

Pain +/- numbness lumbar/buttock/lower extremity

Modified ODI score ≥ 30%

 

Bottom Line

Due to validation and very large probability shift power of LR+, this CPR is recommended for clinical

use within confines of study parameters.

 

Prognostic (PCPR)

 

Target condition: LBP, recovery

 

Test: 3-item predictor cluster (Hancock et al, Eur J Pain, 2009;13:51-55)

 

Quality: Level III (narrow validation, no impact analysis)

 

Test details (Predictor variables)

  1. Baseline pain ≤ 7/10
  2. Duration of symptoms  ≤5 days
  3. Number of previous episodes ≤1

 

Test Scoring

All 3 predictor variables present = 60% chance recovery at 3 weeks, 95% chance recovery at 12

weeks.

 

Reference Standard(s) (i.e. definition of success)

7 days of  pain scored at 0-1 /10

 

Study parameters

Mean age 40.7 (+/- 15.6)

 

Inclusion:

LBP (between 12th rib and buttock crease) +/- leg pain

Seen GP

< 6 weeks duration

Moderate pain and disability (SF-36)

 

Exclusion:

No pain-free period pre-current episode of at least 1 month

Known/suspected serious pathology

Nerve root compression

Taking NSAIDs

Receiving spinal manipulative therapy

Surgery within preceding 6 months

Contraindications to analgesics or manipulation.

 

 

Bottom Line

Due to validation, careful application to clinical practice is recommended, strongly within confines of study parameters.

 

 

 

 

Ref: Glynn PE, Weisbach PC 2011 Clinical Prediction Rules: A Physical Therapy Reference Manual. JBL Publishing

Leave a comment

Filed under Uncategorized

How to turn “Stats” into something useful: Diagnosis and Interventions

How to turn “Stats” into something useful 1: Diagnosis

Understanding diagnostic utility

If you have data for diagnostic utility studies, you can use a 2×2 contingency table to calculate the following information (this should have been reported anyway in the study, but often isn’t):

 

     Gold standard
Clinical test +
+ aTP bFP
cFN dTN

 

 

 

Sensitivity (“TP rate”) = a/(a+c)

Specificity (“TN rate”)= d/(b+d)

Likelihood ratio + = sensitivity/(1-specificity)

Likelihood ratio – = (1-sensitivity)/specificity

You can then use something like a nomogram to calculat post-test probability. You will need to have an estimate of pre-test probability. Ideally, this will be the known prevalence of the condition

Or, get yourself on app on your ‘phone like MedCalc3000 https://itunes.apple.com/us/app/medcalc-3000-ebm-stats/id358054337?mt=8

How to turn “Stats” into something useful 2: Interventions  

If a trial or systematic review is reporting DICHOTOMOUS outcomes, we can bring the “research” findings a little bit closer to clinical decision making… “Do you know HOW MANY subjects responded, and HOW they responded? e.g.  how many people in the TREATMENT group got better/worse, and the same for the control/placebo group?”

NO: then you can’t clinically apply findings. Doh.

 

YES: then go and do some evidence based decision making! Yipee.

 

 

 

Wow, how do we do that? Like this: 1)   Use a 2×2 table (again)

Outcome
+ve -ve
Control/Placebo group  a  b
Rx group  c  d

 

 

 

2)   And some simple formulae…

CONTROL EVENT RATE (CER) number of Control Group people with +ve outcome divided by total number of Control Group people. In other words: i.e.: a/(a+b)

EXPERIMENTAL EVENT RATE (EER) = same as above for Rx Group c/(c+d) Now that we know the CER and EER, we can do loads of other useful things…

RELATIVE RISK, or RISK RATIO (RR): RR = EER/CER (a RR of 1 means there is no difference between groups; >1 means increased rate of outcome in Rx group, and <1 means less chance of outcome)

ABSOULTE RATE REDUCTION (ARR): ARR = CER – EER

RELATIVE RISK/RATE REDUCTION (or increase!) (RRR): RRR = (CER-EER)/CER

NUMBER NEEDED TO TREAT (NNT): NNT = 1/ARR Some other stats more USEFUL than “p-values”…

 

1)   THINK LIKE A BOOKIE..!

“What are the odds of getting this person better with this treatment?”

EXPERIMENTAL EVENT ODDS (EEO): c/d CONTROL EVENT ODDS (CEO): a/b ODDS RATIO (OR): EEO/CEO

The greater above 1, the better.

2)   EFFECT SIZE.

This is a standardised, scale-free measure of the relative size of the effect of an intervention. It is particularly useful for quantifying effects measured on unfamiliar or arbitrary scales and for comparing the relative sizes of effects from different studies.

EFFECT SIZE = (mean score of group 1) – (mean score of group 2) / SD (of either group, or even pooled data)

 

EXAMPLE: A study into effects of manual therapy on neck pain measured a Rx group (n=23) and a Control group (n=21), and considered a “cut-off” point for improved ROM as being an increase in at least 20deg rotation (so results can be dichotomised). Mean Rx score = 28deg (SD 4) Mean Control score = 16deg (SD 4) Results were:

 

Outcome
+ve -ve Total
Control/Placebo group a9 b12 a+b21
Rx group c18 d5 c+d23
Total a+c       27 b+d17 a+b+c+d44

CER a/(a+b): 0.43 or 43%

EER c/(c+d): 0.78 or 78%

RR = EER/CER: 1.83

ARR = CER – EER: -0.35 or 35%

RRR = (CER-EER)/CER: -0.83 or 83% (a minus figure, so this would be RR Increase)

NNT = 1/ARR: 2.86, so say 3.

EEO: c/d = 3.6 CEO: a/b = 0.75

OR: EOR/CEO = 4.8

EFFECT SIZE =  28– 16   = 3 4

 

The clinical  story then…

“So, if untreated, my patient would have a 43% chance of getting better anyway.  But if treated, his chance of improvement would be 78%.  He is 1.83 times more likely to improve if I treat him. The absolute benefit of being treated would be 35%.  The treatment would increase the chance of improvement by 83%. I would need to treat 3 people (in the period of time relevant to the study) to achieve 1 positive outcome. The odds of him getting better with no treatment are 0.75, whereas if I treat him, the odds are much better, at 3.6. the odds are 3.6:0.75 (i.e. 4.8) of him improving with treatment. “

However… From a clinical reasoning perspective, we still need to understand what “43%, 78%, 35%, etc etc” MEANS… and that’s where the real fun starts:-)

Here’s a little video for y’all: http://www.youtube.com/watch?v=tsk788hW2Ms

Leave a comment

Filed under Uncategorized

Should cervical manipulations be abandoned?

TRANSFER FROM POSTEROUS: ORIGINALLY POSTED ON POSTEROUS JUNE 8, 2012

This is really for physiotherapists, but please feel free to read whoever you are.

The Chartered Society of Physiotherapy (CSP) today reported on a British Medical Journal (BMJ) article by Wand et al about the practice of cervical manipulations.

The news report can be found here, and the original article can be found here.

There is a formal rapid response to the paper found here, which is from a group of physiotherapists.
This brief blog represents my opinions and not necessarily those of the co-authors of the response.

Wand et al conclude with a proposition that “manipulation of the cervical spine should be abandoned”, and call on professional bodies to adopt this stance as a formal policy. This is a strong conclusion which you would hope would be based on firm and conclusive data.  However, this is not the case. The data on risk, by self-admission of the BMJ paper authors, is inconsistent and provides nothing but uncertainty. The best of the data would suggest that adverse events to manipulation is extraordinarily low. The benefits of the intervention are, at worst, comparable with alternatives. The maths is simple at this level. Extremely low risk versus likely benefit. If the logic used to argue for the abandonment of manipulation in the BMJ paper was taken seriously (which is very unlikely), then most interventions in medicine and allied therapies would be scrapped. Take, for example oral contraceptives: risk of thrombosis (much higher than manipulation risks) versus comparable benefit.  Let alone non-medical interventions, like hotel hair-dryers, coffee, pencils, and tampons.

Anyway, these are old and tired arguments and not really central to the issues of the current paper and news story. This isn’t about manipulation. Personally, I have no interested in manipulations per se other than wondering, like anything health professionals do, whether it is of likely benefit for patients.  The concept of manipulations is, however, for some bizarre reason, the source of disproportionate concern and emotion.  And it seems it is this that continues to drive the ‘manipulations’ debate. The evidence is merely (mis)used as a tool to support some fixed, pre-established view-point. The ‘Head to Head’ stand-off in today’s BMJ was contrived and non-informative to all involved.  In some ways, the authors were victims of some mis-guided editorial whim.

The real issues at hand though are deeper. This is about the fervent, over-enthusiasm to drag something meaningful out of meaningless data in attempts to appear scientific and contribute to evidence-based practice, whilst paradoxically being pseudo-scientific and missing the point of evidence-based practice.  The potential harmful effects on associated professions and their patients are irreparable.  Physiotherapy is slowly but surely becoming stripped of all its worth.  A hundred-plus years of development and progress based on logic, intellect and science is undergoing painful erosion.  Interventions once the realm of the profession are being thrown out and taken up by others; exercise prescription, manual therapy, electro-therapy. We are becoming experts in handing out advice pamphlets. This is NOT because the interventions are ineffective. It is because they are deemed to have questionable efficacy based on sorry scraps of ‘evidence’ salvaged to adhere to the rhetoric of evidence-based practice.  Remember, most research findings are false (J.P.A. Ioannidid PLoSMed. 2, el124;2005), and this is more-so the case in physiotherapy (R. Kerry et al J Eval Clin Prac).

Very rarely is data considered in a scientific manner: in context of a priori beliefs; in context of the professional background for which it is intended; in the context of the dynamism of scientific discovery; in context of what we understand by cause and effect; in context of individual patients. Today’s claim of “abandoning manipulations” is simply another ludicrous reminder of the state we are in.  The CSP’s reporting is another example of a body blind to the deterioration of its very own profession.

POSTEROUS VIEWS: 670

POSTEROUS COMMENTS

Kevin Kelly responded:
I am in total agreement. It is unfortunate that the drugs industry is a multi-billion pound industry with tremendous lobbying power….any treatment that reduces the amount of drugs prescribed and helps patients should be used not criticised.The cherry-picking of poor quality research needlessly raises alarm in patients and does little to help the people suffering from neck pain and headaches to choose the most appropriate treatment.Neck manipulation has been shown to be safe and effective and benefits thousands of people suffering from neck pain and headaches. In fact, the risk of a stroke after treatment is the same whether you see a GP and get a prescription or see a chiropractor and get your neck adjusted.http://www.ncbi.nlm.nih.gov/pubmed/18204390

Manipulation of the neck is at least as effective as other medical treatments and is safer than many of the drugs used to treat similar conditions.http://www.ncbi.nlm.nih.gov/pubmed/17258728

11 months agoreid_heather (Twitter) responded:
Here hereAs a lecturer and teacher of spinal manipulation for 20years one can only assume that such articles are fuelled by ignorance.manipulation is a highly effective method of alleviating pain whose cause or source lies within the cervical spine. It is a skill we as physiotherapists should embrace rather than through fear discard. Whilst it has been shown that manipulation of the thoracic spine for some patients with cervical symptoms can be effective one must note the SOME. If we don’t continue to use these skills we will lose them. I certainly have not been a member of this profession for over 30 years to become reduced to leaflet giving.

I don’t usually have a rant about such matters but having just revised the history of medicine with my 16 year old daughter it seems that myths about health can be so powerful and disabling that it can prevent further progress. Whilst I do not suggest we reconsider Hippocrates take on the four humours and the balance of opposites he did give us a philosophical base to progress,whilst others thought illness was due to the anger of the Gods or evil spirits.

Benefits can be risky, however physiotherapists are also highly qualified in assessing the risk, and would only act when the risk is absolutely minimal.

11 months agoTaylorAlanJ (Twitter) responded:
So it all becomes clear! The BMJ have just made the web pages ‘pay per view’ . . . So it truly was a cynical marketing ploy that some folks fell for . . . Hook line and sinker!But where were the CSP when we needed them? . . . Sucking it all in and regurgitating it for good measure. None of this is helpful to a dying profession!It is ALL cumulative . . . In the last 2 months G.P’s have been given the impression that Whiplash injury doesn’t exist and now manipulation is harmful . . .

Who is standing up??

I have published my views in more detail onhttp://alteredhaemodynamics.blogspot.co.uk/

AJT

11 months agoHoward Turner responded:
Roger: beautifully put and a grave concern; there is no way to reverse this agenda if it goes too far. Has it gone too far already?My jubilee was a very wet tent in Wales – excellent thank you!
11 months agoRoger Kerry responded:
Hi Howard
Thanks for your supportive comments. Has it gone too far already? I think we are starting to see a generation of ‘de-skilled’ practioners coming through. If there was overwhelmeing evidence to confidently refute skill-based interventions, this would be fine, but there’s not.Hope your tent has dried out – Stick to leafy Cheshire!
10 months agoSteve Robson responded:
Roger makes many good points here about the erosion of skills within physiotherapy. This discourse regarding the safety of manipulation has come and gone many times over the years without resolution. But are we overlooking some fundamental issues here? Before those interested parties within physiotherapy once again ‘lock horns’ in an intra and inter professional struggle to retain their status as manipulators, what is it we as a profession feel is so valuable about manipulation?In circumstances of patient centred treatment and evidence based medicine, any treatment or mode of clininical management logically has to engage with the biopsychosocial status of each patient.
In terms of clinical reasoning, isn’t it essential that we attempt to answer some fundamental questions central to the use of manipulation? After all, decisions based on the use of manipulation as a technique come before an estimation of its safety is even necessary. From this perspective;(1) Essentially what is manipulation? (a definition if you like).
(2) Why are we using it, in other words, what do we hope to achieve by using manipulation?
(3) What are the mechanisms of manipulation, or quite simply, how does it work?

The answers to the above should drive the clinical reasoning process, and as such, what is best for the patient. Without this information evdence based clinical reasoning is not possible.
I would be genuinely interested to hear answers to the three questions above from those of us in physiotherapy who use manipulation.

Essentially, without this information the whole discussion regarding the use of manipulation at all is null and void.

10 months agoRoger Kerry responded:
Thanks for these comments Steve.
I agree that there are fundamental questions to be asked about all aspects of practice. Perhaps reducing the discussion to this level will help in understanding precisely what we are aiming for in clinical practice and research. Did you used to do a MT course exploring these issues?
10 months agoAlan responded:
I am a physiotherapist myself and it seems more and more that in order to continue with manual therapy in Britain I should have qualified in osteopathy or other related profession.It is no wonder that most of the private practice jobs in this country are being filled with our colleagues from the southern hemisphere, renowned for their manual therapy skills.I worry that the UK trained physiotherapist will be having more of an identity crisis in the years ahead!

Leave a comment

Filed under Uncategorized

Scientific Error and the Real-World

UPDATE 12th August 2013: The paper underpinning this blog has just been published proper. Here is the pdf if you are interested:

JECP -Truth paper 2013 – FINAL

 

TRANSFER FROM POSTEROUS: ORIGINALLY POSTED ON POSTEROUS MAY 10, 2012

IMPORTANT NOTICE: This blog relates to two academic papers published today. One is a paper on which I am the lead author. These comments are entirely my own and do not necessarily reflect the thoughts and opinions of my wonderful co-authors or the University of Nottingham.

There is a fascinating emerging phenomenon in the field of science: that science might be prone to systematic error. At least, there seems to be more attention to this of late. Scientists have always been aware of the potential for error in their methods.  However, today is special in relation to this subject (for me anyway) because first, there is an enlightening World View article published in Nature on this matter by Danial Sarewitz (D.Sarawitz Nature 485, 149), and second, a small team from the University of Nottingham have had published a research paper on this very subject (R. Kerry et al J Eval Clin Prac) (pdf above).

Sarewitz nudges towards a dimension of this phenomenon which is of utmost interest to me and my field of science, health science. That is that the systematic error observed in scientific method seems to be revealed only, or at least best, when the science is placed in the context of the real-world.  In health science we work in a framework known as evidence-based practice (EBP), and this is a living, object example of what Sarewitz is referring to.  EBP is solely concerned with the integration of scientific findings from rigorous research processes into the shop-floor decision-making of health care professionals.  So is scientific error witnessed in EBP? If so, how does that effect patient care?  These are big questions, but here are some thoughts on how their answers might be informed.

First, what does the state of health science look like with regard to this error.  John Ioannidis mid-noughties high-profile reports on the phenomena e.g.  ‘Why Most Published Research Findings are False’ (J.P.A. Ioannidid PLoSMed. 2, el124;2005) caused turbulence in the field of health science, mostly medicine. He provided evidence of systematic bias in ‘gold-standard’ scientific practices.  Our paper published today supports these findings: gold-standard research methods are not reliable truthmakers.  But this is only the case when ‘truth’ is defined as something outside of the methodological frameworks. We tried to find a definition which was as real-world as possible, yet as tightly related to the scientific methods as possible, i.e. conclusions from systematic reviews or clinical guidelines. Right up to this point, the science looks good: tight control for bias, well-powered, apparently externally valid. However, the moment you step-out of the science itself, things look very different. We found that in-fact there was no reliably meaningful indication of truthfulness of a single controlled trial by its internal markers of bias control. So although a trial looks great for internal validity, this quality does not translate to the out-side world.  We are not the first to question the value of markers of bias control for real-world applicability.

Sarewitz states: “Researchers seek to reduce bias through tightly controlled experimental investigations. In doing so, however, they are moving farther away from the real world complexity in which scientific results must be applied to solve problems”. Voilà. The paradox is clear: the tighter trials get for controlling for bias, the less relevance they have to real-world decision making. Sarewitz also suggested that if biases were random, multiple studies ought to converge on truth. Our findings showed that in the trials examined, throughout time (and given that more recent trials tended to be the higher quality ones), study outcomes tended to diverge from the truth. So, the most recent and highest quality trials were the worst predictors of truth.

There are strong scientific, professional, educational and political drivers surrounding this issue: funders base their decisions of proposals that show greatest rigor; health scientists get better at constructing trials which are more likely to establish causal relationships (i.e. control better for bias); journals insist on trial adherence to standards of bias control; scientific panels for conferences examine abstracts for bias control; students are taught about evidential hierarchies and why highly controlled studies sit at the peak of health science; health care commissioners seek to make purchasing decisions on sight of the highest-quality evidence.

However, all may not seem so bleak. There are a couple of movements which initially appear to offer some light. First, the attempts by researchers in making their trials more representative of the real-world. For example, the interventions under investigation being more relevant to common practice and mechanistic principles; the trial sample being more representative of the population; outcomes being more meaningful. Second, Universities and funders are becoming more concerned with ‘knowledge transfer’ methods. The idea being to seek ways to get the great internally valid research findings into real-world practice.  It is clear that these two strategies are missing the point. More representative trials are still going to be confined to internal constraints optimising internal validity. If not, their defining characteristic – to establish causal relationships – will be diminished. It seems a poor situation – you’re damned if you do, you’re damned if you don’t. Knowledge transfer strategies are at risk of further exaggerating the asymmetry between research findings and real-world practice, in-fact. “Let’s just push our findings harder onto society”.
There is no quick, easy solution. However, Sarewitz alludes to the key for potential advancement with regard to this phenomena: real-world complexity. This won’t go away.  Until the real-world is ready and able to absorb abstract knowledge, it is unlikely that simply improving the internal quality of science will make any difference. In fact, it could serve to harm.  The relationship between science and society needs to become more symmetrical. Who knows how this can happen?  Examples as cases in point might be that the nature of ‘bias’ needs to be re-considered. Or our notion of what is understood by causation needs investigating. The causal processes in the real-world might be very different to those observed in a trial, despite the “external validity” of a trial. Even if the real-world could be captured in a trial, the moment the trial finishes, that world might have changed.

Today is a great day for science, and a great day for the real-world – as is every day. Let’s keep science real.

Posterous view: 470

Posterous comments:

Missing-user-35
Interesting blog, thanks!In the 4th para and second last line, you seem to be suggesting by ‘study outcomes tended to diverge from the truth’ that this is one truth? are you meaning to say this. I would have thought you would only be able to observe the results diverging and not agreeing.
6 months agoRoger Kerry responded:
Roger Kerry
Hi Nikki
Thanks for your comment. OK, so I am using the word “truth” to mean “scientific truth”, i.e. the outcome of scientific investigations. Of course this is variable over space-time, but I am premising this on the broad notion that the core activity of science is to reach some sort of consensus. E.g. physics edges towards an understanding of the origins of the universe. What I mean by “diverge” is that looking BACKWARDS from what we know today as the “truth”, it seems like there is no pattern to how scientific studies relate to this. I think if we were looking FORWARD we can talk in terms of “agreeing”, e.g. multiple studies seem to be agreeing with each other, the outcome of this agreement could be called scientific truth. If this purpose of science is accepted, then when looking BACKWARDS you would expect a clear pattern whereby studies, particularly the best quality studies, converged towards the “truth”. So, how did we get to this “truth” which doesn’t relate to trial outcomes? We defined it in numerous ways, e.g. systematic review outcomes, clinical guidelines, totality of epidemiological + mechanistic evidence. What was clear is that no matter how you define “truth”, a “scientific” progression / convergence cannot be identified in RCT findings. i.e. RCT outcomes are random, statistical accidents. Assuming our method is valid, I see a number of explanations / ways out of the problem: 1) you simply can’t/shouldn’t attempt to define “truth”: you just roll with study outcomes day-by-day and don’t worry about it; 2) it is not the purpose of science to edge towards truth; 3) truth is constructed, not discovered 4) health research is not a scientific activity; 5) the purpose of health research is justification, not discovery. I think you would have trouble excepting 1) and 2). Researchers won’t like 3) and 4), which leaves 5). I think this is the best position for health research to hold. However if this is the case then RESEARCH SHOULD NOT BE USED TO INFORM PRACTICE OR PREDICT OUTCOMES. It should be purely a way of providing evidence for something that happened (in the past), like a certificate is evidence of attending a CPD course, but it does not predict future attendance. This would appease commissioners / our own agenda etc, but turns EBP into rhetoric, and not a science-based activity. Of course I agree that there are many truths, bUt I have focussed on an interpretation of scientific truth here.Apologies for rambling! Hope all is well

5 Comments

Filed under Uncategorized