Scientific Error and the Real-World

UPDATE 12th August 2013: The paper underpinning this blog has just been published proper. Here is the pdf if you are interested:

JECP -Truth paper 2013 – FINAL



IMPORTANT NOTICE: This blog relates to two academic papers published today. One is a paper on which I am the lead author. These comments are entirely my own and do not necessarily reflect the thoughts and opinions of my wonderful co-authors or the University of Nottingham.

There is a fascinating emerging phenomenon in the field of science: that science might be prone to systematic error. At least, there seems to be more attention to this of late. Scientists have always been aware of the potential for error in their methods.  However, today is special in relation to this subject (for me anyway) because first, there is an enlightening World View article published in Nature on this matter by Danial Sarewitz (D.Sarawitz Nature 485, 149), and second, a small team from the University of Nottingham have had published a research paper on this very subject (R. Kerry et al J Eval Clin Prac) (pdf above).

Sarewitz nudges towards a dimension of this phenomenon which is of utmost interest to me and my field of science, health science. That is that the systematic error observed in scientific method seems to be revealed only, or at least best, when the science is placed in the context of the real-world.  In health science we work in a framework known as evidence-based practice (EBP), and this is a living, object example of what Sarewitz is referring to.  EBP is solely concerned with the integration of scientific findings from rigorous research processes into the shop-floor decision-making of health care professionals.  So is scientific error witnessed in EBP? If so, how does that effect patient care?  These are big questions, but here are some thoughts on how their answers might be informed.

First, what does the state of health science look like with regard to this error.  John Ioannidis mid-noughties high-profile reports on the phenomena e.g.  ‘Why Most Published Research Findings are False’ (J.P.A. Ioannidid PLoSMed. 2, el124;2005) caused turbulence in the field of health science, mostly medicine. He provided evidence of systematic bias in ‘gold-standard’ scientific practices.  Our paper published today supports these findings: gold-standard research methods are not reliable truthmakers.  But this is only the case when ‘truth’ is defined as something outside of the methodological frameworks. We tried to find a definition which was as real-world as possible, yet as tightly related to the scientific methods as possible, i.e. conclusions from systematic reviews or clinical guidelines. Right up to this point, the science looks good: tight control for bias, well-powered, apparently externally valid. However, the moment you step-out of the science itself, things look very different. We found that in-fact there was no reliably meaningful indication of truthfulness of a single controlled trial by its internal markers of bias control. So although a trial looks great for internal validity, this quality does not translate to the out-side world.  We are not the first to question the value of markers of bias control for real-world applicability.

Sarewitz states: “Researchers seek to reduce bias through tightly controlled experimental investigations. In doing so, however, they are moving farther away from the real world complexity in which scientific results must be applied to solve problems”. Voilà. The paradox is clear: the tighter trials get for controlling for bias, the less relevance they have to real-world decision making. Sarewitz also suggested that if biases were random, multiple studies ought to converge on truth. Our findings showed that in the trials examined, throughout time (and given that more recent trials tended to be the higher quality ones), study outcomes tended to diverge from the truth. So, the most recent and highest quality trials were the worst predictors of truth.

There are strong scientific, professional, educational and political drivers surrounding this issue: funders base their decisions of proposals that show greatest rigor; health scientists get better at constructing trials which are more likely to establish causal relationships (i.e. control better for bias); journals insist on trial adherence to standards of bias control; scientific panels for conferences examine abstracts for bias control; students are taught about evidential hierarchies and why highly controlled studies sit at the peak of health science; health care commissioners seek to make purchasing decisions on sight of the highest-quality evidence.

However, all may not seem so bleak. There are a couple of movements which initially appear to offer some light. First, the attempts by researchers in making their trials more representative of the real-world. For example, the interventions under investigation being more relevant to common practice and mechanistic principles; the trial sample being more representative of the population; outcomes being more meaningful. Second, Universities and funders are becoming more concerned with ‘knowledge transfer’ methods. The idea being to seek ways to get the great internally valid research findings into real-world practice.  It is clear that these two strategies are missing the point. More representative trials are still going to be confined to internal constraints optimising internal validity. If not, their defining characteristic – to establish causal relationships – will be diminished. It seems a poor situation – you’re damned if you do, you’re damned if you don’t. Knowledge transfer strategies are at risk of further exaggerating the asymmetry between research findings and real-world practice, in-fact. “Let’s just push our findings harder onto society”.
There is no quick, easy solution. However, Sarewitz alludes to the key for potential advancement with regard to this phenomena: real-world complexity. This won’t go away.  Until the real-world is ready and able to absorb abstract knowledge, it is unlikely that simply improving the internal quality of science will make any difference. In fact, it could serve to harm.  The relationship between science and society needs to become more symmetrical. Who knows how this can happen?  Examples as cases in point might be that the nature of ‘bias’ needs to be re-considered. Or our notion of what is understood by causation needs investigating. The causal processes in the real-world might be very different to those observed in a trial, despite the “external validity” of a trial. Even if the real-world could be captured in a trial, the moment the trial finishes, that world might have changed.

Today is a great day for science, and a great day for the real-world – as is every day. Let’s keep science real.

Posterous view: 470

Posterous comments:

Interesting blog, thanks!In the 4th para and second last line, you seem to be suggesting by ‘study outcomes tended to diverge from the truth’ that this is one truth? are you meaning to say this. I would have thought you would only be able to observe the results diverging and not agreeing.
6 months agoRoger Kerry responded:
Roger Kerry
Hi Nikki
Thanks for your comment. OK, so I am using the word “truth” to mean “scientific truth”, i.e. the outcome of scientific investigations. Of course this is variable over space-time, but I am premising this on the broad notion that the core activity of science is to reach some sort of consensus. E.g. physics edges towards an understanding of the origins of the universe. What I mean by “diverge” is that looking BACKWARDS from what we know today as the “truth”, it seems like there is no pattern to how scientific studies relate to this. I think if we were looking FORWARD we can talk in terms of “agreeing”, e.g. multiple studies seem to be agreeing with each other, the outcome of this agreement could be called scientific truth. If this purpose of science is accepted, then when looking BACKWARDS you would expect a clear pattern whereby studies, particularly the best quality studies, converged towards the “truth”. So, how did we get to this “truth” which doesn’t relate to trial outcomes? We defined it in numerous ways, e.g. systematic review outcomes, clinical guidelines, totality of epidemiological + mechanistic evidence. What was clear is that no matter how you define “truth”, a “scientific” progression / convergence cannot be identified in RCT findings. i.e. RCT outcomes are random, statistical accidents. Assuming our method is valid, I see a number of explanations / ways out of the problem: 1) you simply can’t/shouldn’t attempt to define “truth”: you just roll with study outcomes day-by-day and don’t worry about it; 2) it is not the purpose of science to edge towards truth; 3) truth is constructed, not discovered 4) health research is not a scientific activity; 5) the purpose of health research is justification, not discovery. I think you would have trouble excepting 1) and 2). Researchers won’t like 3) and 4), which leaves 5). I think this is the best position for health research to hold. However if this is the case then RESEARCH SHOULD NOT BE USED TO INFORM PRACTICE OR PREDICT OUTCOMES. It should be purely a way of providing evidence for something that happened (in the past), like a certificate is evidence of attending a CPD course, but it does not predict future attendance. This would appease commissioners / our own agenda etc, but turns EBP into rhetoric, and not a science-based activity. Of course I agree that there are many truths, bUt I have focussed on an interpretation of scientific truth here.Apologies for rambling! Hope all is well


Filed under Uncategorized

5 responses to “Scientific Error and the Real-World

  1. Bill


    This is certainly an interesting and controversial theory. I saw your related talk at IFOMT and was intrigued. If research should not be used to inform clinical practice, then what do you suggest as an alternative? This seems to me a slippery slope. Are we to return to guru-based physical therapy? Are all sorts of interventions then justified using ‘clinical reasoning’ alone?


    • Hi Bill
      Thanks for your comments. Not at all. I’m certainly not in favour of returning to a pre-EBP way of going about things. I think by and large we are heading in the right direction, but are at risk of going off-course by the value we give to population-based studies. This is not to be harsh on such studies, just considering the evidence about some limitations of them, e.g. Ioannidis etc. If you look at the narrow scope of evidence on which guidelines are formed, or the fundamentalist way people view EBM (see Twitter for example), we may simply be uncritically appealing to another “guru”. This time it’s not a person but an ideology. Science is exactly the correct way to progress, but it should be done with the same level of analysis that we give to other ways of understanding knowledge, e.g. Clinical practice. We reject anecdotal evidence based one one core source of bias – perception, whilst we accept population data to inform a clinical decision despite the fact that we know there are at least 50 sources of bias, 20 critical and only 10 we attempt to control for, and even controlling for RoB does not mean the data is useful (e.g. our study). Population studies give a really valuable clue of what the world is like, but need to be scaffolded with the totality of evidence, including mechanistic studies, and other sources of evidence. So the idea of EBP/EBM is awesome, but it seems like some balance is being lost. I don’t think we should use “clinical reasoning” as an excuse to allow in whatever intervention we fancy, but also I don’t think it should be rejected just yet. The way we think about a clinical problem should be done differently now that it was done 10, even 5 years ago. More skills should be developed to integrate sources of evidence in a way that EBP becomes something more than rhetoric. We are currently working on a re-interpretation of what constitutes causation in health as part of a larger project ( and hope this will add to the story a little. My rants seem a little opinionated and I am absolutely no-one to tell anyone how the world should work! However, I think we are at a really exciting time in health science / clinical practice / research and we should make every effort to ensure the scientific trajectory is as good as possible to take us into the next stage. Apologies for rambling:) Roger

  2. Bill

    Thanks very much for your reply. I read through your paper and Ionnidis’s; its gonna take a me a while to digest all of this. In your opinion, for an individual study, how does the consideration of confidence intervals and effect sizes affect the interpretation of the truthfulness of the results?


  3. Hi Bill
    My (skeptical) thoughts are that essentially any frequentist statistical analysis has some issues. So CIs don’t necessarily solve the problems associated with p-values. It is a mistake to think that they somehow inform our ‘confidence’ in the parameter falling within some (arbitrary) intervals. They speak, like p-values, of true (but unknown) parameters being progressivley edged towards over time in infinitely repeated trials. So don’t inform anything about the ‘now’. This is, of course, being a statistical pedant. But the way I see, there should be no other way.

    However, if we chill-out a bit, I do find myself empathising with the thoughts of a guy called Jeremy Howick who works at the the Oxford CEBM. He has a suggestion of introducing an ‘effect size threshold’, whereby we only take notice of trial results if they pass a certain threshold. Then the results become meaningful. We sieve out the small effects and the ‘statistical accidents’. Then again, I sometimes wake up and think, nah – there are still too many fundamental problems.

    I’m a fan of Bayesian statistics and I think if trial data was analysed using these, we would have more meaningful interpretations. And it would halt the number of ridiculous physio RCTs being conducted because we would see what magnitude of evidence would be required to influence our a priori probability. This, to me. is much better science than continually conducting meaningless RCTs in the hope that we either ‘eventually gather enough data’, or ‘hope that we find something’, etc. My view is that we have created a research ‘wild-west’. we need a new sheriff..!

    Usual caveat – all my own opinions!


  4. Bill


    Thanks again for more food for thought. This is a bit much to wrap my head around all at once. I will need to ponder for awhile how these concepts may affect my application and teaching of EBP.

    I did a quick search on Bayesian statistics and physical therapy and came across this review paper from Canada.
    The concept of determining a prior probability in Bayesian methods reminds me of establishing a pre-test probability when applying likelihood ratios for diagnostics tests. It ain’t an exact science but it is more similar to how we think in the clinic.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s