Entire fields on poor foundations

It has been a long standing interest of mine to better understand (and find ways to address) how entire fields of scientists can lead each other astray– into ideas that have no solid foundation. A volume of studies can reinforce each other in a positive feedback loop, with no one critically assessing the foundation, and any critical analysis being dismissed.
This is part of what TFI is/was about. https://www.tfi.ucsb.edu/about
Part of this is just the natural course of humans doing science, and the self-correcting nature of the process can take time.
Plus, add in all the foibles of human reasoning, relationships, and echo chambers…
Plus, add in incentive structures that discourage disowning one’s own prior work…
It’s not surprising that these things happen. But they waste resources– the time and brain power of smart, hardworking people, and money, animals, etc. So if there are ways that we can better detect when such dynamics are in play, it can overall help the enterprise of science.
I’m not talking about fraud, whether brazen fabrication or softer p-hacking. That’s important to combat too. I’m talking about insufficient rigor, and not just in individual studies. An individual study can be rigorous, but based on a foundational idea that lacks rigorous evidence. The foundation is poor, and people keep building upon it. The building — paper after paper after paper, new PhDs trained, grants, studies — all continue, perpetuated by positive feedback mechanisms, despite the fundamental limitations of the conceptual foundation of the entire field.
This foundational problem is a bit harder to nail down than fraud. Although fraud can be well hidden, and I appreciate the heroic efforts to uncover it, the idea is straightforward: presenting evidence that isn’t real.
When there is a lack of foundational rigor, left in place and built upon by well intentioned people, it can be in plain sight, but hard to nail down. And since it is typically coming from well intentioned people, who are being honest and open for the most part, it’s not fun to be critical about it. It’s not like dunking on obviously fraudulent image manipulation via an anonymous PubPeer post. Again, I appreciate that work and the people who do it (e.g. E. Bik), but what I’m talking about here is not as easy to identify and address.
So I am reading this paper with great interest.

I’m not a microbiome researcher, but I care about the phenomenon they’re addressing. And they do so masterfully here. The whole thing is very readable, and it’s open access. Check it out. Kevin Mitchell, Darren Dahly, and Dorothy Bishop make very clear points about the problem I describe at the top of this post. Some of their points are specific to the field they’re discussing (microbiome – autism links), but many are more general.
Here I’ll share some of my thoughts– cribbed from my Slack messages this morning as I live-streamed my thoughts while reading it. I’m going to stick to the points of general relevance. I’m not going to dig into their criticism of specific studies.
Here’s the abstract: “The idea that the gut microbiome causally contributes to autism has gained currency in the scientific literature and popular press. Support for this hypothesis comes from three lines of evidence: human observational studies, preclinical experiments in mice, and human clinical trials. We critically assessed this literature and found that it is beset by conceptual and methodological flaws and limitations that undermine claims that the gut microbiome is causally involved in the etiology or pathophysiology of autism.”
The authors are calling out the lack of foundational rigor in the field that works to link gut microbiomes with autism spectrum disorder. And they are trying to be precise and clear in identifying the positive feedback mechanisms that are perpetuating the field, despite the lack of foundational rigor. They use a couple of terms “pseudo-triangulation” and “quasi-replication”. They explain what those mean, and I think they’re useful terms. Whether they catch on is up to everyone else.
“pseudo-triangulation,” building an appearance of convergence from incommensurate and individually unconvincing lines of evidence.
“quasi-replication,” where any difference in the microbiome is taken as adding to the evidence base of a real effect, even when the details are inconsistent or even contradictory.
“The sense is that the literature is not cumulative, with later studies replicating and building upon earlier ones; rather, the rationale for a study is that ‘‘something is going on’’ in relation to autism and the gut microbiome, with each study adopting different methods and no consistent, replicable findings emerging.”
First, they discuss alternative hypotheses which are not necessarily well considered in the literature. For example, the genetic differences that lead to a diagnosis for autism can also lead to problems elsewhere in the body. This happens all the time in genetic diseases. So, fundamentally, a field should weigh competing or alternative hypotheses before accepting one.
Next, they discuss issues with large scale data mining studies. There are methodological details such as process and the choice of controls, confounders, effect sizes, etc. But also, by their nature, these are exploratory studies that can end up looking like strong evidence for a specific idea. But finding differences in data minding studies is easy. And just because multiple studies find differences, are they really reinforcing each other? Are they building on each other? Is this an accumulation of evidence?
“This body of work continues to be cited uncritically, however, deepening the perception of a solid evidence base. The inference seems to be that, even if specific findings are inconsistent, most of the (published) studies that have examined the question have found some differences in the microbiome of autistic people compared with TD [typically developing] controls. This is often couched in more general terms with claims of ‘‘dysbiosis’’ of the gut microbiome in individuals with autism. As pointed out by researchers in the field, this is a vague term, which effectively captures any profile of differences, regardless of consistency. [13,14] Two studies showing directly contradictory profiles of change could both be classed as evidence of dysbiosis.”
Then there is a lot of criticism of mouse models for autism research, and I agree with some of it. I have a bit to say about this topic, but I won’t do it here in this blog post, because I want to stick with the more generally applicable stuff.
Getting back to exploratory work… I appreciate exploratory work. And the rigor bar is lower for those kinds of studies. They’re important, and by their nature they definitely shouldn’t pre-register their hypotheses. They should explore and see what they find.
What sometimes happens is exploratory work evolves into something else. And that’s a problem. Here are two examples of what can happen:
(i) Other people cite an exploratory study with a statistically significant measurement as strong evidence for “A causes B”, even though the original paper is much more muted in its claims. In this case, the problem lies with the people citing the study.
(ii) The desire for a high profile publication pushes the authors to make stronger claims than their study design justifies. In this case, the authors need to be accountable as well as the people who cite it uncritically.
“The big data generated in microbiome studies lend themselves to numerous possible analyses, such that if one looks hard enough, something is bound to emerge, especially in studies with very small samples. Flexibility also characterizes the hypotheses that are considered. A typical study starts with the notion that there is some link between autism and the microbiome but is vague in terms of the specific hypothesis being tested, so that what we see is hypothesizing after results are known (HARKing).[143]”
Here’s a link to that HARKing paper.
This pull quote really nails it, and the same can be said about many fields that have a sort of positive feedback loop that keeps them going despite the lack of a strong, rigorous foundation:
“Taking this literature as a whole, there may be a sense that ‘‘where there’s smoke, there must be fire,’’ but experience from other fields (social psychology, candidate gene association studies, neuroimaging association studies, and nutritional epidemiology) has shown that actually there can sometimes just be lots of smoke. Though diverse results from different studies are often presented as a kind of quasi-replication (where the details vary but something was found), in fact, each such study could be taken as a non-replication of the findings of many other studies.”
In this particular case, there is a human cost that needs to be considered and weighed carefully:
“This hype is not without consequences. It feeds on, and feeds into, narratives of an epidemic of autism caused by environmental factors that can be addressed with ‘‘natural’’ remedies promoted by a largely unregulated global wellness industry. The general public is not well equipped to evaluate these claims and are vulnerable to being taken advantage of, especially with respect to autism.”
At the end, they provide some constructive advice. And this is what I hope has impact on the field, so it is a pity it is buried at the end and not even alluded to in the abstract. They say, in part:
“for those who think this topic is worth taking further, it would make sense to adopt some ground rules. In many ways, these would be similar to those adopted by geneticists after the first frenzy of candidate gene associations was found to be a waste of time and money because the area was overwhelmed by nonreplicable false positive results. [147] The solution was an increase in experimental rigor, with requirements for adequate statistical power, standardized protocols, and a clear distinction drawn between exploratory and confirmatory studies.”
[…] In scientific research, I am interested in how we can avoid fooling ourselves. Scientists are, by training and nature, appreciative of evidence, quantified uncertainty, and updating models or beliefs. Still, we aren’t perfect, and there’s room for improvement. For example, how can we avoid letting entire fields wander off into areas that lack a firm, rigorous foundation? […]