When I was in grad school, Ken Burnham gave a seminar in my department about model selection and met with our research group. His book with David Anderson had been out for ~3 years at the time (it now has more than 45,000 citations!), but I had zero idea of what it was or why everyone was so excited about it. My understanding of statistical analysis was so poor that when a professor suggested that I should use model selection in my dissertation, I could only nod silently. In reality, I didn’t even know what a model was.
Sure, I had run t-tests and ANOVAs in SPSS and PROC MIXED in SAS, but they were just names for things that I didn’t really understand. The idea that there were underlying similarities between them, that they were models, was baffling to me. I was happy enough just getting the software to work. Then I’d google how to interpret the output, and try to add the stats to my paper with as little explanation as I could get away with, hoping no one would ask about the stats.
I don’t think I was alone. Like most ecologists I know, especially those of us who use controlled experiments, my training in statistics was limited to a few graduate courses in biostatistics that followed a familiar pattern.
We learned tests.
If you have two groups, then use a t-test.
If you have more than two, then use an ANOVA.
We learned rules
If your data are not normally distributed, then it’s Kruskall-Wallis time. Heterogeneity of variance is something you should really be afraid about.
But we didn’t learn what any of this meant. At least I didn’t. And for a while, that was just fine. My experiments were going well, showing big effects that hardly needed a p-value to convince anyone. So what if they weren’t analyzed with the perfect models (whatever that meant), the science was still sound, and we were replicating the findings. All was OK.
As I moved into postdocs and began to collaborate with a wider group of people, I felt a nagging discontent. Everyone had a unique set of rules to apply or ignore, often couched in folklore. Such as the idea that statistics are only around to make up for poorly designed experiments. Or that you should always use Tukey. Or that you should always use Bonferroni. Or that pseudoreplication was something to be terrified about (rather than just modeled).
But there were cracks in the wall.
Then, in 2011, I came across several papers by Shinichi Nakagawa that blew me away. The papers were critical of ecology’s blind allegiance to Bonferroni-style corrections and emphasized effect sizes as critical measures over p-values (written with Innes Cuthill). These papers were a revelation to me, partly for their sensible approach, but mostly for the simple existence of debate. Before then I had assumed that statisticians agreed on all the rules (even if us non-statisticians didn’t). After all, what were our textbooks and classes in statistics if not a slew of rules to be wary of? Instead, these papers brought a sense of excitement. Perhaps I was not alone in my confusion and frustration with arbitrary cutoffs. Perhaps the weirdness of it all wasn’t just a reflection of my poor math skills. Perhaps there was more to a statistical analysis than whether p was above or below 0.05.
Perhaps there was more…but I didn’t know what. For the next several years, I plodded along, analyzing data with the standard tools, but becoming increasingly disillusioned with them. Then, in 2014 (or maybe 2015), I analyzed some data for the first time using lmer() in R. The output had all the familiar summary statistics that come with any linear analysis, but to my dismay, there were no p-values. StackExchange quickly confirmed that this was no mistake. It also confirmed that I was not alone in wondering where this cornerstone of my statistical understanding had gone. The question has been viewed over 100,000 times.
Here was my opening. I had already committed myself to using R full-time when I started my faculty position, and the model I needed to run was a linear mixed model. I was stuck with lmer(). This was my chance to break free, to embrace effect sizes or confidence intervals or bootstrapping or…something…and let go of p-value shackles. And so, like any good transition from a comfort zone, the first thing I did was scramble straight back to safety. I googled a solution to produce p-values from lmer(), and that was that.
Eventually, I did try to publish a paper without p-values. In that paper, I tried to use some god-forsaken confidence interval approach I’d read in an ecotox journal. Something about comparing overlap with 84% intervals instead of 95%, because 84% was better at replicating the alpha of 0.05. I honestly can’t remember. What I do remember is that reviewer one hated it and refused to read beyond page 9, where I’d introduced the 84% idea. I can’t blame them. It sucked. I eventually abandoned that approach and published it in a different journal with all the traditional statistical approaches.
I needed some help
Clearly, I was not going to learn a better way to analyze my data on my own. I needed help, so I attended a workshop in Fort Collins, Colorado in 2015 (link is for a different year, but is the same workshop). It was targeted at ecologists who wanted to learn Bayesian statistics. I didn’t know what Bayesian statistics was, but I knew it was different than what I’d been doing and it seemed hip (sometimes that matters, too). The workshop was intense, and I was struck at how quickly I reverted to my habits from undergrad – sit in the back, never talk, wait too long to clarify a simple mis-understanding. Even though I was college professor, I was once again just a so-so student. In between classes, I couldn’t make myself ignore the grant I needed to write, the paper I was finishing, or the summer field season my grad students were starting. Plus I was back in Fort Collins, where I’d lived and worked before, and I had lots of reminiscing to do.
Year(s) of the books
When I got back to my office in Vermillion, I half-heartedly tried to run a Bayesian analysis, using the approach I’d learned at the workshop (in rjags). But it failed miserably, and I went straight back to analyzing data with the lmer p-value hack. But 2015/2016 was an incredible year for learning Bayes, due to the publication of several books that offered a fresh way of thinking about data analysis in general:
Bayesian Models: A Statistical Primer for Ecologists, by Tom Hobbs and Mevin Hooten (who taught the workshop I had attended), Statistical Rethinking, by Richard McElreath, and Bayesian Data Analysis in Ecology…, by Franzi Korner-Nievergelt et al.
These books touched on similar topics, such as defining Bayesian analysis, or describing a hierarchical model, but they did so in unique ways. In the year after I attended the workshop, I would constantly shift between them to understand some component of a model. Hobbs and Hooten described how to use the posterior distribution to compute derived quantities, akin to the “post-hoc” tests that had always flummoxed me. It was so simple that I still have to re-read every few months just to make sure I hadn’t missed something. McElreath’s description of hierarchical models, and the underlying structure of all of those “tests” is as good as it gets.
But it was Korner-Nievergelt et al.’s side-by-side comparisons of Bayesian and frequentist results, along with their bare-bones R code, that were most revealing to me. The first Bayesian regression I could run myself and understand came from the introductory chapters of their book. Numerically, the results weren’t any different than what I would have gotten from a frequentist analysis (i.e. the slope and intercept were numerically similar regardless of the method). But their book also contained perfectly understandable descriptions of why numerical similarity is not the point. With Bayes, I could now make direct statements about hypotheses that I couldn’t make otherwise. It is hard to describe what a relief it is to be able to say: “the probability that the slope is greater than zero is 93%”, instead of “the probability of obtaining data as extreme or more extreme than we obtained under the null hypothesis that the slope is exactly zero is 0.03″, which of course is never actually written out, but is instead short-circuited as something like “the slope was positive (p=0.03)“. The straightforward way of putting the results of Bayesian analyses into sentence format is easily one of the best arguments I have when someone asks why they ought to learn Bayes. It’s one less thing to worry about, so you can get on with what is most important, your scientific question and results.
Year of the packages
As if the publication of these books wasn’t helpful enough, rstanarm() was released around this time, and Paul Bürkner released the brms() package soon after. It uses the typical R syntax to specify models that I had become accustomed to, essentially removing any excuse I had left to not run Bayesian models as a default. Now, this frequentist regression in base R
lm(y ~ x, data=data)
became this Bayesian regression in brms()
brm(y ~ x, data=data)
Since ~2016, my graduate students and I have exclusively used Bayesian analysis in our publications. To my surprise, we have had zero trouble convincing reviewers that this approach is acceptable. My transition from frequentist to Bayesian analysis has easily been one of the most intellectually satisfying things I’ve ever done. It only took, uh, 15 or so years.