Three books that helped me learn Bayesian statistics

In a previous post, I wrote about my journey into learning (and continuing to learn) Bayesian statistics. Making the jump into Bayes would have been impossible without some great resources (books, articles, packages, and blogs) that have come out in the last few years. Here’s my quick review of books that have been most influential for me (a practicing ecologist). In later posts, I’ll talk about packages, articles, and blogs.

Books

Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan, by Franzi Korner-Nievergelt et al.

As someone who used frequentist statistics for over a decade, this book was essential for me to understand Bayesian models. Unlike other Bayesian books I’ve read, this book does a side-by-side comparison of frequentist and Bayesian analysis of the same models, instead of pretending that frequentistm doesn’t exist. That approach really helped me understand a fundamental lesson: learning Bayesian did not require learning new model structures. A linear regression y~a + bx is a linear regression, whether it’s a Bayesian regression or a frequentist. The main difference is in how we interpret the parameters, in this case the intercept a and slope b. This book helped me clear up confusion over common questions, such as “Do you think this would work with a Bayesian approach?”. After reading this book, I now know that the answer is of course it will work with a Bayesian approach.

The book comes with an R package and well-described R code in lmer() syntax that links to STAN for exploring the posterior. But it starts off by using a simple function in base R – sim(). I really liked this, because it generates a posterior (assuming flat priors) without the need for external programs, and allowed me to see the power of analyzing things like treatment comparisons using the full posterior (hint: it’s really easy once you get comfortable thinking about the iterations in the posterior).

Bayesian Models: A Statistical Primer for Ecologists, by Tom Hobbs and Mevin Hooten.

This was the first Bayesian book I ever read, and I learned Bayesian statistics from the authors at an NSF funded workshop that they taught with Kiona Ogle and Maria Uriarte.

What I like most about this are the clear ecological examples, and the emphasis on choosing the right likelihood with clear descriptions of the method of moments. My own work uses the gamma likelihood almost exclusively now, and their examples of the gamma in this book are excellent. In the appendix, there is also an extremely useful table that compares the different likelihoods, and what types of ecological data are relevant for each one. (also see Sean Anderson’s excellent vignettes for gamma examples in Bayes and non-Bayes).

The book does not have any code, instead using detailed mathematical notation and DAGs. For me, this was difficult to digest as a first Bayesian text. I like trying to replicate someone else’s work by trying to code it, failing, trying again, failing, etc… That’s not the most efficient way, but it works for me. However, the authors of this book also rightly point out that adding code or specific software will limit their audience. New packages come out all the time, instantly dating anything that would be in the book. Because of that, this book will be useful regardless of the programming language you use (or will use in the future).

Statistical Rethinking, by Richard McElreath

A lot has been written on this book already (e.g. here), and for good reason. It really is “a pedagogical masterpiece“. When I teach Bayesian Statistics to our graduate students, this is the book we use. It comes with it’s own R package (rethinking), which is used throughout the book.

One of the things I like best about it is the clear description of what the code and formulas mean. It’s use of R code and non-mathematical formulas are a godsend for readers that have very little recall of algebra or calculus. In that sense, it provides a nice contrast to the Hobbs and Hooten book, or to other well-known Bayesian books, such as Gelman et al.’s Bayesian Data Analysis.

This book is most helpful if you read the whole thing. That probably sounds obvious, but I say it because, as the name suggests, it really is a new style of thinking and writing about statistics. It is designed as a complement to semester-long course, in which each chapter builds on the others and references past analyses. It would be difficult to drop in on chapter 12 to only learn multilevel models if you’re not already familiar with the syntax and examples of earlier chapters. Of course, you should plan to learn Bayesian over months to years, anyway. Shortcuts to understanding any new statistical philosophy and re-wiring your statistical workflow don’t exist.

Importantly, as an example of the clarity of writing, McElreath has done away with traditional statistical lexicons that often confuse non-statisticians. If you have to pause every time you see “i.i.d” or “moments” or “jth group”, then this book is for you. Sure, it contains all of those concepts (often as separate “Overthinking” sections), but describes them in fresh ways, without resorting to verbal shortcuts. Brevity is not always a pedagogical friend, and McElreath understands that.

The parts of this book that I don’t like as much are that plots use base R, often using for loops. That’s just a personal preference, as I tend to use tidyverse and ggplot. The good news is that Solomon Kurz earned a lifetime’s worth of good academic karma by recoding everything in this book, from models to figures, with tidyverse, brms, and ggplot.

The other I’d hoped for but did not see are examples of fitting models with categorical predictors that contain more than two levels. There are lots of examples of models with continuous predictors and with categorical predictors with two levels (i.e. 0 or 1). But I’m an experimental ecologist and we often have treatments with 4-5 levels, typically measured repeatedly over time, where I want to derive the posterior distribution for each treatment and compare them. The rethinking package can actually do this quite easily (hint: look at the end of Chapter 5), using the correct 0/1 matrix of predictors. But if you are used to using a shortcut like y ~ time*treatment to specify an interaction in base R models, there is nothing like that in Rethinking.

Statistical Rethinking, by Richard McElreath