i4i is one of the world’s foremost practitioners of Bayesian statistics.
Bayesian statistics uses the rules of probability to combine data with prior information to deliver conclusions which are better than would be obtained from either source alone. This prior information may be your beliefs, results from previous studies, or even theories of “how the world works.”
In contrast, classical statistical methods – often called frequentist methods – avoid prior distributions. For example, you might include a variable or you might exclude it. These are your only choices. In Bayesian modeling, you not only can include or exclude, but you also must assign a prior distribution that represents the possible values that the coefficient for that variable can be.
Because you must choose a prior, Bayesian methods are sometimes called “subjective.” While the prior requires user input, this is no more “subjective” than other aspects of statistical modeling: which method to use, which variables to include, which coefficients should vary over time or across situations.
In a regression equation, for example, setting the prior to “noninformative” is the same as including the predictor in the model. Setting the prior to have a spike at zero is the same as excluding the predictor. We can also use “informative priors.” For example, price coefficients should be negative. A promotion should have no effect (the promo did not work) or it can have a positive effect (it worked).
The recent success of Bayesian methods can be attributed to two factors: faster computation and improved algorithms. Except in simple problems, Bayesian models use difficult mathematical calculations called Monte Carlo simulation methods. Since the 1980s, we have seen big jumps in the sophistication of algorithms, the capacity of computers to run these algorithms in real time, and the complexity of the statistical models that practitioners are now fitting to data.
The essence of Bayesian statistics is the combination of information from multiple sources. It’s about putting together data to understand a larger structure. Big data are inherently messy: scraped data not random samples, observational data not randomized experiments, available data not constructed measurements. So statistical modeling is needed to put data from these different sources on a common footing. That is what Bayesian methods are very good at.
(written with deep respect and admiration for Dr. Andrew Gelman of Columbia University)