# Abstracts

## Monday 14 March

Short Course

Monday 12:00 – 15:30

Gerda Claeskens – Model selection and model averaging

The selection of a suitable model, including the selection of regression variables, is central to any good data analysis. In this course we will learn different criteria for model selection, with a deeper understanding of where they originate, what they intend to optimize, and how they should be understood and used. As an alternative to selecting one single model, we consider model averaging, and discuss the uncertainty involved with model selection. Data examples will be worked out and discussed.

Keynote

Monday 16:00 – 17:00

Peter Grunwald – Model Selection when All Models are Wrong

How to select a model for our data in the realistic situation that all models under consideration are wrong, yet some are useful? Among the myriad existing model selection methods, Bayesian inference stands out as the most general and coherent approach. Unfortunately, it does not always work well when models are wrong, yet useful. I will illustrate this using both practical examples and theoretical results. I will then give an overview of the work in my group, which focusses on methods for model selection, averaging and prediction that provably work well even when the models are wrong. The resulting procedures are still mostly Bayesian, but with an added frequentist *sanity check* that can be understood in terms of Popperian falsification ideas. As such, they shed new light on the age-old discussion between the Bayesian and the frequentist school of statistics.

## Tuesday 15 March

Beyond model fit...

Tuesday 9:00 – 10:30

John Copas – Some models are useful ― for confidence intervals?

In his famous quote “All models are wrong ... ” Box went on to say “ ... but some are useful”. Useful for what? And how is the model selected? We discuss the use of models for finding a confidence interval for a specific parameter of interest. We assume that model selection is in two stages: a weak model based only on information known or assumed about the context of the data, and a stronger (sub-) model selected so that, in some sense, it gives a good description of the data. This suggests a family of models indexed by some criterion of goodness of fit, and hence the corresponding family of confidence intervals for our parameter. We discuss one or two examples, leading to some fairly general (and remarkably simple) asymptotic theory for the outer limits of these intervals. This suggests some questions about how models are, or should be, selected in practice.

Angelika van der Linde – Model Complexity

The talk addresses the problem of formally defining the effective number of parameters in a model which is assumed to be given by a sampling distribution and a prior distribution for the parameters. The problem occurs in the derivation of criteria for model choice which often – like AIC – trade off goodness of fit and model complexity. It also arises in (frequentist) attempts to estimate the error variance in regression models with informative priors on the regression coefficients, for example in smoothing. It is argued that model complexity can be conceptualized as a feature of the joint distribution of the observed variables and the random parameters and hence can be formally described by a measure of dependence. The universal and accurate estimation of the measure of dependence, however, is the most challenging problem in practice. Several well-known criteria for model choice are interpreted and discussed along these lines.Contributed talks

Tuesday 11:00 – 13:00

A. Philosophical perspectives of model uncertainty

Martin Sewell – Model selection and uncertainty in climate change mitigation research

There are aspects of climate change about which we are almost certain (the physical chemistry), and areas in which uncertainty is rife (effect of clouds, ocean, response of biological processes, climate change mitigation). We're pretty certain of the uncertainty regarding climate sensitivity, and we're more certain of global warming in the future than the past. Where the uncertainty lies is the only theoretical difference between carbon tax and emissions trading.

Joel Katzav – Hybrid models, climate models and inference to the best explanation

I examine the warrants that result from the successes of climate and related models. I argue that these warrants' strengths depend on inferential virtues that aren't just explanatory virtues, contrary to what would be the case if inference to the best explanation (IBE) provided the warrants. I also argue that the warrants in question, unlike IBE's warrants, guide inferences solely to model implications the accuracy of which is unclear.

Henkjan Honing – The role of surprise in theory testing

While for most scientists the limitations of evaluating a model by showing a good fit with the empirical data are clear cut, a recent discussion (cf. Honing, 2006) shows that this wide-spread method is still (or again) in the center of scientific debate. An approach to model selection in music cognition is proposed that tries to capture the common intuition that a model's validity should increase when it makes surprising predictions.

Johan Camps – Nuclear emergency management: taking the right decisions with uncertain models

This contribution reflects on the use of models and the related practical difficulties encountered in nuclear/radiological emergency management. It discusses the role of models for estimating effects to the humans and environment and for taking decisions on suitable protective actions. Examples typical errors that can be made when taking decisions based on wrong model assumptions are illustrated. This is joint work with Catrinel Turcanu.

B. Statistical perspectives of model uncertainty

Max Welling – Learning with Weakly Chaotic Nonlinear Dynamical Systems

We describe a class of deterministic weakly chaotic dynamical systems with infinite memory. These “herding systems” combine learning and inference into one algorithm. They convert moments directly into a sequence of pseudo-samples without learning an explicit model. Using the “perceptron cycling theorem” we can deduce several convergence results.

George A.K. van Voorn – An evaluation list as model selection aid: finding models with a balance between modelcomplexity, data availability and model application

The continuous increase in the complexity of models that are being applied for environmental assessments results in increased uncertainty about the quantitative predictions. Classical criteria to find optimal models, such as the Akaike information criterion, do not consider the application. A list that evaluates the balance between model complexity, data support, and application, gives different ‘optimal’ models than classical criteria. This is joint work with P.W. Bogaart.

Ariel Alonso – Model selection and multimodel inference in reliability estimation

Recently, some methods have been proposed to study the reliability of rating scales within a longitudinal context and using clinical trial data (Laenen et al 2007, 2009). The approach allows the assessment of reliability every time a scale is used in a clinical study, avoiding the need for additional data collection. The methodology is based on linear mixed models and reliability coefficients are derived from the corresponding covariance matrices. However, finding a good parsimonious model to describe complex phenomena in psychiatry and psychology is a challenging task. Frequently, different models fit the data equally well, raising the problem of model selection uncertainty. In this paper we explore the use of different model building strategies, including model averaging, in reliability estimation, via simulations. This is joint work with Annouschka Laenen.

Nick T. Longford – ‘Which model?’ is the wrong question

The paper presents the argument that the search for a valid model, by whichever criterion, is a distraction in the pursuit of efficient inference. This is demonstrated on several generic examples. Composite estimation, in which alternative (single-model based) estimators are linearly combined, is proposed. It resembles Bayes factors, with the crucial difference that the weights accorded to the estimators are target specific.

Bayesian approaches

Tuesday 14:00 – 15:30

Herbert Hoijtink – Objective Bayes factors for inequality constrained hypotheses

This paper will present a Bayes factor for the comparison of an inequality constrained hypothesis with its complement. Equivalent sets of hypotheses form the basis for the quantification of the complexity of an inequality constrained hypothesis. It will be shown that the prior distribution can be chosen such that one of the terms in the Bayes factor is the quantification of the complexity of the hypothesis of interest. The other term in the Bayes factor represents a measure of the fit of the hypothesis. Using a vague prior distribution this fit value is essentially determined by the data. The result is an objective Bayes factor. The procedure proposed will be illustrated using analysis of variance and latent class analysis.

Eric-Jan Wagenmakers – Default Bayesian t-tests

Empirical researchers often use the frequentist t-test to compare statistical models, and assess whether or not their manipulations had an effect. Here we summarize recent work on a default Bayesian alternative for the frequentist t-test and discuss the possibility of a hierarchical extension.

Computational Bayes

Tuesday 16:00 – 17:30

Nial Friel – Computing marginal likelihood and Bayes factors for Bayesian models

Over the past 15 years a variety of different methods have been presented in the literature to estimate the marginal likelihood of a Bayesian model. This talk will present a survey of this area and offer some new perspectives.

Peter J. Green – How to compute posterior model probabilities ... and why that doesn't mean that we have solved the problem of model choice

The generic set-up for model choice in a Bayesian setting puts prior probabilities over the set of models to be entertained, and then conditional on the model follows the usual (possibly hyperprior)-prior-likelihood formulation. It therefore sits in a framework that adds one additional level, the model indicator, into a Bayesian hierarchical model. This makes sense whether the "models" are genuinely distinct hypotheses about data generation, or simply determine degrees of complexity of a functional representation (such as the order of an autoregressive process), or some combination of the two.

I will begin by discussing Markov chain Monte Carlo methods for computing posteriors on model indicators simultaneously with model parameters. Such methods include both within-model methods typically requiring approximation of marginal likelihoods, and across-model methods such as reversible jump, where algorithms are complicated by the facts that different model parameters may be of differing dimension, and that designing efficient across-model jumps may be difficult (but worth doing).

So computational Bayesians can compute posterior probabilities – does that leave us anything else to worry about? I will conclude by discussing why the answer is yes, and an attempt to categorise the different reasons that there are still interesting questions to answer.

## Wednesday 16 March

Contributed talks

Wednesday 9:00 – 10:30

A. Applied philosophical aspects of model uncertainty

Keith Beven – Testing hydrological models as hypotheses: a limits of acceptability approach and the issue of disinformation

The problem in testing hydrological models is that there are always epistemic errors as well as aleatory errors. It cannot then be assured that the nature of errors in prediction will be the same as in calibration while the value of the information in calibration might be less than that implied by calculating a formal statistical likelihood. Some errors might even be disinformative about what constitutes a good model. This paper reports on a limits of acceptability approach to dealing with epistemic error in hydrological models. This is joint work with Paul Smith.

Sylvia Wenmackers – Models and simulations in material science: two cases without error bars

We present two cases in material science which do not provide a way to estimate the error on the final result. Case 1: experimental results of spectroscopic ellipsometry are related to simulated data, using an idealized optical model and a fitting procedure. Case 2: experimental results of scanning tunneling microscopy are related to images, based on ab initio calculations. The experimental and simulated images are compared visually; no fitting occurs.

Leonard A Smith – All models are wrong, but some are dangerous: Philosophical Aspects of Statistical Model Selection

All models are wrong, some are dangerous; of those a few may prove useful. Outside the classroom, model selection is best done in the context of the (each) question of interest; there need be no optimal approach to model selection, nor decision-relevant probability distribution over models. Nevertheless overconfidence can be reduced, understanding increased, and the qualitative (perhaps quantitative) use of relevant models (from amongst those on the table) can be made less dangerous.

B. Applied statistical aspects of model uncertainty

Anne Presanis – Identifiability and model selection in dynamic transmission models for HIV: Bayesian evidence synthesis

We present a probabilistic dynamic HIV transmission model, embedded in a Bayesian synthesis of multiple data sources, to estimate incidence and prevalence. Incidence is parameterised in terms of prevalence, contact rates and transmission probabilities given contact. We simultaneously estimate these, via a multi-state model described by differential equations. In the context of this application, we discuss issues of model fit, identifiability and model selection.

Setia Pramana – Model averaging in dose-response study in microarray expression

Dose-response studies recently have been integrated with microarray technologies. Within this setting, the response is gene-expression measured at a certain dose level. In this study, genes which are not differentially expressed are filtered out using a monotonic trend test. Then for the genes with significant monotone trend, several dose-response models were fitted. Afterward model averaging technique is carried for estimating the of target dose, ED50.

Paul H. C. Eilers – Sea level trend estimation by Seemingly Unrelated Penalized Regressions

A probable effect of global warming is a rise in sea levels. The Dutch government operates a large monitoring network, which allows trend estimation. Traditionally, trends have been computed for each monitoring station separately. However, the residuals at different stations show strong correlations. A large increase in the precision of estimated trends can be achieved by combining the P-spline smoother with variants of the seemingly unrelated regression (SUR) model that is popular in econometrics.

Model uncertainty and science

Wednesday 10:45 – 12:30

Arthur Petersen – Model structure uncertainty: a matter of (Bayesian) belief?

What constitutes an ‘appropriate’ modelstructure, for instance for modelling climate change? Appropriateness can be understood in many different ways: appropriateness in terms of fitness for purpose; appropriateness in terms of reflecting the current knowledge on the most important processes; or appropriateness on basis of being close to observations. Inevitably there is uncertainty involved when choosing model structure: a model is at best only an approximation of certain aspects of reality. It is important to express the uncertainty which is involved in the model and its outcomes. This paper will address several strategies for dealing with model-structure uncertainty, in particular in the area of climate change. This is collaborative work with Peter Janssen (PBL).

Kenneth Burnham – Data, truth, models, and AIC versus BIC multimodel inference

I explore several model selection issues, especially that model selection ought to mean multimodel inference. A “true” model is often assumed as a necessary theoretical foundation, but is only needed as a concept for criterion-based selection. Such selection allows model uncertainty to be estimated. The often-raised issue of AIC “over-fitting” will be discussed. Moreover, BIC can select a model that does not fit, hence “underfits.” With complex models, AIC seems the more defensible choice. When model averaged prediction is used, inference is less affected by the choice of selection criterion. Simulation comparison of selection methods is problematic because it can be structured to produce any answer you want.