About the project

Context. Researchers are turning to machine learning to tackle various problems in science, from biology to astrophysics and fluid dynamics. Our project is part of this growing AI4Science movement, focusing on a key challenge in experiments: figuring out which model parameters best match the data we observe (Figure 1).

Figure 1: Given a stochastic simulator taking parameters $\theta$ as input and returning simulations $x \sim p(x|\theta)$, the posterior distribution $p(\theta|x_0)$ helps us determine the parameters which are the most likely to have generated observation $x_0$. Figure taken from [Deistler et al. 2025]

We use simulation-based inference (SBI) [Cranmer et al. 2020, Deistler et al. 2025], a Bayesian approach that leverages deep generative models, such as conditional normalizing flows, to approximate the posterior distribution—assigning higher probability to parameter values most likely to have produced an observed data $x_0$ (Figure 2).

Figure 2: SBI consist of four main steps: (i) draw parameters from the prior distribution $\theta_i \sim p(\theta)$, and (ii) run the simulator to generate data $x_i \sim p(x|\theta_i)$. (iii) Train over dataset $(\theta_i, x_i)$ a conditional generative model $q_\phi$ that takes $x$ as input and predicts a distribution over parameters $\theta$. (iv) Use $q_\phi(\theta|x_0)$ as an approximation to the posterior $p(\theta|x_0)$. Figure taken from [Deistler et al. 2025]

Overarching goal. SBI has so far been applied mainly to small-scale simulators. With SBI4C, we aim to move from such simplified settings to large-scale models, exemplified by climate simulators that capture complex ocean–atmosphere dynamics.

In this project, such models serve both as a central application and as a source of inspiration for new methodological advances. Indeed, a longstanding issue in climate science is the calibration of model parametrizations, which has been widely discussed in the literature and gained renewed attention in recent years [Hourdin et al. 2017].

Figure 3: Parametrizations are simplified mathematical representations of physical processes (like cloud formation or turbulence) that occur at scales smaller than the model grid. They are necessary because these fine-scale processes cannot be directly resolved by the model’s horizontal and vertical grids, yet strongly influence large-scale climate dynamics. Figure taken from [Edwards 2010]

Directions. Extending SBI from small-scale to large PDE-based simulators introduces specific difficulties: simulations are computationally expensive, the underlying processes span multiple scales, and reliable uncertainty quantification is essential. SBI4C addresses these issues through three complementary research directions:

  • More with less : develop SBI methods that remain effective even when only a small number of expensive simulations are available.
  • Smarter generative models : design posterior approximators that are lighter, multiscale-aware, and better suited to large scientific simulators.
  • Fast surrogates : leverage climate emulators to replace costly simulations and test their impact on inference quality and trust.

Impact. By advancing the robustness and reproducibility of parameter inference methods in climate science, the SBI4C project aims to make substantial contributions to the practices of climatologists while enhancing the reliability of model-based projections for policymakers. Moreover, the novel SBI methods we develop will be readily applicable to other domains involving large-scale simulators, such as cosmology, computational neuroscience, and civil engineering.

Pedro L. C. Rodrigues
Pedro L. C. Rodrigues
Researcher @ Inria