.. _bayesian_prior: =================== Prior Distributions =================== To sample the posterior of the PDF model parameters :math:`\boldsymbol{\theta}`, we must specify a prior distribution :math:`\pi(\boldsymbol{\theta})`. Colibri’s release supports two prior types: - **Uniform priors**, with individually configurable bounds for each parameter. - **Gaussian priors**, defined by a mean vector and covariance matrix derived from the posterior samples of a previous fit. The latter implements a Bayesian-update (posterior-factorisation): when you believe the earlier fit is a valid approximation (or exactly Gaussian), and the two datasets are uncorrelated, you can use the posterior of the first fit as a prior for the second fit. In the next section we outline the basic idea and domain of validity of this posterior-factorisation approach; for detailed usage instructions in Colibri, see (TODO: link to user guide). Bayesian update ^^^^^^^^^^^^^^^ Let us suppose that experimental data comprising :math:`N_{\rm data}` datapoints is distributed according to a multivariate normal distribution .. math:: \begin{align} \mathbf{D} \sim N(FK(\boldsymbol{\theta}), \Sigma), \end{align} where :math:`\Sigma` is an :math:`N_{\rm data}\times N_{\rm data}` experimental covariance matrix. Since in Bayesian statistics, :math:`\boldsymbol{\theta}` itself is assumed to be a random variable, it has some associated prior probability density :math:`\pi(\boldsymbol{\theta})` which in the following we will assume to be a “sufficiently” wide uniform probability density. Bayes’ theorem then tells us that after an observation :math:`\mathbf{D}_0` of :math:`\mathbf{D}`, the probability density of :math:`\boldsymbol{\theta}` is .. math:: :label: eq:bayes-theorem p(\boldsymbol{\theta}\mid\mathbf{D}_0) = \frac{\pi(\boldsymbol{\theta})\,L(\mathbf{D}_0\mid\boldsymbol{\theta})}{Z} = \frac{\pi(\boldsymbol{\theta}) \exp\!\bigl(-\tfrac12 \|\mathbf{D}_0 - FK(\boldsymbol{\theta})\|^2_{\Sigma}\bigr)} {\displaystyle \int d\boldsymbol{\theta}\;\pi(\boldsymbol{\theta}) \exp\!\bigl(-\tfrac12 \|\mathbf{D}_0 - FK(\boldsymbol{\theta})\|^2_{\Sigma}\bigr)} , where we wrote the generalised :math:`L_2` norm as .. math:: \|\vec{x}\|^2_{\Sigma} = \vec{x}^T\,\Sigma^{-1}\,\vec{x} \quad\text{for}\quad\vec{x}\in\mathbb{R}^{N_{\rm data}}. Now let’s assume that :math:`\mathbf{D}_0 = (\mathbf{D}_1, \mathbf{D}_2)^T` with :math:`\mathbf{D}_1\in\mathbb{R}^{n_1}`, :math:`\mathbf{D}_2\in\mathbb{R}^{n_2}` and :math:`n_1+n_2 = N_{\rm data}`, and that the two measurements are uncorrelated. That is, the covariance matrix factorises, .. math:: \Sigma = \Sigma_1 \oplus \Sigma_2, \quad \Sigma_1\in\mathbb{R}^{n_1\times n_1}, \;\; \Sigma_2\in\mathbb{R}^{n_2\times n_2}. In this case, since the likelihood :math:`L(\mathbf{D}_0\mid\boldsymbol{\theta})` factorises (block‐diagonal :math:`\Sigma`), we can write :eq:`eq:bayes-theorem` as .. math:: :label: eq:bayes-uncorr p(\boldsymbol{\theta}\mid\mathbf{D}_0) = \frac{ \pi(\boldsymbol{\theta}) \exp\!\bigl(-\tfrac12\|\mathbf{D}_1-FK_1(\boldsymbol{\theta})\|^2_{\Sigma_1}\bigr) \exp\!\bigl(-\tfrac12\|\mathbf{D}_2-FK_2(\boldsymbol{\theta})\|^2_{\Sigma_2}\bigr) } { \displaystyle \int d\boldsymbol{\theta}\;\pi(\boldsymbol{\theta}) \exp\!\bigl(-\tfrac12\|\mathbf{D}_1-FK_1(\boldsymbol{\theta})\|^2_{\Sigma_1}\bigr) \exp\!\bigl(-\tfrac12\|\mathbf{D}_2-FK_2(\boldsymbol{\theta})\|^2_{\Sigma_2}\bigr) } , where we write :math:`FK(\boldsymbol{\theta}) = (FK_1(\boldsymbol{\theta}), FK_2(\boldsymbol{\theta}))^T`. Now, by noticing that the posterior for parameters :math:`\boldsymbol{\theta}` given only :math:`\mathbf{D}_1` is .. math:: :label: eq:pd1-post p_{\mathbf{D}_1}(\boldsymbol{\theta}\mid\mathbf{D}_1) = \frac{\pi(\boldsymbol{\theta}) \exp\!\bigl(-\tfrac12\|\mathbf{D}_1-FK_1(\boldsymbol{\theta})\|^2_{\Sigma_1}\bigr)} {\displaystyle \int d\boldsymbol{\theta}\;\pi(\boldsymbol{\theta}) \exp\!\bigl(-\tfrac12\|\mathbf{D}_1-FK_1(\boldsymbol{\theta})\|^2_{\Sigma_1}\bigr)} = \frac{\pi(\boldsymbol{\theta}) \exp\!\bigl(-\tfrac12\|\mathbf{D}_1-FK_1(\boldsymbol{\theta})\|^2_{\Sigma_1}\bigr)} {Z_1} , we can rewrite :eq:`eq:bayes-uncorr` as .. math:: p(\boldsymbol{\theta}\mid\mathbf{D}_0) = \frac{ Z_1\,p_{\mathbf{D}_1}(\boldsymbol{\theta}\mid\mathbf{D}_1) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_2-FK_2(\boldsymbol{\theta})\|^2_{\Sigma_2}\bigr) } { \displaystyle \int d\boldsymbol{\theta}\;Z_1\,p_{\mathbf{D}_1}(\boldsymbol{\theta}\mid\mathbf{D}_1) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_2-FK_2(\boldsymbol{\theta})\|^2_{\Sigma_2}\bigr) } = \frac{ p_{\mathbf{D}_1}(\boldsymbol{\theta}\mid\mathbf{D}_1) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_2-FK_2(\boldsymbol{\theta})\|^2_{\Sigma_2}\bigr) } { \displaystyle \int d\boldsymbol{\theta}\;p_{\mathbf{D}_1}(\boldsymbol{\theta}\mid\mathbf{D}_1) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_2-FK_2(\boldsymbol{\theta})\|^2_{\Sigma_2}\bigr) }. Note that if we have a measurement :math:`\mathbf{D}\sim N(FK(\boldsymbol{\theta}),\Sigma)` with .. math:: \Sigma = \Sigma_1 \oplus \Sigma_2 \oplus \dots \oplus \Sigma_n, we can apply this recursively. The posterior after all :math:`n` uncorrelated blocks is .. math:: p(\boldsymbol{\theta}\mid\mathbf{D}_0) = \frac{ \prod_{i=1}^{n-1} p_{\mathbf{D}_i}(\boldsymbol{\theta}\mid\mathbf{D}_i) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_n-FK_n(\boldsymbol{\theta})\|^2_{\Sigma_n}\bigr) } { \displaystyle \int d\boldsymbol{\theta}\;\prod_{i=1}^{n-1} p_{\mathbf{D}_i}(\boldsymbol{\theta}\mid\mathbf{D}_i) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_n-FK_n(\boldsymbol{\theta})\|^2_{\Sigma_n}\bigr) } , with each intermediate posterior for :math:`k>1` defined by .. math:: p_{\mathbf{D}_k}(\boldsymbol{\theta}\mid\mathbf{D}_k) = \frac{ \displaystyle \prod_{i=1}^{k-1} p_{\mathbf{D}_i}(\boldsymbol{\theta}\mid\mathbf{D}_i) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_k-FK_k(\boldsymbol{\theta})\|^2_{\Sigma_k}\bigr) } { \displaystyle \int d\boldsymbol{\theta}\;\prod_{i=1}^{k-1} p_{\mathbf{D}_i}(\boldsymbol{\theta}\mid\mathbf{D}_i) \,\exp\!\bigl(-\tfrac12\|\mathbf{D}_k-FK_k(\boldsymbol{\theta})\|^2_{\Sigma_k}\bigr) }.