Angular Error Calibration

Motivation and Context¶

This section provides a detailed introduction to angular error modeling in point-source analyses. Readers interested only in the Lightning Tracks–specific implementation may skip ahead to Lightning Tracks Implementation.

The Full Directional Posterior¶

For any event, the most fundamental quantity encoding its directional information is a normalized probability density for the true arrival direction, $$ f(\hat{\mathbf{n}}_\mathrm{true} \mid \mathrm{event}), $$ where $\hat{\mathbf{n}}_\mathrm{true}$ is a unit vector on the sphere. This function is the spatial probability density that enters the signal term of the point-source likelihood: the full signal likelihood $S$ factorizes into an angular component $f$ and an energy component. It is $f$ that specifies, for each event, the probability density assigned to a hypothesized source direction on the sky.

Since any such direction can be parameterized by two angles, this PDF is equivalently a fully two-dimensional function $$ f(\varphi, \vartheta \mid \mathrm{event}), $$ defined over the sphere. In principle, the ideal point-source analysis would evaluate a model of this entire 2D directional posterior directly within the likelihood ratio.

TNF as a 2D Posterior Estimator¶

Thorsten’s Normalizing Flow reconstruction (TNF) is the first IceCube algorithm that explicitly approximates this full 2D surface, $$ f_\mathrm{TNF}(\varphi,\vartheta \mid \mathrm{pulses}), $$ using conditional normalizing flows. TNF does not merely provide a direction and uncertainty estimate—it produces a full, event-specific posterior over all possible arrival directions on the sphere.

Using $f_\mathrm{TNF}$ directly in a point-source likelihood would be statistically ideal. However, a typical point-source analysis requires millions of likelihood evaluations (e.g., scrambled background trials, iterative maximizations, grid scans, and stacking evaluations). Evaluating a high-dimensional normalizing-flow model for every event–source pair across millions of likelihood calls is computationally infeasible with current methods.

Reduction to a Parametric PSF¶

To make point-source searches tractable, the full 2D posterior is replaced by a parametric point-spread function (PSF),

\[ f(\hat{\mathbf{n}}_\mathrm{true} \mid \mathrm{event}) \longrightarrow f(\Delta \Psi \mid \boldsymbol{\theta}_\mathrm{event}), \]

where $\Delta\Psi = \arccos(\hat{\mathbf{n}}_\mathrm{reco}\cdot \hat{\mathbf{n}}_\mathrm{true})$ is the angular separation, and $\boldsymbol{\theta}_\mathrm{event}$ is a small set of event-level parameters.

This step imposes rotational symmetry about the reconstructed direction: once we condition on $\hat{\mathbf{n}}_\mathrm{reco}$ and the event properties, the PSF is assumed to depend on the source location only through $\Delta\Psi$. A genuinely two-dimensional directional likelihood on the sphere is thus collapsed into a one-dimensional radial profile, and this reduction is only accurate when the underlying posterior is itself approximately radially symmetric.

However, the detector geometry directly produces cases where this assumption fails. A particularly salient example occurs for steeply up- or down-going tracks: because the detector has ~10x finer DOM spacing in $z$ (along strings) than in $x$–$y$ (string-to-string separation), many such events illuminate only a narrow vertical band of DOMs. The zenith angle can then be tightly constrained while the azimuth remains weakly constrained, producing elongated–and in the most extreme cases, ring-like–posteriors on the sphere. A related degeneracy appears for tracks that are close to horizontal: when most of the light is deposited in a single layer of DOMs, the lever arm in the $x$–$y$ plane is large while the vertical sampling is limited, yielding zenith uncertainties that are substantially broader than those in azimuth. In both regimes, the posteriors are intrinsically anisotropic and, in some cases, even multimodal. No radially symmetric function of $\Delta\Psi$—regardless of how many parameters it carries—can fully reproduce such shapes. Even if one could choose a scale parameter that perfectly matches the radial width of the posterior (e.g. yielding an ideal Rayleigh pull distribution), a one-dimensional PSF $f(\Delta\Psi \mid \boldsymbol{\theta}_\mathrm{PSF})$ would still fail to capture the true two-dimensional structure.

This is a fundamental and irreducible approximation introduced by the radial symmetry assumption itself. It is independent of any subsequent calibration: pull correction can improve the radial scaling of the PSF but cannot remove the loss of information caused by compressing an intrinsically 2D directional posterior into a 1D function.

The Standard vMF and Rayleigh PSFs Used in Csky¶

The default PSF used by Csky remains a radially symmetric von Mises–Fisher (vMF) distribution. In general, the vMF family describes probability densities on the surface of a unit $(p-1)$-sphere $S^{p-1}$ embedded in $\mathbb{R}^p$, with the concentration parameter $\kappa$ controlling how tightly the distribution clusters around a preferred direction. For the case relevant to directional reconstruction in IceCube—that is, directions on the two-dimensional sphere $S^2$ embedded in $\mathbb{R}^3$—the vMF density reduces to $$ f_\mathrm{vMF}(\Delta\Psi \mid \kappa) = \frac{\kappa}{2\pi (e^\kappa - e^{-\kappa})} \exp\bigl(\kappa\cos\Delta\Psi\bigr), $$ where $\Delta\Psi$ is the angular separation between the reconstructed and hypothesized source directions.

This is paired with the small-angle Rayleigh approximation, $$ f_\mathrm{Rayleigh}(\Delta\Psi \mid \sigma) = \frac{\Delta\Psi}{\sigma^2} \exp\left(-\frac{\Delta\Psi^2}{2\sigma^2}\right), $$ with the standard correspondence in the small-angle vMF limit, $$ \sigma \approx \frac{1}{\sqrt{\kappa}}. $$ Csky transitions between the Rayleigh and vMF forms at a configurable angular error estimate threshold kent_min (default $7^\circ$). Our [coverage studies][psf-coverage] show that for the track samples, which exhibit sufficiently small angular errors, the Rayleigh and vMF models indeed produce indistinguishable coverage. Only the DNN Cascade sample—with its substantially larger directional uncertainties—exhibits the expected slight deviation between the two descriptions.

Intuitively, the vMF distribution is the spherical analogue of a 2D Gaussian: it defines the most symmetric, single-parameter density concentrated around a preferred direction, but crucially it does so on the curved surface of the sphere. The normalization and shape of the distribution explicitly account for spherical geometry, which is why the vMF PDF remains accurate even when the support of the PSF spans several degrees.

The Rayleigh form, by contrast, arises as the radial distribution of a 2D Gaussian on its tangent plane. It ignores curvature entirely, treating small angular displacements as Euclidean distances. This is exactly why it is an excellent approximation in the small-angle regime: for sufficiently small $\Delta\Psi$, the sphere is locally flat, the difference between geodesic and Euclidean distances is negligible, and the vMF distribution reduces to a Gaussian whose radial part is Rayleigh. In this limit, the Rayleigh and vMF models become nearly indistinguishable.

Attempts to Improve the 1D PSF: KDEs and King Functions¶

The considerations above address the fundamental limitation of using a 1D, radially symmetric PSF: such a model can never reproduce genuinely anisotropic or multimodal posteriors. However, even within the restricted class of 1D radial profiles, the standard vMF/Rayleigh approach is further constrained by using only a single scale parameter ($\sigma$ or $\kappa$) to characterize the entire shape of the angular-error distribution. This can lead to imperfect coverage, especially at low energies where the true $\Delta\Psi$-distribution develops broader tails.

Several analyses have therefore explored more flexible 1D-in-$\Delta\Psi$ PSF models that retain radial symmetry but relax the single-parameter assumption. Two such approaches are the KDE-based PSF and the King function.

KDE-based PSFs¶

The KDE method developed for Northern Tracks (see NT Documentation) replaced the analytic vMF/Rayleigh spatial term with a non-parametric 1D KDE in the variables $(\Delta\Psi, E_\mathrm{reco}, \sigma)$, conditioned on a spectral index $\gamma$. More on why the angular-error distribution must be conditioned on the assumed spectral index $\gamma$ in order to remain physically consistent is discussed later in Spectral Dependence of the Full PSF.

Mathematically, the KDE-based PSF is a kernel density estimate of the conditional distribution of the opening angle, $$ f_\mathrm{KDE}(\Delta\Psi \mid E_\mathrm{reco}, \sigma, \gamma) = \sum_{k} w_k\, K_h\bigl(\Delta\Psi - \Delta\Psi_k\bigr), $$ with MC-derived weights $w_k$, bandwidth $h$, and a 1D kernel $K_h$ (Gaussian in the NT implementation), evaluated in the radial variable $\Delta\Psi$.

Crucially, this KDE is not a 2D angular posterior. It still assumes a radially symmetric PSF and models only the distribution of $\Delta\Psi$. The KDE approach improves the shape of the radial PSF (especially in the tails) but does not remove the radial-symmetry assumption.

King PSF¶

The King function provides a more flexible two-parameter radial model of the form

\[ f_\mathrm{King}(\Delta\Psi \mid \alpha, \beta) = N(\alpha,\beta)\, \left(\,1 + \frac{\Delta\Psi^{2}}{2\beta\,\alpha^{2}} \right)^{-\beta}, \]

where $\alpha$ controls the core width, $\beta$ governs the tail slope, and $N$ is the normalization constant satisfying $\int f_\mathrm{King}(\Delta\Psi)\, d\Omega = 1$. This form allows a narrow core together with independently adjustable heavy tails and is used, for example, in Fermi-LAT PSF modeling. A demonstration of the King-function approach within the Csky framework is available in PR #207.

Although used in high-energy astrophysics, this functional form is not a standard distribution in mathematics, and the origin of the name “King function” for this particular profile is largely historical; it does not correspond to a canonical definition outside this context.

Like the KDE approach, the King function remains a purely radial one-dimensional model in $\Delta\Psi$. It can improve coverage by providing a more flexible radial profile than a single-parameter Rayleigh or vMF, while retaining the strong practical advantage that it is much cheaper to fit and evaluate than a full KDE. However, it also cannot resolve the fundamental mismatch that arises when the true directional posterior is anisotropic or otherwise not radially symmetric.

The Need for Pull Calibration¶

There are two conceptually distinct reasons why a pull calibration is required, even when using state-of-the-art directional reconstructions.

Imperfect Estimators¶

First, any angular-error estimator may be imperfect. Classical methods such as Paraboloid (as used in PST) are known to underestimate uncertainties for bright events (see Paraboloid Documentation), and machine-learning–based regressors (as used in ESTES, DNNC, and NT when not using KDEs) tend to absorb biases from the distributional structure of the MC training sample—e.g. in visible energy, zenith, or containment. Even if they avoid event-level overtraining, these models can systematically misestimate the scale parameter ($\sigma$ or $\kappa$) because the loss function is minimized over the ensemble of training events rather than the posterior of each individual event.

In contrast, TNF derives $\kappa$ from the event’s own reconstructed posterior $f_\mathrm{TNF}(\varphi,\vartheta\mid\mathrm{pulses})$, not from a global regression task. Because the $\kappa$ extraction is based on samples from the full TNF posterior, it is inherently immune to biases arising from the overall MC distribution. Thorsten’s coverage studies demonstrate excellent agreement between the estimated and true full posteriors across all observables; correspondingly, the vMF $\kappa$ obtained by fitting to posterior samples should exhibit near-ideal coverage subject only to the intrinsic limitation that a radially symmetric PSF cannot represent a fully anisotropic posterior. Thus, TNF greatly suppresses estimator-induced bias; any remaining mismatch arises from the intrinsic limitations of the 1D PSF model and whatever residual imperfections remain in the learned posterior itself.

Spectral Dependence of the Full PSF¶

The second–and more fundamental–reason why pull calibration is required is that even an ideal angular-error estimator would only be ideal with respect to the observable final-state lepton or hadronic shower, not the underlying neutrino. Because the true neutrino energy, flavor, and interaction channel are intrinsically unobservable, the mapping from neutrino kinematics to the outgoing final state introduces an irreducible, energy- and flavor-dependent angular smearing. As a consequence, the conditional angular-error distribution depends on the assumed spectral index and flavor composition, even in the limit of perfect detector response and perfect reconstruction. The calibrated PSF is therefore unavoidably tied to the spectral assumptions used to construct it.

This dependence is discussed in detail in [Spectral Dependence][spectral-dependence]; the key points are repeated here. Any PSF is conditioned on observable quanitites, like the reconstructed energy. Because the mapping from true neutrino energy and interaction channel to reconstructed observables is broad and non-bijective, a fixed region of reconstructed space corresponds to a spectrum-dependent mixture of true energies and interaction types. Changing the assumed spectral index therefore changes the underlying mixture of neutrino kinematics populating that region, which in turn changes the distribution of physics-driven scattering angles and inelasticities that set the irreducible neutrino-to-lepton (or hadronic-system) angular smearing. In this sense, a PSF calibration is strictly valid only for the spectral index used to construct it.

Consequently, the effective PSF used in point source likelihood analyses such as in Csky is the convolution of

an irreducible physics PSF, describing the distribution of final-state directions relative to the true neutrino direction (set by weak-interaction kinematics and dependent on true energy, flavor, and interaction channel), and
a detector PSF, describing the distribution of reconstructed directions relative to the true final-state direction (set by detector geometry, light transport, and the reconstruction algorithm).

The detector PSF depends only on the final-state particle and could, in principle, be made spectrum-independent. The physics PSF, however, depends on unobservable neutrino-level quantities and thus on the assumed spectral model and flavor composition used to represent them. As a result, even a perfectly operating detector and a perfectly specified reconstruction cannot produce a spectrum- and flavor-universal angular-error model at the level required by the likelihood. Conditioning the PSF on the assumed spectrum and flavor mix is therefore necessary to enforce correct conditional coverage in reconstructed space.

Ideally, the point-source likelihood would therefore adopt a $\gamma$-dependent spatial term, using the same spectral index that appears in the signal hypothesis. If $\gamma$ is treated as a free parameter, a self-consistent treatment would require either a family of pull-correction surfaces or a smooth interpolation of the correction factor as a function of $\gamma$.

As discussed earlier, the KDE-based PSF approach for Northern Tracks effectively takes this route: it constructs $\gamma$-conditioned KDEs for the angular-error distribution, thereby folding the neutrino-physics contribution into the spatial term in a way that is explicitly dependent on the assumed spectral index. This achieves spectral consistency at the cost of increased model and computational complexity.

The standard IceCube framework does not implement these more elaborate treatments for the default vMF PSF. Instead, a pragmatic compromise is adopted here: the pull correction is derived from MC events reweighted to an effective spectral index $\gamma = 2.5$, chosen as an intermediate between $\gamma = 2$ (high-energy samples) and $\gamma = 3$ (low-energy samples). This choice does not attempt to match every possible spectral index; it simply provides a stable and representative calibration point.

Connecting Back to TNF¶

It is worth noting that TNF is trained on an approximately uniform true-energy distribution. In the ideal limit TNF would provide a spectrally neutral estimate of the detector-only angular response. However, even a perfect detector-level posterior does not remove the need for pull calibration: the default point-source likelihood in Csky still uses a single PSF, and this PSF must implicitly fold in the additional, irreducible broadening from neutrino–lepton kinematics. That physics-induced component depends on the assumed spectral index and flavor composition and therefore lies outside what TNF can approximate from the detector response alone. As a result, pull correction remains necessary to obtain the correct conditional coverage for Lightning Tracks events, even when TNF provides an excellent estimate of the underlying event-level posterior.

TNF reconstructs the entire 2D likelihood surface $f_\mathrm{TNF}(\phi,\vartheta)$, but–at least for now–Csky requires a 1D radially symmetric PSF. For that purpose TNF also provides an effective vMF concentration $\kappa_\mathrm{TNF}$, obtained by sampling the full estimated posterior $f_\mathrm{TNF}(\phi,\vartheta \mid \mathrm{pulses})$ and finding the best-fitting vMF for those samples on an event-by-event basis.

$\sigma_\mathrm{TNF} = \frac{1}{\sqrt{\kappa_\mathrm{TNF}}}$ becomes the input to the pull-correction procedure.

Formal Definition of Pull Calibration¶

The goal of the calibration is to make the PSF used in the likelihood consistent with the empirical distribution of angular errors seen in Lightning Tracks, conditional on the reconstructed observables and assumed spectrum. To formalize this, we introduce explicit notation.

For each event, let

$\hat{\mathbf{n}}$ be the reconstructed unit direction,
$\mathbf{n}_\mathrm{true}$ be the true unit direction (available only in MC),
the true angular error be $$ \Delta\psi \equiv \arccos\left(\hat{\mathbf{n}} \cdot \mathbf{n}_\mathrm{true}\right) \,, $$
and the raw TNF-derived angular error estimate (after $\kappa \to \sigma$ conversion) be $$ \sigma_\mathrm{TNF} \,. $$

For the likelihood, Csky assumes a parametric PSF family, this can be written schematically as $$ \Delta\psi \mid \sigma \sim f_\mathrm{PSF}(\Delta\psi;\,\sigma) \,, $$ where $f_\mathrm{PSF}$ is Rayleigh-like with scale parameter $\sigma$ for small angles, and for larger angles the same scale is mapped to a vMF concentration parameter $\kappa(\sigma)$ internally. In other words, all the angular uncertainty is summarized by a single scale parameter (Rayleigh $\sigma$ or equivalently vMF $\kappa$), and the likelihood assumes that the conditional distribution of $\Delta\psi$ is fully described by that one parameter.

Definition of the Pull¶

Given an event with true angular error $\Delta\psi$ and estimated scale $\sigma_\mathrm{est}$, we define the pull $$ r \equiv \frac{\Delta\psi}{\sigma_\mathrm{est}} \,. $$

Under the idealizeed assumption that the PSF model is correct and $\sigma_\mathrm{est}$ equals the true scale parameter $\sigma_\mathrm{true}$ for all events in a given population, then in the small-angle Rayleigh approximation $$ \Delta\psi \mid \sigma_\mathrm{true} \sim f_\mathrm{Rayleigh}(\Delta\psi;\,\sigma_\mathrm{true}) \,, $$ where $$ f_\mathrm{Rayleigh}(x;\sigma) = \frac{x}{\sigma^2}\exp\left(-\frac{x^2}{2\sigma^2}\right), \qquad x \ge 0 $$ denotes the Rayleigh PDF, and therefore $$ r = \frac{\Delta\psi}{\sigma_\mathrm{true}} \sim f_\mathrm{Rayleigh}(r;\,1) \,, $$ i.e. the pull follows a Rayleigh distribution with unit scale.

The Rayleigh$(\sigma=1)$ distribution has analytic quantiles; in particular, the median is $$ r_\mathrm{med}^\mathrm{Rayleigh} = \sqrt{2 \ln 2} \approx 1.1774 \,. $$

Thus, in the ideal case where $\sigma_\mathrm{est}$ exactly matches the true PSF scale $\sigma_\mathrm{true}$, the pull distribution at fixed reconstructed observables would satisfy $$ r \mid (E_\mathrm{reco}, \sin\delta_\mathrm{reco}) \sim f_\mathrm{Rayleigh}(r;\,1) \qquad \forall\, (E_\mathrm{reco}, \sin\delta_\mathrm{reco}) \,, $$ and $$ \mathrm{median}\left[r \mid (E_\mathrm{reco}, \sin\delta_\mathrm{reco})\right] = r_\mathrm{med}^\mathrm{Rayleigh} \qquad \forall\, (E_\mathrm{reco}, \sin\delta_\mathrm{reco}) \,. $$

Correction in Reconstructed Space¶

For a given region of reconstructed space $R$ (for example, a cell in $(E_\mathrm{reco}, \sin\delta_\mathrm{reco})$), we can look at an ensemble of MC events and empirically measure the pull distribution $$ r_R = \frac{\Delta\psi}{\sigma_\mathrm{TNF}} \quad \text{for events in } R \,. $$ Let the empirical median pull in that region be $$ q_{50}(R) \equiv \mathrm{median}\left[r_R\right] \,. $$

If $q_{50}(R) > r_\mathrm{med}^\mathrm{Rayleigh}$, then in that region the TNF errors are too small on average (overly optimistic); if $q_{50}(R) < r_\mathrm{med}^\mathrm{Rayleigh}$, they are too large (overly conservative).

Importantly, overestimation is not “safer” than underestimation: both represent a mismatch between the assumed PSF and the true angular-error distribution. Either direction of bias makes the likelihood suboptimal in the Neyman–Pearson sense and therefore reduces analysis power. The goal is not to err high or low, but to match the true scale.

We now introduce a multiplicative correction factor $\alpha(R)$ such that the corrected angular error $$ \sigma_\mathrm{corr}(R) = \sigma_\mathrm{TNF} \cdot \alpha(R) $$ produces a corrected pull $$ r_\mathrm{corr}(R) = \frac{\Delta\psi}{\sigma_\mathrm{corr}(R)} = \frac{\Delta\psi}{\sigma_\mathrm{TNF}\,\alpha(R)} = \frac{r_R}{\alpha(R)} \,, $$ whose median matches the Rayleigh unit-scale median in that region: $$ \mathrm{median}\left[r_\mathrm{corr}(R)\right] = r_\mathrm{med}^\mathrm{Rayleigh} \,. $$

Since $\mathrm{median}(r_R) = q_{50}(R)$, choosing $$ \alpha(R) = \frac{q_{50}(R)}{r_\mathrm{med}^\mathrm{Rayleigh}} $$ implies $$ \mathrm{median}\left[r_\mathrm{corr}(R)\right] = \frac{q_{50}(R)}{\alpha(R)} = \frac{q_{50}(R)}{q_{50}(R) / r_\mathrm{med}^\mathrm{Rayleigh}} = r_\mathrm{med}^\mathrm{Rayleigh} \,. $$

In words:

we measure (on simulation) how far the median pull in each reconstructed region deviates from the Rayleigh unit-scale median,
we rescale $\sigma_\mathrm{TNF}$ by that ratio,
and thereby enforce that the corrected pull has the correct median in that region.

In practice, the regions $R$ are implemented as discrete bins in the reconstructed-observable space, chosen large enough to provide sufficient MC statistics for a stable median estimate. The fitted correction field $\alpha(E_\mathrm{reco}, \sin\delta_\mathrm{reco})$, however, is constructed as a continuous function by interpolating between the bin-center values; this avoids discontinuities in the likelihood—for example, the artificial jump in PSF scale that would occur when probing a source location very close to a bin boundary, causing nearby events to be evaluated with abruptly different scales on either side of that boundary.

Lightning Tracks Implementation¶

The correction is constructed in the space of reconstructed observables $$ (E_\mathrm{reco}, \sin\delta_\mathrm{reco}) \,, $$ where $E_\mathrm{reco}$ is the MuEx energy proxy and $\delta_\mathrm{reco}$ is the reconstructed declination. We work in $\sin\delta_\mathrm{reco}$ rather than $\delta_\mathrm{reco}$ (or euivalently $\vartheta_\mathrm{reco}$) itself to obtain approximately uniform coverage in sky coordinates.

All calibration is performed on NuGen events reweighted to an effective $E^{-2.5}$ spectrum, as discussed in the [Spectral Dependence][spectral-dependence] section. In practice, this is implemented by accumulating event weights instead of counts during the pull quantile computation. The resulting median-pull sample points $q_{50}(R_{ij})$ are therefore strictly matched to this assumed spectrum.

Starting and through-going tracks are treated separately throughout the procedure, producing two independent calibration fields and two independent correction factors. By not forcing a single calibration surface to absorb both contained-vertex and entering-track regimes simultaneously, the pull model has far less variability to marginalize over (e.g., differences in neutrino interaction kinematics and energy resolution). As a result, each topology can achieve closer-to-ideal PSF coverage on its own than would be possible under a topology-merged calibration.

Step 1: Robust Adaptive Binning in Energy and Declination¶

We first define a set of declination bands in $\sin\delta_\mathrm{reco}$ via fixed edges $\{\sin\delta_i\}_{i=0}^{N_\mathrm{bands}}$, and within each band construct an adaptive binning in $\log_{10} E_\mathrm{reco}$. For a given declination band $i$ (with $\sin\delta_i \leq \sin\delta_\mathrm{reco} < \sin\delta_{i+1}$), we introduce log-energy bin edges $\{\log_{10} E_{i,j}\}_{j=0}^{N_{\mathrm{cells},i}}\,,$ and define the corresponding cells (regions) $R_{i,j}$ as

\[ \begin{aligned} R_{i,j} = \Bigl\{ (\log_{10} E_\mathrm{reco}, \sin\delta_\mathrm{reco}) \,\Big|\;\; &\sin\delta_i \leq \sin\delta_\mathrm{reco} < \sin\delta_{i+1},\\ &\log_{10} E_{i,j} \leq \log_{10} E_\mathrm{reco} < \log_{10} E_{i,j+1} \Bigr\} \,. \end{aligned} \]

Let the number of events from the chosen calibration sample that fall inside $R_{i,j}$ be $N_{i,j}$. The adaptive-binning algorithm enforces two constraints for every cell $R_{i,j}$ inside declination band $i$:

Minimum number of events per cell: $$ N_{i,j} \;\geq\; N_\mathrm{min} \qquad \qquad \forall i,j \,, $$
Minimum bin width in $\log_{10} E$: $$ \Delta \log_{10} E_{i,j} \equiv \log_{10} E_{i,j+1} - \log_{10} E_{i,j} \;\geq\; \Delta_{\log E}^{\min} \qquad \forall\, i,j \,. $$

The algorithm in each declination band is:

Select all events with reconstructed $\sin\delta_\mathrm{reco}$ in the band.
Sort their $\log_{10} E_\mathrm{reco}$ values.
Sweep from low to high energy, growing bins until both the minimum count and minimum width thresholds are satisfied.
If the final high-energy bin fails the thresholds, iteratively merge it backward with its neighbor until the constraints are met.

This produces non-uniform energy binning for each declination band, with fine granularity where statistics are abundant and coarse bins where statistics are sparse. Only raw event counts are used for binning; MC weights enter only at the later quantile stage.

Step 2: Weighted Per-Cell Pull Quantiles¶

For each cell $R_{i,j}$ we compute the weighted median pull. Let $I_{i,j}$ denote the set of event indices whose reconstructed observables fall inside $R_{i,j}$, and let $\{r_k\}_{k \in I_{i,j}}$ be the corresponding pull values with weights $\{w_k\}_{k \in I_{i,j}}$. The weighted median pull in cell $(i,j)$, which we denote $$ q_{i,j} \;\equiv\; q_{50}(R_{i,j}) \,, $$ is defined as the value $r^\ast$ satisfying $$ \sum_{k \in I_{i,j} :\, r_k \le r^\ast} w_k \;\ge\; \tfrac{1}{2} \sum_{k \in I_{i,j}} w_k \qquad\text{and}\qquad \sum_{k \in I_{i,j} :\, r_k < r^\ast} w_k \;\le\; \tfrac{1}{2} \sum_{k \in I_{i,j}} w_k \,. $$

From all valid cells $R_{i,j}$ we obtain a corresponding set of scattered control points $$ {\,(\log_{10} E_{i,j}^\mathrm{ctr},\, \sin\delta_i^\mathrm{ctr},\, q_{i,j})\,} , $$ where $(\log_{10} E_{i,j}^\mathrm{ctr},\, \sin\delta_i^\mathrm{ctr})$ denotes the geometric center of region $R_{i,j}$. Each point represents the empirical weighted median pull in a well-populated region of reconstructed space.

Step 3: Smooth 2D Surface Fit via RBF Thin-Plate Spline¶

To obtain a continuous median-pull field $\tilde{q}_{50}(E_\mathrm{reco}, \sin\delta_\mathrm{reco})$, we fit a smooth 2D surface to the scattered control points $$ {\,(\log_{10} E_{i,j}^\mathrm{ctr},\, \sin\delta_i^\mathrm{ctr},\, q_{i,j})\,}. $$

Let each control point be denoted $\mathbf{x}_k = (x_{1,k}, x_{2,k}) = (\log_{10} E_{i,j}^\mathrm{ctr},\, \sin\delta_i^\mathrm{ctr}),$ and $f_k = q_{i,j}$ with $k=1,\dots,M$.

The RBF thin-plate spline interpolant has the form

\[ \tilde{q}_{50}(\mathbf{x}) = \sum_{k=1}^M w_k\,\phi(\|\mathbf{x} - \mathbf{x}_k\|) + a_0 + a_1 x_1 + a_2 x_2 , \]

where

$\phi(r)$ is the thin-plate spline kernel $$ \phi(r) = r^2 \log r , $$
$w_k$ are RBF weights,
$(a_0, a_1, a_2)$ are the coefficients of the polynomial tail required for well-posedness in 2D,
$\mathbf{x} = (x_1, x_2)$ is any query point in reconstructed-observable space.

The coefficients are obtained by solving the block linear system

\[ \left[ \begin{array}{c c} K + \lambda I & P \\ P^T & 0 \end{array} \right] \begin{bmatrix} \mathbf{w} \\ \mathbf{a} \end{bmatrix} = \begin{bmatrix} \mathbf{f} \\ 0 \end{bmatrix}. \]

where

$K$ is the $M\times M$ kernel matrix $$ K_{jk} = \phi(|\mathbf{x}_j - \mathbf{x}_k|), $$
$P$ is the $M \times 3$ polynomial matrix with entries $$ P_{k0} = 1, \qquad P_{k1} = x_{1,k}, \qquad P_{k2} = x_{2,k}, \qquad k = 1, \dots, M \,, $$ i.e. each row $k$ is given by $(1, x_{1,k}, x_{2,k})$.
$\mathbf{w} = (w_1,\dots,w_M)^T$,
$\mathbf{a} = (a_0,a_1,a_2)^T$,
$\mathbf{f} = (f_1,\dots,f_M)^T$,
$\lambda$ is a non-zero smoothing parameter that regularizes the fit.

Intuitively, the thin-plate spline can be viewed as the smoothest possible surface that passes near the scattered control points while respecting a global notion of curvature. The RBF kernel $\phi(r) = r^2 \log r$ penalizes rapid changes in second derivatives, so the fitted surface bends only as much as needed to accommodate the data. Each control point contributes a radially symmetric “influence field” whose strength is determined by the solved coefficients $w_k$, and the polynomial tail accounts for the overall linear trend that cannot be represented by radial functions alone. The smoothing parameter $\lambda$ controls the trade-off between fidelity to the control points and global smoothness: smaller values force the surface to interpolate the median pulls exactly, while larger values allow the fit to ignore local noise and produce a more regular, well-behaved correction field. In effect, the procedure constructs a smoothly varying approximation of the per-cell median pulls that avoids discontinuities and captures only the large-scale structure supported by the statistics of the calibration sample.

This procedure could—in principle—be generalized to include $\gamma$ as an additional variable. If the number of control points $M$ is chosen wisely—that is, sufficiently small—the RBF evaluation might be efficient enough to be performed directly in the likelihood.

The fitted surfaces can be visualized either as

cell plots in $(E_\mathrm{reco}, \sin\delta_\mathrm{reco})$ with observed cell-wise $q_{50}$ values and overlaid spline contours, or
1D slices at fixed declination bands showing the RBF prediction as a function of energy with the underlying cell medians as points.

Figure 1: Cell Plots

Sample:

**Figure 1.1:** Median pull values in adaptive 2D bins for Starting tracks. Contours show the fitted RBF thin-plate spline surface.

Figure 2: 1D Slices

Sample:

**Figure 2.1:** RBF surface slices at $\sin\delta$ band centers for Starting tracks. Tick marks on the colorbar indicate evaluation points; scatter points show underlying cell medians. Dashed horizontal line marks the Rayleigh median $\sqrt{2\ln 2}$.

These diagnostic plots provide a direct check that the fit is smooth, well behaved, and consistent with the underlying per-cell medians.

Step 4: Applying the Correction¶

Once the median-pull surface has been constructed, the final correction factor is defined as

\[ \alpha(E_\mathrm{reco}, \sin\delta_\mathrm{reco}) = \frac{\tilde{q}_{50}(E_\mathrm{reco}, \sin\delta_\mathrm{reco})} {r_\mathrm{med}^\mathrm{Rayleigh}} \,. \]

For each event—both in simulation and in data—we evaluate $$ \sigma_\mathrm{corr} = \sigma_\mathrm{TNF} \cdot \alpha(E_\mathrm{reco}, \sin\delta_\mathrm{reco}) \,, $$ and finally use $\sigma_\mathrm{corr}$ as the PSF scale parameter in Csky. Intuitively:

regions where the uncorrected median pull is too large (overly optimistic $\sigma_\mathrm{TNF}$) receive a correction factor $\alpha > 1$ (inflating the errors),
regions where the median pull is too small (overly conservative $\sigma_\mathrm{TNF}$) receive $\alpha < 1$ (shrinking the errors).

By construction, this drives the median of the corrected pull distribution towards the Rayleigh unit-scale value in each region of reconstructed phase space, while leaving the detailed shape of the pull distribution (for example tail behavior) as determined by TNF.

Comparison to 1D Corrections and Systematics¶

For completeness, we also tested a simple 1D correction scheme as conventionally applied in which the median pull was corrected as a function of energy only. While this removes the leading energy dependence, it leaves visible declination-dependent structure in the median pull maps and in the coverage plots, especially near the poles which effectively have a lower weight in the 1D correction due to lower statistics. The full 2D $(E_\mathrm{reco}, \sin\delta_\mathrm{reco})$ calibration substantially reduces these residuals and produces more uniform coverage across the sky.

Work in Progress

This section is incomplete. Remaining items:

Comparison plots: standard 1D energy-only correction vs. 2D RBF method (pull distributions, coverage maps)
Systematics discussion: baseline MC shows slightly better high-energy resolution, leading to overcoverage when calibrating to systematics ensemble—is this TNF overtraining or a real systematics effect? Requires different baseline set to disentangle.
Discussion of which NuGen (baseline vs. systematics ensemble) to use and how that changes the interpretation of results. Ideally both should be shown; neither choice is right or wrong, they just mean different things statistically.
Sensitivity/DP plots for intentionally wrong “correction” to demonstrate the impact of miscalibration.