Skip to content

Point-Source Sample Performance

This page presents performance diagnostics for point-source event samples. We include not only Lightning Tracks but also earlier IceCube samples to provide context and facilitate comparison. The diagnostics are organized around the structure of the point-source likelihood, with each section addressing a specific component of the statistical model.

Index of Tables and Figures

This page is designed as a comprehensive introduction to point-source analysis diagnostics in IceCube. It covers fundamental statistical concepts, the likelihood framework, and how these methods are implemented in practice (Csky). For readers who prefer to browse the figures directly, the index below provides links to all tables and figures.

Tables:

  • Table 1: Sample Definitions
  • Table 2: Nominal Analysis Configurations

Figures:

Introduction

Statistical Foundations

A fundamental result in statistical hypothesis testing is the Neyman–Pearson lemma (Neyman & Pearson 1933): for testing a simple null hypothesis \(H_0\) against a simple alternative \(H_1\), the most powerful test at significance level \(\alpha\) rejects \(H_0\) when the likelihood ratio satisfies

\[ \frac{\mathcal{L}(H_0)}{\mathcal{L}(H_1)} \leq k, \]

where \(k\) is chosen such that the false-positive rate equals \(\alpha\). No other test with the same false-positive rate can achieve higher power (probability of correctly rejecting \(H_0\) when \(H_1\) is true). In our context, this means that if the signal and background PDFs are correctly specified, the likelihood ratio test achieves the maximum possible detection power for a given false-positive rate. Conversely, any mismodeling of the PDFs—whether in the PSF, the energy distributions, or the background spatial structure—reduces analysis power relative to this theoretical optimum.

It is important to note, however, that PDF mismodeling does not bias the significance of observed excesses, provided the null hypothesis is correctly calibrated on data. Because we determine TS thresholds empirically from RA-randomized trials using the same (possibly imperfect) likelihood model, the resulting p-values remain valid in the frequentist sense: they correctly reflect the probability of observing a given TS or higher under the background-only hypothesis, regardless of whether the PDFs accurately describe the true underlying distributions.

Mismodeling does, however, affect three related areas: (1) analysis power—sensitivity and discovery potential degrade when the likelihood deviates from the true distributions; (2) signal recovery—the fitted nuisance parameters (\(n_s\), \(\gamma\)) may be biased if the model does not match the injected or real signal; and (3) flux inference—confidence intervals and upper limits on source fluxes are highly sensitive to systematic mismatches, since they rely on the relationship between the fitted \(n_s\) and the physical flux normalization. The last point is particularly important: while p-values remain correctly calibrated, the interpretation of a detected excess in terms of astrophysical flux depends on accurate modeling of both the signal and background distributions.

The Likelihood

The unbinned point-source likelihood tests for an excess of events clustered around a hypothesized source position \(\mathbf{x}_s\). For a single sample with \(N\) observed events, the likelihood of observing the data given \(n_s\) signal events and \(N - n_s\) background events is

\[ \mathcal{L}(\mathbf{x}_s, n_s, \gamma) = \prod_{i=1}^{N} \left[ \frac{n_s}{N} \mathcal{S}_i(\mathbf{x}_s, \gamma) + \frac{N - n_s}{N} \mathcal{B}_i \right], \]

where \(\mathcal{S}_i(\mathbf{x}_s, \gamma)\) and \(\mathcal{B}_i\) are the signal and background probability densities evaluated for event \(i\). The signal PDF depends on the hypothesized source position \(\mathbf{x}_s\) and spectral index \(\gamma\); the background PDF depends only on the event observables. Each event is modeled as arising from either the signal or background population, with mixing fraction \(n_s / N\).

For hypothesis testing, we compare this to the background-only hypothesis (\(n_s = 0\)) via the likelihood ratio:

\[ \frac{\mathcal{L}(\mathbf{x}_s, n_s, \gamma)}{\mathcal{L}(\mathbf{x}_s, 0)} = \prod_{i=1}^{N} \left[ \frac{n_s}{N} \left( \frac{\mathcal{S}_i(\mathbf{x}_s, \gamma)}{\mathcal{B}_i} - 1 \right) + 1 \right]. \]

Taking the logarithm converts the product over events into a sum, which is more numerically stable and analytically tractable:

\[ \ln \frac{\mathcal{L}(\mathbf{x}_s, n_s, \gamma)}{\mathcal{L}(\mathbf{x}_s, 0)} = \sum_{i=1}^{N} \ln \left[ \frac{n_s}{N} \left( \frac{\mathcal{S}_i(\mathbf{x}_s, \gamma)}{\mathcal{B}_i} - 1 \right) + 1 \right]. \]

The test statistic is defined as:

\[ \mathrm{TS} = -2 \ln \frac{\mathcal{L}(\mathbf{x}_s, 0)}{\mathcal{L}(\mathbf{x}_s, \hat{n}_s, \hat{\gamma})} = 2 \ln \frac{\mathcal{L}(\mathbf{x}_s, \hat{n}_s, \hat{\gamma})}{\mathcal{L}(\mathbf{x}_s, 0)}, \]

where \(\hat{n}_s\) and \(\hat{\gamma}\) are the values that maximize the likelihood, found numerically subject to the constraint \(n_s \geq 0\). By Wilks’ theorem, this test statistic asymptotically follows a \(\chi^2\) distribution under the null hypothesis, which simplifies the interpretation of significance thresholds.

Both signal and background PDFs factorize into spatial and energy terms:

\[ \mathcal{S}_i(\mathbf{x}_s, \gamma) = \mathcal{S}_\mathrm{space}(\mathbf{x}_i \mid \mathbf{x}_s, \hat{\sigma}_i) \times \mathcal{S}_\mathrm{energy}(E_i \mid \gamma), \quad \mathcal{B}_i = \mathcal{B}_\mathrm{space}(\delta_i) \times \mathcal{B}_\mathrm{energy}(E_i \mid \delta_i). \]

The signal spatial PDF \(\mathcal{S}_\mathrm{space}\) is the point-spread function (PSF), centered on the source position \(\mathbf{x}_s\) and parameterized by a per-event PSF scale estimate \(\hat{\sigma}_i\)—this is where the \(\mathbf{x}_s\) dependence enters. The signal energy PDF \(\mathcal{S}_\mathrm{energy}\) reflects the assumed source spectrum (here a power law \(E^{-\gamma}\)) and is derived from MC simulation—this is where the \(\gamma\) dependence enters. The background spatial PDF \(\mathcal{B}_\mathrm{space}\) describes the declination-dependent event rate under the null hypothesis. The background energy PDF \(\mathcal{B}_\mathrm{energy}\) captures the energy distribution of background events as a function of declination.

In practice, the true background \(\mathcal{B}\) is unknown and must be estimated from data. We denote the data-derived estimate \(\mathcal{D}\); see Background Modeling for details on how these PDFs are constructed and Signal Subtraction for a derivation of the modified likelihood that accounts for signal contamination of \(\mathcal{D}\). We store and evaluate the energy terms as a combined signal-to-data ratio \(\mathcal{S}_\mathrm{energy}(\gamma) / \mathcal{D}_\mathrm{energy}\) rather than maintaining separate signal and data-derived energy PDFs.

Page Structure

The sections that follow address each component of this likelihood:

  • Signal Modeling covers the spatial term (\(\mathcal{S}_\mathrm{space}\), the PSF) and the quantities that determine the expected signal rate: effective area and acceptance.
  • Background Modeling covers the spatial term (\(\mathcal{D}_\mathrm{space}\)) and discusses the energy distribution of background events.
  • Energy Signal-to-Background Ratio presents the combined energy PDF ratio used in the likelihood.
  • Null Hypothesis Calibration examines the test statistic distribution under background-only conditions, which determines significance thresholds.
  • Sensitivity and Discovery Potential presents the bottom-line performance metrics that integrate all of the above.

Samples

Table 1 lists all sample acronyms used throughout this page along with their full names and the corresponding Csky DataSpec and version identifiers.

Terminology: Selection vs. Sample

Throughout this documentation, we use selection to refer to the event-filtering procedure (cuts, reconstructions, quality criteria) and sample to refer to the resulting set of events. The same selection applied to different data periods yields different samples; conversely, a sub-sample of the 12-year dataset still uses the same selection.

Table 1: Sample Definitions

Base Samples

Acronym Full Name DataSpec Version
DNNC DNN Cascades (IceMan) DNN_ICEMAN version-001-p00
DNNC (Old) DNN Cascades (Old) DNNC_12yr version-001-p02
ESTES Enhanced Starting Track Event Selection ESTES_2011_2022 version-001-p04
NT Northern Tracks NTv5p2 version-005-p04
PST Point Source Tracks ps_v4_15yr version-004-p04
SLT Starting Lightning Tracks SLT_12yr version-001-p00
TLT Throughgoing Lightning Tracks TLT_12yr version-001-p00

Combined Samples

Acronym Full Name Components Notes
LT Lightning Tracks SLT + TLT Combined starting + throughgoing
LT + DNNC Lightning Tracks + DNN Cascades SLT + TLT + DNNC Primary analysis sample; overlap removed from LT
DNNC + PST DNN Cascades + Point Source Tracks DNNC + PST Background PDFs rebuilt after overlap removal
DNNC + NT + ESTES DNN Cascades + Northern Tracks + ESTES DNNC + NT + ESTES Background PDFs rebuilt after overlap removal
Table 1: Sample acronyms, full names, and corresponding Csky DataSpec and version identifiers used for the performance plots on this page.

Lightning Tracks version-001-p00 corresponds to the final cuts obtained through the procedure described in Sensitivity Optimization. As discussed there, the resulting performance metrics depend on the choice of final selection cut; using an alternative working point would change the effective area, background contamination, and ultimately the sensitivity curves shown in Figure 13.

Nominal Configuration

Several figures on this page include a “Background PDF” selector that controls which signal subtraction configuration is used. The “Nominal” option shows results using each sample’s designated default configuration. Table 2 lists these assignments. The reference track samples (PST, NT, ESTES) use spatial-only signal subtraction. Cascades (DNNC) use full signal subtraction (spatial + energy). For Lightning Tracks, the optimal mode differs per component: SLT uses spatial-only signal subtraction, while TLT benefits from full signal subtraction. The combined samples LT and LT + DNNC therefore use a per-component mix.

Table 2: Nominal Analysis Configurations

Base Samples

Sample Nominal Configuration
SLT Spatial Signal Subtraction
TLT Full Signal Subtraction
PST Spatial Signal Subtraction
NT Spatial Signal Subtraction
ESTES No Signal Subtraction
DNNC Full Signal Subtraction

Combined Samples

Sample Nominal Configuration
LT Mixed: SLT spatial, TLT full
LT + DNNC Mixed: SLT spatial, TLT + DNNC full
DNNC + PST Full Signal Subtraction
DNNC + NT + ESTES Full Signal Subtraction
Table 2: Nominal analysis configuration for each sample. The "Nominal" option in the Background PDF selector uses these assignments.

Combining Datasets

When combining multiple samples in a single analysis, the individual sample likelihoods are multiplied together. The signal parameters \(n_s\) and \(\gamma\) are shared across all samples rather than fit independently for each. Each sample receives a fraction of the total signal determined by its relative acceptance:

\[ \mathcal{L}_\mathrm{combined}(\mathbf{x}_s, n_s, \gamma) = \prod_k \mathcal{L}_k\!\left(\mathbf{x}_s,\, n_s \cdot f_k(\gamma),\, \gamma\right), \qquad f_k(\gamma) = \frac{A_k(\gamma)}{\sum_j A_j(\gamma)}, \]

where \(A_k(\gamma)\) is the signal acceptance of sample \(k\) (see Signal Acceptance). A sample with higher acceptance receives a proportionally larger share of the total signal. In practice, the combined log-likelihood ratio takes the same form as the single-sample case—a sum over all events—but each event uses the PDF terms from its own sample and is weighted by the effective signal count \(n_s^{(k)} = f_k \cdot n_s\) for that sample:

\[ \ln \frac{\mathcal{L}_\mathrm{combined}}{\mathcal{L}_0} = \sum_k \sum_{i \in \mathcal{I}_k} \ln \left[ \frac{n_s \cdot f_k}{N^{(k)}} \left( \frac{S_i^{(k)}}{B_i^{(k)}} - 1 \right) + 1 \right], \]

where \(\mathcal{I}_k\) denotes the set of event indices in the sample indexed by \(k\).

Event overlap must be handled carefully. As Figure 1 shows, significant overlap exists between several sample pairs—particularly between track selections (PST, NT, ESTES, LT) and between ESTES and DNNC. Overlapping events must be removed to avoid double-counting, which would bias the likelihood and inflate significance.

Figure 1: Sample Overlap Matrix
Overlap matrix for the 12-year NuSources samples. Each cell shows the number of data events appearing simultaneously in the final Csky NumPy files for the corresponding pair of samples across IC86 seasons 2011 to 2022. For DNN Cascades, we distinguish between the original (DNNC (Old)) and the IceMan version (DNNC).Overlap matrix for the 12-year NuSources samples. Each cell shows the number of data events appearing simultaneously in the final Csky NumPy files for the corresponding pair of samples across IC86 seasons 2011 to 2022. For DNN Cascades, we distinguish between the original (DNNC (Old)) and the IceMan version (DNNC).
Figure 1: Overlap matrix for the 12-year NuSources samples. Each cell shows the number of data events appearing simultaneously in the final Csky NumPy files for the corresponding pair of samples across IC86 seasons 2011 to 2022. For DNN Cascades, we distinguish between the original (DNNC (Old)) and the IceMan version (DNNC).

The combined samples shown on this page (DNNC + PST and DNNC + NT + ESTES) are not identical to the combinations used in previous IceCube analyses (SKATE and IceMan, respectively). The differences are twofold:

  1. Component versions—We use the latest available version of each selection, as listed in Table 1. For DNNC + PST in particular, this means using the IceMan version of DNNC rather than the original version used in SKATE.

  2. PDF reconstruction after overlap removal—In the original analyses, Csky constructed the background spatial PDF and energy \(\mathcal{S}/\mathcal{D}\) ratio before overlap removal, leading to minor mismatches between the PDFs and the actual post-overlap data. Our implementation rebuilds these PDFs after overlap removal, ensuring consistency between the likelihood model and the data it evaluates.

However, even with rebuilt PDFs, the treatment is not entirely correct for DNNC + PST and DNNC + NT + ESTES: overlap was removed only from data, not from MC. In the overlapping phase space, this can potentially lead to signal overestimation—the MC used for signal injection and PDF construction includes events that would be removed if the same overlap procedure were applied. This may be particularly problematic for the phase space of contained events, where DNNC, NT, and ESTES have significant overlap (NT includes starting tracks, ESTES includes some cascades, DNNC may include some starting tracks). The magnitude of this effect is unclear but could be addressed by running the selections over shared MC sets and applying the same event-matching procedure used for data.

For LT (SLT + TLT), this issue does not arise: overlap is prevented by construction during the selection process, since the samples are two branches of the same selection pipeline. For LT + DNNC, we processed the same MC that was used for the IceMan DNNC sample, enabling event matching and thus exact overlap removal on both data and MC. The downstream analysis impact of MC overlap is uncertain, but consistent treatment of data and MC is the strictly correct approach, so we applied it where possible.

Signal Modeling

The signal hypothesis assumes a steady point source at position \(\mathbf{x}_s\) emitting neutrinos with a power-law spectrum \(\Phi(E) \propto E^{-\gamma}\). Signal modeling requires two ingredients: a spatial PDF describing where signal events appear relative to the source (the PSF), and an energy-dependent detection efficiency that determines how many events are expected for a given flux (effective area and acceptance).

Spatial: Point-Spread Function

The signal spatial PDF \(\mathcal{S}_\mathrm{space}(\mathbf{x}_i \mid \mathbf{x}_s, \hat{\sigma}_i)\) is the point-spread function (PSF), which describes the probability density for observing an event at position \(\mathbf{x}_i\) given a true source at \(\mathbf{x}_s\). For track-like events with good angular resolution, in Csky we model the PSF as a von Mises–Fisher distribution (or equivalently, a 2D Gaussian on the sphere in the small-angle limit), parameterized by a per-event PSF scale estimate \(\hat{\sigma}_i\).

The accuracy of the PSF model directly impacts analysis power. By the Neyman–Pearson lemma, the likelihood ratio constructed from the true signal and background distributions is the most powerful test statistic; any deviation from the true PSF reduces statistical power.

Angular Resolution

Although angular resolution is essential for point-source sensitivity, it is only predictive when other factors are comparable across selections. Angular resolution can even be artificially improved by applying tight cuts on the angular-error estimator. However, as long as the estimator is reasonably consistent with the PSF model used in the likelihood, such cuts reduce sensitivity rather than enhance it.

A more fundamental limitation is that angular resolution cannot be compared meaningfully across reconstruction methods unless all methods are evaluated event-by-event on the same underlying sample. For this reason, comparing angular-resolution curves across selections provides limited insight. The angular error distributions shown in Figure 3 should therefore be interpreted with caution and in conjunction with other metrics.

All plots on this page are generated after loading the data through Csky, so they reflect exactly what the likelihood sees during analysis. For the Lightning Tracks samples we calibrate the PSF to a \(\gamma = 2.5\) spectrum, while NT and PST calibrate for \(\gamma = 2\). For a more detailed discussion, see [Spectral Dependence][spectral-dependence].

PSF Modeling and Calibration

The point-spread function (PSF) modeling and angular error calibration procedure are described in detail on the Angular Error Calibration page. That page covers the formal definition of the pull statistic, the relationship between the TNF estimator and parametric PSF models (von Mises–Fisher and Rayleigh), the pull correction procedure, and the two-dimensional implementation for Lightning Tracks.

Figure 3: Angular Error Distributions
2
Median True Angular Error as a function of Reconstructed Energy for SLT, weighted by an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Shaded bands indicate the 20th--80th percentile range. Quantiles are only shown for bins containing at least 100 events. Values shown Before application of the $\sigma$ floor (0.2° for all samples).Median True Angular Error as a function of Reconstructed Energy for SLT, weighted by an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Shaded bands indicate the 20th--80th percentile range. Quantiles are only shown for bins containing at least 100 events. Values shown Before application of the $\sigma$ floor (0.2° for all samples).
Figure 3.1.1.1.1.1: Median True Angular Error as a function of Reconstructed Energy for SLT, weighted by an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Shaded bands indicate the 20th--80th percentile range. Quantiles are only shown for bins containing at least 100 events. Values shown Before application of the $\sigma$ floor (0.2° for all samples).

PSF Coverage

A coverage plot provides a visual diagnostic of whether a probabilistic model—here, the PSF—correctly describes the distribution of the quantity it models. The construction proceeds as follows: For each MC event, we compute the CDF of the assumed PSF evaluated at the true angular error: $$ p_i = F_\mathrm{PSF}(\Delta\psi_i \mid \hat{\sigma}_i), $$ where \(F_\mathrm{PSF}\) is the cumulative distribution function of the PSF model (Rayleigh or von Mises–Fisher) with estimated scale parameter \(\hat{\sigma}_i\), and \(\Delta\psi_i\) is the true angular separation between the reconstructed and true directions. If the PSF model is correctly specified—that is, if \(\Delta\psi_i\) is indeed drawn from the distribution \(F_\mathrm{PSF}(\,\cdot \mid \hat{\sigma}_i)\)—then by the probability integral transform, these p-values should be uniformly distributed: \(p_i \sim \mathrm{Uniform}(0, 1)\).

To visualize this, we bin events by some observable (e.g., reconstructed energy) and, within each bin, compute the empirical CDF of the \(p_i\) values. If the model is well-calibrated, the empirical CDF should follow the diagonal \(F_\mathrm{empirical}(p) = p\). Deviations from the diagonal indicate miscalibration: curves lying above the diagonal indicate overcoverage (the PSF is too wide; \(\hat{\sigma}\) is overestimated), while curves below the diagonal indicate undercoverage (the PSF is too narrow; \(\hat{\sigma}\) is underestimated). The magnitude of the vertical deviation at any point \(p\) gives the fraction of events for which the model assigns a CDF value less than \(p\) when the correct fraction should be exactly \(p\).

Figure 4: PSF Coverage
2
PSF coverage for SLT using a von Mises-Fisher PSF model, binned by Reconstructed Energy and weighted by an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Values shown Before application of the $\sigma$ floor. Deviation from the ideal diagonal indicates miscalibrated angular error estimates. Horizontal lines inside the colorbar mark the bin edges; tick marks inside indicate the bin centers where each curve is colored.PSF coverage for SLT using a von Mises-Fisher PSF model, binned by Reconstructed Energy and weighted by an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Values shown Before application of the $\sigma$ floor. Deviation from the ideal diagonal indicates miscalibrated angular error estimates. Horizontal lines inside the colorbar mark the bin edges; tick marks inside indicate the bin centers where each curve is colored.
Figure 4.1.1.1.1.1: PSF coverage for SLT using a von Mises-Fisher PSF model, binned by Reconstructed Energy and weighted by an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Values shown Before application of the $\sigma$ floor. Deviation from the ideal diagonal indicates miscalibrated angular error estimates. Horizontal lines inside the colorbar mark the bin edges; tick marks inside indicate the bin centers where each curve is colored.

The PSF scale estimator \(\hat{\sigma} = \sigma_\text{TNF}\) used in this selection is calibrated to achieve accurate coverage conditional on the reconstructed observables, namely the reconstructed energy proxy and reconstructed zenith (transformed to \(\sin{\delta}\)). The nominal estimator is subjected to a two-dimensional pull correction in the space \((E_\mathrm{reco}, \sin{\delta_\mathrm{reco}})\), as described in Angular Error Calibration. This correction enforces that, everywhere on this reconstructed-observable plane, the median pull approaches the desired target value. As a consequence, the point-spread function (PSF)—modeled by a von Mises–Fisher distribution (or Rayleigh distribution in the small-angle regime)—is explicitly calibrated in the space that the likelihood function evaluates, ensuring internal self-consistency.

The success of this procedure is illustrated in Figure 4.2.1.1.1.2 and Figure 4.2.1.3.1.2, which show that the PSF is well-calibrated across reconstructed energy and reconstructed declination, respectively.

Limits of Reconstructed-Space Calibration

However, when the resulting angular resolution model is evaluated as a function of true neutrino energy, the coverage degrades noticeably as can be seen in Figure 4.2.1.2.1.2. This behavior is expected and does not indicate a flaw in the calibration procedure. The pull correction is performed solely with respect to reconstructed observables, since the true neutrino energy is not available for data and therefore cannot be used as a conditioning variable in the angular error model. In an idealized scenario where the mapping between true and reconstructed energy were one-to-one (i.e., bijective over the physically populated domain), conditioning on \(E_\nu^\mathrm{true}\) or on \(E_\mathrm{reco}\) would be equivalent. In that limit, a PSF calibrated in reconstructed space would automatically exhibit identical coverage as a function of true energy, \(f(\Delta\psi \mid E_\nu^\mathrm{true}) = f(\Delta\psi \mid E_\mathrm{reco})\).

In reality, the mapping between true and reconstructed energy is broad and non-bijective: fluctuations in inelasticity, hadronic light production, muon energy-loss stochasticity, and—for through-going tracks—the right-censoring of the pre-detector muon range, all contribute to a many-to-many relationship between \(E_\nu^\mathrm{true}\) and \(E_\mathrm{reco}\). Since the pull correction enforces agreement between the PSF model and the mixture-averaged distribution of angular errors present at each point in reconstructed space (for the utilized MC sample), it does not enforce that the angular error distributions corresponding to individual true-energy subpopulations match the PSF separately. When the sample is regrouped in slices of true energy, the mixture of reconstructed energies is reweighted according to \(p(E_\mathrm{reco}, \text{latent} \mid E_\nu^\mathrm{true})\), which generally differs from the mixture implicit in the calibration sample. As a result, the pull distribution in true-energy slices no longer coincides with the calibrated PSF, and the coverage as a function of true energy degrades even though the model remains correctly calibrated in the reconstructed observables that the likelihood actually uses.

Spectral Dependence

A related consideration concerns the spectral dependence of the angular error calibration. The pull correction is derived from MC events reweighted to a specific assumed spectral index. Since the mapping from true neutrino energy to reconstructed observables is broad and non-bijective, the mixture of true energies that populate each region of \((E_\mathrm{reco}, \sin{\delta_\mathrm{reco}})\) depends on the assumed spectrum. The pull correction therefore implicitly encodes this spectral mixture: changing the underlying spectral index alters the relative contributions of high- and low-energy events to a given reconstructed region which changes the true distribution of angular errors present there. In this sense, the PSF calibration is strictly correct only for the spectral index used to construct it.

We can see this clearly by comparing, e.g., Figure 3.1.3.1.1.2 to Figure 3.1.3.1.1.1 or Figure 3.4.3.1.1.1 to Figure 3.4.3.1.1.3.

The cause of this effect is that the PSF is fundamentally shaped by neutrino–lepton interaction physics, even before detector or reconstruction effects enter. In an idealized detector that perfectly reports the true lepton or hadronic-shower direction and energy, the PSF would still not collapse to a delta function because the mapping from neutrino kinematics to the outgoing lepton or hadronic system is intrinsically broad and energy dependent. The scattering-angle and inelasticity distributions are governed by weak-interaction dynamics and vary with both energy and interaction channel. These energy- and flavor-dependent kernels (e.g., \(\nu_\mu\) CC, \(\nu_e\) CC, or NC interactions) introduce irreducible uncertainty into the inference of the neutrino direction from the observed final state. As the assumed spectral index or flavor composition changes, the distribution of true neutrino energies and interaction channels feeding a given observable region shifts, altering the mixture of these underlying physics-driven kinematic kernels.

In practice, this means that the total PSF is the convolution of two components: an irreducible physics PSF that maps the neutrino direction to the true lepton (or hadronic-shower) direction, and a detector PSF (including the reconstruction algorithm) that maps the true final-state direction to the reconstructed direction. Both components are energy dependent, but only the physics PSF depends on the neutrino energy and flavor, which are unobservable and prior dependent. Thus, even with perfect detector performance, the conditional angular-error distribution—and therefore the PSF calibration—remains intrinsically tied to the assumed spectral index and flavor mix.

Ideally, the point-source likelihood would therefore employ a \(\gamma\)-dependent signal PSF, using the same spectral index as that assumed for the signal hypothesis. If \(\gamma\) is treated as a free parameter in the fit, a fully consistent approach would require either a family of pull corrections spanning a range of spectral indices or an interpolation scheme that adjusts the pull-scaling factor continuously as a function of \(\gamma\). An alternative strategy would explicitly separate the detector PSF from the neutrino-kinematics PSF and perform the convolution inside the likelihood evaluation itself. In that case, the detector PSF would be calibrated with respect to the lepton or shower direction and would thus become spectrum-independent, while the complexity associated with modeling the neutrino kinematics would be shifted into the likelihood. Either approach would eliminate spectrum-induced mismatches in the conditional angular-error distribution, but at the cost of substantially increased algorithmic and computational complexity.

The standard IceCube framework does not implement this more elaborate treatment. Instead, a compromise is adopted for this selection: the pull correction is derived from MC events reweighted to an effective spectral index \(\gamma = 2.5\), chosen as an intermediate between the commonly used \(\gamma = 2\) (for high-energy general-purpose samples) and \(\gamma = 3\) (for low-energy samples). This choice is not intended to match every possible spectral index encountered in likelihood evaluation; rather, it provides a reasonable and stable calibration point.

Analysis Impact

It is important to emphasize that spectral mismatch or other imperfections in the PSF model do not introduce a bias in significance estimation. The test statistic under the null hypothesis is obtained from RA-randomized real data using exactly the same angular error model and therefore remains correctly calibrated in the frequentist sense: The reported p-values correctly reflect the empirical frequency of TS fluctuations under the background hypothesis, even if the PSF model does not perfectly describe the true PSF.

Where PSF mis-modeling does appear is in signal recovery and analysis power. If the angular-error model used in the likelihood does not match the true angular-error distribution of injected signal events, the fitted signal strength \(n_s\) may act as an effective compensator. One plausible mechanism is that an overly narrow PSF over-weights the core region relative to the injected data; the likelihood can reconcile this discrepancy by selecting a smaller \(n_s\), leading to under-recovery (\(\hat{n}_s < n_\mathrm{inj}\)). Depending on the form of the mismatch and the relative contributions of core and tail regions, over-recovery is also possible. The sign and size of such recovery biases are inherently analysis-dependent and should be evaluated with dedicated injection studies rather than inferred from general considerations.

More importantly, improvements in the fidelity of the PSF model directly translate into improved sensitivity and discovery potential. By the Neyman–Pearson lemma, the likelihood constructed with the correct angular-error distribution is the most powerful test statistic for distinguishing signal from background. Any deviation from this optimal PSF makes the test statistic suboptimal and therefore reduces (or at best preserves) analysis power. This is the practical motivation for employing two-dimensional pull corrections and for modeling the angular-error distribution as accurately as is reasonably achievable.

These considerations become more pronounced when the assumed signal spectrum differs significantly from a simple power law. Broken power laws, exponential cutoffs, and more structured spectra induce stronger variations in the true-energy mixture within each reconstructed region than modest changes in spectral index. In such cases, a pull correction tuned to a single effective spectrum may introduce more noticeable spectrum-induced mismodeling of the PSF and therefore a greater loss of analysis power. Quantifying these effects requires targeted injection studies tailored to the specific spectral hypotheses under consideration.

Energy: Effective Area and Acceptance

The signal energy PDF \(\mathcal{S}_\mathrm{energy}\) describes the expected energy distribution of signal events given the assumed source spectrum. Rather than appearing explicitly in the likelihood, this term is combined with the data-derived energy PDF into a signal-to-background ratio (see Energy Signal-to-Background Ratio). Here we focus on the underlying quantities that determine the rate of signal events: effective area and acceptance.

Effective Area

The effective area \(A_\mathrm{eff}\) is a standard measure of detection efficiency: it represents the equivalent cross-sectional area of an idealized detector that would observe the same number of events as the real instrument. Formally, it is defined such that the rate of detected events from a flux \(\Phi(E)\) is

\[ \frac{\mathrm{d}N}{\mathrm{d}t} = \int \Phi(E) \, A_\mathrm{eff}(E, \Omega) \, \mathrm{d}E \, \mathrm{d}\Omega. \]

In general, \(A_\mathrm{eff}\) depends on both energy and arrival direction (zenith and azimuth). For IceCube, located at the geographic South Pole, the detector’s orientation relative to the celestial sky is nearly time-independent, and the hexagonal string layout introduces only weak azimuthal asymmetry. As a result, IceCube’s effective area is well-approximated as a function of neutrino energy and declination alone: \(A_\mathrm{eff}(E_\nu, \delta)\). This effective area encapsulates the entire detection chain: the neutrino–nucleon interaction cross-section, the probability that the resulting lepton or hadronic shower produces detectable Cherenkov light, the geometric acceptance of the photomultiplier array, the trigger efficiency, and the survival probability through all selection cuts.

In Monte Carlo simulation, the effective area is estimated from the ratio of detected to generated events:

\[ A_\mathrm{eff}(E_\nu, \delta) = A_\mathrm{gen} \cdot \frac{N_\mathrm{det}(E_\nu, \delta)}{N_\mathrm{gen}(E_\nu, \delta)}, \]

where \(A_\mathrm{gen}\) is the generation area (the cross-sectional area of the cylindrical or spherical volume over which neutrinos were injected) and the ratio reflects the fraction of generated events that pass all selection criteria. In IceCube’s simulation framework, this information is encoded in the per-event “one-weight” \(w_\mathrm{one}\), which combines generation area, solid angle, and interaction probability into a single weight that allows efficient computation of expected event rates for arbitrary flux models.

Figure 2 shows the effective areas for all samples, derived from the Csky MC NumPy files as listed in Table 1. For PST and NT, these files contain only \(\nu_\mu\) and \(\nu_\tau\); neither selection has yet been processed on \(\nu_\mathrm{e}\) NuGen. Consequently, their effective-area curves do not exhibit the Glashow resonance, whereas it is clearly visible for Lightning Tracks and, as expected, for DNN Cascades. The resonance would also appear for NT and PST if run on \(\nu_\mathrm{e}\) simulations, but this cannot be confirmed without producing those samples.

While effective area is a useful diagnostic quantity, it should not be interpreted as a direct indicator of point-source performance. Sensitivity is ultimately set by the signal-to-noise ratio and angular resolution, and effective area only correlates with sensitivity when background rates and angular errors are comparable across selections.

This can be seen clearly in Figure 2.2.2, where NT has a slightly smaller effective area than PST. Yet, as shown in Figure 13.1.2.2.2.2, NT achieves better sensitivity. The underlying reason is difficult to attribute uniquely—lower background contamination, improved angular reconstruction, or more accurate angular-error modeling may all contribute—but the conclusion remains: effective area alone is not a meaningful measure of point-source sensitivity.

Figure 2: Effective Area
Southern Sky neutrino and anti-neutrino effective area for Starting Track samples.Southern Sky neutrino and anti-neutrino effective area for Starting Track samples.
Figure 2.1.1: Southern Sky neutrino and anti-neutrino effective area for Starting Track samples.

Acceptance

The acceptance \(A(\gamma, \delta)\) integrates the effective area over the assumed source spectrum and observation time to yield an expected event count. For a source with differential flux \(\Phi(E_\nu) = \Phi_0 \, E^{-\gamma}\), the expected number of detected events is

\[ \langle N \rangle = \tau \int \Phi(E_\nu) \, A_\mathrm{eff}(E_\nu, \delta) \, \mathrm{d}E_\nu = \Phi_0 \, \tau \int E^{-\gamma} \, A_\mathrm{eff}(E, \delta) \, \mathrm{d}E \equiv \Phi_0 \, A(\gamma, \delta), \]

where \(\tau\) is the detector livetime. Thus the acceptance is the livetime-weighted spectral integral of the effective area:

\[ A(\gamma, \delta) = \tau \int E^{-\gamma} \, A_\mathrm{eff}(E, \delta) \, \mathrm{d}E. \]

In IceCube’s simulation framework, this integral is computed via a weighted sum over MC events:

\[ A(\gamma, \delta) = \frac{\tau}{\Omega} \sum_i w_{\mathrm{one},i} \, E_{\nu,i}^{-\gamma}, \]

where the sum runs over MC events in a declination band of solid angle \(\Omega\) around the source.

The acceptance provides the crucial link between detected event counts and physical flux. Given a measured or fitted signal strength \(n_s\), the corresponding flux normalization is \(\Phi_0 = n_s / A(\gamma, \delta)\). This relationship underlies the conversion from sensitivity (expressed in expected signal events) to sensitivity (expressed as a flux limit), as discussed in Sensitivity and Discovery Potential.

Figure 5 shows the signal acceptance for each sample as a function of declination. For combined samples, the plot shows stacked contributions from each component.

Figure 5: Signal Acceptance
2
Signal acceptance vs. declination for SLT assuming an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Combined selections show stacked contributions from each component.Signal acceptance vs. declination for SLT assuming an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Combined selections show stacked contributions from each component.
Figure 5.1.1: Signal acceptance vs. declination for SLT assuming an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Combined selections show stacked contributions from each component.

Background Modeling

The background hypothesis assumes that all observed events arise from diffuse backgrounds—atmospheric muons, atmospheric neutrinos, and any diffuse astrophysical neutrino flux—with no contribution from localized astrophysical point sources. Background modeling requires characterizing both the spatial distribution of events across the sky and their energy distribution as a function of declination.

Estimation from Data

Both background PDFs are constructed directly from the observed data rather than from Monte Carlo simulation. In principle, one could model atmospheric backgrounds using MC, but this approach faces severe practical limitations. Atmospheric neutrino flux predictions carry large systematic uncertainties (of order 10–30%), and producing atmospheric muon simulation with sufficient statistics across all energies is computationally prohibitive. MC-based background estimation is therefore only marginally feasible in the northern sky (where atmospheric muons are subdominant), and even there it requires careful treatment of systematics that can easily dominate over statistical uncertainties.

The data-driven alternative exploits the fact that IceCube’s background is isotropic in right ascension. Atmospheric neutrinos and muons arrive uniformly in azimuth (in local coordinates), and Earth’s rotation smears any fixed local direction into a uniform ring in equatorial right ascension over the course of a sidereal day. Because the background is RA-independent, the background PDFs depend only on declination (and energy): the spatial PDF \(\mathcal{D}_\mathrm{space}(\delta)\) is simply the declination distribution of observed events, and the energy PDF \(\mathcal{D}_\mathrm{energy}(E\mid \delta)\) is the energy distribution within each declination band. We denote these data-derived PDFs with \(\mathcal{D}\) rather than \(\mathcal{B}\) to emphasize that they are empirical estimates constructed from the observed data, not the true (unknown) background distribution.

RA Independence and Detector Location

IceCube’s location at the geographic South Pole is not required for data-driven background estimation—it simply makes the implementation more convenient. All neutrino detectors have zenith-dependent background rates in local coordinates (azimuth-independent, unless built next to a large mountain or similar obstruction). Seasonal variations do introduce time dependence, but for time-integrated searches we assume these average out over the observation period. The key difference between detector locations is how local coordinates map to equatorial coordinates. For detectors not at a pole, the transformation between equatorial coordinates (right ascension, declination) and local coordinates (azimuth, zenith) is time-dependent: a fixed point in right ascension and declination traces a path across different zenith angles throughout the day. This means the PDFs must be evaluated in local coordinates, and the source position (in local coordinates) is different for every event depending on its timestamp, adding complexity to the likelihood evaluation. At the South Pole, the declination-to-zenith mapping is effectively time-invariant (declination equals 90° minus zenith), and only right ascension rotates with local sidereal time. This allows us to work entirely in equatorial coordinates.

Spatial: Declination Distribution

The background spatial PDF \(\mathcal{D}_\mathrm{space}(\delta)\) describes the expected distribution of events across the sky under the null hypothesis. Because the background is uniform in azimuth, the PDF depends only on declination and reflects the detector’s zenith-dependent acceptance.

Figure 6 shows the background spatial PDF for each sample. All samples peak near the horizon, where the ice overburden provides sufficient shielding to attenuate most atmospheric muons while remaining transparent to neutrinos—this region offers the best signal-to-noise ratio.

We refer to the transition from neutrino-dominated to muon-dominated backgrounds as the muon horizon: the declination where observed atmospheric muon and neutrino rates cross. For the energies most relevant to point-source searches, this lies around 82° zenith (\(\sin\delta \approx -0.14\)). Further south (more downgoing), the atmospheric muon rate increasingly dominates.

Throughgoing track selections must apply progressively stricter energy cuts south of the muon horizon to achieve optimal signal-to-noise. Because throughgoing tracks lack topological information to distinguish single atmospheric muons from neutrino-induced muons—the two are topologically indistinguishable—the only available discriminant is the spectrum. This produces a distinctive feature in the TLT background PDF visible in Figure 6.2: a sharp spike on the rising edge near the muon horizon. This is not an artifact but an energy-dependent effect best understood by examining the energy PDF slices (Figure 8.2.2.2). Just south of the muon horizon, the ice overburden is thin enough to transmit high-energy neutrino-induced muons that are shielded by the thicker ice (and bedrock) at more northern declinations, but still thick enough to filter most low- and medium-energy atmospheric muons. These high-energy neutrinos interacted with the ice far from the detector, so their muon energy is right-censored—they appear as low reconstructed energy events. The result is a signal-to-background ratio above unity at low reconstructed energies in this region, and the data rate temporarily spikes before plummeting as cuts become stricter further south.

This is precisely the regime where TLT requires signal subtraction with the full energy term. Loosening the cut in this region improves signal-to-noise because the data rate increase is dominated by signal-like events (see Sensitivity Optimization), but only if the likelihood can correctly subtract the low-reconstructed-energy contamination from the background estimate via the energy-dependent sigsub weight. Without the energy term, these events degrade sensitivity because the extreme contamination at low reconstructed energies is not accounted for.

This feature is easily washed out by insufficiently fine binning in this region, which leads to a local widening of the background TS distribution due to degraded likelihood agreement and thus a local loss of analysis power. In high-resolution full-sky scans of RA-randomized data, we observed this as a bright band of higher TS values adjacent to a band of lower TS values before optimizing the binning—a consequence of local under- and overestimation of the background where the PDF model interpolated through the local peak structure. Optimizing the binning to capture this structure resolved the issue. A similar pattern appears in PST’s randomized skymaps, albeit more subtly, suggesting that PST has a comparable feature partially obscured by coarser binning.

The Starting Lightning Tracks sample follows a different pattern. Starting tracks can use veto cuts based on event topology to reject atmospheric muons, but these cuts lose discriminating power in the southern sky. As a result, the optimal energy cut is loosened rather than tightened, yielding better overall sensitivity (see Sensitivity Optimization). These declination-dependent changes in the optimal cut produce the characteristic double-peak structure visible in Figure 6.1.

Figure 6: Background Spatial PDF
Background spatial PDF for SLT, showing the expected declination distribution under the null hypothesis.Background spatial PDF for SLT, showing the expected declination distribution under the null hypothesis.
Figure 6.1: Background spatial PDF for SLT, showing the expected declination distribution under the null hypothesis.

Signal Subtraction

As noted in Estimation from Data, the background PDFs are constructed from the observed data. This raises a subtle issue: if the data contain a real astrophysical signal, that signal contaminates the background estimate. The background spatial PDF is constructed by averaging the data over right ascension. If the data contain a point source at position \((\alpha_s, \delta_s)\), the RA-averaging smears the localized signal over all right ascensions while preserving its declination structure. This smeared contribution, \(\mathcal{S}^\mathrm{sigsub}\), is the RA-integrated signal PDF—the same spatial PSF and energy weighting as the full signal, but integrated over right ascension so that only the declination dependence remains.

The signal subtraction (sigsub) method corrects for this contamination. The RA-averaged data PDF \(\mathcal{D}\) is fixed—it is the empirical distribution from the observed data. However, \(\mathcal{D}\) is itself a mixture: if the data contain \(n_s\) signal events out of \(N\) total, then

\[ \mathcal{D} = \frac{n_s}{N}\mathcal{S}^\mathrm{sigsub} + \left(1 - \frac{n_s}{N}\right)\mathcal{B}, \]

where \(\mathcal{B}\) is the assumed background. As \(n_s\) varies during fitting, our assumption of how \(\mathcal{D}\) decomposes into signal versus background changes dynamically—\(\mathcal{B}\) effectively depends on \(n_s\). Solving for \(\mathcal{B}\) and substituting into the standard per-event likelihood contribution \(\frac{n_s}{N}\mathcal{S} + (1 - \frac{n_s}{N})\mathcal{B}\):

\[ \frac{n_s}{N}\mathcal{S} + \left(1 - \frac{n_s}{N}\right) \cdot \frac{\mathcal{D} - \frac{n_s}{N}\mathcal{S}^\mathrm{sigsub}}{1 - \frac{n_s}{N}} = \frac{n_s}{N}\mathcal{S} + \mathcal{D} - \frac{n_s}{N}\mathcal{S}^\mathrm{sigsub}. \]

Rearranging:

\[ \mathcal{L} = \prod_i \left[\mathcal{D}_i + \frac{n_s}{N}(\mathcal{S}_i - \mathcal{S}_i^\mathrm{sigsub})\right]. \]

Factoring out \(\mathcal{D}_i\):

\[ \mathcal{L} = \prod_i \mathcal{D}_i \left[1 + \frac{n_s}{N} \cdot \frac{\mathcal{S}_i - \mathcal{S}_i^\mathrm{sigsub}}{\mathcal{D}_i}\right]. \]

The background-only hypothesis corresponds to \(n_s = 0\), giving \(\mathcal{L}_0 = \prod_i \mathcal{D}_i\). The log-likelihood ratio is then:

\[ \ln \frac{\mathcal{L}}{\mathcal{L}_0} = \sum_i \ln \left[1 + \frac{n_s}{N} \cdot \frac{\mathcal{S}_i - \mathcal{S}_i^\mathrm{sigsub}}{\mathcal{D}_i}\right]. \]

In practice, we factorize the event weights as \(W_i = W_i^\mathrm{space} \times W_i^\mathrm{energy}\), where \(W_i^\mathrm{space} = \mathcal{S}_{\mathrm{space},i}/\mathcal{D}_{\mathrm{space},i}\) is the spatial signal-to-data ratio (PSF divided by data-derived spatial PDF at the event position) and \(W_i^\mathrm{energy}\) is the energy ratio (looked up from precomputed 2D histograms, as described in the next section). The sigsub weight differs from the signal weight only in the spatial factor: the RA-marginalized signal \(\mathcal{S}_i^\mathrm{sigsub}\) is computed at runtime by integrating the PSF over right ascension, then divided by the data-derived spatial PDF to give \(W_i^\mathrm{space,sigsub}\). The energy factor \(W_i^\mathrm{energy}\) is common to both. In the default full signal subtraction configuration, the sigsub weight is \(W_i^\mathrm{sigsub} = W_i^\mathrm{space,sigsub} \times W_i^\mathrm{energy}\). An alternative spatial-only configuration sets \(W_i^\mathrm{sigsub} = W_i^\mathrm{space,sigsub}\) by dropping the energy factor from the correction term entirely, which eliminates certain pathologies associated with poorly constrained energy PDFs at high energies (see signal subtraction investigation slides). The log-likelihood ratio is then:

\[ X_i = \frac{W_i - W_i^\mathrm{sigsub}}{N}, \qquad \ln \frac{\mathcal{L}}{\mathcal{L}_0} = \sum_i \ln(1 + n_s X_i). \]

Signal subtraction (spatial) is enabled for all analyses and all contexts—background trials, signal-injection trials, and real-data unblinding alike. Whether the energy term is included in the sigsub correction varies by component: TLT and DNNC use full signal subtraction, while SLT uses spatial-only signal subtraction. Table 2 lists the nominal per-component configuration.

Signal Subtraction Energy Pathology

Including the energy term in the sigsub correction (\(W_i^\mathrm{sigsub} = W_i^\mathrm{space,sigsub} \times W_i^\mathrm{energy}\)) can cause the assumed background density to become negative for events with extreme energy weights, distorting the likelihood surface and biasing the fitted \(n_s\). This pathology is sample-dependent: it is severe for SLT, mild for NT and PST, and largely absent for TLT and DNNC. The per-component configuration in Table 2 reflects these findings. For the full investigation—including likelihood surface visualizations, KDE alternatives, and per-selection signal recovery comparisons—see the signal subtraction investigation slides.

Energy Signal-to-Background Ratio

Recall that the full signal-to-data ratio in the likelihood factorizes as \(\mathcal{S}_i/\mathcal{D}_i = (\mathcal{S}_\mathrm{space}/\mathcal{D}_\mathrm{space}) \times (\mathcal{S}_\mathrm{energy}/\mathcal{D}_\mathrm{energy})\). The spatial ratio is computed at runtime from the PSF and background spatial PDF (discussed in Spatial: Declination Distribution). This section focuses on the energy ratio \(\mathcal{S}_\mathrm{energy}(E \mid \gamma) / \mathcal{D}_\mathrm{energy}(E, \delta)\), which is precomputed as 2D histograms and looked up at each event’s reconstructed \((\sin\delta_i, \log E_i)\) coordinates during likelihood evaluation.

The energy ratio quantifies the relative likelihood that an event at energy \(E\) originates from a power-law source rather than atmospheric backgrounds. This ratio provides discrimination between signal and background based on the reconstructed energy proxy: for astrophysical sources with hard spectra, signal events tend toward higher energies than the softer atmospheric background.

Overestimation Is Not Conservative

A common misconception is that overestimating background rates or angular errors is “conservative” and therefore acceptable. In likelihood-based analyses this is not generally true. Any systematic mismatch—whether an over- or underestimation—degrades the agreement between the likelihood model and reality and therefore reduces analysis power. Moreover, when inference relies on asymptotic assumptions or fixed model expectations, such mismodeling can lead to miscalibrated significances and confidence intervals. In those cases, the results are not conservative but statistically invalid. Only explicit calibration against data can preserve correct coverage in the presence of model imperfections.

Construction

The \(\mathcal{S}/\mathcal{D}\) ratio is constructed as a 2D histogram in \((\sin\delta, \log E_\mathrm{reco})\) for each spectral index \(\gamma\) in a predefined grid. The data-derived histogram is built from observed events (with equal weight per event), while the signal histogram is built from MC weighted by \(w_\mathrm{one} \times E_\nu^{-\gamma}\) for the assumed spectral index. These raw histograms are converted to probability densities and interpolated with cubic splines to provide smooth \(\mathcal{S}/\mathcal{D}\) values at arbitrary \((\sin\delta, \log E)\) positions within each histogram. During likelihood evaluation, the spectral index \(\gamma\) is a fit parameter that can take any value, not just the predefined grid points; we handle this by parabolic interpolation in log-space, identifying the three nearest grid points, fitting a parabola through the corresponding \(\log(\mathcal{S}/\mathcal{D})\) values, and exponentiating the result.

Because we do not randomize declination or energy in background trials (for track selections), we can never encounter a bin during background trials or unblinding that has zero data events—every observed event falls into a bin that, by definition, contains at least that event. However, empty bins can arise in two ways: (1) regions of phase space where no data events were observed during the livetime, and (2) regions where no MC events exist. In Csky this is handled by filling empty data bins with the global minimum data density and setting signal values in empty bins equal to the corresponding data value. This yields \(\mathcal{S}/\mathcal{D} = 1\) for any bin with no data, effectively treating it as uninformative. Bins with neither data nor MC are irrelevant even during signal injection (since no MC events can land there), but finite values must still be assigned to prevent \(0/0\) in the \(\mathcal{S}/\mathcal{D}\) calculation (these fill values break the normalization of the energy PDF within each \(\sin\delta\) slice, but this is acceptable since the affected bins are never populated). This filling strategy is why many \(\mathcal{S}/\mathcal{D}\) histograms (e.g., Figure 7.3.1) show large regions with constant \(\mathcal{S}/\mathcal{D}\) ratios. In particular, at extremely high reconstructed energies (\(E_\mathrm{reco} \gtrsim 10^8\) GeV) where we have zero data events and either zero or one MC event per bin, almost all bins fall into one of two categories: \(\mathcal{S}/\mathcal{D} = 1\) (no data, no MC) or \(\mathcal{S}/\mathcal{D} \ll 1\) (no data, one MC event). The latter case yields an arbitrary, binning-dependent but typically tiny value because the flux-weighted MC contribution (for any reasonable spectral assumption) is much smaller than the global minimum data density used to fill empty data bins.

More broadly, the \(\mathcal{S}/\mathcal{D}\) ratio is poorly constrained at high energies wherever data and MC statistics are sparse—including bins with only one or a few data events, where the ratio is highly sensitive to binning choices and statistical fluctuations. Of the two failure modes described above, the zero-data/nonzero-MC case (\(\mathcal{S}/\mathcal{D} \ll 1\)) affects only sensitivity and discovery potential calculations, since those bins are populated only during signal injection and never when evaluating real data; they are therefore irrelevant for unblinding p-values. The low-statistics case, however, affects all analyses. Signal subtraction with the energy term exacerbates both modes: arbitrary bin-filling artifacts and poorly constrained ratios are amplified into extreme sigsub weights that can distort the likelihood surface and bias the best-fit \(n_s\) (see the signal subtraction investigation slides).

Alternatives to Fixed-Binning Energy PDFs

These issues are compounded by a fundamental limitation of csky: it enforces fixed binning in both \(\sin\delta\) and \(\log E\) independently—we cannot use statistically optimized adaptive binning as we do for the PSF calibration (see Robust Adaptive Binning). This means we either lose detail in high-statistics regions or suffer large fluctuations in low-statistics regions. One potential remedy that addresses both problems is to replace the fixed-binning histogram approach entirely with adaptive-bandwidth kernel density estimates of the full 2D \((\sin\delta, \log E)\) densities for signal and background, with explicit regularization (e.g., Laplace smoothing) replacing the ad hoc bin-filling. This eliminates the fixed-binning trade-off and makes all smoothing assumptions transparent model choices rather than implicit consequences of binning. However, tuning the KDE hyperparameters—bandwidth selection, boundary handling, and regularization strength—is a nontrivial optimization problem in its own right, and the resulting PDFs are sensitive to these choices. For the present analysis we retain the standard histogram approach with the per-component sigsub configuration described above, but KDE-based energy PDFs remain a viable direction for future work.

Implications for Hard Spectra

These limitations are most acute for hard spectra (\(\gamma \lesssim 2\)), where the signal is dominated by the highest-energy events. These probe precisely the regions of phase space where we have zero or very few data events, so the \(\mathcal{S}/\mathcal{D}\) ratio is driven by bin-filling artifacts rather than physical considerations. Sensitivity and discovery potential estimates for hard spectra should therefore be interpreted with care: they are highly model-dependent and strictly valid only for the exact assumptions made during the corresponding signal injection trials. The issue is not that the results are wrong, but that they depend sensitively on how the analysis handles regions with insufficient data to constrain the background. Other analysis frameworks (e.g., SkyLLH) address this by allowing a configurable default value for empty data bins, making the prior background assumption explicit. In practice, this is unlikely to significantly affect results since regions with zero data events are also unlikely to see substantial numbers of injected signal events for any reasonable source hypothesis.

This is further exacerbated by a subtle inconsistency in the TS calibration (see Null Hypothesis Calibration below). The background trials used to define the TS threshold under \(H_0\) are computed from the original data, which may contain zero or very few high-energy events in these sparsely populated bins. Those events can therefore never cluster by chance during background trials, and the resulting TS distribution reflects this. During signal injection trials, however, injected MC events populate these previously empty bins. Although the background PDFs are updated for each injection trial, the TS threshold is still calibrated against the original null distribution—which never saw such events. A fully self-consistent treatment would require rerunning background trials for every individual signal injection trial, recalibrating the null TS distribution with the specific injected events from that trial included in the data. This is computationally infeasible. Sensitivity, discovery potential, and upper limit estimates for such extreme spectra are therefore fundamentally limited by these uncorrectable approximations, and there is a reasonable argument for not publishing them at all.

2D Signal-to-Background Ratio

Figure 7 shows the 2D signal-to-background ratio as a function of reconstructed energy and declination; values above 1 indicate regions where signal is favored, values below 1 favor background. Astrophysical signal should dominate at high energies, and the northern sky typically has lower atmospheric background, increasing \(\mathcal{S}/\mathcal{D}\). Softer assumed spectra (\(\gamma \gtrsim 2.5\)) shift the signal distribution to lower energies, reducing the discrimination power of the energy term.

Figure 7: Energy $\mathcal{S}/\mathcal{D}$ Ratio (2D)
2
2D signal-to-background energy PDF ratio for SLT with $\gamma$ = 2. Colorscale shows $\log(\mathcal{S}/\mathcal{D})$; contours indicate $\mathcal{S}/\mathcal{D} = 1$.2D signal-to-background energy PDF ratio for SLT with $\gamma$ = 2. Colorscale shows $\log(\mathcal{S}/\mathcal{D})$; contours indicate $\mathcal{S}/\mathcal{D} = 1$.
Figure 7.1.1: 2D signal-to-background energy PDF ratio for SLT with $\gamma$ = 2. Colorscale shows $\log(\mathcal{S}/\mathcal{D})$; contours indicate $\mathcal{S}/\mathcal{D} = 1$.

Energy PDF Slices

Since reading 2D color histogranms can be difficult, we also provide 1D slice plots. Figure 8 shows \(\mathcal{S}/\mathcal{D}\) ratio curves extracted at fixed values of one variable, plotted as a function of the other. Selecting “Declination” shows \(\mathcal{S}/\mathcal{D}\) as a function of \(\log E_\mathrm{reco}\) for several fixed declination values; selecting “Energy” shows \(\mathcal{S}/\mathcal{D}\) as a function of \(\sin\delta\) for several fixed energies.

Figure 8: Energy $\mathcal{S}/\mathcal{D}$ Ratio (Slices)
2
Energy signal-to-background ratio sliced by Declination for SLT with $\gamma$ = 2. Each curve corresponds to a fixed Declination value (indicated by color) and shows $\mathcal{S}/\mathcal{D}$ as a function of the other variable. Tick marks inside the colorbar indicate the values at which slices are taken.Energy signal-to-background ratio sliced by Declination for SLT with $\gamma$ = 2. Each curve corresponds to a fixed Declination value (indicated by color) and shows $\mathcal{S}/\mathcal{D}$ as a function of the other variable. Tick marks inside the colorbar indicate the values at which slices are taken.
Figure 8.1.1.1: Energy signal-to-background ratio sliced by Declination for SLT with $\gamma$ = 2. Each curve corresponds to a fixed Declination value (indicated by color) and shows $\mathcal{S}/\mathcal{D}$ as a function of the other variable. Tick marks inside the colorbar indicate the values at which slices are taken.

Null Hypothesis Calibration

With the likelihood components defined, we now examine how the test statistic behaves under the null hypothesis—that is, when no signal is present. The TS distribution under background-only conditions determines significance thresholds and p-value calibration.

RA Randomization

To determine the significance of any observed excess, we must know the distribution of the test statistic under the null hypothesis. We obtain this empirically by generating background trials—pseudo-experiments that are, by construction, signal-free realizations of the background. Each trial is generated by RA randomization: drawing each event’s right ascension uniformly from \(\mathrm{U}(0, 2\pi)\) while preserving declinations and all other observables. Running the full likelihood analysis on many such RA-randomized trials yields the empirical TS distribution, from which p-values and significance thresholds are derived.

Terminology: Scrambling vs. Randomization

This procedure is often referred to as “scrambling” in IceCube, meaning to destroy the signal structure in the data by breaking the association between each event’s right ascension and its other properties. This should not be confused with permutation (shuffling RA values among events without replacement). What we actually perform is uniform RA resampling: each event’s right ascension is independently drawn from \(\mathrm{U}(0, 2\pi)\), completely discarding the original RA values. We use “RA randomization” throughout for clarity.

The randomized data directly encode the detector’s acceptance, seasonal variations, and all other systematics that affect the observed event distribution, without requiring any simulation. This is a principal advantage of the data-driven approach: the background trials inherit all the complexity of the real detector response automatically.

Background TS Distributions

For unbinned point-source searches, the TS follows a mixture distribution: a delta function at zero (when the best-fit \(n_s = 0\)) plus a \(\chi^2\) tail (when \(n_s > 0\)). The mixture fraction \(\eta\) is the observed fraction of trials with \(n_s > 0\), while the \(\chi^2\) degrees of freedom are fit to the positive-TS tail.

Figure 9 shows the TS distribution at representative declinations for each sample. The histogram shows the empirical distribution from \(\mathcal{O}(10^{6})\) background trials; the dashed line shows the fitted \(\chi^2\) mixture model. Vertical lines mark the median TS and the \(3\sigma\)/\(5\sigma\) significance thresholds. Because the \(\chi^2\) approximation can deviate from the true distribution—particularly in the tails—and because we generate sufficient trials to obtain accurate empirical coverage, we use the empirically determined quantiles for the median and \(3\sigma\) thresholds rather than relying on the \(\chi^2\) fit. The \(5\sigma\) threshold is extrapolated from the fitted \(\chi^2\) model because direct empirical estimation would require \(\mathcal{O}(10^{9})\) trials: with \(p_{5\sigma} \approx 2.87 \times 10^{-7}\), even \(10^9\) trials yield only \(\sim300\) events above threshold, corresponding to a \(\sim6\%\) statistical uncertainty on the quantile.

The "\(\sigma\)" Convention

The \(n\sigma\) notation is a historical convention inherited from particle physics: we quote the p-value that would correspond to an \(n\)-sigma one-sided fluctuation in a standard normal distribution, even though the TS distribution is not Gaussian. That is, \(3\sigma\) means \(p = \int_3^\infty \mathcal{N}(0,1)\,dx \approx 1.35 \times 10^{-3}\), and \(5\sigma\) means \(p \approx 2.87 \times 10^{-7}\). The test is one-sided because we are looking for an excess of signal; a deficit would simply yield TS \(= 0\). This convention has no statistical justification—it merely provides a familiar scale for those accustomed to Gaussian intuition. The p-values themselves are the fundamental quantities.

To assess the quality of the \(\chi^2\) fit, we compute two goodness-of-fit statistics. The Kolmogorov–Smirnov (KS) statistic \(D\) measures the maximum vertical distance between the empirical and fitted cumulative distribution functions; intuitively, \(D\) is the maximum absolute error on the p-value that would result from using the \(\chi^2\) model instead of the empirical distribution. The Anderson–Darling (AD) statistic \(A^2\) is a weighted integral of the squared difference between CDFs, with weights that increase toward the tails of the distribution; it is therefore more sensitive to discrepancies in the high-TS region that determines significance thresholds. Smaller values of both statistics indicate better agreement with the \(\chi^2\) model.

Figure 9: Background TS Distributions
−0.83
Background test statistic distribution for SLT at $\sin(\delta)$ = −0.83. The histogram shows the empirical distribution from background trials; the dashed line shows the fitted $\chi^2$ mixture model. Background PDF: Nominal.Background test statistic distribution for SLT at $\sin(\delta)$ = −0.83. The histogram shows the empirical distribution from background trials; the dashed line shows the fitted $\chi^2$ mixture model. Background PDF: Nominal.
Figure 9.1.1.1: Background test statistic distribution for SLT at $\sin(\delta)$ = −0.83. The histogram shows the empirical distribution from background trials; the dashed line shows the fitted $\chi^2$ mixture model. Background PDF: Nominal.

Chi-Squared Fit Parameters

Figure 10 shows the fitted \(\chi^2\) parameters as a function of declination: the effective degrees of freedom \(n_\mathrm{dof}\), the mixture fraction \(\eta\), the \(3\sigma\) threshold (both empirical and \(\chi^2\)-extrapolated), the \(5\sigma\) threshold (\(\chi^2\)-extrapolated only), and the goodness-of-fit statistics \(D\) and \(A^2\).

Figure 10: Chi-Squared Fit Parameters
Fitted $\chi^2$ parameters vs. declination for SLT. Panels show degrees of freedom, mixture parameter, significance thresholds, and goodness-of-fit statistics. Background PDF: Nominal.Fitted $\chi^2$ parameters vs. declination for SLT. Panels show degrees of freedom, mixture parameter, significance thresholds, and goodness-of-fit statistics. Background PDF: Nominal.
Figure 10.1.1: Fitted $\chi^2$ parameters vs. declination for SLT. Panels show degrees of freedom, mixture parameter, significance thresholds, and goodness-of-fit statistics. Background PDF: Nominal.

Fitted Parameters from Background Trials

The spectral index \(\gamma\) and signal count \(n_s\) are free parameters in each likelihood fit, with \(\gamma\) constrained to the interval \([1, 4]\) and \(n_s \geq 0\). In background-only trials, there is no injected signal and hence no “true” \(\gamma\) or \(n_s\); the fitted values instead reflect how background fluctuations project onto the signal template. Figure 11 shows the median and 20–80% quantile range of the fitted \(\gamma\) and \(n_s\) distributions across declination, computed only from trials where the fitter found signal (\(n_s > 0\)). The sharp cutoff at \(\gamma = 4\) is an artifact of the fit constraint, not a physical feature.

Figure 11: Fitted Parameters from Background Trials
Fitted parameters from background trials for SLT, showing only trials where the fitter found signal ($n_s > 0$). Top panel: fitted spectral index $\gamma$ vs. declination. Bottom panel: fitted signal count $n_s$ vs. declination. Shaded bands indicate the 20th--80th percentile range. Background PDF: Nominal.Fitted parameters from background trials for SLT, showing only trials where the fitter found signal ($n_s > 0$). Top panel: fitted spectral index $\gamma$ vs. declination. Bottom panel: fitted signal count $n_s$ vs. declination. Shaded bands indicate the 20th--80th percentile range. Background PDF: Nominal.
Figure 11.1.1: Fitted parameters from background trials for SLT, showing only trials where the fitter found signal ($n_s > 0$). Top panel: fitted spectral index $\gamma$ vs. declination. Bottom panel: fitted signal count $n_s$ vs. declination. Shaded bands indicate the 20th--80th percentile range. Background PDF: Nominal.

Signal Injection and Recovery

A correctly specified likelihood should recover the injected signal strength at the median: the fitted \(\hat{n}_s\) should equal the true \(n_\mathrm{inj}\). Systematic deviations indicate model misspecification that can affect both detection power and flux estimation.

To test signal recovery, we inject a known number of signal events \(n_\mathrm{inj}\) into RA-randomized data and fit the likelihood. Repeating this many times for each \(n_\mathrm{inj}\) value yields a distribution of fitted \(\hat{n}_s\) values. If the likelihood is correctly specified, the median fitted \(\hat{n}_s\) should lie on the diagonal \(\hat{n}_s = n_\mathrm{inj}\), with scatter consistent with statistical fluctuations.

Figure 12 shows the signal recovery diagnostic: fitted \(n_s\) (y-axis) vs. injected \(n_\mathrm{inj}\) (x-axis). The solid line shows the median fitted value; shaded bands indicate the 68% and 95% intervals across trials. The diagonal dashed line represents perfect recovery. Since \(n_\mathrm{inj}\) events are injected exactly (no Poisson sampling), the median should lie precisely on this line if the likelihood is correctly specified.

Note that the fit recovers both \(\hat{n}_s\) and \(\hat{\gamma}\) simultaneously. At low injected signal strengths, the limited number of signal events provides little spectral information, making the soft degeneracy between \(n_s\) and \(\gamma\) particularly difficult to break.

It is apparent from Figure 12 that signal recovery is far from perfect across all selections. Pure track samples (e.g., NT) tend to perform best, but even they show systematic deviations. The primary culprit is PDF mismodeling—both the spatial PSF (see Angular Error Calibration) and the energy \(\mathcal{S}/\mathcal{D}\) ratio (see Alternatives to Fixed-Binning Energy PDFs above) contribute to discrepancies between the assumed and true signal-to-background distributions.

Ultimately, signal recovery is primarily a performance diagnostic. As long as the TS is calibrated empirically from background trials, accurate significance estimates do not require perfect PDF agreement—though better agreement should yield greater statistical power. Where signal recovery matters critically is in reporting limits on a source’s spectral shape, for which calibration curves are typically fit from these injection studies. Such measurements are therefore highly model-dependent, inheriting all assumptions about the MC used for injection and the PDF construction.

Figure 12: Signal Recovery ($n_s$ Bias)
−0.83
2
Signal recovery for SLT at $\sin(\delta)$ = −0.83 with $\gamma$ = 2. The plot shows fitted $n_s$ vs. injected $n_\mathrm{{inj}}$, with the median (solid line) and 68%/95% intervals (shaded bands). The diagonal dashed line indicates perfect recovery. Background PDF: Nominal.Signal recovery for SLT at $\sin(\delta)$ = −0.83 with $\gamma$ = 2. The plot shows fitted $n_s$ vs. injected $n_\mathrm{{inj}}$, with the median (solid line) and 68%/95% intervals (shaded bands). The diagonal dashed line indicates perfect recovery. Background PDF: Nominal.
Figure 12.1.1.1.1: Signal recovery for SLT at $\sin(\delta)$ = −0.83 with $\gamma$ = 2. The plot shows fitted $n_s$ vs. injected $n_\mathrm{{inj}}$, with the median (solid line) and 68%/95% intervals (shaded bands). The diagonal dashed line indicates perfect recovery. Background PDF: Nominal.

Sensitivity and Discovery Potential

The primary figures of merit for a point-source selection are sensitivity—the minimum source flux detectable at 90% confidence—and discovery potential—the flux required to achieve a given significance threshold (\(3\sigma\) or \(5\sigma\)) in 50% of experiments. These metrics integrate all aspects of the analysis: effective area, angular resolution, energy resolution, background rate, and the accuracy of the likelihood model. In both cases, lower flux values indicate superior performance. For caveats on the reliability of these estimates at hard spectral indices (\(\gamma \lesssim 2\)), see Implications for Hard Spectra.

Integrated Sensitivity

The standard sensitivity and discovery potential assume a steady point source emitting neutrinos with a power-law energy spectrum \(E^{-\gamma}\) extending over all energies. The calculation proceeds by injecting simulated signal events on top of RA-randomized data and finding the signal strength at which a specified fraction of trials exceeds a TS threshold:

  • Sensitivity (90% CL): the signal strength at which 90% of trials exceed the median background TS. The median is determined empirically from background-only trials.
  • \(3\sigma\) discovery potential (50% CL): the signal strength at which 50% of trials exceed the TS corresponding to a p-value of \(1.35 \times 10^{-3}\). This threshold is determined empirically from background trials.
  • \(5\sigma\) discovery potential (50% CL): the signal strength at which 50% of trials exceed the TS corresponding to \(p = 2.87 \times 10^{-7}\). Because this threshold lies deep in the tail, it is extrapolated from the fitted \(\chi^2\) mixture model rather than determined empirically.

Signal injection proceeds as follows: MC events are pre-filtered to a declination band around the source and assigned weights proportional to \(w_\mathrm{one} \times E_\nu^{-\gamma}\) (see Effective Area for the definition of \(w_\mathrm{one}\)). The injection uses only the spectral shape—no flux normalization is assumed at this stage. To inject \(n_\mathrm{inj}\) signal events, we normalize the weights to probabilities and draw \(n_\mathrm{inj}\) events via weighted rejection sampling. The drawn events are then rotated to the source position and added to the RA-randomized data. The declination band has a fixed full width of \(\approx 0.035\) in \(\sin\delta\), which corresponds to very different angular extents depending on declination: approximately \(\pm 1.0°\) at the horizon, \(\pm 1.2°\) at \(\delta = -30°\), and \(\pm 5.8°\) at \(\delta = -80°\) due to the \(1/\cos\delta\) Jacobian. While the rotation places each injected event at the correct sky position, it retains the detector response—effective area, angular resolution, reconstruction bias—from its original simulated declination. The injected signal therefore does not perfectly represent the detector’s response to a true source at the target position; it is an approximation that relies on detector properties varying slowly across the band. Near the poles, where the band spans more than \(10°\) in declination, this assumption is considerably less justified than near the equator, adding a systematic uncertainty to all injection-based estimates that is difficult to quantify.

For combined samples such as LT (which combines SLT and TLT), the trial runner computes the acceptance of each component at the source declination and distributes the requested \(n_\mathrm{inj}\) signal events among components via a multinomial draw weighted by relative acceptance. A component with three times the acceptance of another receives, on average, three times as many injected events.

Once the threshold \(n_\mathrm{inj}\) is found, it is converted to a physical flux using the acceptance. Since \(A(\gamma)\) gives the expected event count per unit flux normalization, the flux that produces \(n_\mathrm{inj}\) events is simply \(\Phi_0 = n_\mathrm{inj} / A(\gamma)\). Evaluated at pivot energy \(E_0\):

\[ \left.\frac{\mathrm{d}N}{\mathrm{d}E}\right|_{E_0} = \frac{n_\mathrm{inj}}{A(\gamma)} \, E_0^{-\gamma}. \]

IceCube conventionally reports the energy-weighted flux \(E^2 \, \mathrm{d}N/\mathrm{d}E\) rather than the differential flux itself:

\[ E_0^2 \left.\frac{\mathrm{d}N}{\mathrm{d}E}\right|_{E_0} = \frac{n_\mathrm{inj}}{A(\gamma)} \, E_0^{2-\gamma}. \]

For combined samples, \(A(\gamma)\) is summed over all components.

The \(E^2\) Flux Convention

The \(E^2 \, \mathrm{d}N/\mathrm{d}E\) convention is widespread in astronomy but somewhat arbitrary. For \(\gamma = 2\), the pivot-energy dependence vanishes: \(E_0^{2-\gamma} = 1\), so the reported value is simply \(n_\mathrm{inj} / A\) regardless of \(E_0\). For \(\gamma \neq 2\), changing the pivot energy shifts the numerical value by a factor of \(E_0^{2-\gamma}\). In log-space, this is a constant vertical offset of \((2 - \gamma) \log E_0\). The choice of pivot energy is therefore meaningful only when comparing fluxes at different spectral indices, and one should always verify that the same \(E_0\) was used before comparing numerical values across analyses. A more natural choice would be to report \(E^\gamma \, \mathrm{d}N/\mathrm{d}E\), which equals \(n_\mathrm{inj} / A\) for any spectral index and eliminates the pivot-energy ambiguity entirely—but this is not the convention that has taken hold.

Figure 13 shows sensitivity and discovery potential as a function of source declination for different sample groupings; the ratio subplot below each curve compares our sample against the reference samples, where values above 1 indicate improvement. Combined samples (LT + DNNC, PST + DNNC, etc.) generally outperform individual samples by combining track and cascade channels. In the Northern sky (\(\sin\delta > 0\)), propagation through the Earth suppreses atmospheric muons while simultaneously introducing energy-dependent attenuation of the neutrino flux at the highest energies. Southern-sky performance depends more heavily on starting-track vetoes and self-veto effects. Softer assumed spectra (\(\gamma \gtrsim 2.5\)) shift the signal distribution to lower energies, which have poorer angular resolution but higher statistics; they also reduce the signal-to-background ratio because the atmospheric background spectrum is softer than a typical astrophysical signal.

Figure 13: Sensitivity and Discovery Potential
2
90% Sensitivity for Starting Track samples assuming a point source at declination $\delta$ emitting steadily with an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Y-axis shows Flux at 1 TeV. Flux values are per neutrino flavor ($\nu + \bar{{\nu}}$, single flavor). Background PDF: Nominal.90% Sensitivity for Starting Track samples assuming a point source at declination $\delta$ emitting steadily with an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Y-axis shows Flux at 1 TeV. Flux values are per neutrino flavor ($\nu + \bar{{\nu}}$, single flavor). Background PDF: Nominal.
Figure 13.1.1.1.1.1: 90% Sensitivity for Starting Track samples assuming a point source at declination $\delta$ emitting steadily with an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2. Y-axis shows Flux at 1 TeV. Flux values are per neutrino flavor ($\nu + \bar{{\nu}}$, single flavor). Background PDF: Nominal.

Declination Randomization

Early comparison runs applied the same trial configuration to all selections, including Gaussian declination randomization with \(\sigma = 3°\). Since declination randomization is required for cascades (DNNC), applying it uniformly to all samples seemed like the most direct comparison. However, declination randomization artificially inflates significance because it smooths the background in declination, making the unrandomized signal appear sharper. For cascades, as long as the angular resolution is much larger than the chosen smearing width, this effect is negligible; for track selections, the effect is substantial. The curves shown here use the corrected configuration: RA-only randomization for tracks, declination randomization for cascades.

Crucially, declination randomization introduces an asymmetry between the background trials and real data: during unblinding, the data are not declination-randomized. The TS distribution calibrated from randomized pseudo-experiments may therefore not accurately represent the null distribution of the actual data. This is the only component of the analysis pipeline that can systematically bias the empirical null calibration, and declination randomization must consequently be treated with extreme caution—even for cascades, where the effect is expected to be small.

For cascades, however, declination randomization is effectively unavoidable. The data are too sparse: at some declinations there may be only a few dozen events, so RA-only randomization produces a very limited number of distinct background realizations. Running millions of trials is meaningless when the combinatorial space of possible RA permutations is exhausted after a few hundred. Declination randomization alleviates this by drawing from a broader pool of events, but at the cost of the asymmetry described above.

Differential Sensitivity

Integrated sensitivity assumes a power-law spectrum extending over all energies, which is convenient for comparing selections but does not reveal how sensitivity varies with energy. Differential sensitivity addresses this by computing the sensitivity within restricted energy bins, yielding a spectrum-like curve that shows the minimum detectable flux as a function of energy.

The key question answered by differential sensitivity is: how much does each energy range contribute to the analysis’s detection power? This diagnostic reveals the energy dependence of the effective area, background contamination, and overall performance—information that is averaged away in the integrated metric.

The procedure mirrors integrated sensitivity, but signal injection is restricted to a specific energy range. For each bin \([E_\mathrm{min}, E_\mathrm{max}]\), we construct a flux hypothesis that is nonzero only within that bin:

\[ \Phi(E) = \begin{cases} \Phi_0 \, E^{-\gamma} & \text{if } E \in [E_\mathrm{min}, E_\mathrm{max}] \\ 0 & \text{otherwise} \end{cases} \]

Signal injection then draws only from MC events whose true neutrino energy falls within the bin, and the bin-restricted acceptance is

\[ A(\gamma; E_\mathrm{min}, E_\mathrm{max}) = \frac{\tau}{\Omega} \sum_{E_\mathrm{min} \le E_{\nu,i} \le E_\mathrm{max}} w_{\mathrm{one},i} \, E_{\nu,i}^{-\gamma}. \]

Crucially, the likelihood function is unchanged—it still uses the full energy PDF spanning all energies, exactly as in integrated sensitivity. Only the signal injection is restricted; the analysis itself remains agnostic to the energy range of the injected signal. Events from any energy can contribute to the test statistic, regardless of which bin is being tested. This means background trials can be reused across all energy bins, since the TS distribution under the null hypothesis depends only on the RA-randomized data and the likelihood model, neither of which changes between bins.

For each bin, the threshold \(n_\mathrm{inj}\) is converted to flux using the bin-restricted acceptance:

\[ E_0^2 \left.\frac{\mathrm{d}N}{\mathrm{d}E}\right|_{E_0} = \frac{n_\mathrm{inj}}{A(\gamma; E_\mathrm{min}, E_\mathrm{max})} \, E_0^{2-\gamma}. \]

As in the integrated case, the pivot energy \(E_0\) is the reference energy at which the flux is evaluated. For \(\gamma = 2\), the factor \(E_0^{2-\gamma} = 1\), so the pivot energy drops out entirely and the reported value is simply \(n_\mathrm{inj} / A\). For \(\gamma \neq 2\), the choice of pivot energy affects the numerical value (though not the physical content—it is just a vertical shift in log-space).

We adopt the convention of evaluating the flux at the lower bin edge: \(E_0 = E_\mathrm{min}\). This is a common choice in IceCube analyses and aligns with the interpretation that the reported value is an upper limit on the flux “starting from” that energy. For narrow bins (quarter-decade, spanning a factor of \(\sim 1.78\) in energy), the choice of pivot energy within the bin makes little practical difference.

Figure 14 shows the differential sensitivity and discovery potential, comparing selections within each group at a fixed declination. We use quarter-decade (0.25 dex) energy bins from 10 GeV to 10⁸ GeV, assuming a fixed spectral index \(\gamma\) within each bin.

Figure 14: Differential Sensitivity and Discovery Potential
−0.83
2
Differential 90% Sensitivity for Starting samples assuming an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2 and $\sin(\delta)$ = −0.83, showing Flux per energy bin. Flux values are per neutrino flavor ($\nu + \bar{{\nu}}$, single flavor). Background PDF: nominal.Differential 90% Sensitivity for Starting samples assuming an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2 and $\sin(\delta)$ = −0.83, showing Flux per energy bin. Flux values are per neutrino flavor ($\nu + \bar{{\nu}}$, single flavor). Background PDF: nominal.
Figure 14.1.1.1.1.1: Differential 90% Sensitivity for Starting samples assuming an $E^{{-\gamma}}$ spectrum with $\gamma$ = 2 and $\sin(\delta)$ = −0.83, showing Flux per energy bin. Flux values are per neutrino flavor ($\nu + \bar{{\nu}}$, single flavor). Background PDF: nominal.

What Differential Sensitivity Measures

Differential sensitivity answers the question: if a source emitted a power-law flux confined entirely to this energy bin, what is the minimum flux we could detect? This is a hypothetical scenario—real astrophysical sources have broad spectra—but the resulting curve reveals the energy-dependent performance of the analysis.

The differential sensitivity curve is most useful for:

  1. Visualizing the sensitive energy range: Where does the analysis have the most detection power? The minimum of the curve indicates the “sweet spot” energy range.
  2. Comparing to peaked flux predictions: Some theoretical models (e.g., line emission, narrow spectral features) predict flux concentrated in a specific energy range. Differential sensitivity shows whether such a signal would be detectable.
  3. Understanding energy-dependent tradeoffs: Low energies have high event rates but poor angular resolution and high atmospheric background; high energies have excellent angular resolution but low statistics. The differential curve reveals how these factors balance.

A real source with a broad \(E^{-2}\) spectrum is detected via the integrated sensitivity, not the sum of differential sensitivities across bins. The integrated metric accounts for the cumulative signal from all energies simultaneously.