License: CC BY 4.0
arXiv:2604.04785v1 [math.ST] 06 Apr 2026

High Dimensional Bootstrap and Asymptotic Expansion for the kk-th Largest Coordinate

Long Feng
School of Statistics and Data Science, LEBPS, KLMDASR,
AAIS and LPMC, Nankai University
Abstract

We study bootstrap inference for the kkth largest coordinate of a normalized sum of independent high-dimensional random vectors. Existing second-order theory for maxima does not directly extend to order statistics, because the event {Tn,[k]t}\{T_{n,[k]}\leq t\} is not a rectangle and its local structure is governed by exceedance counts rather than by a single boundary. We develop an approach based on factorial moments and weighted inclusion–exclusion that reduces the problem to a collection of rare-orthant probabilities and allows high-dimensional Edgeworth and Cornish–Fisher expansions to be transferred to the order-statistic setting. Under moment, variance, and weak-dependence conditions, we derive a second-order coverage expansion for wild-bootstrap critical values of the kkth order statistic. In particular, a third-moment matching wild bootstrap achieves coverage error of order n1n^{-1} up to logarithmic factors, and the same second-order accuracy is obtained for a prepivoted double wild bootstrap. We also show that the maximal-correlation condition can be replaced by a stationary Gaussian exponential-mixing assumption at the price of an explicit dependence remainder rdr_{d}, and this remainder can itself be of order n1n^{-1} when the dimension is sufficiently large relative to the sample size. These results extend recent second-order Gaussian and bootstrap approximation theory from maxima to the kkth order statistic in high dimension.

Keywords: bootstrap coverage expansion; high-dimensional Gaussian approximation; kkth order statistic; second-order accuracy; wild bootstrap.

1 Introduction

High-dimensional Gaussian approximation for maxima and rectangular probabilities is now a basic tool in modern high-dimensional inference. For the maximum of a sum of independent random vectors, the seminal work of Chernozhukov et al. (2013) established Gaussian approximation and Gaussian multiplier bootstrap validity when the dimension is allowed to be much larger than the sample size. This line of work was sharpened substantially by Chernozhukov et al. (2017), who extended the approximation theory to hyperrectangles and improved the first-order rate. Later, Deng and Zhang (2020) showed that third-moment matching bootstrap procedures enjoy a better logarithmic dependence in the first-order bound, and Koike (2021) proved that the same logarithmic rate is already available for normal approximation. Among the general first-order results under mild moment assumptions, Chernozhukov et al. (2022) further improved the error bound to an n1/4n^{-1/4}-type rate up to logarithmic factors. Under additional nondegeneracy or structural assumptions, nearly parametric n1/2n^{-1/2} rates up to logarithmic losses are also available; see, for example, (Lopes et al., 2020; Fang and Koike, 2021; Chernozhukov et al., 2023; Fang and Koike, 2024).

A decisive recent development for maxima is the asymptotic expansion theory developed by Koike (2026). That paper developed high-dimensional Edgeworth and Cornish–Fisher expansions for maxima and related rectangular probabilities by combining Stein-kernel arguments, smoothing inequalities, and a careful analysis of Gaussian anti-concentration. As a consequence, Koike (2026) obtained a second-order bootstrap coverage expansion and showed that, in several important regimes, the coverage error can be improved from the first-order scale to O(loga(dn)/n)O\!\left(\log^{a}(dn)/n\right) for a suitable constant a>0a>0. In particular, for third-moment matching wild bootstrap, the maximum statistic becomes second-order accurate even without studentization under suitable covariance assumptions.

Compared with the theory for maxima, the literature for the kkth largest coordinate is still sparse. Classical results on order statistics and extremes, such as (Fisher and Tippett, 1928; Mu, 1966; Watts et al., 1982), do not address high-dimensional Gaussian approximation for sums of random vectors. On the Gaussian side, Kozbur (2021) studied dimension-free anti-concentration inequalities for Gaussian order statistics. In the genuinely high-dimensional setting, Ding et al. (2026) established Gaussian and Gaussian multiplier bootstrap approximations for the kkth largest coordinate and for more general functionals of the top-kk order statistics. For the kkth largest coordinate, their Kolmogorov bounds are of order

k2(Bn2log5(pn)n)1/4,k^{2}\Bigl(\frac{B_{n}^{2}\log^{5}(pn)}{n}\Bigr)^{1/4},

up to universal constants, and the bounds for general top-kk functionals are of even larger order. Therefore the currently available theory for the kkth largest coordinate is still essentially first-order and does not provide a second-order coverage expansion comparable to the one available for maxima.

The purpose of the present paper is to fill this gap. We prove that the kkth largest coordinate of a high-dimensional normalized sum also admits a Koike-type second-order bootstrap expansion. Our argument starts from the exact exceedance-count representation of the event {Tn,[k]t}\{T_{n,[k]}\leq t\} and combines weighted inclusion–exclusion with a local rare-orthant analysis. This allows us to transfer the second-order expansion machinery from maxima to the kkth order statistic. As a result, we show that third-moment matching wild bootstrap retains second-order accuracy for the kkth largest coordinate, and we also obtain a second-order result for the prepivoted double wild bootstrap. In this way, the second-order theory that was previously available only for maxima is extended to the kkth largest coordinate in high dimension.

We also give a complementary dependence formulation based on a stationary Gaussian reference field with exponentially decaying strong-mixing coefficients. This assumption is structurally different from the maximal-correlation condition used in the baseline theory: it exploits one-dimensional dependence and allows local clusters of highly correlated coordinates. In that setting we rework the Gaussian aggregation argument and obtain the same distributional, quantile, and coverage expansions with an explicit additional remainder rdr_{d} that isolates the effect of local exceedance clustering. The resulting expression is fully explicit and can again be of order n1n^{-1} when the dimension grows sufficiently quickly relative to the sample size.

The remainder of the paper is organized as follows. Section 2 presents the main theoretical results, including the exponential-mixing alternative in Section 2.3. Section 3 reports simulation results comparing several bootstrap methods. Section 4 concludes. Proofs are collected in Appendices A and B.

Notation. We write [d]:={1,,d}[d]:=\{1,\dots,d\}. For a vector 𝒙m\bm{x}\in\mathbb{R}^{m}, let

𝒙2:=(j=1mxj2)1/2,𝒙:=max1jm|xj|.\|\bm{x}\|_{2}:=\Bigl(\sum_{j=1}^{m}x_{j}^{2}\Bigr)^{1/2},\qquad\|\bm{x}\|_{\infty}:=\max_{1\leq j\leq m}|x_{j}|.

We denote by 𝟏d=(1,,1)d\mathbf{1}_{d}=(1,\dots,1)^{\top}\in\mathbb{R}^{d} the all-ones vector. For rr\in\mathbb{N}, (m)r(\mathbb{R}^{m})^{\otimes r} denotes the set of real-valued mm-dimensional rr-tensors. If 𝖳(m)q\mathsf{T}\in(\mathbb{R}^{m})^{\otimes q} and 𝖴(m)r\mathsf{U}\in(\mathbb{R}^{m})^{\otimes r}, then 𝖳𝖴(m)(q+r)\mathsf{T}\otimes\mathsf{U}\in(\mathbb{R}^{m})^{\otimes(q+r)} denotes their tensor product. When q=rq=r, we write

𝖳,𝖴:=j1,,jr=1mTj1,,jrUj1,,jr,\langle\mathsf{T},\mathsf{U}\rangle:=\sum_{j_{1},\dots,j_{r}=1}^{m}T_{j_{1},\dots,j_{r}}U_{j_{1},\dots,j_{r}},

and

𝖳1:=j1,,jr=1m|Tj1,,jr|,𝖳:=max1j1,,jrm|Tj1,,jr|.\|\mathsf{T}\|_{1}:=\sum_{j_{1},\dots,j_{r}=1}^{m}|T_{j_{1},\dots,j_{r}}|,\qquad\|\mathsf{T}\|_{\infty}:=\max_{1\leq j_{1},\dots,j_{r}\leq m}|T_{j_{1},\dots,j_{r}}|.

For 𝒙m\bm{x}\in\mathbb{R}^{m}, 𝒙r\bm{x}^{\otimes r} denotes the rrth tensor power of 𝒙\bm{x}. Whenever 𝑿1,,𝑿n\bm{X}_{1},\dots,\bm{X}_{n} are under discussion, we set

𝑿¯r:=1ni=1n𝑿ir.\bar{\bm{X}}^{\,r}:=\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}^{\otimes r}.

Given an rr-times differentiable function h:dh:\mathbb{R}^{d}\rightarrow\mathbb{R}, we set rh(x):=(j1,,jrh(x))1j1,,jrd(d)r\nabla^{r}h(x):=\left(\partial^{j_{1},\ldots,j_{r}}h(x)\right)_{1\leq j_{1},\ldots,j_{r}\leq d}\in\left(\mathbb{R}^{d}\right)^{\otimes r} for xdx\in\mathbb{R}^{d}, where j1,,jr=rxj1xjr\partial^{j_{1},\ldots,j_{r}}=\frac{\partial^{r}}{\partial x_{j_{1}}\cdots\partial x_{j_{r}}}. For m{},Cbm(d)m\in\mathbb{N}\cup\{\infty\},C_{b}^{m}\left(\mathbb{R}^{d}\right) denotes the set of bounded CmC^{m} functions with bounded derivatives. For a multi-index α=(α1,,αm)0m\alpha=(\alpha_{1},\dots,\alpha_{m})\in\mathbb{N}_{0}^{m}, we write

|α|:=j=1mαj,α:=1α1mαm.|\alpha|:=\sum_{j=1}^{m}\alpha_{j},\qquad\partial^{\alpha}:=\partial_{1}^{\alpha_{1}}\cdots\partial_{m}^{\alpha_{m}}.

For a positive definite matrix VV, let ϕV\phi_{V} denote the density of N(𝟎,V)N(\bm{0},V). We write Φ\Phi for the standard normal distribution function and Φ¯:=1Φ\bar{\Phi}:=1-\Phi for its survival function. For a distribution function F:[0,1]F:\mathbb{R}\to[0,1], its generalized inverse is defined by

F1(p):=inf{t:F(t)p},p(0,1).F^{-1}(p):=\inf\{t\in\mathbb{R}:F(t)\geq p\},\qquad p\in(0,1).

For α>0\alpha>0 and a scalar random variable YY, let

Yψα:=inf{C>0:𝔼exp(|Y|α/Cα)2}.\|Y\|_{\psi_{\alpha}}:=\inf\Bigl\{C>0:\ \mathbb{E}\exp(|Y|^{\alpha}/C^{\alpha})\leq 2\Bigr\}.

For a matrix 𝐀=(aj)\mathbf{A}=(a_{j\ell}), we set

𝐀max:=max1j,m|aj|,Rj(𝐀):=j|aj|.\|\mathbf{A}\|_{\max}:=\max_{1\leq j,\ell\leq m}|a_{j\ell}|,\qquad R_{j}(\mathbf{A}):=\sum_{\ell\neq j}|a_{j\ell}|.

Also, \mathbb{P}^{*} and 𝔼\mathbb{E}^{*} denote conditional probability and expectation given the data. We assume d3d\geq 3 whenever an expression containing logd\log d appears, and similarly for nn.

2 Main Results

2.1 Asymptotic expansion of coverage probability

Let 𝑿1,,𝑿n\bm{X}_{1},\dots,\bm{X}_{n} be independent centered random vectors in d\mathbb{R}^{d}, and define

𝑺n:=1ni=1n𝑿i,𝒁N(𝟎,𝚺),𝚺:=Var(𝑺n).\bm{S}_{n}:=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{X}_{i},\qquad\bm{Z}\sim N(\bm{0},\mathbf{\Sigma}),\qquad\mathbf{\Sigma}:=\mathrm{Var}(\bm{S}_{n}).

Write

Tn,[1]Tn,[2]Tn,[d]T_{n,[1]}\geq T_{n,[2]}\geq\cdots\geq T_{n,[d]}

for the descending order statistics of the coordinates of 𝑺n\bm{S}_{n}, and define T𝒁,[k]T_{\bm{Z},[k]} analogously from 𝒁\bm{Z}. Set

Gk(t):=(T𝒁,[k]t),fk(t):=Gk(t)G_{k}(t):=\mathbb{P}(T_{\bm{Z},[k]}\leq t),\qquad f_{k}(t):=G_{k}^{\prime}(t)

whenever the derivative exists.

Let w1,,wnw_{1},\dots,w_{n} be i.i.d. multipliers independent of the data. Put

𝑿¯:=1ni=1n𝑿i,𝑺n:=1ni=1nwi(𝑿i𝑿¯).\bar{\bm{X}}:=\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i},\qquad\bm{S}_{n}^{*}:=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}w_{i}(\bm{X}_{i}-\bar{\bm{X}}).

Let Tn,[k]T_{n,[k]}^{*} denote the kkth largest coordinate of 𝑺n\bm{S}_{n}^{*}, and write

F^n,k(t):=(Tn,[k]t),c^p,k:=inf{t:F^n,k(t)p}.\hat{F}_{n,k}(t):=\mathbb{P}^{*}(T_{n,[k]}^{*}\leq t),\qquad\hat{c}_{p,k}:=\inf\{t\in\mathbb{R}:\hat{F}_{n,k}(t)\geq p\}.

For each coordinate, σj2:=Σjj,σ¯:=min1jdσj,σ¯:=max1jdσj.\sigma_{j}^{2}:=\Sigma_{jj},\underline{\sigma}:=\min_{1\leq j\leq d}\sigma_{j},\overline{\sigma}:=\max_{1\leq j\leq d}\sigma_{j}. For p(0,1)p\in(0,1), define the Gaussian quantile cp,kG:=Gk1(p).c^{G}_{p,k}:=G_{k}^{-1}(p). Fix ϵ(0,1/2)\epsilon\in(0,1/2) and define the quantile window

𝒯k,ϵ:={cp,kG:p[ϵ/2,1ϵ/2]}.\mathcal{T}_{k,\epsilon}:=\{c^{G}_{p,k}:\ p\in[\epsilon/2,1-\epsilon/2]\}.

For tt\in\mathbb{R}, define the exceedance counts

Nn(t):=j=1d𝟏{Sn,j>t},Nn(t):=j=1d𝟏{Sn,j>t},NZ(t):=j=1d𝟏{Zj>t}.N_{n}(t):=\sum_{j=1}^{d}\mathbf{1}\{S_{n,j}>t\},\qquad N_{n}^{*}(t):=\sum_{j=1}^{d}\mathbf{1}\{S_{n,j}^{*}>t\},\qquad N_{Z}(t):=\sum_{j=1}^{d}\mathbf{1}\{Z_{j}>t\}.

Then

Tn,[k]tNn(t)k1.T_{n,[k]}\leq t\iff N_{n}(t)\leq k-1.

For every integer s1s\geq 1, define

Vn,s(t):=𝔼(Nn(t)s),Vn,s(t):=𝔼(Nn(t)s),VZ,s(t):=𝔼(NZ(t)s).V_{n,s}(t):=\mathbb{E}\binom{N_{n}(t)}{s},\qquad V^{*}_{n,s}(t):=\mathbb{E}^{*}\binom{N_{n}^{*}(t)}{s},\qquad V_{Z,s}(t):=\mathbb{E}\binom{N_{Z}(t)}{s}.

For each nonempty I[d]I\subset[d], write

BI(t):={𝒛d:zj>t,jI}B_{I}(t):=\{\bm{z}\in\mathbb{R}^{d}:\ z_{j}>t,\forall j\in I\}

and

πI(t):=(Zj>t,jI).\pi_{I}(t):=\mathbb{P}(Z_{j}>t,\forall j\in I).

Let ϕ𝚺\phi_{\mathbf{\Sigma}} denote the density of N(𝟎,𝚺)N(\bm{0},\mathbf{\Sigma}), and abbreviate ϕ:=ϕ𝚺\phi:=\phi_{\mathbf{\Sigma}}. The first-order Edgeworth density for 𝑺n\bm{S}_{n} is

pn(𝒛):=ϕ(𝒛)16n𝔼[𝑿¯ 3],3ϕ(𝒛).p_{n}(\bm{z}):=\phi(\bm{z})-\frac{1}{6\sqrt{n}}\bigl\langle\mathbb{E}[\bar{\bm{X}}^{\,3}],\nabla^{3}\phi(\bm{z})\bigr\rangle.

Let γ:=𝔼(w13).\gamma:=\mathbb{E}(w_{1}^{3}). The bootstrap Edgeworth density is defined by

p^n,γ(𝒛):=ϕ(𝒛)+12𝑿¯ 2𝚺,2ϕ(𝒛)γ6n𝑿¯ 3,3ϕ(𝒛).\hat{p}_{n,\gamma}(\bm{z}):=\phi(\bm{z})+\frac{1}{2}\bigl\langle\bar{\bm{X}}^{\,2}-\mathbf{\Sigma},\nabla^{2}\phi(\bm{z})\bigr\rangle-\frac{\gamma}{6\sqrt{n}}\bigl\langle\bar{\bm{X}}^{\,3},\nabla^{3}\phi(\bm{z})\bigr\rangle.

For each integer s1s\geq 1, define

Mn,s(t):=I[d]|I|=s(t,)spn,I(𝒖)d𝒖,M^n,s,γ(t):=I[d]|I|=s(t,)sp^n,γ,I(𝒖)d𝒖,M_{n,s}(t):=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\int_{(t,\infty)^{s}}p_{n,I}(\bm{u})\,d\bm{u},\qquad\hat{M}_{n,s,\gamma}(t):=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\int_{(t,\infty)^{s}}\hat{p}_{n,\gamma,I}(\bm{u})\,d\bm{u},

where pn,Ip_{n,I} and p^n,γ,I\hat{p}_{n,\gamma,I} denote the corresponding projected densities defined later. Also set

MZ,s(t):=I[d]|I|=sπI(t)=VZ,s(t).M_{Z,s}(t):=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\pi_{I}(t)=V_{Z,s}(t).

We now present the assumptions underlying our analysis.

Assumption 2.1.

The vectors 𝐗1,,𝐗n\bm{X}_{1},\dots,\bm{X}_{n} are independent and centered. Each 𝐗i\bm{X}_{i} admits a Stein kernel 𝛕i(𝐗i)\bm{\tau}_{i}(\bm{X}_{i}) in the sense that

𝔼[𝑿if(𝑿i)]=𝔼[tr{𝝉i(𝑿i)f(𝑿i)}]\mathbb{E}\bigl[\bm{X}_{i}^{\top}f(\bm{X}_{i})\bigr]=\mathbb{E}\bigl[\mathrm{tr}\{\bm{\tau}_{i}(\bm{X}_{i})\nabla f(\bm{X}_{i})^{\top}\}\bigr]

for every smooth vector-valued test function for which both sides are finite. There exist constants b>0b>0 and σ>0\sigma_{*}>0 such that

  1. (i)

    λmin(𝚺)σ2\lambda_{\min}(\mathbf{\Sigma})\geq\sigma_{*}^{2};

  2. (ii)

    max1inmax1jdXijψ1b,max1inmax1j,dτi,j(𝑿i)𝔼τi,j(𝑿i)ψ1b2;\max_{1\leq i\leq n}\max_{1\leq j\leq d}\|X_{ij}\|_{\psi_{1}}\leq b,\max_{1\leq i\leq n}\max_{1\leq j,\ell\leq d}\|\tau_{i,j\ell}(\bm{X}_{i})-\mathbb{E}\tau_{i,j\ell}(\bm{X}_{i})\|_{\psi_{1}}\leq b^{2};

  3. (iii)

    δn:=b5σ5log3(dn)n,εn:=δnlogn\delta_{n}:=\frac{b^{5}}{\sigma_{*}^{5}}\frac{\log^{3}(dn)}{n},\varepsilon_{n}:=\sqrt{\delta_{n}\log n} satisfies εn0\varepsilon_{n}\to 0.

Remark 2.1.

Assumption 2.1 is the data-side regularity condition. The Stein identity provides the analytic device behind the projected Edgeworth expansion and is a convenient substitute for classical Cramér-type smoothness conditions in high dimension. As emphasized in Koike (2026, Remark 2.4), in one dimension the existence of a Stein kernel implies a nontrivial absolutely continuous component, and hence Cramér’s condition, whereas in higher dimensions Stein kernels remain available even in situations where a multivariate Cramér condition is not appropriate, such as Gaussian laws with singular covariance matrices. In our setting, part (i) requires a uniform lower bound on λmin(𝚺)\lambda_{\min}(\mathbf{\Sigma}), which prevents global degeneracy of the Gaussian comparison law and guarantees that the projected Gaussian densities and their derivatives remain well behaved. Part (ii) imposes sub-exponential control on both the coordinates XijX_{ij} and the fluctuations of the Stein-kernel entries. Since

𝔼{𝝉i(𝑿i)}=𝔼(𝑿i𝑿i),\mathbb{E}\{\bm{\tau}_{i}(\bm{X}_{i})\}=\mathbb{E}(\bm{X}_{i}\bm{X}_{i}^{\top}),

the centered quantity

τi,j(𝑿i)𝔼τi,j(𝑿i)\tau_{i,j\ell}(\bm{X}_{i})-\mathbb{E}\tau_{i,j\ell}(\bm{X}_{i})

measures the random fluctuation of the local covariance proxy around its population counterpart; controlling these fluctuations is exactly what allows Koike’s decomposition to be applied uniformly over the low-dimensional projections that enter our inclusion–exclusion argument. Finally, part (iii) is the high-dimensional scaling condition ensuring that the resulting remainder terms vanish. In particular, it specifies the regime in which the projected Edgeworth approximation is accurate enough to deliver a valid second-order expansion for the coverage probability.

Assumption 2.2.

The multipliers w1,w2,w_{1},w_{2},\dots are i.i.d., independent of the data, satisfy

𝔼w1=0,𝔼w12=1,𝔼|w1|m<for all m1,\mathbb{E}w_{1}=0,\qquad\mathbb{E}w_{1}^{2}=1,\qquad\mathbb{E}|w_{1}|^{m}<\infty\quad\text{for all }m\geq 1,

and, in addition, satisfy one of the following two conditions:

  1. (i)

    w1N(0,1)w_{1}\sim N(0,1);

  2. (ii)

    w1w_{1} admits a Stein kernel τw(w1)\tau^{w}(w_{1}) and there exists a constant bw1b_{w}\geq 1 such that

    |w1|bw,|τw(w1)|bw2a.s.|w_{1}|\leq b_{w},\qquad|\tau^{w}(w_{1})|\leq b_{w}^{2}\qquad\text{a.s.}

The constants in the sequel are allowed to depend on bwb_{w}.

Remark 2.2.

Assumption 2.2 is the bootstrap analogue of Assumption 2.1. It ensures that, conditional on the data, the multiplier statistic admits the same kind of Stein–Edgeworth expansion as the original statistic. The Gaussian case is separated out because it is the canonical multiplier choice and automatically fits the required framework. The alternative bounded Stein-kernel condition covers smooth non-Gaussian multipliers and is particularly useful for moment matching, which is central to the second-order improvement. As discussed in Koike (2026), this framework does not cover two-point multipliers such as Mammen’s weights, since two-point laws do not admit Stein kernels. Thus, the restriction is a limitation of the present proof strategy rather than of the bootstrap principle itself.

Assumption 2.3.

There exist constants 0<σ¯σ¯<0<\underline{\sigma}\leq\overline{\sigma}<\infty such that

σ¯2Σjjσ¯2,j=1,,d.\underline{\sigma}^{2}\leq\Sigma_{jj}\leq\overline{\sigma}^{2},\qquad j=1,\dots,d.
Remark 2.3.

Assumption 2.3 places all coordinates on a common scale. Because our target is the raw order statistic Tn,[k]T_{n,[k]}, we are ranking the coordinates of the normalized sum without any coordinatewise rescaling. Uniform upper and lower bounds on the marginal variances therefore rule out the possibility that some coordinates dominate the ranking merely because their variances diverge, or become asymptotically irrelevant because their variances vanish. Without this assumption, the geometry of the kkth largest coordinate would depend on heterogeneous marginal scales, and the limiting problem would be substantially more complicated. In that regime one would typically need a different normalization or even a different target statistic.

Assumption 2.4.

Let ρd:=max1ijd|Σij|.\rho_{d}:=\max_{1\leq i\neq j\leq d}|\Sigma_{ij}|. We assume ρdlogd0.\rho_{d}\log d\to 0.

Remark 2.4.

Assumption 2.4 is a weak-dependence condition tailored to our proof of the order-statistic expansion. The key step in the argument is to approximate the event {Tn,[k]>t}\{T_{n,[k]}>t\} by a finite-order inclusion–exclusion expansion and to show that the probability of having many coordinates simultaneously exceeding tt is negligible. For this strategy to work, exceedances above a high threshold must behave as rare events with only weak clustering, and the condition ρdlogd0\rho_{d}\log d\to 0 enforces exactly this feature. When pairwise correlations are too strong, exceedances can occur in large clusters, and then one can no longer guarantee that the probability of having more than k0k_{0} coordinates above the threshold decays fast enough for the truncation argument to be valid. Handling such strongly dependent regimes would require substantially further studies.

Fix a constant A>0A>0 and define k0:=Alog(εn1)k_{0}:=\left\lceil A\log(\varepsilon_{n}^{-1})\right\rceil. Throughout, k1k\geq 1 is fixed. Finally define

Qn,k(t):=s=kk0(1)sk(s1k1){Mn,s(t)MZ,s(t)},Q_{n,k}(t):=-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}\{M_{n,s}(t)-M_{Z,s}(t)\}, (1)

and define Q^n,γ,k(t)\hat{Q}_{n,\gamma,k}(t) analogously with M^n,s,γ(t)\hat{M}_{n,s,\gamma}(t) in place of Mn,s(t)M_{n,s}(t).

Theorem 2.1.

Assume Assumptions 2.12.4. Then, for A>0A>0 large enough,

supϵ<α<1ϵ|(Tn,[k]c^1α,k)[α(1γ)Qn,k(c1α,kG)𝔼{Rn,k(α)}]|Cεn2,\sup_{\epsilon<\alpha<1-\epsilon}\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha,k}\bigr)-\left[\alpha-(1-\gamma)Q_{n,k}(c^{G}_{1-\alpha,k})-\mathbb{E}\{R_{n,k}(\alpha)\}\right]\right|\leq C\varepsilon_{n}^{2}, (2)

where

Rn,k(α):=fk(c1α,kG)2fk(c1α,kG)3Q^n,γ,k(c1α,kG)2Q^n,γ,k(c1α,kG)fk(c1α,kG)2Q^n,γ,k(c1α,kG).R_{n,k}(\alpha):=\frac{f_{k}^{\prime}(c^{G}_{1-\alpha,k})}{2f_{k}(c^{G}_{1-\alpha,k})^{3}}\hat{Q}_{n,\gamma,k}(c^{G}_{1-\alpha,k})^{2}-\frac{\hat{Q}_{n,\gamma,k}^{\prime}(c^{G}_{1-\alpha,k})}{f_{k}(c^{G}_{1-\alpha,k})^{2}}\hat{Q}_{n,\gamma,k}(c^{G}_{1-\alpha,k}).

Theorem 2.1 is the main second-order coverage statement for the single wild bootstrap. It shows that the leading coverage distortion is described by the deterministic linear term (1γ)Qn,k(1-\gamma)Q_{n,k} together with the quadratic Cornish–Fisher correction 𝔼{Rn,k(α)}\mathbb{E}\{R_{n,k}(\alpha)\}, while the remaining error is of order εn2\varepsilon_{n}^{2}.

Corollary 2.1 (Third-moment matching).

Under the assumptions of Theorem 2.1, if γ=1\gamma=1, then

supϵ<α<1ϵ|(Tn,[k]c^1α,k)α|Cεn2.\sup_{\epsilon<\alpha<1-\epsilon}\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha,k}\bigr)-\alpha\right|\leq C\varepsilon_{n}^{2}.

Corollary 2.1 shows that matching the third multiplier moment removes the linear coverage distortion identified in Theorem 2.1. The wild bootstrap then becomes second-order accurate on the εn2\varepsilon_{n}^{2} scale without any further correction.

Corollary 2.2 (Persistence of the first-order term).

Under the assumptions of Theorem 2.1,

supϵ<α<1ϵ|(Tn,[k]c^1α,k)α+(1γ)Qn,k(c1α,kG)|Cεn2.\sup_{\epsilon<\alpha<1-\epsilon}\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha,k}\bigr)-\alpha+(1-\gamma)Q_{n,k}(c^{G}_{1-\alpha,k})\right|\leq C\varepsilon_{n}^{2}.

In particular, if for some α0(ϵ,1ϵ)\alpha_{0}\in(\epsilon,1-\epsilon) one has

|Qn,k(c1α0,kG)|c0εn,|Q_{n,k}(c^{G}_{1-\alpha_{0},k})|\geq c_{0}\varepsilon_{n},

then

|(Tn,[k]c^1α0,k)α0||1γ|c0εnCεn2.\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha_{0},k}\bigr)-\alpha_{0}\right|\geq|1-\gamma|c_{0}\varepsilon_{n}-C\varepsilon_{n}^{2}.

Corollary 2.2 shows that the term (1γ)Qn,k(1-\gamma)Q_{n,k} is not an artifact of the proof. Unless the third moment is matched, the single-bootstrap coverage error typically remains of first-order size.

2.2 Double wild bootstrap

Let v1,,vnv_{1},\dots,v_{n} be i.i.d. multipliers, independent of everything else, satisfying

𝔼v1=0,𝔼v12=1,𝔼v13=1,\mathbb{E}v_{1}=0,\qquad\mathbb{E}v_{1}^{2}=1,\qquad\mathbb{E}v_{1}^{3}=1,

and the same regularity condition as in Assumption 2.2. Define

𝑿i:=wi(𝑿i𝑿¯),𝑿¯:=1ni=1n𝑿i,𝑺n:=1ni=1nvi(𝑿i𝑿¯).\bm{X}_{i}^{*}:=w_{i}(\bm{X}_{i}-\bar{\bm{X}}),\qquad\bar{\bm{X}}^{*}:=\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}^{*},\qquad\bm{S}_{n}^{**}:=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}v_{i}(\bm{X}_{i}^{*}-\bar{\bm{X}}^{*}).

Let Tn,[k]T^{**}_{n,[k]} be the kkth largest coordinate of 𝑺n\bm{S}_{n}^{**}, let

F^n,k(t):=(Tn,[k]t),\hat{F}^{*}_{n,k}(t):=\mathbb{P}^{**}(T^{**}_{n,[k]}\leq t),

and define

β^α,k:=inf{β(0,1):(F^n,k(Tn,[k])β)1α}.\hat{\beta}_{\alpha,k}:=\inf\Bigl\{\beta\in(0,1):\ \mathbb{P}^{*}\bigl(\hat{F}^{*}_{n,k}(T_{n,[k]}^{*})\leq\beta\bigr)\geq 1-\alpha\Bigr\}.

The prepivoted double-bootstrap test rejects when

Tn,[k]c^β^α,k,k.T_{n,[k]}\geq\hat{c}_{\hat{\beta}_{\alpha,k},k}.
Theorem 2.2.

Assume Assumptions 2.12.4 and the second-level multiplier condition above. Then, for every fixed ϵ(0,1/4)\epsilon\in(0,1/4),

sup2ϵ<α<12ϵ|(Tn,[k]c^β^α,k,k)α|Cεn2.\sup_{2\epsilon<\alpha<1-2\epsilon}\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{\hat{\beta}_{\alpha,k},k}\bigr)-\alpha\right|\leq C\varepsilon_{n}^{2}. (3)

Theorem 2.2 shows that prepivoting removes the leading single-bootstrap distortion and restores second-order accuracy. Thus the double wild bootstrap achieves the same εn2\varepsilon_{n}^{2} coverage scale as the third-moment matching single bootstrap.

2.3 A stationary exponential-mixing alternative

The maximal-correlation condition in Assumption 2.4 can be replaced by a one-dimensional dependence condition when the Gaussian reference field is generated by a stationary Gaussian sequence. The price is an explicit additional remainder that records the contribution of local clusters of exceedances.

Assumption 2.5 (Stationary Gaussian coordinates with exponential strong mixing).

The Gaussian reference vector 𝐙=(Z1,,Zd)\bm{Z}=(Z_{1},\dots,Z_{d})^{\top} is the first dd coordinates of a centered stationary Gaussian sequence {Zj}j\{Z_{j}\}_{j\in\mathbb{Z}} with covariance function

Γ(h):=Cov(Z0,Zh),Γ(0)=σ2[σ¯2,σ¯2].\Gamma(h):=\mathrm{Cov}(Z_{0},Z_{h}),\qquad\Gamma(0)=\sigma^{2}\in[\underline{\sigma}^{2},\overline{\sigma}^{2}].

Its strong-mixing coefficients satisfy

α():=sup{|(AB)(A)(B)|:Aσ(Zj:j0),Bσ(Zj:j)}Cαeaα,1,\alpha(\ell):=\sup\Bigl\{\bigl|\mathbb{P}(A\cap B)-\mathbb{P}(A)\mathbb{P}(B)\bigr|:A\in\sigma(Z_{j}:j\leq 0),\ B\in\sigma(Z_{j}:j\geq\ell)\Bigr\}\leq C_{\alpha}e^{-a_{\alpha}\ell},\qquad\ell\geq 1,

for some constants Cα1C_{\alpha}\geq 1 and aα>0a_{\alpha}>0.

Write

ρ(h):=Corr(Z0,Zh)=Γ(h)/σ2,h,p(t):=(Z1>t)=Φ¯(t/σ),λ(t):=dp(t).\rho(h):=\mathrm{Corr}(Z_{0},Z_{h})=\Gamma(h)/\sigma^{2},\qquad h\in\mathbb{Z},\qquad p(t):=\mathbb{P}(Z_{1}>t)=\bar{\Phi}(t/\sigma),\qquad\lambda(t):=d\,p(t).

Because every 2×22\times 2 principal submatrix of Σ\Sigma has diagonal entries at most σ¯2\overline{\sigma}^{2} and smallest eigenvalue at least σ2\sigma_{*}^{2}, we have

suph1|ρ(h)|1σ2σ¯2=:ϑ<1.\sup_{h\geq 1}|\rho(h)|\leq 1-\frac{\sigma_{*}^{2}}{\overline{\sigma}^{2}}=:\vartheta_{*}<1. (4)

Set

β:=1ϑ1+ϑ>0.\beta_{*}:=\frac{1-\vartheta_{*}}{1+\vartheta_{*}}>0. (5)

Let Λk,ϵ>0\Lambda_{k,\epsilon}>0 be the unique constant satisfying

hk(Λk,ϵ)=ϵ/8,hk(λ):=eλm=0k1λmm!.h_{k}(\Lambda_{k,\epsilon})=\epsilon/8,\qquad h_{k}(\lambda):=e^{-\lambda}\sum_{m=0}^{k-1}\frac{\lambda^{m}}{m!}. (6)

Fix

md:=dβ/4,d:=8(k0+2)log(2d)+8lognaα,qd:=dmd+d.m_{d}:=\left\lceil d^{\beta_{*}/4}\right\rceil,\qquad\ell_{d}:=\left\lceil\frac{8(k_{0}+2)\log(2d)+8\log n}{a_{\alpha}}\right\rceil,\qquad q_{d}:=\left\lfloor\frac{d}{m_{d}+\ell_{d}}\right\rfloor. (7)

Define

η1,d:=dmd+md+dd+d3β/4(logd)1/2,\eta_{1,d}:=\frac{\ell_{d}}{m_{d}}+\frac{m_{d}+\ell_{d}}{d}+d^{-3\beta_{*}/4}(\log d)^{-1/2}, (8)

and

rd:=η1,d+qd1+dk0+1α(d)+(3Λk,ϵ)k0+1(k0+1)!.r_{d}:=\eta_{1,d}+q_{d}^{-1}+d^{k_{0}+1}\alpha(\ell_{d})+\frac{(3\Lambda_{k,\epsilon})^{k_{0}+1}}{(k_{0}+1)!}. (9)

Since (7) implies

α(d)Cαn8(2d)8(k0+2),\alpha(\ell_{d})\leq C_{\alpha}n^{-8}(2d)^{-8(k_{0}+2)}, (10)

the sequence rdr_{d} tends to 0.

Theorem 2.3 (Stationary exponential-mixing alternative).

Assume Assumptions 2.1, 2.2, 2.3, and 2.5. Then, for A>0A>0 large enough, there exists a constant C>0C>0 such that

supt𝒯k,ϵ|(Tn,[k]t)(Gk(t)+Qn,k(t))|\displaystyle\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}(T_{n,[k]}\leq t)-\bigl(G_{k}(t)+Q_{n,k}(t)\bigr)\right| C(εn2+rd),\displaystyle\leq C(\varepsilon_{n}^{2}+r_{d}), (11)
supt𝒯k,ϵ|(Tn,[k]t)(Gk(t)+Q^n,γ,k(t))|\displaystyle\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}^{*}(T_{n,[k]}^{*}\leq t)-\bigl(G_{k}(t)+\hat{Q}_{n,\gamma,k}(t)\bigr)\right| C(εn2+rd)\displaystyle\leq C(\varepsilon_{n}^{2}+r_{d}) (12)

with probability at least 1C/n1-C/n,

supϵ<α<1ϵ|c^1α,k[c1α,kGQ^n,γ,k(c1α,kG)fk(c1α,kG)+Rn,k(α)]|C(εn3+rd),\sup_{\epsilon<\alpha<1-\epsilon}\left|\hat{c}_{1-\alpha,k}-\left[c^{G}_{1-\alpha,k}-\frac{\hat{Q}_{n,\gamma,k}(c^{G}_{1-\alpha,k})}{f_{k}(c^{G}_{1-\alpha,k})}+R_{n,k}(\alpha)\right]\right|\leq C(\varepsilon_{n}^{3}+r_{d}), (13)

with probability at least 1C/n1-C/n,

supϵ<α<1ϵ|(Tn,[k]c^1α,k)[α(1γ)Qn,k(c1α,kG)𝔼{Rn,k(α)}]|C(εn2+rd),\sup_{\epsilon<\alpha<1-\epsilon}\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha,k}\bigr)-\left[\alpha-(1-\gamma)Q_{n,k}(c^{G}_{1-\alpha,k})-\mathbb{E}\{R_{n,k}(\alpha)\}\right]\right|\leq C(\varepsilon_{n}^{2}+r_{d}), (14)

and, if γ=1\gamma=1,

supϵ<α<1ϵ|(Tn,[k]c^1α,k)α|C(εn2+rd).\sup_{\epsilon<\alpha<1-\epsilon}\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha,k}\bigr)-\alpha\right|\leq C(\varepsilon_{n}^{2}+r_{d}). (15)

The corresponding double wild bootstrap statement also holds with the same remainder:

sup2ϵ<α<12ϵ|(Tn,[k]c^β^α,k,k)α|C(εn2+rd).\sup_{2\epsilon<\alpha<1-2\epsilon}\left|\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{\hat{\beta}_{\alpha,k},k}\bigr)-\alpha\right|\leq C(\varepsilon_{n}^{2}+r_{d}). (16)

Theorem 2.3 replaces the maximal-correlation condition by a one-dimensional dependence assumption on the Gaussian reference field. The price is the explicit remainder rdr_{d}, which isolates the effect of local clustering while leaving the structure of the second-order expansion unchanged.

Remark 2.5.

The remainder rdr_{d} is driven mainly by the block-length ratio d/md\ell_{d}/m_{d}. Since k0Clognk_{0}\leq C\log n, the definition of d\ell_{d} yields

dClognlog(2d).\ell_{d}\leq C\log n\,\log(2d).

Consequently,

rd\displaystyle r_{d} Clognlog(2d)dβ/4+Cd1+β/4+Cd3β/4(logd)1/2+Cn8d7k015+C(3Λk,ϵ)k0+1(k0+1)!.\displaystyle\leq C\frac{\log n\,\log(2d)}{d^{\beta_{*}/4}}+Cd^{-1+\beta_{*}/4}+Cd^{-3\beta_{*}/4}(\log d)^{-1/2}+Cn^{-8}d^{-7k_{0}-15}+C\frac{(3\Lambda_{k,\epsilon})^{k_{0}+1}}{(k_{0}+1)!}.

In particular, a sufficient condition for rd=O(n1)r_{d}=O(n^{-1}) is

dβ/4Cnlognlog(2d).d^{\beta_{*}/4}\geq Cn\log n\,\log(2d).

If d=ncd=n^{c}, then

rd\displaystyle r_{d} Cncβ/4log2n+Cnc(1β/4)+Cn3cβ/4(logn)1/2+Cn8c(7k0+15)+C(3Λk,ϵ)k0+1(k0+1)!,\displaystyle\leq Cn^{-c\beta_{*}/4}\log^{2}n+Cn^{-c(1-\beta_{*}/4)}+Cn^{-3c\beta_{*}/4}(\log n)^{-1/2}+Cn^{-8-c(7k_{0}+15)}+C\frac{(3\Lambda_{k,\epsilon})^{k_{0}+1}}{(k_{0}+1)!},

so rd=O(n1)r_{d}=O(n^{-1}) whenever c>4/βc>4/\beta_{*}.

The proof of Theorem 2.3 is given in Appendix B. Only the Gaussian aggregation part of Appendix A needs to be modified; the projected local Edgeworth expansion remains unchanged once the shift/strip bounds are re-established under Assumption 2.5.

3 Simulation

We investigate the finite-sample size of the bootstrap procedures for the kkth largest coordinate Tn,[1]Tn,[2]Tn,[d]T_{n,[1]}\geq T_{n,[2]}\geq\cdots\geq T_{n,[d]} of 𝑺n=1ni=1n𝑿i.\bm{S}_{n}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{X}_{i}. The simulation design is kept fixed across all experiments, and only the target order statistic is varied. We report results for k{2,5,10}.k\in\{2,5,10\}. The case k=1k=1 coincides with the maximum and is therefore omitted here.

Throughout the simulation, the dimension is fixed at d=400,d=400, and the sample size is taken from n{200,400}.n\in\{200,400\}. For the dependence structure, we consider two correlation designs. In Design I,

𝐑=ρ 1d𝟏d+(1ρ)𝐈d,\mathbf{R}=\rho\,\mathbf{1}_{d}\mathbf{1}_{d}^{\top}+(1-\rho)\mathbf{I}_{d},

and in Design II,

𝐑=(ρ|jk|),1j,kd,\mathbf{R}=(\rho^{|j-k|}),\qquad 1\leq j,k\leq d,

with ρ{0.2,0.8}.\rho\in\{0.2,0.8\}.

Let Φ\Phi denote the standard normal distribution function. For θ>0\theta>0, let FθF_{\theta} be the distribution function of the gamma distribution with shape parameter θ\theta and unit scale. For each Monte Carlo repetition, we first generate

𝒁i=(Zi1,,Zid)N(𝟎,𝐑),i=1,,n,\bm{Z}_{i}=(Z_{i1},\dots,Z_{id})^{\top}\sim N(\bm{0},\mathbf{R}),\qquad i=1,\dots,n,

independently, and then define

Uij=Fθ1(Φ(Zij)),1in, 1jd.U_{ij}=F_{\theta}^{-1}\bigl(\Phi(Z_{ij})\bigr),\qquad 1\leq i\leq n,\;1\leq j\leq d.

This yields a Gaussian-copula model with gamma marginals.

We consider two cases.

  • Asymmetric case. We set θ=1\theta=1 and define

    𝑿i=𝑼i𝟏d,i=1,,n,\bm{X}_{i}=\bm{U}_{i}-\mathbf{1}_{d},\qquad i=1,\dots,n,

    where 𝑼i=(Ui1,,Uid)\bm{U}_{i}=(U_{i1},\dots,U_{id})^{\top}. Since each marginal has mean 11, the vector 𝑿i\bm{X}_{i} is centered.

  • Symmetric case. We set θ=12\theta=\tfrac{1}{2}. Let 𝑼i\bm{U}_{i}^{\prime} be an independent copy of 𝑼i\bm{U}_{i}, and define

    𝑿i=𝑼i𝑼i,i=1,,n.\bm{X}_{i}=\bm{U}_{i}-\bm{U}_{i}^{\prime},\qquad i=1,\dots,n.

    This symmetrization removes skewness. The choice θ=12\theta=\tfrac{1}{2} keeps the marginal kurtosis on the same scale as in the asymmetric setup.

We consider the following bootstrap methods:

  • Empirical bootstrap (EB). The classic naive bootstrap methods;

  • Gaussian wild bootstrap (GB): wiN(0,1).w_{i}\sim N(0,1).

  • Mammen wild bootstrap (MB):

    (wi=1+52)=5125,(wi=512)=5+125.\mathbb{P}\!\left(w_{i}=\frac{1+\sqrt{5}}{2}\right)=\frac{\sqrt{5}-1}{2\sqrt{5}},\qquad\mathbb{P}\!\left(w_{i}=-\frac{\sqrt{5}-1}{2}\right)=\frac{\sqrt{5}+1}{2\sqrt{5}}.
  • Rademacher wild bootstrap (RB): (wi=1)=(wi=1)=12.\mathbb{P}(w_{i}=1)=\mathbb{P}(w_{i}=-1)=\frac{1}{2}.

  • Beta wild bootstrap (BB): let ν=0.1\nu=0.1 and define

    cν=ν2+20ν+20,αν=ν2(1ν+2cν),βν=ν2(1+ν+2cν).c_{\nu}=\nu^{2}+20\nu+20,\alpha_{\nu}=\frac{\nu}{2}\left(1-\frac{\nu+2}{\sqrt{c_{\nu}}}\right),\beta_{\nu}=\frac{\nu}{2}\left(1+\frac{\nu+2}{\sqrt{c_{\nu}}}\right).

    Let ηiBeta(αν,βν)\eta_{i}\sim\mathrm{Beta}(\alpha_{\nu},\beta_{\nu}) i.i.d., and standardize by

    wi=ηi𝔼[ηi]Var(ηi).w_{i}=\frac{\eta_{i}-\mathbb{E}[\eta_{i}]}{\sqrt{\mathrm{Var}(\eta_{i})}}.

    Then

    𝔼[wi]=0,𝔼[wi2]=1,𝔼[wi3]=1.\mathbb{E}[w_{i}]=0,\qquad\mathbb{E}[w_{i}^{2}]=1,\qquad\mathbb{E}[w_{i}^{3}]=1.
  • double wild bootstrap (DB). The bootstrap method proposed in subsection 2.2.

For the Monte Carlo implementation, we use B1=499B_{1}=499 first-level bootstrap replications for EB, GB, MB, RB, and BB, and for DB we use B1=499,B2=99B_{1}=499,B_{2}=99 at the first and second bootstrap levels, respectively.

Tables 1-3 report the emprical sizes of different bootstrap methods at the 10%10\% level for k=2,5,10k=2,5,10, respectively. Across k{2,5,10}k\in\{2,5,10\}, the qualitative ordering of the bootstrap procedures is largely unchanged. The dominant source of finite-sample distortion is the underlying design—most notably asymmetry and the more difficult Design II—rather than the value of kk itself. EB is uniformly conservative, with the under-rejection being especially visible in the asymmetric settings and in symmetric Design II, although the distortion is somewhat mitigated as nn increases. GB is more design-sensitive: it is reasonably well calibrated in symmetric Design I, but becomes distinctly liberal under asymmetry, particularly when nn is small and ρ=0.2\rho=0.2. MB and BB display the most stable behavior overall; both are typically mildly conservative, yet they avoid the substantial over-rejection exhibited by GB and, more markedly, RB, and their performance is comparatively robust across designs and values of kk. RB is the least robust method: it is very accurate, and often closest to the nominal level, in the symmetric experiments, but it becomes severely oversized under asymmetry, especially in Design II. DB is frequently numerically closest to the nominal 10%10\% level in the asymmetric designs, although this occurs through a persistent liberal bias; under symmetry it likewise remains slightly oversized. Larger nn generally improves calibration, and increasing kk attenuates some distortions, but these effects are quantitative rather than qualitative. Overall, the evidence points to MB and BB as the most reliable choices when uniform size control across heterogeneous designs is the primary concern, whereas RB is competitive only when symmetry is a credible approximation.

Table 1: Empirical sizes at the 10%10\% level for the second largest coordinate Tn,[2]T_{n,[2]}.
Design nn ρ\rho EB GB MB RB BB DB
Panel A: Asymmetric
I 200 0.2 0.0612 0.1211 0.0757 0.1464 0.0743 0.1114
I 200 0.8 0.0731 0.0869 0.0739 0.0907 0.0719 0.1002
I 400 0.2 0.0732 0.1172 0.0809 0.1301 0.0802 0.1044
I 400 0.8 0.0814 0.0954 0.0809 0.0962 0.0821 0.1034
II 200 0.2 0.0619 0.152 0.0888 0.215 0.0884 0.115
II 200 0.8 0.0686 0.137 0.0865 0.174 0.0857 0.107
II 400 0.2 0.0790 0.153 0.0946 0.182 0.0936 0.109
II 400 0.8 0.0826 0.134 0.0927 0.156 0.0921 0.105
Panel B: Symmetric
I 200 0.2 0.0731 0.0829 0.0892 0.1047 0.0907 0.120
I 200 0.8 0.0970 0.1015 0.1002 0.1058 0.0993 0.114
I 400 0.2 0.0868 0.0914 0.0963 0.1035 0.0934 0.109
I 400 0.8 0.0993 0.1016 0.1029 0.1038 0.1026 0.109
II 200 0.2 0.0584 0.0653 0.0830 0.106 0.0814 0.116
II 200 0.8 0.0677 0.0759 0.0859 0.101 0.0848 0.104
II 400 0.2 0.0763 0.0807 0.0902 0.102 0.0893 0.107
II 400 0.8 0.0877 0.0911 0.0977 0.106 0.0984 0.110
Table 2: Empirical sizes at the 10%10\% level for the 5th largest coordinate Tn,[5]T_{n,[5]}.
Design nn ρ\rho EB GB MB RB BB DB
Panel A: Asymmetric
I 200 0.2 0.0645 0.1140 0.0731 0.1298 0.0730 0.107
I 200 0.8 0.0738 0.0866 0.0742 0.0878 0.0735 0.099
I 400 0.2 0.0754 0.1119 0.0823 0.1235 0.0809 0.102
I 400 0.8 0.0835 0.0953 0.0852 0.0955 0.0843 0.103
II 200 0.2 0.0565 0.1510 0.0887 0.2220 0.0854 0.118
II 200 0.8 0.0712 0.1380 0.0883 0.1730 0.0875 0.111
II 400 0.2 0.0731 0.1510 0.0907 0.1880 0.0884 0.111
II 400 0.8 0.0792 0.1290 0.0890 0.1450 0.0886 0.101
Panel B: Symmetric
I 200 0.2 0.0820 0.0891 0.0918 0.1050 0.0914 0.113
I 200 0.8 0.0985 0.1035 0.1006 0.1060 0.0974 0.112
I 400 0.2 0.0933 0.0962 0.0986 0.1050 0.0979 0.109
I 400 0.8 0.1005 0.1009 0.1007 0.1030 0.1017 0.107
II 200 0.2 0.0587 0.0629 0.0826 0.1070 0.0798 0.122
II 200 0.8 0.0711 0.0780 0.0867 0.1010 0.0883 0.109
II 400 0.2 0.0781 0.0802 0.0918 0.1050 0.0917 0.115
II 400 0.8 0.0886 0.0906 0.0960 0.1040 0.0957 0.107
Table 3: Empirical sizes at the 10%10\% level for the 10th largest coordinate Tn,[10]T_{n,[10]}.
Design nn ρ\rho EB GB MB RB BB DB
Panel A: Asymmetric
I 200 0.2 0.0695 0.1073 0.0769 0.1173 0.0757 0.1052
I 200 0.8 0.0745 0.0853 0.0749 0.0871 0.0730 0.0974
I 400 0.2 0.0770 0.1046 0.0798 0.1111 0.0796 0.0983
I 400 0.8 0.0843 0.0944 0.0867 0.0948 0.0856 0.1027
II 200 0.2 0.0537 0.142 0.0812 0.223 0.0782 0.117
II 200 0.8 0.0774 0.132 0.0883 0.159 0.0875 0.110
II 400 0.2 0.0732 0.146 0.0905 0.187 0.0895 0.113
II 400 0.8 0.0842 0.129 0.0932 0.146 0.0919 0.104
Panel B: Symmetric
I 200 0.2 0.0851 0.0894 0.0901 0.0997 0.0900 0.110
I 200 0.8 0.1005 0.1033 0.1014 0.1056 0.1002 0.110
I 400 0.2 0.0945 0.0969 0.0974 0.1019 0.0982 0.106
I 400 0.8 0.1004 0.1014 0.1002 0.1026 0.1017 0.105
II 200 0.2 0.0608 0.0636 0.0822 0.108 0.0789 0.124
II 200 0.8 0.0764 0.0786 0.0890 0.103 0.0904 0.110
II 400 0.2 0.0756 0.0774 0.0894 0.104 0.0891 0.114
II 400 0.8 0.0885 0.0902 0.0951 0.102 0.0960 0.107

4 Conclusion

This paper studies Gaussian and bootstrap approximations for the kkth largest coordinate statistic Tn,[k]T_{n,[k]} in high dimensions. We establish theoretical guarantees that justify bootstrap critical values when the ambient dimension is allowed to grow with the sample size, thereby extending valid inference beyond the maximum to nonmaximal order statistics. The simulation results show that the proposed framework delivers accurate finite-sample inference and clarify the relative robustness of the competing bootstrap procedures across a range of designs.

An important direction for future research is to develop analogous Gaussian approximation results for temporally dependent observations. Doing so would require a theory that accommodates serial dependence, long-run covariance estimation, and resampling schemes that preserve the time-series structure; see, for example, (Shao, 2010; Zhang and Wu, 2017; Zhang and Cheng, 2014, 2018; Chang et al., 2024, 2023, 2025).

Appendix A Appendix A: Proofs of Theorems

A.1 Combinatorial identities

Lemma A.1 (Finite inclusion–exclusion identity).

For every integer k1k\geq 1 and every nonnegative integer-valued random variable NN,

𝟏{Nk}=s=kN(1)sk(s1k1)(Ns).\mathbf{1}\{N\geq k\}=\sum_{s=k}^{N}(-1)^{s-k}\binom{s-1}{k-1}\binom{N}{s}. (17)

Consequently,

(Tn,[k]>t)=s=kd(1)sk(s1k1)Vn,s(t),\mathbb{P}(T_{n,[k]}>t)=\sum_{s=k}^{d}(-1)^{s-k}\binom{s-1}{k-1}V_{n,s}(t), (18)

and analogously with Nn(t)N_{n}^{*}(t) and NZ(t)N_{Z}(t).

Proof.

For deterministic N=mN=m, define

Sm,k:=s=km(1)sk(s1k1)(ms).S_{m,k}:=\sum_{s=k}^{m}(-1)^{s-k}\binom{s-1}{k-1}\binom{m}{s}.

Using

(s1k1)(ms)=(mk)(mksk),\binom{s-1}{k-1}\binom{m}{s}=\binom{m}{k}\binom{m-k}{s-k},

we obtain

Sm,k=(mk)r=0mk(1)r(mkr)=(mk)(11)mk.S_{m,k}=\binom{m}{k}\sum_{r=0}^{m-k}(-1)^{r}\binom{m-k}{r}=\binom{m}{k}(1-1)^{m-k}.

Hence

Sm,k={0,m<k,1,mk.S_{m,k}=\begin{cases}0,&m<k,\\ 1,&m\geq k.\end{cases}

This proves (17). Taking expectations with N=Nn(t)N=N_{n}(t) gives (18). ∎

A.2 Projected quantities

For every nonempty I={i1,,is}[d]I=\{i_{1},\dots,i_{s}\}\subset[d], let 𝑷I:ds\bm{P}_{I}:\mathbb{R}^{d}\to\mathbb{R}^{s} denote the coordinate projection. Define

𝑺n,I:=𝑷I𝑺n,𝚺II:=𝑷I𝚺𝑷I,ϕI:=ϕ𝚺II.\bm{S}_{n,I}:=\bm{P}_{I}\bm{S}_{n},\qquad\mathbf{\Sigma}_{II}:=\bm{P}_{I}\mathbf{\Sigma}\bm{P}_{I}^{\top},\qquad\phi_{I}:=\phi_{\mathbf{\Sigma}_{II}}.

Also define

𝑿¯Ir:=1ni=1n(𝑷I𝑿i)r,𝒃i,I:=𝑷I(𝑿i𝑿¯),𝒃¯Ir:=1ni=1n𝒃i,Ir.\bar{\bm{X}}_{I}^{\,r}:=\frac{1}{n}\sum_{i=1}^{n}(\bm{P}_{I}\bm{X}_{i})^{\otimes r},\qquad\bm{b}_{i,I}:=\bm{P}_{I}(\bm{X}_{i}-\bar{\bm{X}}),\qquad\bar{\bm{b}}_{I}^{\,r}:=\frac{1}{n}\sum_{i=1}^{n}\bm{b}_{i,I}^{\otimes r}.

The projected Edgeworth densities are

pn,I(𝒖)\displaystyle p_{n,I}(\bm{u}) :=ϕI(𝒖)16n𝔼[𝑿¯I 3],3ϕI(𝒖),\displaystyle:=\phi_{I}(\bm{u})-\frac{1}{6\sqrt{n}}\bigl\langle\mathbb{E}[\bar{\bm{X}}_{I}^{\,3}],\nabla^{3}\phi_{I}(\bm{u})\bigr\rangle, (19)
p^n,γ,I(𝒖)\displaystyle\hat{p}_{n,\gamma,I}(\bm{u}) :=ϕI(𝒖)+12𝒃¯I 2𝚺II,2ϕI(𝒖)γ6n𝒃¯I 3,3ϕI(𝒖).\displaystyle:=\phi_{I}(\bm{u})+\frac{1}{2}\bigl\langle\bar{\bm{b}}_{I}^{\,2}-\mathbf{\Sigma}_{II},\nabla^{2}\phi_{I}(\bm{u})\bigr\rangle-\frac{\gamma}{6\sqrt{n}}\bigl\langle\bar{\bm{b}}_{I}^{\,3},\nabla^{3}\phi_{I}(\bm{u})\bigr\rangle. (20)
Lemma A.2 (Projection preserves the data-side assumptions).

Assume Assumption 2.1. For every nonempty I[d]I\subset[d] the projected vectors 𝐏I𝐗1,,𝐏I𝐗n\bm{P}_{I}\bm{X}_{1},\dots,\bm{P}_{I}\bm{X}_{n} satisfy the same Stein identity with covariance matrix 𝚺II\mathbf{\Sigma}_{II}, the same sub-exponential envelope bb, and

λmin(𝚺II)σ2.\lambda_{\min}(\mathbf{\Sigma}_{II})\geq\sigma_{*}^{2}.
Proof.

Let 𝒀i:=𝑷I𝑿i\bm{Y}_{i}:=\bm{P}_{I}\bm{X}_{i}. For any smooth g:|I||I|g:\mathbb{R}^{|I|}\to\mathbb{R}^{|I|} define

f(𝒙):=𝑷Ig(𝑷I𝒙).f(\bm{x}):=\bm{P}_{I}^{\top}g(\bm{P}_{I}\bm{x}).

Then

f(𝒙)=𝑷Ig(𝑷I𝒙)𝑷I.\nabla f(\bm{x})^{\top}=\bm{P}_{I}^{\top}\nabla g(\bm{P}_{I}\bm{x})^{\top}\bm{P}_{I}.

Applying the Stein identity for 𝑿i\bm{X}_{i} yields

𝔼[𝒀ig(𝒀i)]\displaystyle\mathbb{E}[\bm{Y}_{i}^{\top}g(\bm{Y}_{i})] =𝔼[𝑿if(𝑿i)]\displaystyle=\mathbb{E}[\bm{X}_{i}^{\top}f(\bm{X}_{i})]
=𝔼[tr{𝝉i(𝑿i)𝑷Ig(𝑷I𝑿i)𝑷I}]\displaystyle=\mathbb{E}\Bigl[\mathrm{tr}\{\bm{\tau}_{i}(\bm{X}_{i})\bm{P}_{I}^{\top}\nabla g(\bm{P}_{I}\bm{X}_{i})^{\top}\bm{P}_{I}\}\Bigr]
=𝔼[tr{𝑷I𝝉i(𝑿i)𝑷Ig(𝒀i)}].\displaystyle=\mathbb{E}\Bigl[\mathrm{tr}\{\bm{P}_{I}\bm{\tau}_{i}(\bm{X}_{i})\bm{P}_{I}^{\top}\nabla g(\bm{Y}_{i})^{\top}\}\Bigr].

Hence 𝑷I𝝉i(𝑿i)𝑷I\bm{P}_{I}\bm{\tau}_{i}(\bm{X}_{i})\bm{P}_{I}^{\top} is a Stein kernel for 𝒀i\bm{Y}_{i}. The ψ1\psi_{1} bounds follow by monotonicity under projection. Finally, for every nonzero 𝒖|I|\bm{u}\in\mathbb{R}^{|I|},

𝒖𝚺II𝒖=(𝑷I𝒖)𝚺(𝑷I𝒖)σ2𝑷I𝒖22=σ2𝒖22.\bm{u}^{\top}\mathbf{\Sigma}_{II}\bm{u}=(\bm{P}_{I}^{\top}\bm{u})^{\top}\mathbf{\Sigma}(\bm{P}_{I}^{\top}\bm{u})\geq\sigma_{*}^{2}\|\bm{P}_{I}^{\top}\bm{u}\|_{2}^{2}=\sigma_{*}^{2}\|\bm{u}\|_{2}^{2}.

A.3 External matrix, Gaussian-comparison, and Koike lemmas

Lemma A.3 (Gershgorin interval theorem).

Let A=(aj)s×sA=(a_{j\ell})\in\mathbb{R}^{s\times s} be symmetric. Then

λmin(A)min1js{ajjRj(A)},λmax(A)max1js{ajj+Rj(A)}.\lambda_{\min}(A)\geq\min_{1\leq j\leq s}\{a_{jj}-R_{j}(A)\},\qquad\lambda_{\max}(A)\leq\max_{1\leq j\leq s}\{a_{jj}+R_{j}(A)\}. (21)
Proof.

Let λ\lambda be an eigenvalue of AA with eigenvector x=(x1,,xs)0x=(x_{1},\dots,x_{s})^{\top}\neq 0. Choose

j0argmax1js|xj|.j_{0}\in\arg\max_{1\leq j\leq s}|x_{j}|.

Since Ax=λxAx=\lambda x,

(λaj0j0)xj0=j0aj0x.(\lambda-a_{j_{0}j_{0}})x_{j_{0}}=\sum_{\ell\neq j_{0}}a_{j_{0}\ell}x_{\ell}.

Hence

|λaj0j0||xj0|j0|aj0||x|Rj0(A)|xj0|,|\lambda-a_{j_{0}j_{0}}||x_{j_{0}}|\leq\sum_{\ell\neq j_{0}}|a_{j_{0}\ell}||x_{\ell}|\leq R_{j_{0}}(A)|x_{j_{0}}|,

and therefore

|λaj0j0|Rj0(A).|\lambda-a_{j_{0}j_{0}}|\leq R_{j_{0}}(A).

This proves that every eigenvalue belongs to at least one Gershgorin interval

[ajjRj(A),ajj+Rj(A)],j=1,,s.[a_{jj}-R_{j}(A),a_{jj}+R_{j}(A)],\qquad j=1,\dots,s.

Taking the minimum and maximum over these intervals yields (21). ∎

Lemma A.4 (Berman–Li–Shao normal comparison inequality).

Let ξ=(ξ1,,ξs)\xi=(\xi_{1},\dots,\xi_{s})^{\top} and η=(η1,,ηs)\eta=(\eta_{1},\dots,\eta_{s})^{\top} be centered Gaussian vectors with

Var(ξj)=Var(ηj)=1,1js.\mathrm{Var}(\xi_{j})=\mathrm{Var}(\eta_{j})=1,\qquad 1\leq j\leq s.

Write rjξ:=Corr(ξj,ξ)r_{j\ell}^{\xi}:=\mathrm{Corr}(\xi_{j},\xi_{\ell}) and rjη:=Corr(ηj,η)r_{j\ell}^{\eta}:=\mathrm{Corr}(\eta_{j},\eta_{\ell}), and define

ρj:=max{|rjξ|,|rjη|}.\rho_{j\ell}:=\max\{|r_{j\ell}^{\xi}|,|r_{j\ell}^{\eta}|\}.

Then for every u=(u1,,us)su=(u_{1},\dots,u_{s})^{\top}\in\mathbb{R}^{s},

|(ξ1u1,,ξsus)(η1u1,,ηsus)|\displaystyle\left|\mathbb{P}(\xi_{1}\leq u_{1},\dots,\xi_{s}\leq u_{s})-\mathbb{P}(\eta_{1}\leq u_{1},\dots,\eta_{s}\leq u_{s})\right|
12π1j<s|arcsin(rjξ)arcsin(rjη)|exp(uj2+u22(1+ρj)).\displaystyle\qquad\leq\frac{1}{2\pi}\sum_{1\leq j<\ell\leq s}\left|\arcsin(r_{j\ell}^{\xi})-\arcsin(r_{j\ell}^{\eta})\right|\exp\!\left(-\frac{u_{j}^{2}+u_{\ell}^{2}}{2(1+\rho_{j\ell})}\right). (22)

In particular, if η\eta has independent coordinates and

ρ¯:=max1j<s|rjξ|<1,\bar{\rho}:=\max_{1\leq j<\ell\leq s}|r_{j\ell}^{\xi}|<1,

then

|(ξ1u1,,ξsus)j=1sΦ(uj)|\displaystyle\left|\mathbb{P}(\xi_{1}\leq u_{1},\dots,\xi_{s}\leq u_{s})-\prod_{j=1}^{s}\Phi(u_{j})\right|
12π1ρ¯21j<s|rjξ|exp(uj2+u22(1+ρ¯)).\displaystyle\qquad\leq\frac{1}{2\pi\sqrt{1-\bar{\rho}^{2}}}\sum_{1\leq j<\ell\leq s}|r_{j\ell}^{\xi}|\exp\!\left(-\frac{u_{j}^{2}+u_{\ell}^{2}}{2(1+\bar{\rho})}\right). (23)
Proof.

The inequality (22) is Theorem 1 of Li and Shao (2002), which refines Berman’s original comparison argument (Berman, 1964). If η\eta has independent coordinates, then rjη=0r_{j\ell}^{\eta}=0 for all jj\neq\ell, and the mean-value theorem gives

|arcsin(rjξ)arcsin(0)|sup|u|ρ¯11u2|rjξ||rjξ|1ρ¯2.\left|\arcsin(r_{j\ell}^{\xi})-\arcsin(0)\right|\leq\sup_{|u|\leq\bar{\rho}}\frac{1}{\sqrt{1-u^{2}}}|r_{j\ell}^{\xi}|\leq\frac{|r_{j\ell}^{\xi}|}{\sqrt{1-\bar{\rho}^{2}}}.

Substituting this estimate into (22) yields (23). ∎

Lemma A.5 (Koike smoothing identity).

Let AsA\subset\mathbb{R}^{s} be measurable, let ZIN(0,ΣII)Z_{I}\sim N(0,\Sigma_{II}), and define for u(0,1]u\in(0,1],

hA,u(x):=𝔼[𝟏A(1ux+uZI)].h_{A,u}(x):=\mathbb{E}\bigl[\mathbf{1}_{A}(\sqrt{1-u}\,x+\sqrt{u}\,Z_{I})\bigr].

Then

hA,u(x)=Aus/2ϕI(y1uxu)𝑑y.h_{A,u}(x)=\int_{A}u^{-s/2}\phi_{I}\!\left(\frac{y-\sqrt{1-u}\,x}{\sqrt{u}}\right)\,dy. (24)

Moreover, for every multi-index α0s\alpha\in\mathbb{N}_{0}^{s} with |α|=r1|\alpha|=r\geq 1,

αhA,u(x)\displaystyle\partial^{\alpha}h_{A,u}(x) =(1)r(1uu)r/2AαϕI(y1uxu)us/2dy.\displaystyle=(-1)^{r}\left(\frac{1-u}{u}\right)^{r/2}\int_{A}\partial^{\alpha}\phi_{I}\!\left(\frac{y-\sqrt{1-u}\,x}{\sqrt{u}}\right)u^{-s/2}\,dy. (25)
Proof.

Formulas (24)–(25) are the projected specialization of equations (4.4)–(4.5) in Koike (2026). From the definition,

hA,u(x)\displaystyle h_{A,u}(x) =s𝟏A(1ux+uz)ϕI(z)𝑑z.\displaystyle=\int_{\mathbb{R}^{s}}\mathbf{1}_{A}(\sqrt{1-u}\,x+\sqrt{u}\,z)\phi_{I}(z)\,dz.

Set

y:=1ux+uz,z=y1uxu,dz=us/2dy.y:=\sqrt{1-u}\,x+\sqrt{u}\,z,\qquad z=\frac{y-\sqrt{1-u}\,x}{\sqrt{u}},\qquad dz=u^{-s/2}\,dy.

Then (24) follows. Differentiate (24) under the integral sign. For each derivative xj\partial_{x_{j}},

xjϕI(y1uxu)=1uujϕI(y1uxu).\partial_{x_{j}}\phi_{I}\!\left(\frac{y-\sqrt{1-u}\,x}{\sqrt{u}}\right)=-\sqrt{\frac{1-u}{u}}\,\partial_{j}\phi_{I}\!\left(\frac{y-\sqrt{1-u}\,x}{\sqrt{u}}\right).

Applying this identity r=|α|r=|\alpha| times gives (25). ∎

Lemma A.6 (Koike orthant derivative bound).

Let

At:=(,t]s.A_{t}^{-}:=(-\infty,-t]^{s}.

There exist constants cr,Cr>0c_{r},C_{r}>0, depending only on rr, such that for every u(0,1/2]u\in(0,1/2], every xsx\in\mathbb{R}^{s}, and every integer r1r\geq 1,

rhAt,u(x)1Crur/2(1+t)r(1ux+uZIAtau),au:=crulog(2s).\|\nabla^{r}h_{A_{t}^{-},u}(x)\|_{1}\leq C_{r}u^{-r/2}(1+t)^{r}\mathbb{P}\!\left(\sqrt{1-u}\,x+\sqrt{u}\,Z_{I}\in A_{t-a_{u}}^{-}\right),\qquad a_{u}:=c_{r}\sqrt{u\log(2s)}. (26)

In particular,

supxsrhAt,u(x)1Cr(log(2s)u)r/2.\sup_{x\in\mathbb{R}^{s}}\|\nabla^{r}h_{A_{t}^{-},u}(x)\|_{1}\leq C_{r}\left(\frac{\log(2s)}{u}\right)^{r/2}. (27)
Proof.

The uniform estimate (27) is exactly Lemma 4.4 in Koike (2026) after replacing dd by ss and using λmin(ΣII)σ2\lambda_{\min}(\Sigma_{II})\geq\sigma_{*}^{2} from Lemma A.2. The localized bound (26) is obtained by combining (25) with the Anderson–Hall–Titterington bound stated as Lemma D.4 in Koike (2026). Indeed, for each multi-index α\alpha with |α|=r|\alpha|=r,

|αhAt,u(x)|(1uu)r/2At|αϕI(y1uxu)|us/2𝑑y.|\partial^{\alpha}h_{A_{t}^{-},u}(x)|\leq\left(\frac{1-u}{u}\right)^{r/2}\int_{A_{t}^{-}}\left|\partial^{\alpha}\phi_{I}\!\left(\frac{y-\sqrt{1-u}\,x}{\sqrt{u}}\right)\right|u^{-s/2}\,dy.

If z=(y1ux)/uz=(y-\sqrt{1-u}\,x)/\sqrt{u}, then yAty\in A_{t}^{-} implies

1ux+uzAt.\sqrt{1-u}\,x+\sqrt{u}\,z\in A_{t}^{-}.

Applying Lemma D.4 to the orthant enlarged by a cube of side length au=crulog(2s)a_{u}=c_{r}\sqrt{u\log(2s)} yields

At|αϕI(y1uxu)|us/2𝑑yCr(1+t)r(1ux+uZIAtau),\int_{A_{t}^{-}}\left|\partial^{\alpha}\phi_{I}\!\left(\frac{y-\sqrt{1-u}\,x}{\sqrt{u}}\right)\right|u^{-s/2}\,dy\leq C_{r}(1+t)^{r}\mathbb{P}\!\left(\sqrt{1-u}\,x+\sqrt{u}\,Z_{I}\in A_{t-a_{u}}^{-}\right),

and summing over |α|=r|\alpha|=r proves (26). ∎

Lemma A.7 (Koike projected decomposition).

Let ξ1,I,,ξn,I\xi_{1,I},\dots,\xi_{n,I} be independent centered s\mathbb{R}^{s}-valued random vectors with approximate Stein kernels (τi,I,βi,I)(\tau_{i,I},\beta_{i,I}), and put

WI:=i=1nξi,I,T¯I:=i=1nτi,I(ξi,I)ΣII,BI:=i=1nβi,I(ξi,I).W_{I}:=\sum_{i=1}^{n}\xi_{i,I},\qquad\bar{T}_{I}:=\sum_{i=1}^{n}\tau_{i,I}(\xi_{i,I})-\Sigma_{II},\qquad B_{I}:=\sum_{i=1}^{n}\beta_{i,I}(\xi_{i,I}).

For a bounded measurable function h:sh:\mathbb{R}^{s}\to\mathbb{R} and u(0,1]u\in(0,1], define

hu(x):=𝔼[h(1ux+uZI)],ZIN(0,ΣII).h_{u}(x):=\mathbb{E}\bigl[h(\sqrt{1-u}\,x+\sqrt{u}\,Z_{I})\bigr],\qquad Z_{I}\sim N(0,\Sigma_{II}).

Let pWIp_{W_{I}} be the first-order Edgeworth density around N(0,ΣII)N(0,\Sigma_{II}) as in equation (4.1) of Koike (2026). Then for every bounded measurable h:sh:\mathbb{R}^{s}\to\mathbb{R} and every ϑ(0,1]\vartheta\in(0,1],

𝔼[hϑ(WI)]shϑ(z)pWI(z)𝑑z=ν=16Rν,I(h,ϑ),\mathbb{E}[h_{\vartheta}(W_{I})]-\int_{\mathbb{R}^{s}}h_{\vartheta}(z)p_{W_{I}}(z)\,dz=\sum_{\nu=1}^{6}R_{\nu,I}(h,\vartheta), (28)

where each Rν,I(h,ϑ)R_{\nu,I}(h,\vartheta) is one of the six terms displayed in equation (4.6) of Koike (2026), specialized to the projected dimension ss. In particular, each Rν,I(h,ϑ)R_{\nu,I}(h,\vartheta) is a finite linear combination of iterated integrals involving only the tensors

T¯I,i=1nξi,I3,i=1nτi,I(ξi,I)2,i=1nξi,I3τi,I(ξi,I),i=1nξi,I4,\bar{T}_{I},\qquad\sum_{i=1}^{n}\xi_{i,I}^{\otimes 3},\qquad\sum_{i=1}^{n}\tau_{i,I}(\xi_{i,I})^{\otimes 2},\qquad\sum_{i=1}^{n}\xi_{i,I}^{\otimes 3}\otimes\tau_{i,I}(\xi_{i,I}),\qquad\sum_{i=1}^{n}\xi_{i,I}^{\otimes 4},

and their β\beta-counterparts. If βi,I0\beta_{i,I}\equiv 0, then the last four terms in equation (4.6) of Koike (2026) disappear identically.

Proof.

This is exactly Lemma 4.3 in Koike (2026) after replacing the ambient dimension dd by the projected dimension s=|I|s=|I|. The statement about the coefficient tensors follows by reading term-by-term the six displays in equation (4.6) of Koike (2026). When βi,I0\beta_{i,I}\equiv 0, every summand involving βi,I\beta_{i,I} vanishes. ∎

Lemma A.8 (Koike tensor and concentration bounds).

Under Assumption 2.1, there exists C>0C>0 such that, for every a1a\geq 1,

(1ni=1n𝑿i𝑿i𝚺max>Cb2alog(dn)n)\displaystyle\mathbb{P}\!\left(\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{\top}-\mathbf{\Sigma}\right\|_{\max}>Cb^{2}\sqrt{\frac{a\log(dn)}{n}}\right) na,\displaystyle\leq n^{-a}, (29)
(𝑿¯>Cbalog(dn)n)\displaystyle\mathbb{P}\!\left(\|\bar{\bm{X}}\|_{\infty}>Cb\sqrt{\frac{a\log(dn)}{n}}\right) na,\displaystyle\leq n^{-a}, (30)
(supV11|1ni=1n𝑿i3𝔼[𝑿i3],V|>Cb3alognn)\displaystyle\mathbb{P}\!\left(\sup_{\|V\|_{1}\leq 1}\left|\frac{1}{n}\sum_{i=1}^{n}\bigl\langle\bm{X}_{i}^{\otimes 3}-\mathbb{E}[\bm{X}_{i}^{\otimes 3}],V\bigr\rangle\right|>Cb^{3}a\frac{\log n}{\sqrt{n}}\right) na.\displaystyle\leq n^{-a}. (31)

Moreover, for every nonempty I[d]I\subset[d] with |I|k0|I|\leq k_{0}, the coefficient tensors appearing in Lemma A.7 satisfy

𝔼𝖠I,νCδn,ν=1,,6,\mathbb{E}\|\mathsf{A}_{I,\nu}\|_{\infty}\leq C\delta_{n},\qquad\nu=1,\dots,6, (32)

where 𝖠I,ν\mathsf{A}_{I,\nu} denotes any coefficient tensor multiplying 2hu\nabla^{2}h_{u}, 3hu\nabla^{3}h_{u}, 4hu\nabla^{4}h_{u}, or 5hu\nabla^{5}h_{u} in the six terms of (28).

Proof.

The mean and covariance bounds (29)–(30) follow from Lemma D.10 in Koike (2026) applied to Yi=vec(𝑿i𝑿i)Y_{i}=\mathrm{vec}(\bm{X}_{i}\bm{X}_{i}^{\top}) and Yi=𝑿iY_{i}=\bm{X}_{i}, respectively. The third-order tensor bound (31) is Lemma D.11 in Koike (2026) with r=3r=3. Finally, the coefficient-tensor estimate (32) is the projected specialization of the bounds obtained in the proof of Theorem 4.1 in Koike (2026, pp. 22–24). Since projection only removes coordinates, every projected \ell_{\infty}-tensor norm is bounded by the corresponding full-dimensional norm. ∎

Lemma A.9 (Explicit high-probability event for the first bootstrap array).

Under Assumption 2.1, there exists an event Ωn,X\Omega_{n,X} such that

(Ωn,Xc)Cn2\mathbb{P}(\Omega_{n,X}^{c})\leq\frac{C}{n^{2}}

and, on Ωn,X\Omega_{n,X},

𝑿¯\displaystyle\|\bar{\bm{X}}\|_{\infty} Cblog(dn)n,\displaystyle\leq Cb\sqrt{\frac{\log(dn)}{n}}, (33)
1ni=1n𝑿i𝑿i𝚺max\displaystyle\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{\top}-\mathbf{\Sigma}\right\|_{\max} Cb2log(dn)n,\displaystyle\leq Cb^{2}\sqrt{\frac{\log(dn)}{n}},
max1in𝑿i\displaystyle\max_{1\leq i\leq n}\|\bm{X}_{i}\|_{\infty} Cblog(dn).\displaystyle\leq Cb\log(dn). (34)

Consequently, if

𝚺^X:=1ni=1n(𝑿i𝑿¯)(𝑿i𝑿¯),𝒃i:=𝑿i𝑿¯,\hat{\mathbf{\Sigma}}_{X}:=\frac{1}{n}\sum_{i=1}^{n}(\bm{X}_{i}-\bar{\bm{X}})(\bm{X}_{i}-\bar{\bm{X}})^{\top},\qquad\bm{b}_{i}:=\bm{X}_{i}-\bar{\bm{X}},

then on Ωn,X\Omega_{n,X},

𝚺^XΣmax\displaystyle\|\hat{\mathbf{\Sigma}}_{X}-\Sigma\|_{\max} Cb2log(dn)n,\displaystyle\leq Cb^{2}\sqrt{\frac{\log(dn)}{n}}, (35)
max1in𝒃i\displaystyle\max_{1\leq i\leq n}\|\bm{b}_{i}\|_{\infty} Cblog(dn),\displaystyle\leq Cb\log(dn), (36)
supI[d]1|I|k0𝒃¯I 2ΣIImax\displaystyle\sup_{\begin{subarray}{c}I\subset[d]\\ 1\leq|I|\leq k_{0}\end{subarray}}\|\bar{\bm{b}}_{I}^{\,2}-\Sigma_{II}\|_{\max} Cb2log(dn)n,\displaystyle\leq Cb^{2}\sqrt{\frac{\log(dn)}{n}}, (37)
supI[d]1|I|k0𝒃¯I 3\displaystyle\sup_{\begin{subarray}{c}I\subset[d]\\ 1\leq|I|\leq k_{0}\end{subarray}}\|\bar{\bm{b}}_{I}^{\,3}\|_{\infty} Cb3log(dn).\displaystyle\leq Cb^{3}\log(dn). (38)
Proof.

Apply (30) and (29) with a=2a=2 to obtain

(𝑿¯>Cblog(dn)n)+(1ni=1n𝑿i𝑿i𝚺max>Cb2log(dn)n)Cn2.\mathbb{P}\!\left(\|\bar{\bm{X}}\|_{\infty}>Cb\sqrt{\frac{\log(dn)}{n}}\right)+\mathbb{P}\!\left(\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{\top}-\mathbf{\Sigma}\right\|_{\max}>Cb^{2}\sqrt{\frac{\log(dn)}{n}}\right)\leq\frac{C}{n^{2}}. (39)

Next, since Xijψ1b\|X_{ij}\|_{\psi_{1}}\leq b, the tail bound for sub-exponential random variables implies

(|Xij|>u)2exp(uCb),u>0.\mathbb{P}(|X_{ij}|>u)\leq 2\exp\!\left(-\frac{u}{Cb}\right),\qquad u>0.

Set u=C1blog(dn)u=C_{1}b\log(dn) with C1C_{1} large enough. Then

(max1inmax1jd|Xij|>C1blog(dn))2nd(dn)42n2.\mathbb{P}\!\left(\max_{1\leq i\leq n}\max_{1\leq j\leq d}|X_{ij}|>C_{1}b\log(dn)\right)\leq 2nd\,(dn)^{-4}\leq\frac{2}{n^{2}}. (40)

Define Ωn,X\Omega_{n,X} as the intersection of the events in (39) and (40). Then (Ωn,Xc)C/n2\mathbb{P}(\Omega_{n,X}^{c})\leq C/n^{2}, and (33)–(34) hold on Ωn,X\Omega_{n,X}.

Now

𝚺^X=1ni=1n𝑿i𝑿i𝑿¯𝑿¯.\hat{\mathbf{\Sigma}}_{X}=\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{\top}-\bar{\bm{X}}\bar{\bm{X}}^{\top}.

Hence, on Ωn,X\Omega_{n,X},

𝚺^X𝚺max\displaystyle\|\hat{\mathbf{\Sigma}}_{X}-\mathbf{\Sigma}\|_{\max} 1ni=1n𝑿i𝑿i𝚺max+𝑿¯𝑿¯max\displaystyle\leq\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}\bm{X}_{i}^{\top}-\mathbf{\Sigma}\right\|_{\max}+\|\bar{\bm{X}}\bar{\bm{X}}^{\top}\|_{\max}
Cb2log(dn)n+Cb2log(dn)n\displaystyle\leq Cb^{2}\sqrt{\frac{\log(dn)}{n}}+Cb^{2}\frac{\log(dn)}{n}
Cb2log(dn)n,\displaystyle\leq Cb^{2}\sqrt{\frac{\log(dn)}{n}},

which proves (35). Also,

max1in𝒃imax1in𝑿i+𝑿¯Cblog(dn),\max_{1\leq i\leq n}\|\bm{b}_{i}\|_{\infty}\leq\max_{1\leq i\leq n}\|\bm{X}_{i}\|_{\infty}+\|\bar{\bm{X}}\|_{\infty}\leq Cb\log(dn),

which proves (36). For every II,

𝒃¯I 2=𝑷I𝚺^X𝑷I,\bar{\bm{b}}_{I}^{\,2}=\bm{P}_{I}\hat{\mathbf{\Sigma}}_{X}\bm{P}_{I}^{\top},

so (37) follows from (35). Finally, for every coordinate triple (j1,j2,j3)(j_{1},j_{2},j_{3}) belonging to II,

|1ni=1nbij1bij2bij3|\displaystyle\left|\frac{1}{n}\sum_{i=1}^{n}b_{ij_{1}}b_{ij_{2}}b_{ij_{3}}\right| (max1in𝒃i)1ni=1n|bij1bij2|\displaystyle\leq\left(\max_{1\leq i\leq n}\|\bm{b}_{i}\|_{\infty}\right)\frac{1}{n}\sum_{i=1}^{n}|b_{ij_{1}}b_{ij_{2}}|
(max1in𝒃i)(1ni=1nbij12)1/2(1ni=1nbij22)1/2\displaystyle\leq\left(\max_{1\leq i\leq n}\|\bm{b}_{i}\|_{\infty}\right)\left(\frac{1}{n}\sum_{i=1}^{n}b_{ij_{1}}^{2}\right)^{1/2}\left(\frac{1}{n}\sum_{i=1}^{n}b_{ij_{2}}^{2}\right)^{1/2}
Cblog(dn)max1jdΣ^X,jj\displaystyle\leq Cb\log(dn)\cdot\max_{1\leq j\leq d}\hat{\Sigma}_{X,jj}
Cb3log(dn),\displaystyle\leq Cb^{3}\log(dn),

because Σ^X,jjΣjj+Cb2log(dn)/nCb2\hat{\Sigma}_{X,jj}\leq\Sigma_{jj}+Cb^{2}\sqrt{\log(dn)/n}\leq Cb^{2} on Ωn,X\Omega_{n,X}. Taking the maximum over all coordinate triples gives (38). ∎

Lemma A.10 (Multiplier maximum event).

Under Assumption 2.2, there exists an event Ωn,w\Omega_{n,w} such that

(Ωn,wc)Cn2\mathbb{P}(\Omega_{n,w}^{c})\leq\frac{C}{n^{2}}

and

max1in|wi|Clog(dn)on Ωn,w.\max_{1\leq i\leq n}|w_{i}|\leq C\log(dn)\qquad\text{on }\Omega_{n,w}. (41)
Proof.

If Assumption 2.2(ii) holds, then |wi|bw|w_{i}|\leq b_{w} almost surely, so (41) is trivial for CbwC\geq b_{w}. If Assumption 2.2(i) holds, then wiN(0,1)w_{i}\sim N(0,1) and

(|wi|>u)2eu2/2,u>0.\mathbb{P}(|w_{i}|>u)\leq 2e^{-u^{2}/2},\qquad u>0.

Choose u=C0log(dn)u=C_{0}\log(dn) with C0C_{0} large enough. Then

(max1in|wi|>C0log(dn))2nexp(C02log2(dn)2)Cn2.\mathbb{P}\!\left(\max_{1\leq i\leq n}|w_{i}|>C_{0}\log(dn)\right)\leq 2n\exp\!\left(-\frac{C_{0}^{2}\log^{2}(dn)}{2}\right)\leq\frac{C}{n^{2}}.

This proves the claim. ∎

A.4 The projected local input

Proposition A.1 (Projected local orthant expansion).

Assume Assumptions 2.1, 2.3, and 2.4. Then there exists C>0C>0 such that for every nonempty I[d]I\subset[d] with |I|k0|I|\leq k_{0},

supt𝒯k,ϵ|(𝑺n,I(t,)|I|)(t,)|I|pn,I(𝒖)𝑑𝒖|Cεn2πI(t).\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}\bigl(\bm{S}_{n,I}\in(t,\infty)^{|I|}\bigr)-\int_{(t,\infty)^{|I|}}p_{n,I}(\bm{u})\,d\bm{u}\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t). (42)

Moreover, with probability at least 1C/n1-C/n,

supt𝒯k,ϵ|(𝑺n,I(t,)|I|)(t,)|I|p^n,γ,I(𝒖)𝑑𝒖|Cεn2πI(t)\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}^{*}\bigl(\bm{S}_{n,I}^{*}\in(t,\infty)^{|I|}\bigr)-\int_{(t,\infty)^{|I|}}\hat{p}_{n,\gamma,I}(\bm{u})\,d\bm{u}\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t) (43)

holds simultaneously for all such II.

Proof.

Fix a nonempty I[d]I\subset[d] and write s:=|I|s:=|I|. Set

𝒀n,I:=𝑺n,I,At:=(,t]s.\bm{Y}_{n,I}:=-\bm{S}_{n,I},\qquad A_{t}^{-}:=(-\infty,-t]^{s}.

Then

(𝑺n,I(t,)s)=(𝒀n,IAt).\mathbb{P}\bigl(\bm{S}_{n,I}\in(t,\infty)^{s}\bigr)=\mathbb{P}\bigl(\bm{Y}_{n,I}\in A_{t}^{-}\bigr).

If 𝒁IN(0,ΣII)\bm{Z}_{I}^{-}\sim N(0,\Sigma_{II}), then

πI(t)=(𝒁IAt).\pi_{I}(t)=\mathbb{P}(\bm{Z}_{I}^{-}\in A_{t}^{-}).

For u(0,1]u\in(0,1], define

ht,u(x):=hAt,u(x)=𝔼[𝟏At(1ux+u𝒁I)].h_{t,u}(x):=h_{A_{t}^{-},u}(x)=\mathbb{E}\bigl[\mathbf{1}_{A_{t}^{-}}(\sqrt{1-u}\,x+\sqrt{u}\,\bm{Z}_{I}^{-})\bigr].

By Lemma A.5, for every multi-index α\alpha with |α|=r|\alpha|=r,

αht,u(x)=(1)r(1uu)r/2AtαϕI(y1uxu)us/2dy.\partial^{\alpha}h_{t,u}(x)=(-1)^{r}\left(\frac{1-u}{u}\right)^{r/2}\int_{A_{t}^{-}}\partial^{\alpha}\phi_{I}\!\left(\frac{y-\sqrt{1-u}\,x}{\sqrt{u}}\right)u^{-s/2}\,dy.

Applying Lemma A.6 with r{2,4,5}r\in\{2,4,5\} gives

rht,u(x)1Crur/2(1+t)r(1ux+u𝒁IAtau),au:=crulog(2s).\|\nabla^{r}h_{t,u}(x)\|_{1}\leq C_{r}u^{-r/2}(1+t)^{r}\mathbb{P}\!\left(\sqrt{1-u}\,x+\sqrt{u}\,\bm{Z}_{I}^{-}\in A_{t-a_{u}}^{-}\right),\qquad a_{u}:=c_{r}\sqrt{u\log(2s)}. (44)

Lemma A.12(i) implies that there exists c0>0c_{0}>0 such that, uniformly for t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon} and 0ac0/t0\leq a\leq c_{0}/t,

πI(ta)CπI(t).\pi_{I}(t-a)\leq C\pi_{I}(t). (45)

Choose

ϑn:=min{12,εn4log(dn)log(2k0)}.\vartheta_{n}:=\min\left\{\frac{1}{2},\,\frac{\varepsilon_{n}^{4}}{\log(dn)\log(2k_{0})}\right\}.

Then

log(ϑn1)Clogn,taϑnCεn2uniformly on 𝒯k,ϵ,\log(\vartheta_{n}^{-1})\leq C\log n,\qquad t\,a_{\vartheta_{n}}\leq C\varepsilon_{n}^{2}\qquad\text{uniformly on }\mathcal{T}_{k,\epsilon}, (46)

because t2logdt^{2}\asymp\log d by Lemma A.11. Since sk0s\leq k_{0}, (46) implies

auc0tfor every u[ϑn,1/2].a_{u}\leq\frac{c_{0}}{t}\qquad\text{for every }u\in[\vartheta_{n},1/2].

Therefore, integrating (44) with respect to the law of 𝒀n,I\bm{Y}_{n,I} and using (45),

rht,u(x)1𝑑𝒀n,I(x)\displaystyle\int\|\nabla^{r}h_{t,u}(x)\|_{1}\,d\mathbb{P}_{\bm{Y}_{n,I}}(x) Crur/2(1+t)r(𝒁IAtau)\displaystyle\leq C_{r}u^{-r/2}(1+t)^{r}\mathbb{P}\bigl(\bm{Z}_{I}^{-}\in A_{t-a_{u}}^{-}\bigr)
Crur/2(1+t)rπI(t),u[ϑn,1/2].\displaystyle\leq C_{r}u^{-r/2}(1+t)^{r}\pi_{I}(t),\qquad u\in[\vartheta_{n},1/2]. (47)

Exactly the same estimate holds with 𝒀n,I\mathbb{P}_{\bm{Y}_{n,I}} replaced by the Gaussian law.

Let

pn,IY(y):=pn,I(y).p_{n,I}^{Y}(y):=p_{n,I}(-y).

Apply the smoothing inequality in Lemma 4.1 of Koike (2026) to the bounded measurable function 𝟏At\mathbf{1}_{A_{t}^{-}}, with

μ=(𝒀n,I),ν(dy)=pn,IY(y)dy,K=N(0,ϑnΣII).\mu=\mathcal{L}(\bm{Y}_{n,I}),\qquad\nu(dy)=p_{n,I}^{Y}(y)\,dy,\qquad K=N(0,\vartheta_{n}\Sigma_{II}).

Using Lemma A.13(with a=aϑna=a_{\vartheta_{n}}) and Lemma A.12(ii), we obtain

|(𝒀n,IAt)𝔼[ht,ϑn(𝒀n,I)]|+|Atpn,IY(y)𝑑yht,ϑn(y)pn,IY(y)𝑑y|\displaystyle\left|\mathbb{P}(\bm{Y}_{n,I}\in A_{t}^{-})-\mathbb{E}[h_{t,\vartheta_{n}}(\bm{Y}_{n,I})]\right|+\left|\int_{A_{t}^{-}}p_{n,I}^{Y}(y)\,dy-\int h_{t,\vartheta_{n}}(y)p_{n,I}^{Y}(y)\,dy\right|
Caϑn(1+t)4πI(t)Cεn2πI(t).\displaystyle\qquad\leq Ca_{\vartheta_{n}}(1+t)^{4}\pi_{I}(t)\leq C\varepsilon_{n}^{2}\pi_{I}(t). (48)

For each ii, define

ξi,I:=1n𝑷I𝑿i.\xi_{i,I}:=-\frac{1}{\sqrt{n}}\bm{P}_{I}\bm{X}_{i}.

Then

i=1nξi,I=𝒀n,I,i=1n𝔼[τi,I(ξi,I)]=ΣII,βi,I0\sum_{i=1}^{n}\xi_{i,I}=\bm{Y}_{n,I},\qquad\sum_{i=1}^{n}\mathbb{E}[\tau_{i,I}(\xi_{i,I})]=\Sigma_{II},\qquad\beta_{i,I}\equiv 0

with projected Stein kernels inherited from Lemma A.2. Therefore Lemma A.7 gives

𝔼[ht,ϑn(𝒀n,I)]ht,ϑn(y)pn,IY(y)𝑑y=ν=16Rν,I(t,ϑn).\mathbb{E}[h_{t,\vartheta_{n}}(\bm{Y}_{n,I})]-\int h_{t,\vartheta_{n}}(y)p_{n,I}^{Y}(y)\,dy=\sum_{\nu=1}^{6}R_{\nu,I}(t,\vartheta_{n}).

Because βi,I0\beta_{i,I}\equiv 0, only the terms involving T¯I\bar{T}_{I},

T¯I:=i=1nτi,I(ξi,I)ΣII,\bar{T}_{I}:=\sum_{i=1}^{n}\tau_{i,I}(\xi_{i,I})-\Sigma_{II},

and iξi,I3\sum_{i}\xi_{i,I}^{\otimes 3} remain. By (32),

𝔼𝖠I,νCδn,ν=1,,6.\mathbb{E}\|\mathsf{A}_{I,\nu}\|_{\infty}\leq C\delta_{n},\qquad\nu=1,\dots,6. (49)

Substituting (47) and (49) into the six explicit terms of equation (4.6) in Koike (2026), and integrating the kernels exactly as they appear there, yields

ν=16|Rν,I(t,ϑn)|Cδn(1+t)4{1+log(ϑn1)}πI(t).\sum_{\nu=1}^{6}|R_{\nu,I}(t,\vartheta_{n})|\leq C\delta_{n}\,(1+t)^{4}\bigl\{1+\log(\vartheta_{n}^{-1})\bigr\}\pi_{I}(t). (50)

Since t2logdt^{2}\asymp\log d on 𝒯k,ϵ\mathcal{T}_{k,\epsilon} and εn2=δnlogn\varepsilon_{n}^{2}=\delta_{n}\log n, (46) implies

δn(1+t)4{1+log(ϑn1)}Cεn2.\delta_{n}\,(1+t)^{4}\bigl\{1+\log(\vartheta_{n}^{-1})\bigr\}\leq C\varepsilon_{n}^{2}.

Hence

|𝔼[ht,ϑn(𝒀n,I)]ht,ϑn(y)pn,IY(y)𝑑y|Cεn2πI(t).\left|\mathbb{E}[h_{t,\vartheta_{n}}(\bm{Y}_{n,I})]-\int h_{t,\vartheta_{n}}(y)p_{n,I}^{Y}(y)\,dy\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t). (51)

Combining (48) and (51),

|(𝒀n,IAt)Atpn,IY(y)𝑑y|Cεn2πI(t).\left|\mathbb{P}(\bm{Y}_{n,I}\in A_{t}^{-})-\int_{A_{t}^{-}}p_{n,I}^{Y}(y)\,dy\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t).

Changing variables y=uy=-u gives (42).

Work on the event Ωn,X\Omega_{n,X} from Lemma A.9. Then, for every II with |I|k0|I|\leq k_{0},

𝒃¯I 2ΣIImaxCb2log(dn)n,𝒃¯I 3Cb3log(dn).\|\bar{\bm{b}}_{I}^{\,2}-\Sigma_{II}\|_{\max}\leq Cb^{2}\sqrt{\frac{\log(dn)}{n}},\qquad\|\bar{\bm{b}}_{I}^{\,3}\|_{\infty}\leq Cb^{3}\log(dn). (52)

Condition on the data. Define

ξi,I:=1nwi𝒃i,I,WI:=i=1nξi,I=𝑺n,I.\xi_{i,I}^{*}:=\frac{1}{\sqrt{n}}w_{i}\bm{b}_{i,I},\qquad W_{I}^{*}:=\sum_{i=1}^{n}\xi_{i,I}^{*}=\bm{S}_{n,I}^{*}.

If wiN(0,1)w_{i}\sim N(0,1), then ξi,I\xi_{i,I}^{*} has exact Stein kernel

τi,I(ξi,I)=1n𝒃i,I𝒃i,I.\tau_{i,I}^{*}(\xi_{i,I}^{*})=\frac{1}{n}\bm{b}_{i,I}\bm{b}_{i,I}^{\top}.

If Assumption 2.2(ii) holds, then

τi,I(ξi,I)=1nτw(wi)𝒃i,I𝒃i,I\tau_{i,I}^{*}(\xi_{i,I}^{*})=\frac{1}{n}\tau^{w}(w_{i})\bm{b}_{i,I}\bm{b}_{i,I}^{\top}

is again an exact Stein kernel, because for every smooth vector field g:ssg:\mathbb{R}^{s}\to\mathbb{R}^{s},

𝔼[ξi,Ig(ξi,I)]\displaystyle\mathbb{E}^{*}[\xi_{i,I}^{*\top}g(\xi_{i,I}^{*})] =1n𝔼[wi𝒃i,Ig(n1/2wi𝒃i,I)]\displaystyle=\frac{1}{\sqrt{n}}\mathbb{E}^{*}\bigl[w_{i}\bm{b}_{i,I}^{\top}g(n^{-1/2}w_{i}\bm{b}_{i,I})\bigr]
=1n𝔼[τw(wi)tr{𝒃i,I𝒃i,Ig(n1/2wi𝒃i,I)}].\displaystyle=\frac{1}{n}\mathbb{E}^{*}\bigl[\tau^{w}(w_{i})\mathrm{tr}\{\bm{b}_{i,I}\bm{b}_{i,I}^{\top}\nabla g(n^{-1/2}w_{i}\bm{b}_{i,I})^{\top}\}\bigr].

Thus, conditionally on Ωn,X\Omega_{n,X},

i=1n𝔼[τi,I(ξi,I)]=𝒃¯I 2,𝔼[i=1n(ξi,I)3]=γn𝒃¯I 3,βi,I0.\sum_{i=1}^{n}\mathbb{E}^{*}[\tau_{i,I}^{*}(\xi_{i,I}^{*})]=\bar{\bm{b}}_{I}^{\,2},\qquad\mathbb{E}^{*}\left[\sum_{i=1}^{n}(\xi_{i,I}^{*})^{\otimes 3}\right]=\frac{\gamma}{\sqrt{n}}\bar{\bm{b}}_{I}^{\,3},\qquad\beta_{i,I}^{*}\equiv 0.

Therefore, conditionally on Ωn,X\Omega_{n,X}, set

WI:=𝑺n,I,T¯I:=𝒃¯I 2ΣII.W_{I}^{*}:=-\bm{S}_{n,I}^{*},\qquad\bar{T}_{I}^{*}:=\bar{\bm{b}}_{I}^{\,2}-\Sigma_{II}.

Then

(𝑺n,I(t,)s)=(WIAt).\mathbb{P}^{*}(\bm{S}_{n,I}^{*}\in(t,\infty)^{s})=\mathbb{P}^{*}(W_{I}^{*}\in A_{t}^{-}).

And we have, for r{2,4,5}r\in\{2,4,5\},

𝔼[rht,u(WI)1]Crur/2(1+t)rπI(t),\mathbb{E}^{*}\bigl[\|\nabla^{r}h_{t,u}(W_{I}^{*})\|_{1}\bigr]\leq C_{r}u^{-r/2}(1+t)^{r}\pi_{I}(t),

uniformly over u[ϑn,1/2]u\in[\vartheta_{n},1/2], because Lemma A.12 depends only on the Gaussian reference law N(0,ΣII)N(0,\Sigma_{II}). Here, we gives the smoothing error

|(WIAt)𝔼[ht,ϑn(WI)]|Cεn2πI(t).\left|\mathbb{P}^{*}(W_{I}^{*}\in A_{t}^{-})-\mathbb{E}^{*}[h_{t,\vartheta_{n}}(W_{I}^{*})]\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t).

Finally, the coefficient tensors in Koike’s decomposition satisfy

𝔼[T¯I]+1n𝒃¯I 3Cεn\mathbb{E}^{*}[\|\bar{T}_{I}^{*}\|_{\infty}]+\frac{1}{\sqrt{n}}\|\bar{\bm{b}}_{I}^{\,3}\|_{\infty}\leq C\varepsilon_{n}

by (52). Substituting these conditional bounds into the six remainder terms in (50) yields, on Ωn,X\Omega_{n,X},

supt𝒯k,ϵ|(𝑺n,I(t,)s)(t,)sp^n,γ,I(u)𝑑u|Cεn2πI(t).\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}^{*}(\bm{S}_{n,I}^{*}\in(t,\infty)^{s})-\int_{(t,\infty)^{s}}\hat{p}_{n,\gamma,I}(u)\,du\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t).

Since (Ωn,Xc)C/n2\mathbb{P}(\Omega_{n,X}^{c})\leq C/n^{2}, this proves (43). ∎

Lemma A.11 (Gaussian threshold scale).

Assume Assumptions 2.3 and 2.4. Then there exist constants 0<c1<C1<0<c_{1}<C_{1}<\infty such that

c1logdt2C1logdfor every t𝒯k,ϵc_{1}\log d\leq t^{2}\leq C_{1}\log d\qquad\text{for every }t\in\mathcal{T}_{k,\epsilon}

for all sufficiently large dd.

Proof.

Set

λ(t):=j=1dΦ¯(tσj).\lambda(t):=\sum_{j=1}^{d}\bar{\Phi}\!\left(\frac{t}{\sigma_{j}}\right).

Lemma A.14 below yields

Gk(t)=hk(λ(t))+O(ηd),hk(λ):=eλm=0k1λmm!,G_{k}(t)=h_{k}(\lambda(t))+O(\eta_{d}),\qquad h_{k}(\lambda):=e^{-\lambda}\sum_{m=0}^{k-1}\frac{\lambda^{m}}{m!},

with ηd0\eta_{d}\to 0 uniformly on 𝒯k,ϵ\mathcal{T}_{k,\epsilon}. Since Gk(t)[ϵ/2,1ϵ/2]G_{k}(t)\in[\epsilon/2,1-\epsilon/2] on that window and hkh_{k} is continuous and strictly decreasing, there exist constants 0<λ<λ+<0<\lambda_{-}<\lambda_{+}<\infty such that

λλ(t)λ+for every t𝒯k,ϵ\lambda_{-}\leq\lambda(t)\leq\lambda_{+}\qquad\text{for every }t\in\mathcal{T}_{k,\epsilon} (53)

for all sufficiently large dd. Also,

dΦ¯(tσ¯)λ(t)dΦ¯(tσ¯).d\,\bar{\Phi}\!\left(\frac{t}{\underline{\sigma}}\right)\leq\lambda(t)\leq d\,\bar{\Phi}\!\left(\frac{t}{\overline{\sigma}}\right). (54)

Applying Mills’ ratio to (54) and using (53) gives the claim. ∎

Lemma A.12 (Gaussian shift and strip bounds).

Assume Assumptions 2.3 and 2.4. Then there exists c0>0c_{0}>0 such that the following hold uniformly over all nonempty I[d]I\subset[d] with |I|k0|I|\leq k_{0} and all t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon}:

  1. (i)

    if 0ac0/t0\leq a\leq c_{0}/t, then

    πI(ta)CπI(t);\pi_{I}(t-a)\leq C\pi_{I}(t);
  2. (ii)

    if 0ac0/t0\leq a\leq c_{0}/t, then

    πI(ta)πI(t)Ca(1+t)πI(t).\pi_{I}(t-a)-\pi_{I}(t)\leq Ca(1+t)\pi_{I}(t).
Proof.

Let I={j1,,js}I=\{j_{1},\dots,j_{s}\} and standardize Yr:=Zjr/σjrY_{r}:=Z_{j_{r}}/\sigma_{j_{r}}. The covariance matrix of (Y1,,Ys)(Y_{1},\dots,Y_{s}) has diagonal entries 11 and off-diagonal entries bounded by ρd/σ¯2\rho_{d}/\underline{\sigma}^{2}. Since sρd0s\rho_{d}\to 0, Lemma A.3 applied to the correlation matrix yields

λmax(Corr(Y1,,Ys))2\lambda_{\max}(\mathrm{Corr}(Y_{1},\dots,Y_{s}))\leq 2

for all sufficiently large dd. Hence the Gaussian density on s\mathbb{R}^{s} is bounded above and below, on the relevant orthant boundary region, by the density of an independent Gaussian vector up to multiplicative constants depending only on σ¯,σ¯\underline{\sigma},\overline{\sigma}. Consequently,

πI(t)r=1sΦ¯(tσjr)\pi_{I}(t)\asymp\prod_{r=1}^{s}\bar{\Phi}\!\left(\frac{t}{\sigma_{j_{r}}}\right) (55)

uniformly for |I|k0|I|\leq k_{0} and t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon}. By Mills’ ratio,

Φ¯((ta)/σjr)Φ¯(t/σjr)exp{atσ¯2}Cif 0ac0/t,\frac{\bar{\Phi}((t-a)/\sigma_{j_{r}})}{\bar{\Phi}(t/\sigma_{j_{r}})}\leq\exp\!\left\{\frac{at}{\underline{\sigma}^{2}}\right\}\leq C\qquad\text{if }0\leq a\leq c_{0}/t,

which proves part (i) after multiplication over rr. Also,

Φ¯(taσ)Φ¯(tσ)aσϕ(taσ)Ca(1+t)Φ¯(tσ),\bar{\Phi}\!\left(\frac{t-a}{\sigma}\right)-\bar{\Phi}\!\left(\frac{t}{\sigma}\right)\leq\frac{a}{\sigma}\phi\!\left(\frac{t-a}{\sigma}\right)\leq Ca(1+t)\bar{\Phi}\!\left(\frac{t}{\sigma}\right),

again by Mills’ ratio. Multiplying over coordinates and using (55) proves part (ii). ∎

Lemma A.13 (Gaussian strip bound for the Edgeworth density).

Under Assumptions 2.12.4, for every nonempty I[d]I\subset[d] with |I|k0|I|\leq k_{0}, every t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon}, and every 0ac0/t0\leq a\leq c_{0}/t,

(ta,)|I|(t,)|I|ϕI(𝒖)𝑑𝒖Ca(1+t)πI(t),\int_{(t-a,\infty)^{|I|}\setminus(t,\infty)^{|I|}}\phi_{I}(\bm{u})\,d\bm{u}\leq Ca(1+t)\pi_{I}(t), (56)

and

(ta,)|I|(t,)|I||pn,I(𝒖)ϕI(𝒖)|𝑑𝒖Ca(1+t)4n1/2πI(t).\int_{(t-a,\infty)^{|I|}\setminus(t,\infty)^{|I|}}|p_{n,I}(\bm{u})-\phi_{I}(\bm{u})|\,d\bm{u}\leq Ca(1+t)^{4}n^{-1/2}\pi_{I}(t). (57)

On the event of Lemma A.17, the same proof with 𝐜¯I 2ΣII\bar{\bm{c}}_{I}^{\,2}-\Sigma_{II} replaced by 𝐛¯I 2ΣII\bar{\bm{b}}_{I}^{\,2}-\Sigma_{II} and 𝐜¯I 3\bar{\bm{c}}_{I}^{\,3} replaced by (γ/n)𝐛¯I 3(\gamma/\sqrt{n})\bar{\bm{b}}_{I}^{\,3} gives

(ta,)|I|(t,)|I||p^n,γ,I(𝒖)ϕI(𝒖)|𝑑𝒖Ca(1+t)4n1/2πI(t).\int_{(t-a,\infty)^{|I|}\setminus(t,\infty)^{|I|}}|\hat{p}_{n,\gamma,I}(\bm{u})-\phi_{I}(\bm{u})|\,d\bm{u}\leq Ca(1+t)^{4}n^{-1/2}\pi_{I}(t).
Proof.

Let s:=|I|s:=|I|. For the Gaussian part, write

𝒮t,aI:=(ta,)s(t,)s.\mathcal{S}_{t,a}^{I}:=(t-a,\infty)^{s}\setminus(t,\infty)^{s}.

Since

𝒮t,aIr=1s(ta,t]×(ta,)r1×(t,)sr,\mathcal{S}_{t,a}^{I}\subset\bigcup_{r=1}^{s}(t-a,t]\times(t-a,\infty)^{r-1}\times(t,\infty)^{s-r},

we have

𝒮t,aIϕI(u)𝑑u\displaystyle\int_{\mathcal{S}_{t,a}^{I}}\phi_{I}(u)\,du =πI(ta)πI(t)\displaystyle=\pi_{I}(t-a)-\pi_{I}(t)
Ca(1+t)πI(t)\displaystyle\leq Ca(1+t)\pi_{I}(t)

by Lemma A.12(ii). This proves (56).

For the Edgeworth correction, (19) gives

pn,I(u)ϕI(u)=16n𝔼[𝑿¯I 3],3ϕI(u).p_{n,I}(u)-\phi_{I}(u)=-\frac{1}{6\sqrt{n}}\left\langle\mathbb{E}[\bar{\bm{X}}_{I}^{\,3}],\nabla^{3}\phi_{I}(u)\right\rangle.

Because |I|k0|I|\leq k_{0} and Xijψ1b\|X_{ij}\|_{\psi_{1}}\leq b, all components of 𝔼[𝑿¯I 3]\mathbb{E}[\bar{\bm{X}}_{I}^{\,3}] are bounded by Cb3Cb^{3}. Also,

|αϕI(u)|Cα(1+u3)ϕI(u),|α|=3.|\partial^{\alpha}\phi_{I}(u)|\leq C_{\alpha}(1+\|u\|_{\infty}^{3})\phi_{I}(u),\qquad|\alpha|=3.

Hence

𝒮t,aI|pn,I(u)ϕI(u)|𝑑u\displaystyle\int_{\mathcal{S}_{t,a}^{I}}|p_{n,I}(u)-\phi_{I}(u)|\,du Cn𝒮t,aI(1+u3)ϕI(u)𝑑u\displaystyle\leq\frac{C}{\sqrt{n}}\int_{\mathcal{S}_{t,a}^{I}}(1+\|u\|_{\infty}^{3})\phi_{I}(u)\,du
C(1+t)3n𝒮t,aIϕI(u)𝑑u\displaystyle\leq\frac{C(1+t)^{3}}{\sqrt{n}}\int_{\mathcal{S}_{t,a}^{I}}\phi_{I}(u)\,du
Ca(1+t)4n1/2πI(t),\displaystyle\leq Ca(1+t)^{4}n^{-1/2}\pi_{I}(t),

which proves (57).

For the bootstrap density, work on the event Ωn,X\Omega_{n,X} from Lemma A.9. Then

p^n,γ,I(u)ϕI(u)=12𝒃¯I 2ΣII,2ϕI(u)γ6n𝒃¯I 3,3ϕI(u).\hat{p}_{n,\gamma,I}(u)-\phi_{I}(u)=\frac{1}{2}\left\langle\bar{\bm{b}}_{I}^{\,2}-\Sigma_{II},\nabla^{2}\phi_{I}(u)\right\rangle-\frac{\gamma}{6\sqrt{n}}\left\langle\bar{\bm{b}}_{I}^{\,3},\nabla^{3}\phi_{I}(u)\right\rangle.

By (37)–(38),

𝒃¯I 2ΣIImaxCb2log(dn)n,𝒃¯I 3Cb3log(dn).\|\bar{\bm{b}}_{I}^{\,2}-\Sigma_{II}\|_{\max}\leq Cb^{2}\sqrt{\frac{\log(dn)}{n}},\qquad\|\bar{\bm{b}}_{I}^{\,3}\|_{\infty}\leq Cb^{3}\log(dn).

Using

|αϕI(u)|Cα(1+u|α|)ϕI(u),|α|{2,3},|\partial^{\alpha}\phi_{I}(u)|\leq C_{\alpha}(1+\|u\|_{\infty}^{|\alpha|})\phi_{I}(u),\qquad|\alpha|\in\{2,3\},

and (56), we obtain on Ωn,X\Omega_{n,X},

𝒮t,aI|p^n,γ,I(u)ϕI(u)|𝑑u\displaystyle\int_{\mathcal{S}_{t,a}^{I}}|\hat{p}_{n,\gamma,I}(u)-\phi_{I}(u)|\,du C{log(dn)n(1+t)2+log(dn)n(1+t)3}𝒮t,aIϕI(u)𝑑u\displaystyle\leq C\left\{\sqrt{\frac{\log(dn)}{n}}(1+t)^{2}+\frac{\log(dn)}{\sqrt{n}}(1+t)^{3}\right\}\int_{\mathcal{S}_{t,a}^{I}}\phi_{I}(u)\,du
Ca(1+t)4εnπI(t),\displaystyle\leq Ca(1+t)^{4}\varepsilon_{n}\,\pi_{I}(t),

because εnCn1/2log(dn)\varepsilon_{n}\geq Cn^{-1/2}\log(dn) under Assumption 2.1. This is the bootstrap analogue of (57). ∎

A.5 Gaussian factorial moments, aggregation, and regularity

Define

pj(t):=(Zj>t)=Φ¯(tσj),λ(t):=j=1dpj(t).p_{j}(t):=\mathbb{P}(Z_{j}>t)=\bar{\Phi}\!\left(\frac{t}{\sigma_{j}}\right),\qquad\lambda(t):=\sum_{j=1}^{d}p_{j}(t).

Also define the elementary symmetric polynomial

es(t):=I[d]|I|=sjIpj(t).e_{s}(t):=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\prod_{j\in I}p_{j}(t).
Lemma A.14 (Gaussian factorial moments).

Assume Assumptions 2.3 and 2.4. Then there exists a sequence ηd0\eta_{d}\downarrow 0 such that, uniformly over t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon} and 1sk01\leq s\leq k_{0},

|VZ,s(t)λ(t)ss!|Cηd,\left|V_{Z,s}(t)-\frac{\lambda(t)^{s}}{s!}\right|\leq C\eta_{d}, (58)

where one may take

ηd:=C(daσ(logd)1/2+ρdlogd),aσ:=σ¯2σ¯2.\eta_{d}:=C\Bigl(d^{-a_{\sigma}}(\log d)^{-1/2}+\rho_{d}\log d\Bigr),\qquad a_{\sigma}:=\frac{\underline{\sigma}^{2}}{\overline{\sigma}^{2}}.

Consequently,

supt𝒯k,ϵ|Gk(t)hk(λ(t))|Cηd,hk(λ):=eλm=0k1λmm!.\sup_{t\in\mathcal{T}_{k,\epsilon}}|G_{k}(t)-h_{k}(\lambda(t))|\leq C\eta_{d},\qquad h_{k}(\lambda):=e^{-\lambda}\sum_{m=0}^{k-1}\frac{\lambda^{m}}{m!}. (59)
Proof.

Fix s{1,,k0}s\in\{1,\dots,k_{0}\}. First compare VZ,s(t)V_{Z,s}(t) with the elementary symmetric polynomial es(t)e_{s}(t). For every I={j1,,js}I=\{j_{1},\dots,j_{s}\}, Lemma A.4 applied repeatedly to the standardized vector (Zjr/σjr)rs\bigl(Z_{j_{r}}/\sigma_{j_{r}}\bigr)_{r\leq s} yields

|πI(t)jIpj(t)|Cρd(1+t2)jIpj(t).\left|\pi_{I}(t)-\prod_{j\in I}p_{j}(t)\right|\leq C\rho_{d}(1+t^{2})\prod_{j\in I}p_{j}(t).

Summing over |I|=s|I|=s and using t2logdt^{2}\asymp\log d from Lemma A.11 gives

|VZ,s(t)es(t)|Cρdlogdes(t).|V_{Z,s}(t)-e_{s}(t)|\leq C\rho_{d}\log d\,e_{s}(t). (60)

Next compare es(t)e_{s}(t) with λ(t)s/s!\lambda(t)^{s}/s!. Writing

λ(t)s=j1,,js=1dpj1(t)pjs(t)\lambda(t)^{s}=\sum_{j_{1},\dots,j_{s}=1}^{d}p_{j_{1}}(t)\cdots p_{j_{s}}(t)

and separating the terms with repeated indices, we obtain

|λ(t)ss!es(t)|Csλ(t)s2j=1dpj(t)2.\left|\frac{\lambda(t)^{s}}{s!}-e_{s}(t)\right|\leq C_{s}\lambda(t)^{s-2}\sum_{j=1}^{d}p_{j}(t)^{2}. (61)

Because λ(t)\lambda(t) stays in a compact interval by Lemma A.11 and

maxjpj(t)Φ¯(tσ¯)Cdaσ(logd)1/2,\max_{j}p_{j}(t)\leq\bar{\Phi}\!\left(\frac{t}{\overline{\sigma}}\right)\leq Cd^{-a_{\sigma}}(\log d)^{-1/2},

we obtain

j=1dpj(t)2λ(t)maxjpj(t)Cdaσ(logd)1/2.\sum_{j=1}^{d}p_{j}(t)^{2}\leq\lambda(t)\max_{j}p_{j}(t)\leq Cd^{-a_{\sigma}}(\log d)^{-1/2}.

Combining this with (60) and (61) proves (58).

Now apply Lemma A.1 to NZ(t)N_{Z}(t):

Gk(t)=1s=k(1)sk(s1k1)VZ,s(t).G_{k}(t)=1-\sum_{s=k}^{\infty}(-1)^{s-k}\binom{s-1}{k-1}V_{Z,s}(t).

The same identity with VZ,s(t)V_{Z,s}(t) replaced by λ(t)s/s!\lambda(t)^{s}/s! equals hk(λ(t))h_{k}(\lambda(t)). Using (58) for sk0s\leq k_{0} and the Gaussian tail bound from Lemma A.15(ii) below for s>k0s>k_{0} gives (59). ∎

Lemma A.15 (Weighted aggregation and Gaussian regularity).

Assume Assumptions 2.3 and 2.4. Then the following hold.

  1. (i)

    There exist constants 0<λ<λ+<0<\lambda_{-}<\lambda_{+}<\infty such that

    λλ(t)λ+for every t𝒯k,ϵ\lambda_{-}\leq\lambda(t)\leq\lambda_{+}\qquad\text{for every }t\in\mathcal{T}_{k,\epsilon}

    for all sufficiently large dd.

  2. (ii)

    If A>0A>0 is large enough in the definition of k0k_{0}, then

    supt𝒯k,ϵs=kk0(s1k1)MZ,s(t)C,\sup_{t\in\mathcal{T}_{k,\epsilon}}\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}M_{Z,s}(t)\leq C, (62)

    and

    supt𝒯k,ϵs=k0+1d(s1k1)MZ,s(t)Cεn2.\sup_{t\in\mathcal{T}_{k,\epsilon}}\sum_{s=k_{0}+1}^{d}\binom{s-1}{k-1}M_{Z,s}(t)\leq C\varepsilon_{n}^{2}. (63)
  3. (iii)

    GkG_{k} is C2C^{2} on a neighborhood of 𝒯k,ϵ\mathcal{T}_{k,\epsilon}, and there exist constants mk,ϵ,Bk,ϵ>0m_{k,\epsilon},B_{k,\epsilon}>0 such that

    fk(cp,kG)mk,ϵ,|(Gk1)′′(p)|Bk,ϵfor p[ϵ/2,1ϵ/2].f_{k}(c^{G}_{p,k})\geq m_{k,\epsilon},\qquad\bigl|(G_{k}^{-1})^{\prime\prime}(p)\bigr|\leq B_{k,\epsilon}\qquad\text{for }p\in[\epsilon/2,1-\epsilon/2]. (64)
Proof.

Part (i) was already proved in the proof of Lemma A.11. For part (ii), use Lemma A.14:

MZ,s(t)=VZ,s(t)=λ(t)ss!+O(ηd)(1sk0).M_{Z,s}(t)=V_{Z,s}(t)=\frac{\lambda(t)^{s}}{s!}+O(\eta_{d})\qquad(1\leq s\leq k_{0}).

Since λ(t)λ+\lambda(t)\leq\lambda_{+} and kk is fixed,

s=k(s1k1)λ+ss!<.\sum_{s=k}^{\infty}\binom{s-1}{k-1}\frac{\lambda_{+}^{s}}{s!}<\infty.

Therefore (62) follows. For the tail, use

MZ,s(t)=VZ,s(t)Cλ+ss!M_{Z,s}(t)=V_{Z,s}(t)\leq C\frac{\lambda_{+}^{s}}{s!}

for every s1s\geq 1, and then Stirling’s formula gives

s=k0+1(s1k1)λ+ss!Cexp(ck0logk0).\sum_{s=k_{0}+1}^{\infty}\binom{s-1}{k-1}\frac{\lambda_{+}^{s}}{s!}\leq C\exp(-ck_{0}\log k_{0}).

Choosing AA large enough yields (63).

For part (iii), write

Hk(t):=hk(λ(t)).H_{k}(t):=h_{k}(\lambda(t)).

On the compact interval [λ,λ+][\lambda_{-},\lambda_{+}] one has

hk(λ)=eλλk1(k1)!,infλ[λ,λ+]|hk(λ)|>0.h_{k}^{\prime}(\lambda)=-e^{-\lambda}\frac{\lambda^{k-1}}{(k-1)!},\qquad\inf_{\lambda\in[\lambda_{-},\lambda_{+}]}|h_{k}^{\prime}(\lambda)|>0.

Moreover,

λ(t)=j=1d1σjϕ(tσj),λ′′(t)=j=1dtσj3ϕ(tσj).\lambda^{\prime}(t)=-\sum_{j=1}^{d}\frac{1}{\sigma_{j}}\phi\!\left(\frac{t}{\sigma_{j}}\right),\qquad\lambda^{\prime\prime}(t)=\sum_{j=1}^{d}\frac{t}{\sigma_{j}^{3}}\phi\!\left(\frac{t}{\sigma_{j}}\right).

By Mills’ ratio and Lemma A.11,

|λ(t)|t,|λ′′(t)|C(1+t2)on 𝒯k,ϵ.|\lambda^{\prime}(t)|\asymp t,\qquad|\lambda^{\prime\prime}(t)|\leq C(1+t^{2})\qquad\text{on }\mathcal{T}_{k,\epsilon}.

Hence

Hk(t)=hk(λ(t))λ(t),Hk′′(t)=hk′′(λ(t))(λ(t))2+hk(λ(t))λ′′(t)H_{k}^{\prime}(t)=h_{k}^{\prime}(\lambda(t))\lambda^{\prime}(t),\qquad H_{k}^{\prime\prime}(t)=h_{k}^{\prime\prime}(\lambda(t))(\lambda^{\prime}(t))^{2}+h_{k}^{\prime}(\lambda(t))\lambda^{\prime\prime}(t)

with

|Hk(t)|t,|Hk′′(t)|C(1+t2).|H_{k}^{\prime}(t)|\asymp t,\qquad|H_{k}^{\prime\prime}(t)|\leq C(1+t^{2}). (65)

Differentiating the factorial expansion termwise and using the same argument as in Lemma A.14,

supt𝒯k,ϵ|Gk(t)Hk(t)|+supt𝒯k,ϵ|Gk′′(t)Hk′′(t)|Cηd(1+t2).\sup_{t\in\mathcal{T}_{k,\epsilon}}|G_{k}^{\prime}(t)-H_{k}^{\prime}(t)|+\sup_{t\in\mathcal{T}_{k,\epsilon}}|G_{k}^{\prime\prime}(t)-H_{k}^{\prime\prime}(t)|\leq C\eta_{d}(1+t^{2}). (66)

Since ηd0\eta_{d}\to 0 and tlogdt\asymp\sqrt{\log d}, (65) and (66) imply

fk(t)=Gk(t)ctmk,ϵ>0for t𝒯k,ϵf_{k}(t)=G_{k}^{\prime}(t)\geq ct\geq m_{k,\epsilon}>0\qquad\text{for }t\in\mathcal{T}_{k,\epsilon}

for all sufficiently large dd. Finally,

(Gk1)′′(p)=fk(Gk1(p))fk(Gk1(p))3(G_{k}^{-1})^{\prime\prime}(p)=-\frac{f_{k}^{\prime}(G_{k}^{-1}(p))}{f_{k}(G_{k}^{-1}(p))^{3}}

and (65)–(66) imply the asserted boundedness. ∎

A.6 Factorial-moment and distribution expansions

Theorem A.1 (Factorial-moment expansion).

Assume Assumptions 2.12.4. Then

supt𝒯k,ϵs=kk0(s1k1)|Vn,s(t)Mn,s(t)|Cεn2,\sup_{t\in\mathcal{T}_{k,\epsilon}}\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}|V_{n,s}(t)-M_{n,s}(t)|\leq C\varepsilon_{n}^{2}, (67)

and, with probability at least 1C/n1-C/n,

supt𝒯k,ϵs=kk0(s1k1)|Vn,s(t)M^n,s,γ(t)|Cεn2.\sup_{t\in\mathcal{T}_{k,\epsilon}}\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}|V^{*}_{n,s}(t)-\hat{M}_{n,s,\gamma}(t)|\leq C\varepsilon_{n}^{2}. (68)

Comment. Theorem A.1 converts the local projected Edgeworth expansions into a weighted approximation for the factorial moments of the exceedance count. This is the combinatorial bridge from rare orthant probabilities to the law of the kkth largest coordinate.

Proof.

For every integer s1s\geq 1,

Vn,s(t)=I[d]|I|=s(𝑺n,I(t,)s),Mn,s(t)=I[d]|I|=s(t,)spn,I(u)𝑑u.V_{n,s}(t)=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\mathbb{P}\bigl(\bm{S}_{n,I}\in(t,\infty)^{s}\bigr),\qquad M_{n,s}(t)=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\int_{(t,\infty)^{s}}p_{n,I}(u)\,du.

Hence

|Vn,s(t)Mn,s(t)|\displaystyle|V_{n,s}(t)-M_{n,s}(t)| I[d]|I|=s|(𝑺n,I(t,)s)(t,)spn,I(u)𝑑u|\displaystyle\leq\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\left|\mathbb{P}\bigl(\bm{S}_{n,I}\in(t,\infty)^{s}\bigr)-\int_{(t,\infty)^{s}}p_{n,I}(u)\,du\right|
Cεn2I[d]|I|=sπI(t)=Cεn2MZ,s(t)\displaystyle\leq C\varepsilon_{n}^{2}\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\pi_{I}(t)=C\varepsilon_{n}^{2}M_{Z,s}(t)

by Proposition A.1. Summing with the weights (s1k1)\binom{s-1}{k-1} and using (62) gives (67).

For the bootstrap statement, work on the event from (43). Then

Vn,s(t)=I[d]|I|=s(𝑺n,I(t,)s),M^n,s,γ(t)=I[d]|I|=s(t,)sp^n,γ,I(u)𝑑u,V_{n,s}^{*}(t)=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\mathbb{P}^{*}\bigl(\bm{S}_{n,I}^{*}\in(t,\infty)^{s}\bigr),\qquad\hat{M}_{n,s,\gamma}(t)=\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\int_{(t,\infty)^{s}}\hat{p}_{n,\gamma,I}(u)\,du,

and therefore

|Vn,s(t)M^n,s,γ(t)|\displaystyle|V_{n,s}^{*}(t)-\hat{M}_{n,s,\gamma}(t)| I[d]|I|=s|(𝑺n,I(t,)s)(t,)sp^n,γ,I(u)𝑑u|\displaystyle\leq\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\left|\mathbb{P}^{*}\bigl(\bm{S}_{n,I}^{*}\in(t,\infty)^{s}\bigr)-\int_{(t,\infty)^{s}}\hat{p}_{n,\gamma,I}(u)\,du\right|
Cεn2I[d]|I|=sπI(t)=Cεn2MZ,s(t).\displaystyle\leq C\varepsilon_{n}^{2}\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\pi_{I}(t)=C\varepsilon_{n}^{2}M_{Z,s}(t).

Summing again with the weights (s1k1)\binom{s-1}{k-1} and using (62) proves (68). ∎

Theorem A.2 (Distribution expansion).

Assume Assumptions 2.12.4. Then

supt𝒯k,ϵ|(Tn,[k]t)(Gk(t)+Qn,k(t))|Cεn2,\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}(T_{n,[k]}\leq t)-\bigl(G_{k}(t)+Q_{n,k}(t)\bigr)\right|\leq C\varepsilon_{n}^{2}, (69)

and, with probability at least 1C/n1-C/n,

supt𝒯k,ϵ|(Tn,[k]t)(Gk(t)+Q^n,γ,k(t))|Cεn2.\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}^{*}(T_{n,[k]}^{*}\leq t)-\bigl(G_{k}(t)+\hat{Q}_{n,\gamma,k}(t)\bigr)\right|\leq C\varepsilon_{n}^{2}. (70)

Moreover,

supt𝒯k,ϵ(|Qn,k(t)|+|Qn,k(t)|+|Qn,k′′(t)|)Cεn,\sup_{t\in\mathcal{T}_{k,\epsilon}}\Bigl(|Q_{n,k}(t)|+|Q_{n,k}^{\prime}(t)|+|Q_{n,k}^{\prime\prime}(t)|\Bigr)\leq C\varepsilon_{n}, (71)

and, with probability at least 1C/n1-C/n,

supt𝒯k,ϵ(|Q^n,γ,k(t)|+|Q^n,γ,k(t)|+|Q^n,γ,k′′(t)|)Cεn.\sup_{t\in\mathcal{T}_{k,\epsilon}}\Bigl(|\hat{Q}_{n,\gamma,k}(t)|+|\hat{Q}_{n,\gamma,k}^{\prime}(t)|+|\hat{Q}_{n,\gamma,k}^{\prime\prime}(t)|\Bigr)\leq C\varepsilon_{n}. (72)

Comment. Theorem A.2 upgrades the factorial-moment approximation to a distributional expansion for Tn,[k]T_{n,[k]} and its bootstrap analogue. It also shows that the correction term Qn,kQ_{n,k} is smooth enough for the quantile inversion carried out later.

Proof.

By Lemma A.1,

(Tn,[k]>t)=s=kd(1)sk(s1k1)Vn,s(t).\mathbb{P}(T_{n,[k]}>t)=\sum_{s=k}^{d}(-1)^{s-k}\binom{s-1}{k-1}V_{n,s}(t). (73)

Split the right-hand side at k0k_{0}. For ksk0k\leq s\leq k_{0}, Theorem A.1 gives

Vn,s(t)=Mn,s(t)+rn,s(t),|rn,s(t)|Cεn2MZ,s(t).V_{n,s}(t)=M_{n,s}(t)+r_{n,s}(t),\qquad|r_{n,s}(t)|\leq C\varepsilon_{n}^{2}M_{Z,s}(t). (74)

Substituting (74) into (73), and using

s=kk0(s1k1)MZ,s(t)=(T𝒁,[k]>t)+O(εn2)\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}M_{Z,s}(t)=\mathbb{P}(T_{\bm{Z},[k]}>t)+O(\varepsilon_{n}^{2})

from (63), we obtain

(Tn,[k]t)=Gk(t)s=kk0(1)sk(s1k1){Mn,s(t)MZ,s(t)}+O(εn2).\mathbb{P}(T_{n,[k]}\leq t)=G_{k}(t)-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}\{M_{n,s}(t)-M_{Z,s}(t)\}+O(\varepsilon_{n}^{2}).

The sum equals Qn,k(t)Q_{n,k}(t) by (1), so (69) follows.

For the bootstrap expansion, work on the event (68). On that event,

(Tn,[k]t)\displaystyle\mathbb{P}^{*}(T_{n,[k]}^{*}\leq t) =Gk(t)s=kk0(1)sk(s1k1){Vn,s(t)VZ,s(t)}+O(εn2)\displaystyle=G_{k}(t)-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}\{V_{n,s}^{*}(t)-V_{Z,s}(t)\}+O(\varepsilon_{n}^{2})
=Gk(t)s=kk0(1)sk(s1k1){M^n,s,γ(t)MZ,s(t)}+O(εn2),\displaystyle=G_{k}(t)-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}\{\hat{M}_{n,s,\gamma}(t)-M_{Z,s}(t)\}+O(\varepsilon_{n}^{2}),

uniformly on 𝒯k,ϵ\mathcal{T}_{k,\epsilon}, which is exactly (70).

It remains to prove the derivative bounds. Fix s{k,,k0}s\in\{k,\dots,k_{0}\}. By (19),

Mn,s(t)MZ,s(t)=16nI[d]|I|=s(t,)s𝔼[𝑿¯I 3],3ϕI(u)𝑑u.M_{n,s}(t)-M_{Z,s}(t)=-\frac{1}{6\sqrt{n}}\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\int_{(t,\infty)^{s}}\left\langle\mathbb{E}[\bar{\bm{X}}_{I}^{\,3}],\nabla^{3}\phi_{I}(u)\right\rangle\,du. (75)

Since 𝔼[𝑿¯I 3]C\|\mathbb{E}[\bar{\bm{X}}_{I}^{\,3}]\|_{\infty}\leq C uniformly for |I|k0|I|\leq k_{0},

|Mn,s(t)MZ,s(t)|C(1+t)3nMZ,s(t)CεnMZ,s(t),|M_{n,s}(t)-M_{Z,s}(t)|\leq\frac{C(1+t)^{3}}{\sqrt{n}}M_{Z,s}(t)\leq C\varepsilon_{n}M_{Z,s}(t), (76)

where the first inequality follows from the Gaussian derivative bound

|αϕI(u)|Cα(1+u3)ϕI(u),|α|=3,|\partial^{\alpha}\phi_{I}(u)|\leq C_{\alpha}(1+\|u\|_{\infty}^{3})\phi_{I}(u),\qquad|\alpha|=3,

and the second uses t2logdt^{2}\asymp\log d.

Differentiate (75) with respect to tt. By the fundamental theorem of calculus, each derivative creates a finite sum of boundary integrals over (s1)(s-1)-dimensional faces. Therefore

ddt{Mn,s(t)MZ,s(t)}\displaystyle\frac{d}{dt}\{M_{n,s}(t)-M_{Z,s}(t)\} =16nI[d]|I|=sr=1s(t,)s1𝔼[𝑿¯I 3],3ϕI(u(r,t))𝑑ur,\displaystyle=\frac{1}{6\sqrt{n}}\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\sum_{r=1}^{s}\int_{(t,\infty)^{s-1}}\left\langle\mathbb{E}[\bar{\bm{X}}_{I}^{\,3}],\nabla^{3}\phi_{I}(u^{(r,t)})\right\rangle\,du_{-r},

hence

|ddt{Mn,s(t)MZ,s(t)}|CεnMZ,s(t).\left|\frac{d}{dt}\{M_{n,s}(t)-M_{Z,s}(t)\}\right|\leq C\varepsilon_{n}M_{Z,s}(t).

Differentiating once more produces second-face integrals and diagonal boundary terms. The same Gaussian derivative estimate and the strip estimate of Lemma A.13 imply

|d2dt2{Mn,s(t)MZ,s(t)}|CεnMZ,s(t).\left|\frac{d^{2}}{dt^{2}}\{M_{n,s}(t)-M_{Z,s}(t)\}\right|\leq C\varepsilon_{n}M_{Z,s}(t). (77)

Summing (76)–(77) with the weights in (1) and using (62) proves (71).

For the bootstrap derivative bounds, work on Ωn,X\Omega_{n,X} from Lemma A.9. By (20),

M^n,s,γ(t)MZ,s(t)\displaystyle\hat{M}_{n,s,\gamma}(t)-M_{Z,s}(t) =12I[d]|I|=s(t,)s𝒃¯I 2ΣII,2ϕI(u)𝑑u\displaystyle=\frac{1}{2}\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\int_{(t,\infty)^{s}}\left\langle\bar{\bm{b}}_{I}^{\,2}-\Sigma_{II},\nabla^{2}\phi_{I}(u)\right\rangle\,du
γ6nI[d]|I|=s(t,)s𝒃¯I 3,3ϕI(u)𝑑u.\displaystyle\qquad-\frac{\gamma}{6\sqrt{n}}\sum_{\begin{subarray}{c}I\subset[d]\\ |I|=s\end{subarray}}\int_{(t,\infty)^{s}}\left\langle\bar{\bm{b}}_{I}^{\,3},\nabla^{3}\phi_{I}(u)\right\rangle\,du.

Using (37)–(38), together with the Gaussian derivative estimates for |α|=2,3|\alpha|=2,3 and the same boundary differentiation as above, yields

|M^n,s,γ(t)MZ,s(t)|+|ddt{M^n,s,γ(t)MZ,s(t)}|+|d2dt2{M^n,s,γ(t)MZ,s(t)}|CεnMZ,s(t)|\hat{M}_{n,s,\gamma}(t)-M_{Z,s}(t)|+\left|\frac{d}{dt}\{\hat{M}_{n,s,\gamma}(t)-M_{Z,s}(t)\}\right|+\left|\frac{d^{2}}{dt^{2}}\{\hat{M}_{n,s,\gamma}(t)-M_{Z,s}(t)\}\right|\leq C\varepsilon_{n}M_{Z,s}(t)

uniformly on Ωn,X\Omega_{n,X}. Summing with the weights in (1) proves (72). ∎

A.7 Bootstrap centering and Cornish–Fisher inversion

Lemma A.16 (Bootstrap centering).

Assume Assumptions 2.12.4. Then

supt𝒯k,ϵ|𝔼[Q^n,γ,k(t)]γQn,k(t)|Cεn2.\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{E}\bigl[\hat{Q}_{n,\gamma,k}(t)\bigr]-\gamma Q_{n,k}(t)\right|\leq C\varepsilon_{n}^{2}. (78)
Proof.

Fix s{k,,k0}s\in\{k,\dots,k_{0}\}. For every II with |I|=s|I|=s,

𝔼[𝒃¯I 2]𝚺II\displaystyle\mathbb{E}\bigl[\bar{\bm{b}}_{I}^{\,2}\bigr]-\mathbf{\Sigma}_{II} =1n𝚺II,\displaystyle=-\frac{1}{n}\mathbf{\Sigma}_{II}, (79)
𝔼[𝒃¯I 3]𝔼[𝑿¯I 3]max\displaystyle\left\|\mathbb{E}\bigl[\bar{\bm{b}}_{I}^{\,3}\bigr]-\mathbb{E}\bigl[\bar{\bm{X}}_{I}^{\,3}\bigr]\right\|_{\max} Cn.\displaystyle\leq\frac{C}{n}. (80)

Indeed, (79) is the usual bias of the sample covariance, and (80) follows by expanding (𝑷I𝑿i𝑷I𝑿¯)3(\bm{P}_{I}\bm{X}_{i}-\bm{P}_{I}\bar{\bm{X}})^{\otimes 3} and observing that every difference term contains at least one factor 𝑿¯\bar{\bm{X}}.

Now integrate (20) over (t,)s(t,\infty)^{s}, take expectations, subtract γ{Mn,s(t)MZ,s(t)}\gamma\{M_{n,s}(t)-M_{Z,s}(t)\}, and use (79)–(80). By Lemma A.13,

|𝔼[M^n,s,γ(t)MZ,s(t)]γ{Mn,s(t)MZ,s(t)}|C((1+t)2n+(1+t)3n3/2)MZ,s(t).\left|\mathbb{E}\bigl[\hat{M}_{n,s,\gamma}(t)-M_{Z,s}(t)\bigr]-\gamma\{M_{n,s}(t)-M_{Z,s}(t)\}\right|\leq C\left(\frac{(1+t)^{2}}{n}+\frac{(1+t)^{3}}{n^{3/2}}\right)M_{Z,s}(t).

Since tlogdt\asymp\sqrt{\log d} on 𝒯k,ϵ\mathcal{T}_{k,\epsilon}, the right-hand side is O(εn2MZ,s(t))O(\varepsilon_{n}^{2}M_{Z,s}(t)). Summing over ss with the weights in (1) and using (62) proves (78). ∎

Theorem A.3 (Cornish–Fisher expansion).

Assume Assumptions 2.12.4. Then, with probability at least 1C/n1-C/n,

supϵ<α<1ϵ|c^1α,k[c1α,kGQ^n,γ,k(c1α,kG)fk(c1α,kG)+Rn,k(α)]|Cεn3.\sup_{\epsilon<\alpha<1-\epsilon}\left|\hat{c}_{1-\alpha,k}-\left[c^{G}_{1-\alpha,k}-\frac{\hat{Q}_{n,\gamma,k}(c^{G}_{1-\alpha,k})}{f_{k}(c^{G}_{1-\alpha,k})}+R_{n,k}(\alpha)\right]\right|\leq C\varepsilon_{n}^{3}. (81)

Comment. Theorem A.3 identifies the bootstrap critical value as a Gaussian quantile perturbed by an explicit linear term and a quadratic correction. This is the quantile-level expansion needed to turn the distributional approximation into a coverage expansion.

Proof.

Fix α(ϵ,1ϵ)\alpha\in(\epsilon,1-\epsilon) and abbreviate

ck:=c1α,kG,Q^k:=Q^n,γ,k(ck),Q^k:=Q^n,γ,k(ck).c_{k}:=c^{G}_{1-\alpha,k},\qquad\hat{Q}_{k}:=\hat{Q}_{n,\gamma,k}(c_{k}),\qquad\hat{Q}_{k}^{\prime}:=\hat{Q}_{n,\gamma,k}^{\prime}(c_{k}).

On the event of (70) and (72),

F^n,k(t)=Gk(t)+Q^n,γ,k(t)+rn(t),supt𝒯k,ϵ|rn(t)|Cεn2.\hat{F}_{n,k}(t)=G_{k}(t)+\hat{Q}_{n,\gamma,k}(t)+r_{n}(t),\qquad\sup_{t\in\mathcal{T}_{k,\epsilon}}|r_{n}(t)|\leq C\varepsilon_{n}^{2}. (82)

Because Gk(ck)=1αG_{k}(c_{k})=1-\alpha and fk(ck)mk,ϵ>0f_{k}(c_{k})\geq m_{k,\epsilon}>0, the implicit function theorem yields a unique root c^1α,k=ck+Δk\hat{c}_{1-\alpha,k}=c_{k}+\Delta_{k} with |Δk|Cεn|\Delta_{k}|\leq C\varepsilon_{n}. Substituting t=ck+Δkt=c_{k}+\Delta_{k} into (82) and using Taylor’s formula up to order 22 gives

0\displaystyle 0 =F^n,k(ck+Δk)(1α)\displaystyle=\hat{F}_{n,k}(c_{k}+\Delta_{k})-(1-\alpha)
=fk(ck)Δk+12fk(ck)Δk2+Q^k+Q^kΔk+12Q^n,γ,k′′(ξk)Δk2+rn(ck+Δk)\displaystyle=f_{k}(c_{k})\Delta_{k}+\frac{1}{2}f_{k}^{\prime}(c_{k})\Delta_{k}^{2}+\hat{Q}_{k}+\hat{Q}_{k}^{\prime}\Delta_{k}+\frac{1}{2}\hat{Q}_{n,\gamma,k}^{\prime\prime}(\xi_{k})\Delta_{k}^{2}+r_{n}(c_{k}+\Delta_{k}) (83)

for some ξk\xi_{k} between ckc_{k} and ck+Δkc_{k}+\Delta_{k}. Since Q^n,γ,k′′(ξk)=O(εn)\hat{Q}_{n,\gamma,k}^{\prime\prime}(\xi_{k})=O(\varepsilon_{n}) by (72), the last quadratic term in (83) is O(εn3)O(\varepsilon_{n}^{3}). Solving (83) iteratively,

Δk=Q^kfk(ck)+fk(ck)2fk(ck)3Q^k2Q^kfk(ck)2Q^k+O(εn3).\Delta_{k}=-\frac{\hat{Q}_{k}}{f_{k}(c_{k})}+\frac{f_{k}^{\prime}(c_{k})}{2f_{k}(c_{k})^{3}}\hat{Q}_{k}^{2}-\frac{\hat{Q}_{k}^{\prime}}{f_{k}(c_{k})^{2}}\hat{Q}_{k}+O(\varepsilon_{n}^{3}).

This is exactly (81); compare also the classical Cornish–Fisher inversion formulas in Hall (1992, Chapter 2). ∎

A.8 Coverage expansion

Proof of Theorem 2.1.

Fix α(ϵ,1ϵ)\alpha\in(\epsilon,1-\epsilon) and write

ck:=c1α,kG,Fn,k(t):=(Tn,[k]t).c_{k}:=c^{G}_{1-\alpha,k},\qquad F_{n,k}(t):=\mathbb{P}(T_{n,[k]}\leq t).

By (69) and (71),

Fn,k(t)=Gk(t)+Qn,k(t)+rn(t),supt𝒯k,ϵ|rn(t)|Cεn2.F_{n,k}(t)=G_{k}(t)+Q_{n,k}(t)+r_{n}(t),\qquad\sup_{t\in\mathcal{T}_{k,\epsilon}}|r_{n}(t)|\leq C\varepsilon_{n}^{2}. (84)

Let EnE_{n} denote the event on which the Cornish–Fisher expansion (81) holds and c^1α,k𝒯k,ϵ/2\hat{c}_{1-\alpha,k}\in\mathcal{T}_{k,\epsilon/2}. Then

(Enc)CnCεn2.\mathbb{P}(E_{n}^{c})\leq\frac{C}{n}\leq C\varepsilon_{n}^{2}.

On EnE_{n} define

Δk:=c^1α,kck.\Delta_{k}:=\hat{c}_{1-\alpha,k}-c_{k}.

By Theorem A.3,

Δk=Q^n,γ,k(ck)fk(ck)+Rn,k(α)+ζn,k(α),|ζn,k(α)|Cεn3.\Delta_{k}=-\frac{\hat{Q}_{n,\gamma,k}(c_{k})}{f_{k}(c_{k})}+R_{n,k}(\alpha)+\zeta_{n,k}(\alpha),\qquad|\zeta_{n,k}(\alpha)|\leq C\varepsilon_{n}^{3}. (85)

Also |Δk|Cεn|\Delta_{k}|\leq C\varepsilon_{n} on EnE_{n}. Since Fn,kF_{n,k} is deterministic, Taylor’s formula on EnE_{n} gives

Fn,k(c^1α,k)\displaystyle F_{n,k}(\hat{c}_{1-\alpha,k}) =Fn,k(ck)+Fn,k(ck)Δk+12Fn,k′′(ξn,k)Δk2\displaystyle=F_{n,k}(c_{k})+F_{n,k}^{\prime}(c_{k})\Delta_{k}+\frac{1}{2}F_{n,k}^{\prime\prime}(\xi_{n,k})\Delta_{k}^{2} (86)

for some ξn,k\xi_{n,k} between ckc_{k} and c^1α,k\hat{c}_{1-\alpha,k}. From (84), (71), and (64),

Fn,k(ck)=(1α)+Qn,k(ck)+O(εn2),F_{n,k}(c_{k})=(1-\alpha)+Q_{n,k}(c_{k})+O(\varepsilon_{n}^{2}),
Fn,k(ck)=fk(ck)+Qn,k(ck)+O(εn2),F_{n,k}^{\prime}(c_{k})=f_{k}(c_{k})+Q_{n,k}^{\prime}(c_{k})+O(\varepsilon_{n}^{2}),

and

Fn,k′′(ξn,k)=fk(ck)+O(1).F_{n,k}^{\prime\prime}(\xi_{n,k})=f_{k}^{\prime}(c_{k})+O(1).

Substituting these bounds and (85) into (86), using Qn,k(ck)=O(εn)Q_{n,k}^{\prime}(c_{k})=O(\varepsilon_{n}) and Δk=O(εn)\Delta_{k}=O(\varepsilon_{n}), yields on EnE_{n},

Fn,k(c^1α,k)\displaystyle F_{n,k}(\hat{c}_{1-\alpha,k}) =(1α)+Qn,k(ck)Q^n,γ,k(ck)+Rn,k(α)+O(εn2).\displaystyle=(1-\alpha)+Q_{n,k}(c_{k})-\hat{Q}_{n,\gamma,k}(c_{k})+R_{n,k}(\alpha)+O(\varepsilon_{n}^{2}). (87)

Now take expectations. Since 0Fn,k(c^1α,k)10\leq F_{n,k}(\hat{c}_{1-\alpha,k})\leq 1,

|𝔼[Fn,k(c^1α,k)𝟏Enc]|(Enc)Cεn2.\left|\mathbb{E}\bigl[F_{n,k}(\hat{c}_{1-\alpha,k})\mathbf{1}_{E_{n}^{c}}\bigr]\right|\leq\mathbb{P}(E_{n}^{c})\leq C\varepsilon_{n}^{2}.

Therefore

(Tn,[k]c^1α,k)=𝔼[Fn,k(c^1α,k)𝟏En]+O(εn2).\mathbb{P}(T_{n,[k]}\leq\hat{c}_{1-\alpha,k})=\mathbb{E}\bigl[F_{n,k}(\hat{c}_{1-\alpha,k})\mathbf{1}_{E_{n}}\bigr]+O(\varepsilon_{n}^{2}). (88)

Taking expectations in (87) and using Lemma A.16,

𝔼[Qn,k(ck)Q^n,γ,k(ck)]=(1γ)Qn,k(ck)+O(εn2).\mathbb{E}\bigl[Q_{n,k}(c_{k})-\hat{Q}_{n,\gamma,k}(c_{k})\bigr]=(1-\gamma)Q_{n,k}(c_{k})+O(\varepsilon_{n}^{2}).

Combining this with (88) gives

(Tn,[k]c^1α,k)=(1α)+(1γ)Qn,k(ck)+𝔼{Rn,k(α)}+O(εn2).\mathbb{P}(T_{n,[k]}\leq\hat{c}_{1-\alpha,k})=(1-\alpha)+(1-\gamma)Q_{n,k}(c_{k})+\mathbb{E}\{R_{n,k}(\alpha)\}+O(\varepsilon_{n}^{2}).

Taking complements proves (2). ∎

Proof of Corollary 2.1.

If γ=1\gamma=1, the first-order term disappears in Theorem 2.1. Also,

|Rn,k(α)|C(|Q^n,γ,k(ck)|2+|Q^n,γ,k(ck)||Q^n,γ,k(ck)|)Cεn2|R_{n,k}(\alpha)|\leq C\bigl(|\hat{Q}_{n,\gamma,k}(c_{k})|^{2}+|\hat{Q}_{n,\gamma,k}^{\prime}(c_{k})|\,|\hat{Q}_{n,\gamma,k}(c_{k})|\bigr)\leq C\varepsilon_{n}^{2}

by (72), hence 𝔼|Rn,k(α)|Cεn2\mathbb{E}|R_{n,k}(\alpha)|\leq C\varepsilon_{n}^{2} uniformly in α\alpha. ∎

Proof of Corollary 2.2.

The claim follows immediately from Theorem 2.1 and the uniform bound 𝔼|Rn,k(α)|Cεn2\mathbb{E}|R_{n,k}(\alpha)|\leq C\varepsilon_{n}^{2}. ∎

A.9 Deterministic conditional theorem and double bootstrap

Theorem A.4 (Deterministic-array conditional theorem).

Let 𝐚1,,𝐚nd\bm{a}_{1},\dots,\bm{a}_{n}\in\mathbb{R}^{d} be deterministic and define

𝑻n(𝒂):=1ni=1nvi𝒂i.\bm{T}_{n}(\bm{a}):=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}v_{i}\bm{a}_{i}.

Assume that for some constants LnL_{n} and rnr_{n},

max1in𝒂iLn,\max_{1\leq i\leq n}\|\bm{a}_{i}\|_{\infty}\leq L_{n}, (89)

and for every I[d]I\subset[d] with 1|I|k01\leq|I|\leq k_{0},

λmin(1ni=1n𝑷I𝒂i(𝑷I𝒂i))12σ2,\lambda_{\min}\!\left(\frac{1}{n}\sum_{i=1}^{n}\bm{P}_{I}\bm{a}_{i}(\bm{P}_{I}\bm{a}_{i})^{\top}\right)\geq\frac{1}{2}\sigma_{*}^{2}, (90)
1ni=1n𝑷I𝒂i(𝑷I𝒂i)𝚺IImaxrn.\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{P}_{I}\bm{a}_{i}(\bm{P}_{I}\bm{a}_{i})^{\top}-\mathbf{\Sigma}_{II}\right\|_{\max}\leq r_{n}. (91)

Then the conclusions of Theorems A.2, A.3, and 2.1 hold for the conditional law v()\mathbb{P}_{v}(\cdot) of the kkth order statistic of 𝐓n(𝐚)\bm{T}_{n}(\bm{a}), with constants uniform over all deterministic arrays satisfying (89)–(91) and with the same second-order rate Cεn2C\varepsilon_{n}^{2}.

Comment. Theorem A.4 isolates the deterministic conditions needed for the second bootstrap level. Once the first-level resample satisfies these array conditions, the same second-order expansion follows conditionally.

Proof.

Fix a deterministic array 𝒂1,,𝒂n\bm{a}_{1},\dots,\bm{a}_{n} satisfying (89)–(91). For every nonempty I[d]I\subset[d] with |I|=sk0|I|=s\leq k_{0}, define

𝒂i,I:=𝑷I𝒂i,𝒂¯I 2:=1ni=1n𝒂i,I𝒂i,I,𝒂¯I 3:=1ni=1n𝒂i,I3.\bm{a}_{i,I}:=\bm{P}_{I}\bm{a}_{i},\qquad\bar{\bm{a}}_{I}^{\,2}:=\frac{1}{n}\sum_{i=1}^{n}\bm{a}_{i,I}\bm{a}_{i,I}^{\top},\qquad\bar{\bm{a}}_{I}^{\,3}:=\frac{1}{n}\sum_{i=1}^{n}\bm{a}_{i,I}^{\otimes 3}.

Let

𝑻n,I(𝒂):=𝑷I𝑻n(𝒂)=1ni=1nvi𝒂i,I.\bm{T}_{n,I}(\bm{a}):=\bm{P}_{I}\bm{T}_{n}(\bm{a})=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}v_{i}\bm{a}_{i,I}.

Because viv_{i} satisfies the same regularity condition as Assumption 2.2, the projected summand

ξi,I𝒂:=1nvi𝒂i,I\xi_{i,I}^{\bm{a}}:=\frac{1}{\sqrt{n}}v_{i}\bm{a}_{i,I}

has exact Stein kernel

τi,I𝒂(ξi,I𝒂)=1nτv(vi)𝒂i,I𝒂i,I,\tau_{i,I}^{\bm{a}}(\xi_{i,I}^{\bm{a}})=\frac{1}{n}\tau^{v}(v_{i})\bm{a}_{i,I}\bm{a}_{i,I}^{\top},

where τv\tau^{v} denotes the scalar Stein kernel of viv_{i} (or τv1\tau^{v}\equiv 1 in the Gaussian case). Hence

i=1n𝔼v[τi,I𝒂(ξi,I𝒂)]=𝒂¯I 2,𝔼v[i=1n(ξi,I𝒂)3]=1n𝒂¯I 3,βi,I𝒂0.\sum_{i=1}^{n}\mathbb{E}_{v}[\tau_{i,I}^{\bm{a}}(\xi_{i,I}^{\bm{a}})]=\bar{\bm{a}}_{I}^{\,2},\qquad\mathbb{E}_{v}\left[\sum_{i=1}^{n}(\xi_{i,I}^{\bm{a}})^{\otimes 3}\right]=\frac{1}{\sqrt{n}}\bar{\bm{a}}_{I}^{\,3},\qquad\beta_{i,I}^{\bm{a}}\equiv 0.

Now define the projected deterministic-array Edgeworth density

pn,𝒂,I(u):=ϕI(u)+12𝒂¯I 2ΣII,2ϕI(u)16n𝒂¯I 3,3ϕI(u).p_{n,\bm{a},I}(u):=\phi_{I}(u)+\frac{1}{2}\left\langle\bar{\bm{a}}_{I}^{\,2}-\Sigma_{II},\nabla^{2}\phi_{I}(u)\right\rangle-\frac{1}{6\sqrt{n}}\left\langle\bar{\bm{a}}_{I}^{\,3},\nabla^{3}\phi_{I}(u)\right\rangle.

Fix I[d]I\subset[d] with 1|I|k01\leq|I|\leq k_{0} and write s:=|I|s:=|I|. Set

𝑻n,I(𝒂):=n1/2i=1nvi𝒂i,I,At:=(,t]s,ht,u(x):=𝔼[𝟏At(1ux+uZI)].\bm{T}_{n,I}(\bm{a}):=n^{-1/2}\sum_{i=1}^{n}v_{i}\bm{a}_{i,I},\qquad A_{t}^{-}:=(-\infty,-t]^{s},\qquad h_{t,u}(x):=\mathbb{E}\bigl[\mathbf{1}_{A_{t}^{-}}(\sqrt{1-u}\,x+\sqrt{u}\,Z_{I}^{-})\bigr].

Then

v(𝑻n,I(𝒂)(t,)s)=v(𝑻n,I(𝒂)At).\mathbb{P}_{v}\bigl(\bm{T}_{n,I}(\bm{a})\in(t,\infty)^{s}\bigr)=\mathbb{P}_{v}\bigl(-\bm{T}_{n,I}(\bm{a})\in A_{t}^{-}\bigr).

For the deterministic array 𝒂\bm{a}, the proof of Proposition A.1 uses only the following inputs:

max1in𝒂iLn,𝒂¯I 2ΣIImaxrn,λmin(𝒂¯I 2)12σ2,\max_{1\leq i\leq n}\|\bm{a}_{i}\|_{\infty}\leq L_{n},\qquad\left\|\bar{\bm{a}}_{I}^{\,2}-\Sigma_{II}\right\|_{\max}\leq r_{n},\qquad\lambda_{\min}(\bar{\bm{a}}_{I}^{\,2})\geq\frac{1}{2}\sigma_{*}^{2},

which are exactly (89)–(91). Therefore,

supt𝒯k,ϵ|v(𝑻n,I(𝒂)(t,)s)(t,)spn,𝒂,I(u)𝑑u|Cεn2πI(t)\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}_{v}\bigl(\bm{T}_{n,I}(\bm{a})\in(t,\infty)^{s}\bigr)-\int_{(t,\infty)^{s}}p_{n,\bm{a},I}(u)\,du\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t) (92)

uniformly over all admissible deterministic arrays.

Starting from (92), the factorial-moment argument gives

supt𝒯k,ϵ|s=kk0(1)sk(s1k1){Vn,s(𝒂)(t)Mn,s(𝒂)(t)}|Cεn2,\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}\left\{V_{n,s}^{(\bm{a})}(t)-M_{n,s}^{(\bm{a})}(t)\right\}\right|\leq C\varepsilon_{n}^{2},

where Vn,s(𝒂)V_{n,s}^{(\bm{a})} and Mn,s(𝒂)M_{n,s}^{(\bm{a})} are the conditional factorial moment and its first-order approximation built from 𝑻n(𝒂)\bm{T}_{n}(\bm{a}). Substituting this identity into the weighted inclusion–exclusion formula gives the deterministic-array analogue of Theorem A.2. The Cornish–Fisher and coverage expansions then follow from the same algebraic steps as in Sections A.7A.8 after replacing Qn,kQ_{n,k} by the corresponding deterministic-array first-order term. All constants remain uniform under (89)–(91). This proves the theorem. ∎

Lemma A.17 (The first-level bootstrap array satisfies the deterministic conditions).

Define

𝒂i:=wi(𝑿i𝑿¯)𝑿¯,𝑿¯:=1nr=1nwr(𝑿r𝑿¯).\bm{a}_{i}:=w_{i}(\bm{X}_{i}-\bar{\bm{X}})-\bar{\bm{X}}^{*},\qquad\bar{\bm{X}}^{*}:=\frac{1}{n}\sum_{r=1}^{n}w_{r}(\bm{X}_{r}-\bar{\bm{X}}).

Then there exists an event Ωn\Omega_{n} such that

(Ωnc)Cn\mathbb{P}(\Omega_{n}^{c})\leq\frac{C}{n}

and, on Ωn\Omega_{n}, the deterministic array 𝐚1,,𝐚n\bm{a}_{1},\dots,\bm{a}_{n} satisfies (89)–(91) with

Ln:=Clog2(dn),rn:=Clog2(dn)log(dn)n.L_{n}:=C\log^{2}(dn),\qquad r_{n}:=C\log^{2}(dn)\sqrt{\frac{\log(dn)}{n}}.
Proof.

Let Ωn,X\Omega_{n,X} be the event from Lemma A.9, and let Ωn,w\Omega_{n,w} be the event from Lemma A.10. Define

Ωn,1:=Ωn,XΩn,w.\Omega_{n,1}:=\Omega_{n,X}\cap\Omega_{n,w}.

Then

(Ωn,1c)Cn2.\mathbb{P}(\Omega_{n,1}^{c})\leq\frac{C}{n^{2}}. (93)

On Ωn,1\Omega_{n,1},

max1inwi(𝑿i𝑿¯)(max1in|wi|)(max1in𝑿i𝑿¯)Clog2(dn).\max_{1\leq i\leq n}\|w_{i}(\bm{X}_{i}-\bar{\bm{X}})\|_{\infty}\leq\left(\max_{1\leq i\leq n}|w_{i}|\right)\left(\max_{1\leq i\leq n}\|\bm{X}_{i}-\bar{\bm{X}}\|_{\infty}\right)\leq C\log^{2}(dn). (94)

Set

𝑿i:=wi(𝑿i𝑿¯),𝑿¯:=1ni=1n𝑿i.\bm{X}_{i}^{*}:=w_{i}(\bm{X}_{i}-\bar{\bm{X}}),\qquad\bar{\bm{X}}^{*}:=\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}^{*}.

We first bound 𝑿¯\bar{\bm{X}}^{*}. Conditional on the original data, the vectors 𝑿i\bm{X}_{i}^{*} are independent and centered. On Ωn,1\Omega_{n,1}, every coordinate satisfies

1nXijψ1Clog(dn)n,\left\|\frac{1}{n}X_{ij}^{*}\right\|_{\psi_{1}}\leq\frac{C\log(dn)}{n},

because either wiw_{i} is bounded or wiw_{i} is Gaussian, hence sub-Gaussian, and (36) holds on Ωn,X\Omega_{n,X}. Apply Lemma D.10 of Koike (2026) conditionally with

Yi:=1n𝑿i,K=Clog(dn)n,α=1,a=2.Y_{i}:=\frac{1}{n}\bm{X}_{i}^{*},\qquad K=\frac{C\log(dn)}{n},\qquad\alpha=1,\qquad a=2.

Then, on Ωn,1\Omega_{n,1},

(𝑿¯>Clog(dn)log(dn)n|𝑿1,,𝑿n)1n2.\mathbb{P}\!\left(\|\bar{\bm{X}}^{*}\|_{\infty}>C\log(dn)\sqrt{\frac{\log(dn)}{n}}\ \middle|\ \bm{X}_{1},\dots,\bm{X}_{n}\right)\leq\frac{1}{n^{2}}. (95)

Next, write

𝚺^w:=1ni=1n𝑿i𝑿i=1ni=1nwi2(𝑿i𝑿¯)(𝑿i𝑿¯).\hat{\mathbf{\Sigma}}_{w}:=\frac{1}{n}\sum_{i=1}^{n}\bm{X}_{i}^{*}\bm{X}_{i}^{*\top}=\frac{1}{n}\sum_{i=1}^{n}w_{i}^{2}(\bm{X}_{i}-\bar{\bm{X}})(\bm{X}_{i}-\bar{\bm{X}})^{\top}.

Then

𝚺^w𝚺^X=1ni=1n(wi21)(𝑿i𝑿¯)(𝑿i𝑿¯).\hat{\mathbf{\Sigma}}_{w}-\hat{\mathbf{\Sigma}}_{X}=\frac{1}{n}\sum_{i=1}^{n}(w_{i}^{2}-1)(\bm{X}_{i}-\bar{\bm{X}})(\bm{X}_{i}-\bar{\bm{X}})^{\top}. (96)

Conditional on the original data, the summands in (96) are independent and centered. On Ωn,1\Omega_{n,1}, each entry of the matrix

1n(wi21)(𝑿i𝑿¯)(𝑿i𝑿¯)\frac{1}{n}(w_{i}^{2}-1)(\bm{X}_{i}-\bar{\bm{X}})(\bm{X}_{i}-\bar{\bm{X}})^{\top}

has conditional ψ1\psi_{1}-norm at most

Clog2(dn)n.\frac{C\log^{2}(dn)}{n}.

Apply Lemma D.10 of Koike (2026) conditionally with

Yi:=1n(wi21)vec((𝑿i𝑿¯)(𝑿i𝑿¯)),K=Clog2(dn)n,α=1,a=2.Y_{i}:=\frac{1}{n}(w_{i}^{2}-1)\mathrm{vec}\!\left((\bm{X}_{i}-\bar{\bm{X}})(\bm{X}_{i}-\bar{\bm{X}})^{\top}\right),\qquad K=\frac{C\log^{2}(dn)}{n},\qquad\alpha=1,\qquad a=2.

Then, on Ωn,1\Omega_{n,1},

(𝚺^w𝚺^Xmax>Clog2(dn)log(dn)n|𝑿1,,𝑿n)1n2.\mathbb{P}\!\left(\|\hat{\mathbf{\Sigma}}_{w}-\hat{\mathbf{\Sigma}}_{X}\|_{\max}>C\log^{2}(dn)\sqrt{\frac{\log(dn)}{n}}\ \middle|\ \bm{X}_{1},\dots,\bm{X}_{n}\right)\leq\frac{1}{n^{2}}. (97)

Define the conditional events

Ωn,2:={𝑿¯Clog(dn)log(dn)n},\Omega_{n,2}:=\left\{\|\bar{\bm{X}}^{*}\|_{\infty}\leq C\log(dn)\sqrt{\frac{\log(dn)}{n}}\right\},

and

Ωn,3:={𝚺^w𝚺^XmaxClog2(dn)log(dn)n}.\Omega_{n,3}:=\left\{\|\hat{\mathbf{\Sigma}}_{w}-\hat{\mathbf{\Sigma}}_{X}\|_{\max}\leq C\log^{2}(dn)\sqrt{\frac{\log(dn)}{n}}\right\}.

Finally, set

Ωn:=Ωn,1Ωn,2Ωn,3.\Omega_{n}:=\Omega_{n,1}\cap\Omega_{n,2}\cap\Omega_{n,3}.

Using (93), (95), and (97),

(Ωnc)\displaystyle\mathbb{P}(\Omega_{n}^{c}) (Ωn,1c)+𝔼[(Ωn,2cΩn,3c𝑿1,,𝑿n)𝟏Ωn,1]\displaystyle\leq\mathbb{P}(\Omega_{n,1}^{c})+\mathbb{E}\bigl[\mathbb{P}(\Omega_{n,2}^{c}\cup\Omega_{n,3}^{c}\mid\bm{X}_{1},\dots,\bm{X}_{n})\mathbf{1}_{\Omega_{n,1}}\bigr]
Cn2+Cn2Cn.\displaystyle\leq\frac{C}{n^{2}}+\frac{C}{n^{2}}\leq\frac{C}{n}.

Now work on Ωn\Omega_{n}. Recall

𝒂i=𝑿i𝑿¯.\bm{a}_{i}=\bm{X}_{i}^{*}-\bar{\bm{X}}^{*}.

By (94),

𝒂i𝑿i+𝑿¯Clog2(dn)=Ln.\|\bm{a}_{i}\|_{\infty}\leq\|\bm{X}_{i}^{*}\|_{\infty}+\|\bar{\bm{X}}^{*}\|_{\infty}\leq C\log^{2}(dn)=L_{n}.

Also,

1ni=1n𝒂i𝒂i=𝚺^w𝑿¯𝑿¯.\frac{1}{n}\sum_{i=1}^{n}\bm{a}_{i}\bm{a}_{i}^{\top}=\hat{\mathbf{\Sigma}}_{w}-\bar{\bm{X}}^{*}\bar{\bm{X}}^{*\top}.

Hence

1ni=1n𝒂i𝒂i𝚺max\displaystyle\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{a}_{i}\bm{a}_{i}^{\top}-\mathbf{\Sigma}\right\|_{\max} 𝚺^w𝚺^Xmax+𝚺^X𝚺max+𝑿¯𝑿¯max\displaystyle\leq\|\hat{\mathbf{\Sigma}}_{w}-\hat{\mathbf{\Sigma}}_{X}\|_{\max}+\|\hat{\mathbf{\Sigma}}_{X}-\mathbf{\Sigma}\|_{\max}+\|\bar{\bm{X}}^{*}\bar{\bm{X}}^{*\top}\|_{\max}
Clog2(dn)log(dn)n+Cb2log(dn)n+Clog2(dn)log(dn)n\displaystyle\leq C\log^{2}(dn)\sqrt{\frac{\log(dn)}{n}}+Cb^{2}\sqrt{\frac{\log(dn)}{n}}+C\log^{2}(dn)\frac{\log(dn)}{n}
Crn.\displaystyle\leq Cr_{n}.

Therefore, for every II with 1|I|k01\leq|I|\leq k_{0},

1ni=1n𝑷I𝒂i(𝑷I𝒂i)𝚺IImaxCrn,\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{P}_{I}\bm{a}_{i}(\bm{P}_{I}\bm{a}_{i})^{\top}-\mathbf{\Sigma}_{II}\right\|_{\max}\leq Cr_{n},

which proves (91) after enlarging the constant in rnr_{n}.

It remains to prove (90). Let

𝚺I(𝒂):=1ni=1n𝑷I𝒂i(𝑷I𝒂i).\mathbf{\Sigma}_{I}(\bm{a}):=\frac{1}{n}\sum_{i=1}^{n}\bm{P}_{I}\bm{a}_{i}(\bm{P}_{I}\bm{a}_{i})^{\top}.

Since λmin(𝚺II)σ2\lambda_{\min}(\mathbf{\Sigma}_{II})\geq\sigma_{*}^{2} by Lemma A.2,

λmin(𝚺I(𝒂))\displaystyle\lambda_{\min}(\mathbf{\Sigma}_{I}(\bm{a})) λmin(𝚺II)𝚺I(𝒂)𝚺IIop\displaystyle\geq\lambda_{\min}(\mathbf{\Sigma}_{II})-\|\mathbf{\Sigma}_{I}(\bm{a})-\mathbf{\Sigma}_{II}\|_{\mathrm{op}}
σ2|I|𝚺I(𝒂)𝚺IImax\displaystyle\geq\sigma_{*}^{2}-|I|\,\|\mathbf{\Sigma}_{I}(\bm{a})-\mathbf{\Sigma}_{II}\|_{\max}
σ2k0Crn\displaystyle\geq\sigma_{*}^{2}-k_{0}Cr_{n} (98)

by Weyl’s inequality (see, e.g., (Horn and Johnson, 2012, Corollary 4.3.2)). Since k0rn0k_{0}r_{n}\to 0, (98) implies

λmin(𝚺I(𝒂))12σ2\lambda_{\min}(\mathbf{\Sigma}_{I}(\bm{a}))\geq\frac{1}{2}\sigma_{*}^{2}

for all sufficiently large nn. This proves (90). Together with the bound for maxi𝒂i\max_{i}\|\bm{a}_{i}\|_{\infty}, the proof is complete. ∎

Proof of Theorem 2.2.

Let Ωn\Omega_{n} be the event from Lemma A.17. Since n1=O(εn2)n^{-1}=O(\varepsilon_{n}^{2}), it is enough to work on Ωn\Omega_{n}. On that event, the first-level bootstrap array satisfies the deterministic conditions of Theorem A.4. Because the second-level multipliers satisfy 𝔼v13=1\mathbb{E}v_{1}^{3}=1, the conditional version of Corollary 2.1 gives

supϵ<α<1ϵ|(Tn,[k]c^1α,k)α|Cεn2on Ωn.\sup_{\epsilon<\alpha<1-\epsilon}\left|\mathbb{P}^{*}\bigl(T_{n,[k]}^{*}\geq\hat{c}_{1-\alpha,k}^{**}\bigr)-\alpha\right|\leq C\varepsilon_{n}^{2}\qquad\text{on }\Omega_{n}. (99)

Set δn:=Cεn2\delta_{n}:=C\varepsilon_{n}^{2}, with CC chosen large enough that both (99) and the first-level second-order accuracy bound hold with the same constant.

Fix α(2ϵ,12ϵ)\alpha\in(2\epsilon,1-2\epsilon). On Ωn\Omega_{n}, (99) with nominal level αδn\alpha-\delta_{n} implies

(Tn,[k]>c^1α+δn,k)α.\mathbb{P}^{*}\bigl(T_{n,[k]}^{*}>\hat{c}_{1-\alpha+\delta_{n},k}^{**}\bigr)\leq\alpha.

Equivalently,

(F^n,k(Tn,[k])>1α+δn)α.\mathbb{P}^{*}\bigl(\hat{F}_{n,k}^{*}(T_{n,[k]}^{*})>1-\alpha+\delta_{n}\bigr)\leq\alpha.

By the definition of β^α,k\hat{\beta}_{\alpha,k},

β^α,k1α+δnon Ωn.\hat{\beta}_{\alpha,k}\leq 1-\alpha+\delta_{n}\qquad\text{on }\Omega_{n}.

Since pc^p,kp\mapsto\hat{c}_{p,k} is nondecreasing,

c^β^α,k,kc^1α+δn,kon Ωn.\hat{c}_{\hat{\beta}_{\alpha,k},k}\leq\hat{c}_{1-\alpha+\delta_{n},k}\qquad\text{on }\Omega_{n}.

Therefore,

(Tn,[k]c^β^α,k,k)\displaystyle\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{\hat{\beta}_{\alpha,k},k}\bigr) (Tn,[k]c^1α+δn,k,Ωn)\displaystyle\geq\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha+\delta_{n},k},\Omega_{n}\bigr)
(Tn,[k]c^1α+δn,k)(Ωnc)\displaystyle\geq\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha+\delta_{n},k}\bigr)-\mathbb{P}(\Omega_{n}^{c})
(αδn)Cεn2(Ωnc)\displaystyle\geq(\alpha-\delta_{n})-C\varepsilon_{n}^{2}-\mathbb{P}(\Omega_{n}^{c})
αCεn2.\displaystyle\geq\alpha-C\varepsilon_{n}^{2}. (100)

Similarly, applying (99) with level α+δn\alpha+\delta_{n} yields

(F^n,k(Tn,[k])1αδn)<1α,\mathbb{P}^{*}\bigl(\hat{F}_{n,k}^{*}(T_{n,[k]}^{*})\leq 1-\alpha-\delta_{n}\bigr)<1-\alpha,

which implies

β^α,k>1αδnon Ωn.\hat{\beta}_{\alpha,k}>1-\alpha-\delta_{n}\qquad\text{on }\Omega_{n}.

Hence

c^β^α,k,kc^1αδn,kon Ωn,\hat{c}_{\hat{\beta}_{\alpha,k},k}\geq\hat{c}_{1-\alpha-\delta_{n},k}\qquad\text{on }\Omega_{n},

and therefore

(Tn,[k]c^β^α,k,k)\displaystyle\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{\hat{\beta}_{\alpha,k},k}\bigr) (Tn,[k]c^1αδn,k)+(Ωnc)\displaystyle\leq\mathbb{P}\bigl(T_{n,[k]}\geq\hat{c}_{1-\alpha-\delta_{n},k}\bigr)+\mathbb{P}(\Omega_{n}^{c})
(α+δn)+Cεn2+(Ωnc)\displaystyle\leq(\alpha+\delta_{n})+C\varepsilon_{n}^{2}+\mathbb{P}(\Omega_{n}^{c})
α+Cεn2.\displaystyle\leq\alpha+C\varepsilon_{n}^{2}. (101)

Combining (100) and (101) proves (3). ∎

Appendix B Appendix B: Proofs for the stationary exponential-mixing alternative

This appendix proves Theorem 2.3. Throughout Appendix B we work under Assumptions 2.1, 2.2, 2.3, and 2.5, and we use the notation introduced in Section 2.3. Only the Gaussian aggregation part of Appendix A needs to be modified; the projected local Edgeworth expansion is unchanged except for the shift/strip estimates established below.

B.1. Correlation decay and Gaussian cluster tails

Lemma B.1.

Under Assumption 2.5, for every h1h\geq 1,

|ρ(h)|sin{2πα(h)}2πCαeaαh.|\rho(h)|\leq\sin\{2\pi\alpha(h)\}\leq 2\pi C_{\alpha}e^{-a_{\alpha}h}. (102)

Consequently,

h=1|ρ(h)|2πCαeaα1eaα<.\sum_{h=1}^{\infty}|\rho(h)|\leq\frac{2\pi C_{\alpha}e^{-a_{\alpha}}}{1-e^{-a_{\alpha}}}<\infty. (103)

Moreover, for every integer m2m\geq 2, every index set I[d]I\subset[d] with |I|=m|I|=m, and every t>0t>0,

(Zj>t,jI)Φ¯(m1+(m1)ϑtσ)σ1+(m1)ϑ2πmtexp{mt22σ2{1+(m1)ϑ}}.\mathbb{P}(Z_{j}>t,\ \forall j\in I)\leq\bar{\Phi}\!\left(\sqrt{\frac{m}{1+(m-1)\vartheta_{*}}}\,\frac{t}{\sigma}\right)\leq\frac{\sigma\sqrt{1+(m-1)\vartheta_{*}}}{\sqrt{2\pi m}\,t}\exp\!\left\{-\frac{mt^{2}}{2\sigma^{2}\{1+(m-1)\vartheta_{*}\}}\right\}. (104)

In particular, when m=2m=2,

(Z0>t,Zh>t)Φ¯(21+ϑtσ)σ1+ϑ4πtexp{t2σ2(1+ϑ)}.\mathbb{P}(Z_{0}>t,Z_{h}>t)\leq\bar{\Phi}\!\left(\sqrt{\frac{2}{1+\vartheta_{*}}}\,\frac{t}{\sigma}\right)\leq\frac{\sigma\sqrt{1+\vartheta_{*}}}{\sqrt{4\pi}\,t}\exp\!\left\{-\frac{t^{2}}{\sigma^{2}(1+\vartheta_{*})}\right\}. (105)
Proof.

For standard Gaussian variables U,VU,V with correlation rr, one has

(U>0,V>0)14=12πarcsin(r).\mathbb{P}(U>0,V>0)-\frac{1}{4}=\frac{1}{2\pi}\arcsin(r).

Therefore, with

0:=σ(Zj:j0),𝒢h:=σ(Zj:jh),\mathcal{F}_{0}:=\sigma(Z_{j}:j\leq 0),\qquad\mathcal{G}_{h}:=\sigma(Z_{j}:j\geq h),

we obtain

α(h)|(Z0>0,Zh>0)(Z0>0)(Zh>0)|=12π|arcsinρ(h)|.\alpha(h)\geq\left|\mathbb{P}(Z_{0}>0,Z_{h}>0)-\mathbb{P}(Z_{0}>0)\mathbb{P}(Z_{h}>0)\right|=\frac{1}{2\pi}|\arcsin\rho(h)|.

Hence

|ρ(h)|sin{2πα(h)}2πα(h)2πCαeaαh,|\rho(h)|\leq\sin\{2\pi\alpha(h)\}\leq 2\pi\alpha(h)\leq 2\pi C_{\alpha}e^{-a_{\alpha}h},

which proves (102). Summing the geometric series yields (103).

Now fix I={i1,,im}[d]I=\{i_{1},\dots,i_{m}\}\subset[d] with |I|=m|I|=m and write

Ur:=Zir/σ,1rm.U_{r}:=Z_{i_{r}}/\sigma,\qquad 1\leq r\leq m.

By (4), every off-diagonal correlation of (U1,,Um)(U_{1},\dots,U_{m})^{\top} is bounded above by ϑ\vartheta_{*}. Therefore

Var(r=1mUr)m+m(m1)ϑ=m{1+(m1)ϑ}.\mathrm{Var}\!\left(\sum_{r=1}^{m}U_{r}\right)\leq m+m(m-1)\vartheta_{*}=m\{1+(m-1)\vartheta_{*}\}.

Since

{Ur>u,rm}{r=1mUr>mu},\{U_{r}>u,\ \forall r\leq m\}\subset\left\{\sum_{r=1}^{m}U_{r}>mu\right\},

we obtain

(Ur>u,rm)Φ¯(m1+(m1)ϑu).\mathbb{P}(U_{r}>u,\ \forall r\leq m)\leq\bar{\Phi}\!\left(\sqrt{\frac{m}{1+(m-1)\vartheta_{*}}}\,u\right).

Applying Mills’ ratio proves (104), and (105) is the case m=2m=2. ∎

B.2. Bonferroni remainder for the kkth exceedance event

Lemma B.2.

For every integer k1k\geq 1, every integer mkm\geq k, and every nonnegative integer-valued random variable NN,

|𝟏{Nk}s=km(1)sk(s1k1)(Ns)|(mk1)(Nm+1).\left|\mathbf{1}\{N\geq k\}-\sum_{s=k}^{m}(-1)^{s-k}\binom{s-1}{k-1}\binom{N}{s}\right|\leq\binom{m}{k-1}\binom{N}{m+1}. (106)

Consequently,

|(Nk)s=km(1)sk(s1k1)𝔼(Ns)|(mk1)𝔼(Nm+1).\left|\mathbb{P}(N\geq k)-\sum_{s=k}^{m}(-1)^{s-k}\binom{s-1}{k-1}\mathbb{E}\binom{N}{s}\right|\leq\binom{m}{k-1}\mathbb{E}\binom{N}{m+1}. (107)
Proof.

The generalized Bonferroni inequalities for the event {Nk}\{N\geq k\} imply

s=km(1)sk(s1k1)(Ns)𝟏{Nk}s=km+1(1)sk(s1k1)(Ns)\sum_{s=k}^{m}(-1)^{s-k}\binom{s-1}{k-1}\binom{N}{s}\leq\mathbf{1}\{N\geq k\}\leq\sum_{s=k}^{m+1}(-1)^{s-k}\binom{s-1}{k-1}\binom{N}{s}

when mkm-k is even, and the inequalities are reversed when mkm-k is odd. In either case, the difference between the two adjacent truncations equals

(mk1)(Nm+1),\binom{m}{k-1}\binom{N}{m+1},

which proves (106). Taking expectations yields (107). ∎

B.3. Block construction and reduction to block exceedances

Let

sd:=dqd(md+d),0sd<md+d.s_{d}:=d-q_{d}(m_{d}+\ell_{d}),\qquad 0\leq s_{d}<m_{d}+\ell_{d}.

Define the main blocks and gaps by

Ir:={(r1)(md+d)+1,,(r1)(md+d)+md},r=1,,qd,I_{r}:=\{(r-1)(m_{d}+\ell_{d})+1,\dots,(r-1)(m_{d}+\ell_{d})+m_{d}\},\qquad r=1,\dots,q_{d},
Jr:={(r1)(md+d)+md+1,,r(md+d)},r=1,,qd,J_{r}:=\{(r-1)(m_{d}+\ell_{d})+m_{d}+1,\dots,r(m_{d}+\ell_{d})\},\qquad r=1,\dots,q_{d},

and define the remainder interval

Rd:={qd(md+d)+1,,d}R_{d}:=\{q_{d}(m_{d}+\ell_{d})+1,\dots,d\}

when sd1s_{d}\geq 1. For tt\in\mathbb{R}, set

Br(t):={maxjIrZj>t},Yr(t):=𝟏{Br(t)},Sd(t):=r=1qdYr(t),Nd(t):=j=1d𝟏{Zj>t}.B_{r}(t):=\left\{\max_{j\in I_{r}}Z_{j}>t\right\},\qquad Y_{r}(t):=\mathbf{1}\{B_{r}(t)\},\qquad S_{d}(t):=\sum_{r=1}^{q_{d}}Y_{r}(t),\qquad N_{d}(t):=\sum_{j=1}^{d}\mathbf{1}\{Z_{j}>t\}.

Also define

q(t):=(B1(t)),μd(t):=qdq(t).q(t):=\mathbb{P}(B_{1}(t)),\qquad\mu_{d}(t):=q_{d}q(t).
Lemma B.3.

For every tt\in\mathbb{R},

0mdp(t)q(t)(md2)Φ¯(21+ϑtσ).0\leq m_{d}p(t)-q(t)\leq\binom{m_{d}}{2}\bar{\Phi}\!\left(\sqrt{\frac{2}{1+\vartheta_{*}}}\,\frac{t}{\sigma}\right). (108)

Moreover,

{Nd(t)Sd(t)}\displaystyle\mathbb{P}\{N_{d}(t)\neq S_{d}(t)\} (qdd+sd)p(t)+qd(md2)Φ¯(21+ϑtσ),\displaystyle\leq(q_{d}\ell_{d}+s_{d})p(t)+q_{d}\binom{m_{d}}{2}\bar{\Phi}\!\left(\sqrt{\frac{2}{1+\vartheta_{*}}}\,\frac{t}{\sigma}\right), (109)
|μd(t)λ(t)|\displaystyle|\mu_{d}(t)-\lambda(t)| (qdd+md+d)p(t)+qd(md2)Φ¯(21+ϑtσ).\displaystyle\leq(q_{d}\ell_{d}+m_{d}+\ell_{d})p(t)+q_{d}\binom{m_{d}}{2}\bar{\Phi}\!\left(\sqrt{\frac{2}{1+\vartheta_{*}}}\,\frac{t}{\sigma}\right). (110)
Proof.

The first Bonferroni inequality gives

q(t)=(jI1{Zj>t})mdp(t),q(t)=\mathbb{P}\Bigl(\bigcup_{j\in I_{1}}\{Z_{j}>t\}\Bigr)\leq m_{d}p(t),

and the second Bonferroni inequality yields

q(t)mdp(t)1a<bmd(Za>t,Zb>t).q(t)\geq m_{d}p(t)-\sum_{1\leq a<b\leq m_{d}}\mathbb{P}(Z_{a}>t,Z_{b}>t).

Using (105) proves (108).

If Nd(t)Sd(t)N_{d}(t)\neq S_{d}(t), then either at least one exceedance occurs in a gap or in RdR_{d}, or some main block contains at least two exceedances. Therefore

{Nd(t)Sd(t)}r=1qd{maxjJrZj>t}+{maxjRdZj>t}+r=1qd{jIr𝟏{Zj>t}2}.\mathbb{P}\{N_{d}(t)\neq S_{d}(t)\}\leq\sum_{r=1}^{q_{d}}\mathbb{P}\left\{\max_{j\in J_{r}}Z_{j}>t\right\}+\mathbb{P}\left\{\max_{j\in R_{d}}Z_{j}>t\right\}+\sum_{r=1}^{q_{d}}\mathbb{P}\left\{\sum_{j\in I_{r}}\mathbf{1}\{Z_{j}>t\}\geq 2\right\}.

Now

{maxjJrZj>t}dp(t),{maxjRdZj>t}sdp(t),\mathbb{P}\left\{\max_{j\in J_{r}}Z_{j}>t\right\}\leq\ell_{d}p(t),\qquad\mathbb{P}\left\{\max_{j\in R_{d}}Z_{j}>t\right\}\leq s_{d}p(t),

and, by the union bound and (105),

{jIr𝟏{Zj>t}2}1a<bmd(Za>t,Zb>t)(md2)Φ¯(21+ϑtσ).\mathbb{P}\left\{\sum_{j\in I_{r}}\mathbf{1}\{Z_{j}>t\}\geq 2\right\}\leq\sum_{1\leq a<b\leq m_{d}}\mathbb{P}(Z_{a}>t,Z_{b}>t)\leq\binom{m_{d}}{2}\bar{\Phi}\!\left(\sqrt{\frac{2}{1+\vartheta_{*}}}\,\frac{t}{\sigma}\right).

This proves (109). Finally,

|μd(t)λ(t)||qdq(t)qdmdp(t)|+|qdmdd|p(t),|\mu_{d}(t)-\lambda(t)|\leq|q_{d}q(t)-q_{d}m_{d}p(t)|+|q_{d}m_{d}-d|\,p(t),

and

|qdmdd|qdd+md+d.|q_{d}m_{d}-d|\leq q_{d}\ell_{d}+m_{d}+\ell_{d}.

Combining these displays with (108) proves (110). ∎

Lemma B.4.

Let s{1,,k0+1}s\in\{1,\dots,k_{0}+1\}. Then, for every tt\in\mathbb{R},

|1r1<<rsqd(j=1sBrj(t))(qds)q(t)s|s2s1(qds)α(d).\left|\sum_{1\leq r_{1}<\cdots<r_{s}\leq q_{d}}\mathbb{P}\Bigl(\bigcap_{j=1}^{s}B_{r_{j}}(t)\Bigr)-\binom{q_{d}}{s}q(t)^{s}\right|\leq s2^{s-1}\binom{q_{d}}{s}\alpha(\ell_{d}). (111)

Consequently,

|𝔼(Sd(t)s)μd(t)ss!|Cs{qd1μd(t)s+dsα(d)},\left|\mathbb{E}\binom{S_{d}(t)}{s}-\frac{\mu_{d}(t)^{s}}{s!}\right|\leq C_{s}\left\{q_{d}^{-1}\mu_{d}(t)^{s}+d^{s}\alpha(\ell_{d})\right\}, (112)

where Cs=s2s1+s!C_{s}=s2^{s-1}+s! is deterministic.

Proof.

Fix 1r1<<rsqd1\leq r_{1}<\cdots<r_{s}\leq q_{d}. Put

Aj(t):=Brj(t)c={maxuIrjZut},j=1,,s.A_{j}(t):=B_{r_{j}}(t)^{c}=\left\{\max_{u\in I_{r_{j}}}Z_{u}\leq t\right\},\qquad j=1,\dots,s.

Since the selected main blocks are separated by at least d\ell_{d}, repeated application of Lemma 3.2.2 of Leadbetter, Lindgren, and Rootzén yields

|(jLAj(t))jL{Aj(t)}|(|L|1)α(d)\left|\mathbb{P}\Bigl(\bigcap_{j\in L}A_{j}(t)\Bigr)-\prod_{j\in L}\mathbb{P}\{A_{j}(t)\}\right|\leq(|L|-1)\alpha(\ell_{d}) (113)

for every nonempty L[s]L\subset[s]. The inclusion–exclusion identity gives

(j=1sBrj(t))=L[s](1)|L|(jLAj(t)),\mathbb{P}\Bigl(\bigcap_{j=1}^{s}B_{r_{j}}(t)\Bigr)=\sum_{L\subset[s]}(-1)^{|L|}\mathbb{P}\Bigl(\bigcap_{j\in L}A_{j}(t)\Bigr),

and the same identity with each probability replaced by the corresponding product equals q(t)sq(t)^{s}, since {Aj(t)}=1q(t)\mathbb{P}\{A_{j}(t)\}=1-q(t). Therefore

|(j=1sBrj(t))q(t)s|\displaystyle\left|\mathbb{P}\Bigl(\bigcap_{j=1}^{s}B_{r_{j}}(t)\Bigr)-q(t)^{s}\right|
L[s]|(jLAj(t))jL{Aj(t)}|\displaystyle\leq\sum_{L\subset[s]}\left|\mathbb{P}\Bigl(\bigcap_{j\in L}A_{j}(t)\Bigr)-\prod_{j\in L}\mathbb{P}\{A_{j}(t)\}\right|
m=2s(sm)(m1)α(d)s2s1α(d).\displaystyle\leq\sum_{m=2}^{s}\binom{s}{m}(m-1)\alpha(\ell_{d})\leq s2^{s-1}\alpha(\ell_{d}).

Summing over the (qds)\binom{q_{d}}{s} choices of (r1,,rs)(r_{1},\dots,r_{s}) gives (111).

Now

𝔼(Sd(t)s)=1r1<<rsqd(j=1sBrj(t)).\mathbb{E}\binom{S_{d}(t)}{s}=\sum_{1\leq r_{1}<\cdots<r_{s}\leq q_{d}}\mathbb{P}\Bigl(\bigcap_{j=1}^{s}B_{r_{j}}(t)\Bigr).

Therefore

𝔼(Sd(t)s)=(qds)q(t)s+Rd,s(t),|Rd,s(t)|s2s1(qds)α(d).\mathbb{E}\binom{S_{d}(t)}{s}=\binom{q_{d}}{s}q(t)^{s}+R_{d,s}(t),\qquad|R_{d,s}(t)|\leq s2^{s-1}\binom{q_{d}}{s}\alpha(\ell_{d}).

Also,

|(qds)qdss!|s!qds1,\left|\binom{q_{d}}{s}-\frac{q_{d}^{s}}{s!}\right|\leq s!\,q_{d}^{s-1},

so

|(qds)q(t)sμd(t)ss!|s!qd1μd(t)s.\left|\binom{q_{d}}{s}q(t)^{s}-\frac{\mu_{d}(t)^{s}}{s!}\right|\leq s!\,q_{d}^{-1}\mu_{d}(t)^{s}.

Finally,

(qds)α(d)qdsα(d)dsα(d).\binom{q_{d}}{s}\alpha(\ell_{d})\leq q_{d}^{s}\alpha(\ell_{d})\leq d^{s}\alpha(\ell_{d}).

Combining the last three displays proves (112). ∎

B.4. Direct Poisson approximation on the quantile window

Lemma B.5.

If λ(t)2Λk,ϵ\lambda(t)\leq 2\Lambda_{k,\epsilon}, then

t2σ22logd3loglogdCk,ϵ,\frac{t^{2}}{\sigma^{2}}\geq 2\log d-3\log\log d-C_{k,\epsilon}, (114)

and hence

Φ¯(21+ϑtσ)Ck,ϵ(logd)1/2d(1+β).\bar{\Phi}\!\left(\sqrt{\frac{2}{1+\vartheta_{*}}}\,\frac{t}{\sigma}\right)\leq C_{k,\epsilon}(\log d)^{-1/2}d^{-(1+\beta_{*})}. (115)

Consequently,

{Nd(t)Sd(t)}\displaystyle\mathbb{P}\{N_{d}(t)\neq S_{d}(t)\} Ck,ϵη1,d,\displaystyle\leq C_{k,\epsilon}\eta_{1,d}, (116)
|μd(t)λ(t)|\displaystyle|\mu_{d}(t)-\lambda(t)| Ck,ϵη1,d.\displaystyle\leq C_{k,\epsilon}\eta_{1,d}. (117)
Proof.

If λ(t)2Λk,ϵ\lambda(t)\leq 2\Lambda_{k,\epsilon}, then

dΦ¯(t/σ)2Λk,ϵ.d\,\bar{\Phi}(t/\sigma)\leq 2\Lambda_{k,\epsilon}.

Mills’ ratio implies

Φ¯(u)12π(1+u)eu2/2,u>0.\bar{\Phi}(u)\geq\frac{1}{\sqrt{2\pi}(1+u)}e^{-u^{2}/2},\qquad u>0.

Applying this with u=t/σu=t/\sigma yields

12π(1+t/σ)et2/(2σ2)2Λk,ϵd.\frac{1}{\sqrt{2\pi}(1+t/\sigma)}e^{-t^{2}/(2\sigma^{2})}\leq\frac{2\Lambda_{k,\epsilon}}{d}.

Taking logarithms and using log(1+t/σ)log(2+t2/σ2)log(2+2logd+Ck,ϵ)\log(1+t/\sigma)\leq\log(2+t^{2}/\sigma^{2})\leq\log(2+2\log d+C_{k,\epsilon}) yields (114). Substituting (114) into (105) proves (115).

Now use (109) and (110). Since

p(t)=λ(t)d2Λk,ϵd,p(t)=\frac{\lambda(t)}{d}\leq\frac{2\Lambda_{k,\epsilon}}{d},

we obtain

(qdd+sd)p(t)2Λk,ϵ{qddd+sdd}2Λk,ϵ{dmd+md+dd},(q_{d}\ell_{d}+s_{d})p(t)\leq 2\Lambda_{k,\epsilon}\left\{\frac{q_{d}\ell_{d}}{d}+\frac{s_{d}}{d}\right\}\leq 2\Lambda_{k,\epsilon}\left\{\frac{\ell_{d}}{m_{d}}+\frac{m_{d}+\ell_{d}}{d}\right\},

because qdd/mdq_{d}\leq d/m_{d} and sd<md+ds_{d}<m_{d}+\ell_{d}. Also,

qd(md2)Φ¯(21+ϑtσ)Ck,ϵdmdmd2(logd)1/2d(1+β)=Ck,ϵd3β/4(logd)1/2.q_{d}\binom{m_{d}}{2}\bar{\Phi}\!\left(\sqrt{\frac{2}{1+\vartheta_{*}}}\,\frac{t}{\sigma}\right)\leq C_{k,\epsilon}\frac{d}{m_{d}}m_{d}^{2}(\log d)^{-1/2}d^{-(1+\beta_{*})}=C_{k,\epsilon}d^{-3\beta_{*}/4}(\log d)^{-1/2}.

Combining these bounds with (109) and (110) proves (116) and (117). ∎

Lemma B.6.

For every tt such that λ(t)2Λk,ϵ\lambda(t)\leq 2\Lambda_{k,\epsilon},

|{Sd(t)k1}hk(μd(t))|Ck,ϵ{qd1+dk0+1α(d)+(3Λk,ϵ)k0+1(k0+1)!}.\left|\mathbb{P}\{S_{d}(t)\leq k-1\}-h_{k}\bigl(\mu_{d}(t)\bigr)\right|\leq C_{k,\epsilon}\left\{q_{d}^{-1}+d^{k_{0}+1}\alpha(\ell_{d})+\frac{(3\Lambda_{k,\epsilon})^{k_{0}+1}}{(k_{0}+1)!}\right\}. (118)

Consequently,

|Gk(t)hk(λ(t))|Ck,ϵrd.\left|G_{k}(t)-h_{k}\bigl(\lambda(t)\bigr)\right|\leq C_{k,\epsilon}r_{d}. (119)
Proof.

Set

VS,s(t):=𝔼(Sd(t)s).V_{S,s}(t):=\mathbb{E}\binom{S_{d}(t)}{s}.

By Lemma B.2 with N=Sd(t)N=S_{d}(t) and m=k0m=k_{0},

|{Sd(t)k}s=kk0(1)sk(s1k1)VS,s(t)|\displaystyle\left|\mathbb{P}\{S_{d}(t)\geq k\}-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}V_{S,s}(t)\right|
(k0k1)VS,k0+1(t).\displaystyle\leq\binom{k_{0}}{k-1}V_{S,k_{0}+1}(t). (120)

By Lemma B.4, for each s{k,,k0+1}s\in\{k,\dots,k_{0}+1\},

|VS,s(t)μd(t)ss!|Ck,ϵ{qd1+dk0+1α(d)},\left|V_{S,s}(t)-\frac{\mu_{d}(t)^{s}}{s!}\right|\leq C_{k,\epsilon}\left\{q_{d}^{-1}+d^{k_{0}+1}\alpha(\ell_{d})\right\},

because μd(t)λ(t)+Ck,ϵη1,d3Λk,ϵ\mu_{d}(t)\leq\lambda(t)+C_{k,\epsilon}\eta_{1,d}\leq 3\Lambda_{k,\epsilon} for all sufficiently large dd by (117). Therefore

|s=kk0(1)sk(s1k1)VS,s(t)s=kk0(1)sk(s1k1)μd(t)ss!|\displaystyle\left|\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}V_{S,s}(t)-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}\frac{\mu_{d}(t)^{s}}{s!}\right|
Ck,ϵ{qd1+dk0+1α(d)}.\displaystyle\leq C_{k,\epsilon}\left\{q_{d}^{-1}+d^{k_{0}+1}\alpha(\ell_{d})\right\}.

Applying Lemma B.2 to a Poisson random variable Πμd(t)Poi(μd(t))\Pi_{\mu_{d}(t)}\sim\mathrm{Poi}(\mu_{d}(t)) gives

|s=kk0(1)sk(s1k1)μd(t)ss!{Πμd(t)k}|(k0k1)μd(t)k0+1(k0+1)!Ck,ϵ(3Λk,ϵ)k0+1(k0+1)!.\left|\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}\frac{\mu_{d}(t)^{s}}{s!}-\mathbb{P}\{\Pi_{\mu_{d}(t)}\geq k\}\right|\leq\binom{k_{0}}{k-1}\frac{\mu_{d}(t)^{k_{0}+1}}{(k_{0}+1)!}\leq C_{k,\epsilon}\frac{(3\Lambda_{k,\epsilon})^{k_{0}+1}}{(k_{0}+1)!}.

Combining the last three displays with (120) proves (118).

Finally,

|Gk(t){Sd(t)k1}|=|{Nd(t)k1}{Sd(t)k1}|{Nd(t)Sd(t)}Ck,ϵη1,d\left|G_{k}(t)-\mathbb{P}\{S_{d}(t)\leq k-1\}\right|=\left|\mathbb{P}\{N_{d}(t)\leq k-1\}-\mathbb{P}\{S_{d}(t)\leq k-1\}\right|\leq\mathbb{P}\{N_{d}(t)\neq S_{d}(t)\}\leq C_{k,\epsilon}\eta_{1,d}

by (116), and

|hk(μd(t))hk(λ(t))|sup0u3Λk,ϵ|hk(u)||μd(t)λ(t)|Ck,ϵη1,d\left|h_{k}\bigl(\mu_{d}(t)\bigr)-h_{k}\bigl(\lambda(t)\bigr)\right|\leq\sup_{0\leq u\leq 3\Lambda_{k,\epsilon}}|h_{k}^{\prime}(u)|\,|\mu_{d}(t)-\lambda(t)|\leq C_{k,\epsilon}\eta_{1,d}

by (117). Combining these bounds with (118) proves (119). ∎

B.5. Threshold scale, shift/strip bounds, and weighted Gaussian bounds

Lemma B.7.

There exist constants 0<c1<C1<0<c_{1}<C_{1}<\infty and an integer d0d_{0} such that

c1logdt2C1logdfor every t𝒯k,ϵ and every dd0.c_{1}\log d\leq t^{2}\leq C_{1}\log d\qquad\text{for every }t\in\mathcal{T}_{k,\epsilon}\text{ and every }d\geq d_{0}. (121)
Proof.

Choose 0<λ<λ+<0<\lambda_{-}<\lambda_{+}<\infty such that

hk(λ)=1ϵ/4,hk(λ+)=ϵ/4.h_{k}(\lambda_{-})=1-\epsilon/4,\qquad h_{k}(\lambda_{+})=\epsilon/4.

Since rd0r_{d}\to 0, there exists d0d_{0} such that

Ck,ϵrdϵ/4for every dd0.C_{k,\epsilon}r_{d}\leq\epsilon/4\qquad\text{for every }d\geq d_{0}.

If t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon} and λ(t)λ\lambda(t)\leq\lambda_{-}, then (119) gives

Gk(t)hk(λ(t))Ck,ϵrd1ϵ/4ϵ/4=1ϵ/2,G_{k}(t)\geq h_{k}(\lambda(t))-C_{k,\epsilon}r_{d}\geq 1-\epsilon/4-\epsilon/4=1-\epsilon/2,

which contradicts the definition of 𝒯k,ϵ\mathcal{T}_{k,\epsilon}. Similarly, if λ(t)λ+\lambda(t)\geq\lambda_{+}, then

Gk(t)hk(λ(t))+Ck,ϵrdϵ/4+ϵ/4=ϵ/2,G_{k}(t)\leq h_{k}(\lambda(t))+C_{k,\epsilon}r_{d}\leq\epsilon/4+\epsilon/4=\epsilon/2,

again contradicting the definition of 𝒯k,ϵ\mathcal{T}_{k,\epsilon}. Therefore

λλ(t)λ+(t𝒯k,ϵ,dd0).\lambda_{-}\leq\lambda(t)\leq\lambda_{+}\qquad(t\in\mathcal{T}_{k,\epsilon},\ d\geq d_{0}).

Since λ(t)=dΦ¯(t/σ)\lambda(t)=d\bar{\Phi}(t/\sigma) and σ[σ¯,σ¯]\sigma\in[\underline{\sigma},\overline{\sigma}], Mills’ ratio yields constants c1,C1c_{1},C_{1} depending only on (k,ϵ,σ¯,σ¯)(k,\epsilon,\underline{\sigma},\overline{\sigma}) such that (121) holds. ∎

Lemma B.8 (Shift and strip bounds).

Under Assumption 2.5, the conclusions of Lemmas A.12 and A.13 remain valid. More precisely, there exist constants c0,C0>0c_{0},C_{0}>0 such that for every nonempty I[d]I\subset[d] with |I|k0+1|I|\leq k_{0}+1, every t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon}, and every 0ac0/t0\leq a\leq c_{0}/t,

πI(ta)C0πI(t),\pi_{I}(t-a)\leq C_{0}\pi_{I}(t), (122)

and

πI(ta)πI(t)C0a(1+t)πI(t).\pi_{I}(t-a)-\pi_{I}(t)\leq C_{0}a(1+t)\pi_{I}(t). (123)
Proof.

Fix a nonempty I[d]I\subset[d] with |I|k0+1|I|\leq k_{0}+1. Since 𝚺II\mathbf{\Sigma}_{II} is a principal submatrix of 𝚺\mathbf{\Sigma},

λmin(𝚺II)λmin(𝚺)σ2.\lambda_{\min}(\mathbf{\Sigma}_{II})\geq\lambda_{\min}(\mathbf{\Sigma})\geq\sigma_{*}^{2}.

On the other hand, by stationarity and (103),

λmax(𝚺II)σ¯2(1+2h=1|ρ(h)|)σ¯2(1+4πCαeaα1eaα)=:CΣ.\lambda_{\max}(\mathbf{\Sigma}_{II})\leq\overline{\sigma}^{2}\left(1+2\sum_{h=1}^{\infty}|\rho(h)|\right)\leq\overline{\sigma}^{2}\left(1+\frac{4\pi C_{\alpha}e^{-a_{\alpha}}}{1-e^{-a_{\alpha}}}\right)=:C_{\Sigma}.

Hence every principal covariance matrix of dimension at most k0+1k_{0}+1 is uniformly well conditioned and has operator norm bounded by CΣC_{\Sigma}. Repeating the proof of Lemmas A.12 and A.13 with these two spectral bounds gives (122) and (123). ∎

Lemma B.9.

There exists a constant Ck,ϵ>0C_{k,\epsilon}>0 such that, for every dd0d\geq d_{0} and every t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon},

s=kk0(s1k1)MZ,s(t)Ck,ϵ,\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}M_{Z,s}(t)\leq C_{k,\epsilon}, (124)

and

(k0k1)MZ,k0+1(t)Ck,ϵrd.\binom{k_{0}}{k-1}M_{Z,k_{0}+1}(t)\leq C_{k,\epsilon}r_{d}. (125)
Proof.

For s{1,,k0+1}s\in\{1,\dots,k_{0}+1\}, decompose MZ,s(t)M_{Z,s}(t) according to the block partition used in Lemmas B.3 and B.4. The contribution of configurations that use only main blocks and place at most one exceedance in each selected block is 𝔼(Sd(t)s)\mathbb{E}\binom{S_{d}(t)}{s}. Every remaining configuration necessarily contains either an exceedance in a gap or in the remainder interval, or at least two exceedances inside one main block. Therefore the same counting argument used in the proof of Lemma B.3, together with the cluster bound (104), yields

|MZ,s(t)𝔼(Sd(t)s)|Cs,k,ϵη1,d.\left|M_{Z,s}(t)-\mathbb{E}\binom{S_{d}(t)}{s}\right|\leq C_{s,k,\epsilon}\eta_{1,d}. (126)

Combining (126) with (112) gives

|MZ,s(t)λ(t)ss!|Cs,k,ϵ{η1,d+qd1+dsα(d)},1sk0+1.\left|M_{Z,s}(t)-\frac{\lambda(t)^{s}}{s!}\right|\leq C_{s,k,\epsilon}\left\{\eta_{1,d}+q_{d}^{-1}+d^{s}\alpha(\ell_{d})\right\},\qquad 1\leq s\leq k_{0}+1. (127)

Since t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon} implies λ(t)[λ,λ+]\lambda(t)\in[\lambda_{-},\lambda_{+}] by Lemma B.7, summing (127) over s=k,,k0s=k,\dots,k_{0} yields

s=kk0(s1k1)MZ,s(t)\displaystyle\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}M_{Z,s}(t) s=kk0(s1k1)λ+ss!+s=kk0(s1k1)Cs,k,ϵ{η1,d+qd1+dsα(d)}.\displaystyle\leq\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}\frac{\lambda_{+}^{s}}{s!}+\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}C_{s,k,\epsilon}\left\{\eta_{1,d}+q_{d}^{-1}+d^{s}\alpha(\ell_{d})\right\}.

The first sum is bounded by a constant depending only on (k,ϵ)(k,\epsilon) because it is dominated by the convergent series

s=k(s1k1)λ+ss!.\sum_{s=k}^{\infty}\binom{s-1}{k-1}\frac{\lambda_{+}^{s}}{s!}.

The second sum is also bounded because k0k_{0} is finite for every nn, η1,d1\eta_{1,d}\leq 1 for large dd, qd11q_{d}^{-1}\leq 1, and (10) implies

s=kk0(s1k1)dsα(d)Ck,ϵdk0α(d)Ck,ϵd7k016n8.\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}d^{s}\alpha(\ell_{d})\leq C_{k,\epsilon}d^{k_{0}}\alpha(\ell_{d})\leq C_{k,\epsilon}d^{-7k_{0}-16}n^{-8}.

This proves (124).

For s=k0+1s=k_{0}+1, (127) yields

MZ,k0+1(t)λ+k0+1(k0+1)!+Ck,ϵ{η1,d+qd1+dk0+1α(d)}.M_{Z,k_{0}+1}(t)\leq\frac{\lambda_{+}^{k_{0}+1}}{(k_{0}+1)!}+C_{k,\epsilon}\left\{\eta_{1,d}+q_{d}^{-1}+d^{k_{0}+1}\alpha(\ell_{d})\right\}.

Multiplying by (k0k1)\binom{k_{0}}{k-1} and enlarging the constant proves (125). ∎

B.6. Regularity of GkG_{k}

Lemma B.10.

There exist constants mk,ϵ>0m_{k,\epsilon}>0, Bk,ϵ>0B_{k,\epsilon}>0, and an integer d1d0d_{1}\geq d_{0} such that

fk(t)=Gk(t)mk,ϵfor every t𝒯k,ϵ and every dd1,f_{k}(t)=G_{k}^{\prime}(t)\geq m_{k,\epsilon}\qquad\text{for every }t\in\mathcal{T}_{k,\epsilon}\text{ and every }d\geq d_{1}, (128)

and

|(Gk1)′′(p)|Bk,ϵfor every p[ϵ/2,1ϵ/2] and every dd1.\left|(G_{k}^{-1})^{\prime\prime}(p)\right|\leq B_{k,\epsilon}\qquad\text{for every }p\in[\epsilon/2,1-\epsilon/2]\text{ and every }d\geq d_{1}. (129)
Proof.

Set

Hk(t):=hk(λ(t)).H_{k}(t):=h_{k}(\lambda(t)).

By Lemma B.7, there exist constants cλ,Cλ>0c_{\lambda},C_{\lambda}>0 such that

cλt|λ(t)|Cλt,|λ′′(t)|Cλ(1+t2),t𝒯k,ϵ,dd0.c_{\lambda}t\leq|\lambda^{\prime}(t)|\leq C_{\lambda}t,\qquad|\lambda^{\prime\prime}(t)|\leq C_{\lambda}(1+t^{2}),\qquad t\in\mathcal{T}_{k,\epsilon},\ d\geq d_{0}. (130)

Since λ(t)[λ,λ+]\lambda(t)\in[\lambda_{-},\lambda_{+}] on 𝒯k,ϵ\mathcal{T}_{k,\epsilon}, the derivatives of hkh_{k} are bounded on this compact interval. Hence there exist constants cH,CH>0c_{H},C_{H}>0 such that

|Hk(t)|cHt,|Hk′′(t)|CH(1+t2),t𝒯k,ϵ,dd0.|H_{k}^{\prime}(t)|\geq c_{H}t,\qquad|H_{k}^{\prime\prime}(t)|\leq C_{H}(1+t^{2}),\qquad t\in\mathcal{T}_{k,\epsilon},\ d\geq d_{0}. (131)

Define

δd:=rd1/4.\delta_{d}:=r_{d}^{1/4}.

Since rd0r_{d}\to 0, there exists d1d0d_{1}\geq d_{0} such that

δd(1+C1logd)cH4for every dd1,\delta_{d}(1+C_{1}\log d)\leq\frac{c_{H}}{4}\qquad\text{for every }d\geq d_{1}, (132)

where C1C_{1} is the constant from Lemma B.7. For t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon} and dd1d\geq d_{1}, Taylor’s theorem gives

|Hk(t+δd)Hk(tδd)2δdHk(t)|\displaystyle\left|\frac{H_{k}(t+\delta_{d})-H_{k}(t-\delta_{d})}{2\delta_{d}}-H_{k}^{\prime}(t)\right| CHδd(1+t2),\displaystyle\leq C_{H}\delta_{d}(1+t^{2}), (133)
|Hk(t+δd)2Hk(t)+Hk(tδd)δd2Hk′′(t)|\displaystyle\left|\frac{H_{k}(t+\delta_{d})-2H_{k}(t)+H_{k}(t-\delta_{d})}{\delta_{d}^{2}}-H_{k}^{\prime\prime}(t)\right| CHδd(1+t2).\displaystyle\leq C_{H}\delta_{d}(1+t^{2}). (134)

By (119),

|Gk(t+δd)Gk(tδd)2δdHk(t+δd)Hk(tδd)2δd|\displaystyle\left|\frac{G_{k}(t+\delta_{d})-G_{k}(t-\delta_{d})}{2\delta_{d}}-\frac{H_{k}(t+\delta_{d})-H_{k}(t-\delta_{d})}{2\delta_{d}}\right| Ck,ϵrdδd=Ck,ϵrd3/4,\displaystyle\leq C_{k,\epsilon}\frac{r_{d}}{\delta_{d}}=C_{k,\epsilon}r_{d}^{3/4}, (135)
|Gk(t+δd)2Gk(t)+Gk(tδd)δd2Hk(t+δd)2Hk(t)+Hk(tδd)δd2|\displaystyle\left|\frac{G_{k}(t+\delta_{d})-2G_{k}(t)+G_{k}(t-\delta_{d})}{\delta_{d}^{2}}-\frac{H_{k}(t+\delta_{d})-2H_{k}(t)+H_{k}(t-\delta_{d})}{\delta_{d}^{2}}\right| Ck,ϵrdδd2=Ck,ϵrd1/2.\displaystyle\leq C_{k,\epsilon}\frac{r_{d}}{\delta_{d}^{2}}=C_{k,\epsilon}r_{d}^{1/2}. (136)

Combining (131)–(135), Lemma B.7, and (132) shows that

Gk(t)cH2t(t𝒯k,ϵ,dd1).G_{k}^{\prime}(t)\geq\frac{c_{H}}{2}t\qquad(t\in\mathcal{T}_{k,\epsilon},\ d\geq d_{1}).

Since 𝒯k,ϵ\mathcal{T}_{k,\epsilon} is separated away from 0 by Lemma B.7, this proves (128).

Likewise, (134) and (136) imply

|Gk′′(t)|Ck,ϵ(1+t2)(t𝒯k,ϵ,dd1).|G_{k}^{\prime\prime}(t)|\leq C_{k,\epsilon}(1+t^{2})\qquad(t\in\mathcal{T}_{k,\epsilon},\ d\geq d_{1}).

Finally,

(Gk1)′′(p)=Gk′′(Gk1(p))Gk(Gk1(p))3,p[ϵ/2,1ϵ/2],(G_{k}^{-1})^{\prime\prime}(p)=-\frac{G_{k}^{\prime\prime}(G_{k}^{-1}(p))}{G_{k}^{\prime}(G_{k}^{-1}(p))^{3}},\qquad p\in[\epsilon/2,1-\epsilon/2],

and (128) together with the bound on Gk′′G_{k}^{\prime\prime} proves (129). ∎

B.7. Completion of the proof of Theorem 2.3

The local projected Edgeworth expansion in Proposition A.1 and its bootstrap version depend on the Gaussian law only through the spectral bounds for principal submatrices and the shift/strip inequalities. By Lemma B.8, the proof of Proposition A.1 remains valid under Assumption 2.5; moreover, with the same argument one may enlarge the range from |I|k0|I|\leq k_{0} to |I|k0+1|I|\leq k_{0}+1. Thus, for every nonempty I[d]I\subset[d] with |I|k0+1|I|\leq k_{0}+1,

supt𝒯k,ϵ|(𝑺n,I(t,)|I|)(t,)|I|pn,I(𝒖)𝑑𝒖|Cεn2πI(t),\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}\bigl(\bm{S}_{n,I}\in(t,\infty)^{|I|}\bigr)-\int_{(t,\infty)^{|I|}}p_{n,I}(\bm{u})\,d\bm{u}\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t), (137)

and, with probability at least 1C/n1-C/n,

supt𝒯k,ϵ|(𝑺n,I(t,)|I|)(t,)|I|p^n,γ,I(𝒖)𝑑𝒖|Cεn2πI(t)\sup_{t\in\mathcal{T}_{k,\epsilon}}\left|\mathbb{P}^{*}\bigl(\bm{S}_{n,I}^{*}\in(t,\infty)^{|I|}\bigr)-\int_{(t,\infty)^{|I|}}\hat{p}_{n,\gamma,I}(\bm{u})\,d\bm{u}\right|\leq C\varepsilon_{n}^{2}\pi_{I}(t) (138)

holds simultaneously for all such II.

Summing (137) and (138) over |I|=s|I|=s gives, for every s{k,,k0+1}s\in\{k,\dots,k_{0}+1\},

|Vn,s(t)Mn,s(t)|\displaystyle|V_{n,s}(t)-M_{n,s}(t)| Cεn2MZ,s(t),\displaystyle\leq C\varepsilon_{n}^{2}M_{Z,s}(t), (139)
|Vn,s(t)M^n,s,γ(t)|\displaystyle|V_{n,s}^{*}(t)-\hat{M}_{n,s,\gamma}(t)| Cεn2MZ,s(t)\displaystyle\leq C\varepsilon_{n}^{2}M_{Z,s}(t) (140)

uniformly over t𝒯k,ϵt\in\mathcal{T}_{k,\epsilon}, with the bootstrap bound holding on an event of probability at least 1C/n1-C/n.

To prove (11), apply Lemma B.2 with N=Nn(t)N=N_{n}(t) and m=k0m=k_{0} to obtain

|(Tn,[k]>t)s=kk0(1)sk(s1k1)Vn,s(t)|(k0k1)Vn,k0+1(t).\left|\mathbb{P}(T_{n,[k]}>t)-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}V_{n,s}(t)\right|\leq\binom{k_{0}}{k-1}V_{n,k_{0}+1}(t). (141)

Using (139) with s=k0+1s=k_{0}+1 and the bound

|Mn,k0+1(t)MZ,k0+1(t)|CεnMZ,k0+1(t)|M_{n,k_{0}+1}(t)-M_{Z,k_{0}+1}(t)|\leq C\varepsilon_{n}M_{Z,k_{0}+1}(t)

from the proof of Theorem A.2, we obtain

Vn,k0+1(t)(1+Cεn+Cεn2)MZ,k0+1(t)2MZ,k0+1(t)V_{n,k_{0}+1}(t)\leq(1+C\varepsilon_{n}+C\varepsilon_{n}^{2})M_{Z,k_{0}+1}(t)\leq 2M_{Z,k_{0}+1}(t)

for all sufficiently large nn. Hence (125) yields

(k0k1)Vn,k0+1(t)Ck,ϵrd.\binom{k_{0}}{k-1}V_{n,k_{0}+1}(t)\leq C_{k,\epsilon}r_{d}. (142)

Also, (139) and (124) imply

s=kk0(s1k1)|Vn,s(t)Mn,s(t)|Cεn2.\sum_{s=k}^{k_{0}}\binom{s-1}{k-1}|V_{n,s}(t)-M_{n,s}(t)|\leq C\varepsilon_{n}^{2}.

Applying Lemma B.2 with N=NZ(t)N=N_{Z}(t) gives

|(T𝒁,[k]>t)s=kk0(1)sk(s1k1)MZ,s(t)|(k0k1)MZ,k0+1(t)Ck,ϵrd.\left|\mathbb{P}(T_{\bm{Z},[k]}>t)-\sum_{s=k}^{k_{0}}(-1)^{s-k}\binom{s-1}{k-1}M_{Z,s}(t)\right|\leq\binom{k_{0}}{k-1}M_{Z,k_{0}+1}(t)\leq C_{k,\epsilon}r_{d}.

Combining the last three displays with the definition of Qn,k(t)Q_{n,k}(t) proves (11). The bootstrap expansion (12) follows in the same way from (140), and the probability of the exceptional event remains bounded by C/nC/n.

The derivative bounds in the proof of Theorem A.2 use only the derivative estimates for the projected Gaussian densities and the uniform weighted bound (124). Since both inputs are available here, the same argument yields

supt𝒯k,ϵ(|Qn,k(t)|+|Qn,k(t)|+|Qn,k′′(t)|)Cεn,\sup_{t\in\mathcal{T}_{k,\epsilon}}\Bigl(|Q_{n,k}(t)|+|Q_{n,k}^{\prime}(t)|+|Q_{n,k}^{\prime\prime}(t)|\Bigr)\leq C\varepsilon_{n}, (143)

and, with probability at least 1C/n1-C/n,

supt𝒯k,ϵ(|Q^n,γ,k(t)|+|Q^n,γ,k(t)|+|Q^n,γ,k′′(t)|)Cεn,\sup_{t\in\mathcal{T}_{k,\epsilon}}\Bigl(|\hat{Q}_{n,\gamma,k}(t)|+|\hat{Q}_{n,\gamma,k}^{\prime}(t)|+|\hat{Q}_{n,\gamma,k}^{\prime\prime}(t)|\Bigr)\leq C\varepsilon_{n}, (144)

exactly as in Theorem A.2.

Next, let

F^n,k(t)=Gk(t)+Q^n,γ,k(t)+r^n(t),supt𝒯k,ϵ|r^n(t)|C(εn2+rd),\hat{F}_{n,k}(t)=G_{k}(t)+\hat{Q}_{n,\gamma,k}(t)+\hat{r}_{n}(t),\qquad\sup_{t\in\mathcal{T}_{k,\epsilon}}|\hat{r}_{n}(t)|\leq C(\varepsilon_{n}^{2}+r_{d}),

which follows from (12). Since Gk(t)mk,ϵ>0G_{k}^{\prime}(t)\geq m_{k,\epsilon}>0 on 𝒯k,ϵ\mathcal{T}_{k,\epsilon} by Lemma B.10, the same implicit-function argument as in the proof of Theorem A.3 yields a unique solution

c^1α,k=c1α,kG+Δn,k(α),|Δn,k(α)|C(εn+rd).\hat{c}_{1-\alpha,k}=c^{G}_{1-\alpha,k}+\Delta_{n,k}(\alpha),\qquad|\Delta_{n,k}(\alpha)|\leq C(\varepsilon_{n}+r_{d}).

Substituting t=c1α,kG+Δn,k(α)t=c^{G}_{1-\alpha,k}+\Delta_{n,k}(\alpha) into the identity F^n,k(t)=1α\hat{F}_{n,k}(t)=1-\alpha and expanding as in (83) gives

|Δn,k(α)+Q^n,γ,k(c1α,kG)fk(c1α,kG)Rn,k(α)|C(εn3+rd),\left|\Delta_{n,k}(\alpha)+\frac{\hat{Q}_{n,\gamma,k}(c^{G}_{1-\alpha,k})}{f_{k}(c^{G}_{1-\alpha,k})}-R_{n,k}(\alpha)\right|\leq C(\varepsilon_{n}^{3}+r_{d}),

uniformly in α(ϵ,1ϵ)\alpha\in(\epsilon,1-\epsilon), which proves (13).

For the coverage expansion, write

Fn,k(t)=Gk(t)+Qn,k(t)+rn(t),supt𝒯k,ϵ|rn(t)|C(εn2+rd),F_{n,k}(t)=G_{k}(t)+Q_{n,k}(t)+r_{n}(t),\qquad\sup_{t\in\mathcal{T}_{k,\epsilon}}|r_{n}(t)|\leq C(\varepsilon_{n}^{2}+r_{d}),

which follows from (11). Insert (13) into the Taylor formula

Fn,k(c^1α,k)=Fn,k(c1α,kG)+Fn,k(c1α,kG)Δn,k(α)+12Fn,k′′(ξn,k,α)Δn,k(α)2.F_{n,k}(\hat{c}_{1-\alpha,k})=F_{n,k}(c^{G}_{1-\alpha,k})+F_{n,k}^{\prime}(c^{G}_{1-\alpha,k})\Delta_{n,k}(\alpha)+\frac{1}{2}F_{n,k}^{\prime\prime}(\xi_{n,k,\alpha})\Delta_{n,k}(\alpha)^{2}.

Using (143), Lemma B.10, and Lemma A.16, the same algebra as in the proof of Theorem 2.1 yields

|(Tn,[k]c^1α,k)[(1α)+(1γ)Qn,k(c1α,kG)+𝔼{Rn,k(α)}]|C(εn2+rd).\left|\mathbb{P}(T_{n,[k]}\leq\hat{c}_{1-\alpha,k})-\left[(1-\alpha)+(1-\gamma)Q_{n,k}(c^{G}_{1-\alpha,k})+\mathbb{E}\{R_{n,k}(\alpha)\}\right]\right|\leq C(\varepsilon_{n}^{2}+r_{d}).

Taking complements proves (14).

If γ=1\gamma=1, then the linear term disappears and

|Rn,k(α)|C(|Q^n,γ,k(c1α,kG)|2+|Q^n,γ,k(c1α,kG)||Q^n,γ,k(c1α,kG)|)Cεn2|R_{n,k}(\alpha)|\leq C\left(|\hat{Q}_{n,\gamma,k}(c^{G}_{1-\alpha,k})|^{2}+|\hat{Q}_{n,\gamma,k}^{\prime}(c^{G}_{1-\alpha,k})|\,|\hat{Q}_{n,\gamma,k}(c^{G}_{1-\alpha,k})|\right)\leq C\varepsilon_{n}^{2}

by (144). Therefore (15) follows from (14).

Finally, the deterministic-array conditional theorem in Section A.8 is proved from the conditional versions of Theorems A.2, A.3, and 2.1. Repeating that argument with (12), (13), and (14) gives the same deterministic-array statement with C(εn2+rd)C(\varepsilon_{n}^{2}+r_{d}) in place of Cεn2C\varepsilon_{n}^{2}. Inserting that conditional bound into the proof of Theorem 2.2 yields (16). This completes the proof of Theorem 2.3.

References

  • S. M. Berman (1964) Limit theorems for the maximum term in stationary sequences. The Annals of Mathematical Statistics 35 (2), pp. 502–516. Cited by: §A.3.
  • J. Chang, X. Chen, and M. Wu (2024) Central limit theorems for high dimensional dependent data. Bernoulli 30 (1), pp. 712–742. External Links: Document Cited by: §4.
  • J. Chang, Q. Jiang, T. S. McElroy, and X. Shao (2025) Statistical inference for high-dimensional spectral density matrix. Journal of the American Statistical Association 120 (551), pp. 1960–1974. External Links: Document Cited by: §4.
  • J. Chang, Q. Jiang, and X. Shao (2023) Testing the martingale difference hypothesis in high dimension. Journal of Econometrics 235 (2), pp. 972–1000. External Links: Document Cited by: §4.
  • V. Chernozhukov, D. Chetverikov, K. Kato, and Y. Koike (2022) Improved central limit theorem and bootstrap approximation in high dimensions. The Annals of Statistics 50 (5), pp. 2562–2586. Cited by: §1.
  • V. Chernozhukov, D. Chetverikov, and K. Kato (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics 41 (6), pp. 2786–2819. External Links: Document Cited by: §1.
  • V. Chernozhukov, D. Chetverikov, and K. Kato (2017) Central limit theorems and bootstrap in high dimensions. The Annals of Probability 45 (4), pp. 2309–2353. External Links: Document Cited by: §1.
  • V. Chernozhukov, D. Chetverikov, and Y. Koike (2023) Nearly optimal central limit theorem and bootstrap approximations in high dimensions. The Annals of Applied Probability 33 (3), pp. 2374–2425. Cited by: §1.
  • H. Deng and C. Zhang (2020) Beyond gaussian approximation: bootstrap for maxima of sums of independent random vectors. The Annals of Statistics 48 (6), pp. 3643–3671. Cited by: §1.
  • Y. Ding, Q. Li, Y. Shi, L. Sun, and L. Zhang (2026) Gaussian multiplier bootstrap procedure for the kkth largest coordinate of high-dimensional statistics. Note: arXiv:2508.14400v2 [math.ST] Cited by: §1.
  • X. Fang and Y. Koike (2021) High-dimensional central limit theorems by stein’s method. The Annals of Applied Probability 31 (4), pp. 1660–1686. Cited by: §1.
  • X. Fang and Y. Koike (2024) Sharp high-dimensional central limit theorems for log-concave distributions. Annales de l’Institut Henri Poincare Probabilites et Statistiques 60 (3), pp. 2129–2156. Cited by: §1.
  • R. A. Fisher and L. H. C. Tippett (1928) Limiting forms of the frequency distribution of the largest or smallest member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society 24 (2), pp. 180–190. External Links: Document Cited by: §1.
  • P. Hall (1992) The bootstrap and edgeworth expansion. Springer Series in Statistics, Springer, New York. Cited by: §A.7.
  • R. A. Horn and C. R. Johnson (2012) Matrix analysis. 2 edition, Cambridge University Press, Cambridge. Cited by: §A.9.
  • Y. Koike (2021) Notes on the dimension dependence in high-dimensional central limit theorems for hyperrectangles. Japanese Journal of Statistics and Data Science 4 (1), pp. 257–297. External Links: Document Cited by: §1.
  • Y. Koike (2026) High-dimensional bootstrap and asymptotic expansion. Probability Theory and Related Fields. Note: Published online first External Links: Document Cited by: §A.3, §A.3, §A.3, §A.3, §A.4, §A.4, §A.9, §A.9, Lemma A.7, Lemma A.7, Lemma A.7, §1, Remark 2.1, Remark 2.2.
  • D. Kozbur (2021) Dimension-free anticoncentration bounds for gaussian order statistics with discussion of applications to multiple testing. Note: arXiv:2107.10766 Cited by: §1.
  • W. V. Li and Q. Shao (2002) A normal comparison inequality and its applications. Probability Theory and Related Fields 122 (4), pp. 494–508. Cited by: §A.3.
  • M. E. Lopes, Z. Lin, and H. Muller (2020) Bootstrapping max statistics in high dimensions: near-parametric rates under weak variance decay and application to functional and multinomial data. The Annals of Statistics 48 (2), pp. 1214–1229. Cited by: §1.
  • C. Y. Mu (1966) The types of limit distributions for some terms of variational series. Scientia Sinica 15, pp. 749–762. Cited by: §1.
  • X. Shao (2010) The dependent wild bootstrap. Journal of the American Statistical Association 105 (489), pp. 218–235. External Links: Document Cited by: §4.
  • V. Watts, H. Rootzen, and M. R. Leadbetter (1982) On limiting distributions of intermediate order statistics from stationary sequences. The Annals of Probability 10, pp. 653–662. Cited by: §1.
  • D. Zhang and W. B. Wu (2017) Gaussian approximation for high dimensional time series. The Annals of Statistics 45 (5), pp. 1895–1919. External Links: Document Cited by: §4.
  • X. Zhang and G. Cheng (2014) Bootstrapping high dimensional time series. Note: arXiv:1406.1037 External Links: Document, Link Cited by: §4.
  • X. Zhang and G. Cheng (2018) Gaussian approximation for high dimensional vector under physical dependence. Bernoulli 24 (4A), pp. 2640–2675. External Links: Document Cited by: §4.
BETA