High Dimensional Probability
Published:
High-Dimensional Probability (2018) - Vershynin 是一本介绍高维变量 concentration inequalities 的书,从 random variable, random vector, random matrix 到 random process,涵盖的内容很广。感激滕佳烨老师的视频!
concentration 是 expectation 形式(弱)或是 tail probability 形式(强)。
Random Variable
Random variable 讨论的是 concentration of summation。
- 通过加 sub-gaussian/sub-exponential 的假设,得到 general Hoeffding/Bernstein 不等式
- 应用: degree of random graph 的 bound。
Random Vector
- Concentration of norm (dimension趋于无穷而非summation)
- 对于 component-wise independent and component-wise sub-gaussian 的 random vector, 可以给出 norm 的双边 bound (Theorem 3.1.1)。
- 然后我们定义了 vector 的 sub-gaussian。对于 component-wise independent 的 sub-gaussian vector,只能给出单边的bound (Exercise 6.3.5).
- 应用: Grothendieck 不等式和 SDP relaxation. 在 optimization 课上,我们已经见过了 quadratic optimization \(\min_{x} x^\top A x\) problem (如 max-cut problem) relax 为 SDP 的求解思想,这里我们进一步考虑 integer quadratic optimization: 要求 x 的每个元素都是 1,-1. 通过Grothendieck 不等式,我们可以把 integer relax 为 continuous value 并给出 relaxation bound.
- Concentration of a \(f(X)-E[f(X)]\) (without independence, generalize the norm)
- 需要假设 Lipschitz continuous of f and specific distribution of X。
- Concentration of quadratic form
- Hanson-Wright inequality 给出了 concentration;证明用到 decoupling 的思想,即引入随机变量的一个副本 X’。延申:Rademacher complexity 和 VC dimension 都会用到的 symmetrization,其证明也是使用了同样的思想,引入随机变量的一个副本 X’。
Random Matrix
- Concentration of F-norm and operator norm
- 当 component-wise independent and sub-gaussian 时,有结论 Theorem 4.4.5;
- 当 row-wise independent and sub-gaussian 时,有结论 Theorem 4.6.1;
- 对于无 sub-gaussian 时,只有 expectation 形式的结论。
- 证明的思路是 disceretize as \(\epsilon\)-net, thus involves covering number.
- 应用:community detection in network.
- Concentration of norm of summation
- Matrix Hoeffding inequality and matrix Bernstein inequality
- 应用:J-L Lemma 给出了降维的保距性。 covariance estimation。
Random Process
Consider \( E[\sup_{t\in T} X_t] \)
- Gaussian process
- Sudakov inequality: lower bound
- General process: with sub-gaussian increment
- Dudley’s inequality. Proof: chaining 是加强版 \(\epsilon\)-net. 改进:generic chaining bound,use adaptive \(\epsilon_k\)
- 应用:empirical process via VC dimension
Others
Vector recovery, matrix recovery; Matrix deviation inequality, Dvoretzky-Milman Theorem.
关于 sub-Gaussian r.v. 的不同定义
关于 sub-Gaussianity 的定义是否要 centering,不同来源有不同的定义,stack exchange 上有关于这个问题的总结:
Regarding the centering, there isn’t an established convention.
- Vershynin only states the \(E[e^{\lambda X}] \le e^{\lambda^2 \sigma^2/2}\) definition in the case $E[X]=0$, but also provides other definitions of sub-Gaussianity that don’t assume \(E[X]=0\).
- Rigollet makes \(E[X]=0\) part of the definition of sub-Gaussianity.
- Wikipedia and other texts (like Wainwright) define sub-Gaussianity by applying the MGF condition to the centered random variable \(X-E[X]\) instead of \(X\).
centering 的影响
个人观点:主要影响 sub-exponential 及其相关命题 (e.g., Hanson-Wright inequality)。如果 sub-Gaussian 定义为 centering 后的行为,那么相应的 sub-exponential 也定义为 centering 后的,这时因为 \( (\xi-E[\xi])^2\neq \xi^2 - E[\xi^2]\),sub-exponentiality 和 sub-Gaussianity 就不是简单的平方关系了。那么在 Hanson-Wright 的这种既需要 sub-exponentiality 的结论、也需要 sub-Gaussianity 的结论的时候,因为 K after centering can’t be lower bounded by K before centering, 推导难以进行.