High Dimensional Probability

Published:

High-Dimensional Probability (2018) - Vershynin 是一本介绍高维变量 concentration inequalities 的书,从 random variable, random vector, random matrix 到 random process,涵盖的内容很广。感激滕佳烨老师的视频

concentration 是 expectation 形式(弱)或是 tail probability 形式(强)。

Random Variable

Random variable 讨论的是 concentration of summation。

  • 通过加 sub-gaussian/sub-exponential 的假设,得到 general Hoeffding/Bernstein 不等式
  • 应用: degree of random graph 的 bound。

Random Vector

  • Concentration of norm (dimension趋于无穷而非summation)
    • 对于 component-wise independent and component-wise sub-gaussian 的 random vector, 可以给出 norm 的双边 bound (Theorem 3.1.1)。
    • 然后我们定义了 vector 的 sub-gaussian。对于 component-wise independent 的 sub-gaussian vector,只能给出单边的bound (Exercise 6.3.5).
    • 应用: Grothendieck 不等式和 SDP relaxation. 在 optimization 课上,我们已经见过了 quadratic optimization \(\min_{x} x^\top A x\) problem (如 max-cut problem) relax 为 SDP 的求解思想,这里我们进一步考虑 integer quadratic optimization: 要求 x 的每个元素都是 1,-1. 通过Grothendieck 不等式,我们可以把 integer relax 为 continuous value 并给出 relaxation bound.
  • Concentration of a \(f(X)-E[f(X)]\) (without independence, generalize the norm)
    • 需要假设 Lipschitz continuous of f and specific distribution of X。
  • Concentration of quadratic form
    • Hanson-Wright inequality 给出了 concentration;证明用到 decoupling 的思想,即引入随机变量的一个副本 X’。延申:Rademacher complexity 和 VC dimension 都会用到的 symmetrization,其证明也是使用了同样的思想,引入随机变量的一个副本 X’。

Random Matrix

  • Concentration of F-norm and operator norm
    • 当 component-wise independent and sub-gaussian 时,有结论 Theorem 4.4.5;
    • 当 row-wise independent and sub-gaussian 时,有结论 Theorem 4.6.1;
    • 对于无 sub-gaussian 时,只有 expectation 形式的结论。
    • 证明的思路是 disceretize as \(\epsilon\)-net, thus involves covering number.
    • 应用:community detection in network.
  • Concentration of norm of summation
    • Matrix Hoeffding inequality and matrix Bernstein inequality
    • 应用:J-L Lemma 给出了降维的保距性。 covariance estimation。

Random Process

Consider \( E[\sup_{t\in T} X_t] \)

  • Gaussian process
    • Sudakov inequality: lower bound
  • General process: with sub-gaussian increment
    • Dudley’s inequality. Proof: chaining 是加强版 \(\epsilon\)-net. 改进:generic chaining bound,use adaptive \(\epsilon_k\)
    • 应用:empirical process via VC dimension

Others

Vector recovery, matrix recovery; Matrix deviation inequality, Dvoretzky-Milman Theorem.

关于 sub-Gaussian r.v. 的不同定义

关于 sub-Gaussianity 的定义是否要 centering,不同来源有不同的定义,stack exchange 上有关于这个问题的总结:

Regarding the centering, there isn’t an established convention.

  • Vershynin only states the \(E[e^{\lambda X}] \le e^{\lambda^2 \sigma^2/2}\) definition in the case $E[X]=0$, but also provides other definitions of sub-Gaussianity that don’t assume \(E[X]=0\).
  • Rigollet makes \(E[X]=0\) part of the definition of sub-Gaussianity.
  • Wikipedia and other texts (like Wainwright) define sub-Gaussianity by applying the MGF condition to the centered random variable \(X-E[X]\) instead of \(X\).

centering 的影响

个人观点:主要影响 sub-exponential 及其相关命题 (e.g., Hanson-Wright inequality)。如果 sub-Gaussian 定义为 centering 后的行为,那么相应的 sub-exponential 也定义为 centering 后的,这时因为 \( (\xi-E[\xi])^2\neq \xi^2 - E[\xi^2]\),sub-exponentiality 和 sub-Gaussianity 就不是简单的平方关系了。那么在 Hanson-Wright 的这种既需要 sub-exponentiality 的结论、也需要 sub-Gaussianity 的结论的时候,因为 K after centering can’t be lower bounded by K before centering, 推导难以进行.