CS236 Deep Generative Models Lecture 5

课程主页：https://deepgenerativemodels.github.io/

课件资料：https://github.com/Subhajit135/CS236_DGM，https://github.com/deepgenerativemodels/notes

视频地址：https://www.bilibili.com/video/av81625948?from=search&seid=4636291486683332935

这里回顾CS236 Lecture 5的课程内容，这一讲介绍了隐变量模型。

Main idea in Monte Carlo Estimation

假设我们想计算期望

$E_{x \sim P}[g(x)]=\sum_{x} g(x) P(x)$

如果直接计算很困难，那么可以使用蒙特卡洛模拟的思路进行计算上述值：从分布$P$中采样$x^{1}, \ldots, x^{T}$，然后利用下值近似上述期望

$\hat{g}\left(\mathbf{x}^{1}, \cdots, \mathbf{x}^{T}\right) \triangleq \frac{1}{T} \sum_{t=1}^{T} g\left(\mathbf{x}^{t}\right)$

蒙特卡洛估计有如下性质：

无偏性：
$E_{P}[\hat{g}]=E_{P}[g(x)]$
收敛性：根据大数定律
$\hat{g}=\frac{1}{T} \sum_{t=1}^{T} g\left(x^{t}\right) \rightarrow E_{P}[g(x)] \text { for } T \rightarrow \infty$
方差：
$V_{P}[\hat{g}]=V_{P}\left[\frac{1}{T} \sum_{t=1}^{T} g\left(x^{t}\right)\right]=\frac{V_{P}[g(x)]}{T}$

Extending the MLE principle to a Bayesian network

回到之前介绍的极大似然估计，现在给定$n$个变量的自回归模型

$P_{\theta}(\mathbf{x})=\prod_{i=1}^{n} p_{\text {neural }}\left(x_{i} | p a\left(x_{i}\right) ; \theta_{i}\right)$

其中$pa(x_i)$表示$x_i$的父节点。

给定训练数据$\mathcal{D}=\left\{\mathbf{x}^{(1)}, \cdots, \mathbf{x}^{(m)}\right\}$，似然函数为

$L(\theta, \mathcal{D})=\prod_{j=1}^{m} P_{\theta}\left(\mathbf{x}^{(j)}\right)=\prod_{j=1}^{m} \prod_{i=1}^{n} p_{\text {neural }}\left(x_{i}^{(j)} | p a\left(x_{i}\right)^{(j)} ; \theta_{i}\right)$

我们的目标是最大化

$\ell(\theta)=\log L(\theta, \mathcal{D})=\sum_{j=1}^{m} \sum_{i=1}^{n} \log p_{\text {neural }}\left(x_{i}^{(j)} | p a\left(x_{i}\right)^{(j)} ; \theta_{i}\right)$

具体方法为使用梯度下降：

随机初始化$\theta^0$
计算$\nabla_{\theta} \ell(\theta)$
$\theta^{t+1}=\theta^{t}+\alpha_{t} \nabla_{\theta} \ell(\theta)$

其中

$\nabla_{\theta} \ell(\theta)=\sum_{j=1}^{m} \sum_{i=1}^{n} \nabla_{\theta} \log p_{\text {neural }}\left(x_{i}^{(j)} | p a\left(x_{i}\right)^{(j)} ; \theta_{i}\right)$

如果$m$很大，那么计算上式的成本太大，所以实际中常用随机（小批量）梯度下降法。

Latent Variable Models

Latent Variable Models: Motivation

考虑人脸图像：

由于性别，眼睛颜色，头发颜色，姿势等原因，图像$\mathrm x$的变化很大。但是，除非对图像进行注释，否则这些变化因素无法直接得到。
思路：使用潜在变量对这些因素进行显式建模。

Deep Latent Variable Models

$\mathbf{z} \sim \mathcal{N}(0, I)$
$p(\mathbf{x} | \mathbf{z})=\mathcal{N}\left(\mu_{\theta}(\mathbf{z}), \Sigma_{\theta}(\mathbf{z})\right)$，其中$\mu_{\theta}, \Sigma_{\theta}$是神经网络

Mixture of Gaussians: a Shallow Latent Variable Model

考虑高斯混合模型：

$\mathbf{z} \sim \text { Categorical }(1, \cdots, K)$
$p(\mathbf{x} | \mathbf{z}=k)=\mathcal{N}\left(\mu_{k}, \Sigma_{k}\right)$

生成过程

通过采样$\mathrm z$选择混合成分$k$
通过从第$k$个高斯分布采样生成数据