CS231 作业2
课程视频地址:https://study.163.com/courses-search?keyword=CS231
课程主页:http://cs231n.stanford.edu/2017/
参考资料:
https://github.com/Halfish/cs231n/tree/master/assignment2/cs231n
我的代码地址:https://github.com/Doraemonzzz/CS231n
这一部分回顾作业2的重点。
准备工作
如果读取数据的时候报错,那么需要修改data_utils.py文件中如下函数:
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000,
subtract_mean=True):
找到
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
将其修改为自己存放数据的位置即可。
1.全连接神经网络
为了后面叙述方便,这里做以下假设
Affine layer: foward
这部分很简单,只要知道输出为
即可,对应代码为
N = x.shape[0]
X = x.reshape(N, -1)
out = X.dot(w) + b
Affine layer: backward
上一次作业最后,我们推导了
其中$F$为反向传播的输入,这部分只要根据定义即可验证,记忆方式很简单,只要匹配矩阵的维度即可,有了上述公式,不难得到这部分对应的代码为
#转换形状
N = x.shape[0]
X = x.reshape(N, -1)
dx = dout.dot(w.T)
#转换为原来的形状
dx = np.reshape(dx, x.shape)
dw = X.T.dot(dout)
db = np.sum(dout, axis=0)
ReLU layer: forward
没什么好说的,直接根据定义:
out = np.copy(x)
out[out < 0] = 0
ReLU layer: backward
只要将输入小于$0$的位置的梯度取$0$即可:
dx = np.copy(dout)
dx[x < 0] = 0
Two-layer network
实现两层神经网络,这里的网络架构为affine - relu - affine - softmax。
第一步,初始化:
W1 = np.random.randn(input_dim, hidden_dim) * weight_scale
b1 = np.zeros(hidden_dim)
W2 = np.random.randn(hidden_dim, num_classes) * weight_scale
b2 = np.zeros(num_classes)
self.params["W1"] = W1
self.params["b1"] = b1
self.params["W2"] = W2
self.params["b2"] = b2
第二步,前向传播:
W1 = self.params["W1"]
b1 = self.params["b1"]
W2 = self.params["W2"]
b2 = self.params["b2"]
#中间层
z1, cache1 = affine_relu_forward(X, W1, b1)
#输出
scores, cache2 = affine_forward(z1, W2, b2)
第三步:反向传播:
#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
loss += self.reg * (np.sum(W2 ** 2) + np.sum(W1 ** 2)) / 2
#计算dW2,db2
dz1, dW2, db2 = affine_backward(dout, cache2)
#加上正则项
dW2 += self.reg * W2
#计算dW1,db1
dx, dW1, db1 = affine_relu_backward(dz1, cache1)
dW1 += self.reg * W1
#存入字典
grads["W1"] = dW1
grads["b1"] = db1
grads["W2"] = dW2
grads["b2"] = db2
Multilayer network
这部分是上一部分的推广。
第一步,初始化,注意这里要根据是输入层,输出层还是中间层来分情况讨论:
for i in range(self.num_layers):
if i == 0:
W = np.random.randn(input_dim, hidden_dims[i]) * weight_scale
b = np.zeros(hidden_dims[i])
elif i == self.num_layers - 1:
W = np.random.randn(hidden_dims[i-1], num_classes) * weight_scale
b = np.zeros(num_classes)
else:
W = np.random.randn(hidden_dims[i-1], hidden_dims[i]) * weight_scale
b = np.zeros(hidden_dims[i])
self.params["W"+str(i+1)] = W
self.params["b"+str(i+1)] = b
第二步,前向传播,这里要分是否是输出层来讨论:
x = X
#记录缓存
Cache = {}
Cache_dropout = {}
for i in range(self.num_layers):
W = self.params["W"+str(i+1)]
b = self.params["b"+str(i+1)]
if i < self.num_layers - 1:
x, cache = affine_relu_forward(x, W, b)
else:
x, cache = affine_forward(x, W, b)
#存入缓存
Cache["cache"+str(i+1)] = cache
#输出
scores = x
第三步,反向传播,依旧要分是否是输出层来讨论
#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
for i in range(self.num_layers):
W = self.params["W"+str(i+1)]
loss += self.reg * (np.sum(W ** 2)) / 2
#计算dWi
for i in range(self.num_layers, 0, -1):
cache = Cache["cache"+str(i)]
W = self.params["W"+str(i)]
if i == self.num_layers:
dz, dW, db = affine_backward(dout, cache)
else:
dz, dW, db = affine_relu_backward(dz, cache)
#加上正则项
dW += self.reg * W
#存入字典
grads["W"+str(i)] = dW
grads["b"+str(i)] = db
这部分内容只需要细心就可以完成,并不是很难。
SGD+Momentum
这里使用的更新公式如下:
v = config["momentum"] * v - config['learning_rate'] * dw
next_w = w + v
RMSProp
config['cache'] = config['decay_rate'] * config['cache'] + (1 - config['decay_rate']) * dx * dx
next_x = x - config['learning_rate'] * dx / (np.sqrt(config['cache']) + config['epsilon'])
Adam
config['m'] = config['beta1'] * config['m'] + (1 - config['beta1']) * dx
config['v'] = config['beta2'] * config['v'] + (1 - config['beta2']) * dx * dx
first_unbias = config['m'] / (1 - config['beta1'] ** config['t'])
second_unbias = config['v'] / (1 - config['beta2'] ** config['t'])
next_x = x - config['learning_rate'] * first_unbias / (np.sqrt(second_unbias) + config['epsilon'])
config['t'] += 1
2.批量标准化
Batch normalization: Forward
计算公式如下:
分为两部分,首先是训练部分,对应代码如下:
#计算样本均值
sample_mean = np.mean(x, axis=0)
#计算样本方差
sample_var = np.var(x, axis=0)
#记录系数
k = np.sqrt(sample_var + eps)
x1 = (x - sample_mean) / k
out = gamma * x1 + beta
cache.append(k)
cache.append(sample_mean)
cache.append(x1)
running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_var = momentum * running_var + (1 - momentum) * sample_var
最后一步是记录均值和方差的滑动平均值,这是为了给测试时使用,对应代码如下:
x1 = (x - running_mean) / np.sqrt(running_var + eps)
out = gamma * x1 + beta
备注,这里多记录了三个缓存的量,k对应的量为
x1对应的量为
out即为输出
这三部分在反向传播的时候都需要使用。
Batch Normalization: backward
这里我直接用求导的方法计算了,实际上完成了optional部分的作业,这里是作业最难的部分之一。
假设批量标准化后得到的矩阵为
注意$\beta ,\gamma$实际上为向量,即
$\beta_j ,\gamma_j$分别作用在第$j$个分量上,即
假设我们的函数为
反向传播传入的参数为
我们求$f$关于各个量的偏导数:
对应代码为
dbeta = np.sum(dout ,axis=0)
对应代码为
dgamma = np.sum(dout * x1, axis=0)
下面重点计算$\frac{\partial \hat x_k^{(s)}}{\partial x_k^{(i)}} $,首先回顾计算公式
所以我们有
有了准备工作,现在来计算$\frac{\partial \hat x_k^{(s)}}{\partial x_k^{(i)}} $:
所以
将上述内容分为三部分计算,首先是
将分子写为矩阵的形式:
利用numpy的广播机制,上述矩阵为
gamma * dout
注意k为
所以再次利用numpy的广播机制,第一项可以计算为
t1 = gamma * dout / k
接着计算第二项:
依旧利用numpy的广播机制,不难得到
m = x.shape[0]
t2 = - gamma / m * np.sum(dout, axis=0).reshape(1, -1) / k
最后是计算:
这一项比较复杂,我们先计算
首先是中心化矩阵:
t3 = x - sample_mean
其次不难看出$\frac{\partial f}{\partial y^{(s)}_k}(x_k^{(s)}-\mu_k)$为梯度矩阵和中心化矩阵对应元素相乘的结果,所以$\sum_{s=1}^ m\frac{\partial f}{\partial y^{(s)}_k}(x_k^{(s)}-\mu_k)$为该矩阵按行求和得到结果,所以代码为:
t4 = np.sum(dout * t3, axis=0).reshape(1, -1)
最后,利用numpy广播机制可以计算
对应代码为
t5 = - gamma / m * t3 / (k ** 3) * t4
最后将上述三项相加即可得到总梯度
dx= t1 + t2 + t5
Fully Connected Nets with Batch Normalization
这部分是修改Connected Nets的代码,因为网络结构为affine_batchnorm_relu,所以编写如下辅助函数:
def affine_batchnorm_relu_forward(x, W, b, gamma, beta, bn_params):
#affline
x, cache_affine = affine_forward(x, W, b)
#batchnorm
x, cache_batch = batchnorm_forward(x, gamma, beta, bn_params)
#relu
x, cache_relu = relu_forward(x)
return x, (cache_affine, cache_batch, cache_relu)
def affine_batchnorm_relu_backward(dout, cache):
cache_affine, cache_batch, cache_relu = cache
#relu
dx = relu_backward(dout, cache_relu)
#batchnorm
dx, dgamma, dbeta = batchnorm_backward(dx, cache_batch)
#affline
dx, dw, db = affine_backward(dx, cache_affine)
return dx, dw, db, dgamma, dbeta
这部分只是将代码模块化,接着修改Connected Nets,首先是初始化部分:
for i in range(self.num_layers):
if i == 0:
W = np.random.randn(input_dim, hidden_dims[i]) * weight_scale
b = np.zeros(hidden_dims[i])
elif i == self.num_layers - 1:
W = np.random.randn(hidden_dims[i-1], num_classes) * weight_scale
b = np.zeros(num_classes)
else:
W = np.random.randn(hidden_dims[i-1], hidden_dims[i]) * weight_scale
b = np.zeros(hidden_dims[i])
if self.use_batchnorm and i != self.num_layers - 1:
gamma = np.ones(hidden_dims[i])
beta = np.zeros(hidden_dims[i])
self.params["gamma"+str(i+1)] = gamma
self.params["beta"+str(i+1)] = beta
self.params["W"+str(i+1)] = W
self.params["b"+str(i+1)] = b
接着是前向传播部分:
x = X
#记录缓存
Cache = {}
Cache_dropout = {}
for i in range(self.num_layers):
W = self.params["W"+str(i+1)]
b = self.params["b"+str(i+1)]
if i < self.num_layers - 1:
#batchnorm
if self.use_batchnorm:
gamma = self.params["gamma"+str(i+1)]
beta = self.params["beta"+str(i+1)]
x, cache = affine_batchnorm_relu_forward(x, W, b, gamma, beta, self.bn_params[i])
else:
x, cache = affine_relu_forward(x, W, b)
else:
x, cache = affine_forward(x, W, b)
#存入缓存
Cache["cache"+str(i+1)] = cache
#输出
scores = x
最后是反向传播部分:
#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
for i in range(self.num_layers):
W = self.params["W"+str(i+1)]
loss += self.reg * (np.sum(W ** 2)) / 2
#计算dWi
dz, dW, db, dgamma, dbeta = 0, 0, 0, 0, 0
for i in range(self.num_layers, 0, -1):
cache = Cache["cache"+str(i)]
W = self.params["W"+str(i)]
if i == self.num_layers:
dz, dW, db = affine_backward(dout, cache)
else:
if self.use_batchnorm:
dz, dW, db, dgamma, dbeta = affine_batchnorm_relu_backward(dz, cache)
grads["gamma"+str(i)] = dgamma
grads["beta"+str(i)] = dbeta
else:
dz, dW, db = affine_relu_backward(dz, cache)
#加上正则项
dW += self.reg * W
#存入字典
grads["W"+str(i)] = dW
grads["b"+str(i)] = db
3.随机失活(Dropout)
Dropout forward pass
具体公式可以参考笔记,这里直接给出代码,首先是前向传播,分为训练部分以及测试部分:
if mode == 'train':
mask = (np.random.rand(x.shape[0], x.shape[1]) < p) / p
out = x * mask
elif mode == 'test':
out = x
Dropout backward pass
其次是反向传播,依旧分为训练部分和测试部分:
if mode == 'train':
dx = dout * mask
elif mode == 'test':
dx = dout
Fully-connected nets with Dropout
只需增加一个判断即可,前向传播:
x = X
#记录缓存
Cache = {}
Cache_dropout = {}
for i in range(self.num_layers):
W = self.params["W"+str(i+1)]
b = self.params["b"+str(i+1)]
if i < self.num_layers - 1:
#batchnorm
if self.use_batchnorm:
gamma = self.params["gamma"+str(i+1)]
beta = self.params["beta"+str(i+1)]
x, cache = affine_batchnorm_relu_forward(x, W, b, gamma, beta, self.bn_params[i])
else:
x, cache = affine_relu_forward(x, W, b)
if self.use_dropout:
x, cache_dropout = dropout_forward(x, self.dropout_param)
Cache_dropout["cache"+str(i+1)] = cache_dropout
else:
x, cache = affine_forward(x, W, b)
#存入缓存
Cache["cache"+str(i+1)] = cache
#输出
scores = x
反向传播:
#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
for i in range(self.num_layers):
W = self.params["W"+str(i+1)]
loss += self.reg * (np.sum(W ** 2)) / 2
#计算dWi
dz, dW, db, dgamma, dbeta = 0, 0, 0, 0, 0
for i in range(self.num_layers, 0, -1):
cache = Cache["cache"+str(i)]
W = self.params["W"+str(i)]
if i == self.num_layers:
dz, dW, db = affine_backward(dout, cache)
else:
if self.use_dropout:
cache_dropout = Cache_dropout["cache"+str(i)]
dz = dropout_backward(dz, cache_dropout)
if self.use_batchnorm:
dz, dW, db, dgamma, dbeta = affine_batchnorm_relu_backward(dz, cache)
grads["gamma"+str(i)] = dgamma
grads["beta"+str(i)] = dbeta
else:
dz, dW, db = affine_relu_backward(dz, cache)
#加上正则项
dW += self.reg * W
#存入字典
grads["W"+str(i)] = dW
grads["b"+str(i)] = db
4.在CIFAR-10上运行卷积神经网络
为了方便讨论,这里定义如下变量:$x$是图像数据,维度为$(N, C, H, W)$;$w$为卷积核,维度为$(F, C, HH, WW)$;$b$为偏置项,维度为$(F, )$,其中$N$是图像的数量,$C$是channel数量(RGB图像中这一项为$3$),$H,W$是图像的长宽,$F$是卷积核的数量,$HH,WW$是卷积核的长宽。此外,定义stride为步长,pad为填充数量,那么根据公式,得到填充后的数据$x_$的维度为$(N, C, H_2, W_2)$,其中
输出维度为$(N,F,H_1,W_1)$,其中
记输出为out,考虑第out第$i,j $个元素
该元素由$x_1=x_[i]\in \mathbb R^{C\times H_2\times W_2}$和$w_1= w[j]\in \mathbb R^{C\times HH \times WW}$计算得到,其第$s,t$个元素的计算方法如下:
x1 = x_[i]
w1 = w[j]
res[s][t] = np.sum(x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * w1) + b[j]
(备注,利用numpy的广播机制,实际中可以最后加上$b[j]$)
为了方便反向传播的讨论,这里记
x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW]
为$x’\in \mathbb R^{C\times HH\times WW}$,注意$w_1\in \mathbb R^{C\times HH \times WW}$,$b[j]\in \mathbb R$为对应偏置项,那么
那么
因此
记反向传播的输入为dout,其第$i,j$个元素为
假设最后作用在out上的函数为$f$,那么我们有
所以
对应代码如下:
for s in range(H1):
for t in range(W1):
dw1 += x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * dout1[s][t]
dx1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] += w1 * dout1[s][t]
db1 += dout1[s][t]
剩余部分只要利用循环即可完成。
Convolution: Naive forward pass
首先是利用np.pad函数进行$0$填充:
stride = conv_param["stride"]
pad = conv_param["pad"]
x_ = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), "constant")
然后计算输出维度:
#输入维度
N, C, H, W = x.shape
F, C, HH, WW = w.shape
#输出维度
H1 = 1 + (H + 2 * pad - HH) // stride
W1 = 1 + (W + 2 * pad - WW) // stride
out = np.zeros((N, F, H1, W1))
然后根据定义计算即可,这里用循环的方法:
for i in range(N):
for j in range(F):
x1 = x_[i]
w1 = w[j]
res = np.zeros((H1, W1))
for s in range(H1):
for t in range(W1):
res[s][t] = np.sum(x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * w1)
res += b[j]
out[i][j] = res
Aside: Image processing via convolutions
这部分如果出现如下报错:
cannot import name imread
只需要安装Pillow即可:
pip install Pillow
Convolution: Naive backward pass
之前已经介绍了大部分内容,后续只要循环遍历即可,首先是初始化工作:
x, w, b, conv_param = cache
stride = conv_param["stride"]
pad = conv_param["pad"]
#填充
x_ = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), "constant")
#输入维度
N, C, H, W = x.shape
F, C, HH, WW = w.shape
H2, W2 = x_.shape[2:]
#输出维度
dx = np.zeros_like(x_)
dw = np.zeros_like(w)
db = np.zeros_like(b)
接着是循环遍历:
for i in range(N):
for j in range(F):
t1 = dout[i][j]
#获得维度
H1, W1 = t1.shape
#初始化梯度
dx1 = np.zeros((C, H2, W2))
dw1 = np.zeros((C, HH, WW))
db1 = 0
#当前维度的x, w
x1 = x_[i]
w1 = w[j]
#当前维度的dout
dout1 = dout[i][j]
for s in range(H1):
for t in range(W1):
dw1 += x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * dout1[s][t]
dx1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] += w1 * dout1[s][t]
db1 += dout1[s][t]
db[j] += db1
dx[i] += dx1
dw[j] += dw1
$i,j$部分的循环只是对$dx, dw$的第一个维度进行遍历,注意我们最后计算的实际上是填充后的梯度,所以输出应该为
dx = dx[:, :, pad: pad+H, pad: pad+W]
Max pooling: Naive forward
这部分和前项传播类似,只是将之前的卷积操作换成取最大值:
pool_height = pool_param["pool_height"]
pool_width = pool_param["pool_width"]
stride = pool_param["stride"]
#输入维度
N, C, H, W = x.shape
#输出维度
H1 = 1 + (H - pool_height) // stride
W1 = 1 + (W - pool_width) // stride
out = np.zeros((N, C, H1, W1))
for i in range(N):
for j in range(C):
x1 = x[i][j]
res = np.zeros((H1, W1))
for s in range(H1):
for t in range(W1):
res[s][t] = np.max(x1[s*stride: s*stride+pool_height, t*stride: t*stride+pool_width])
out[i][j] = res
Max pooling: Naive backward
由于最大池化的特性,只要将最大元素所在位置对应的dout累加即可,代码和反向传播类似:
x, pool_param = cache
pool_height = pool_param["pool_height"]
pool_width = pool_param["pool_width"]
stride = pool_param["stride"]
#输入维度
N, C, H, W = x.shape
#输出维度
H1 = 1 + (H - pool_height) // stride
W1 = 1 + (W - pool_width) // stride
dx = np.zeros_like(x)
for i in range(N):
for j in range(C):
#当前维度的dout
dout1 = dout[i][j]
x1 = x[i][j]
dx1 = np.zeros((H, W))
for s in range(H1):
for t in range(W1):
#拉直
temp = x1[s*stride: s*stride+pool_height, t*stride: t*stride+pool_width].flatten()
#找到最大元素对应的索引
index = np.argmax(temp)
#还原矩阵中的位置
m, n = index // pool_width, index % pool_width
dx1[s*stride + m][t*stride + n] += dout1[s][t]
dx[i][j] = dx1
这里我没找到计算矩阵最大元素对应的行列的方法,只能手工计算:
#拉直
temp = x1[s*stride: s*stride+pool_height, t*stride: t*stride+pool_width].flatten()
#找到最大元素对应的索引
index = np.argmax(temp)
#还原矩阵中的位置
m, n = index // pool_width, index % pool_width
然后累加对应位置的dout
dx1[s*stride + m][t*stride + n] += dout1[s][t]
Fast layers
这部分使用Cython,我一开始产生如下报错
error: Unable to find vcvarsall.bat
最后是参考这篇博客解决的,实际上只要下载一个安装包即可(传送门)。
Three-layer ConvNet
这部分感觉题目没有讲清楚,也有可能是我理解的问题,网络架构为:
conv - relu - 2x2 max pool - affine - relu - affine - softmax
权重是用于relu层以及affine层,一开始对于维度不清楚,后来发现有如下代码:
# pass conv_param to the forward pass for the convolutional layer
filter_size = W1.shape[2]
conv_param = {'stride': 1, 'pad': (filter_size - 1) // 2}
# pass pool_param to the forward pass for the max-pooling layer
pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}
上述代码说明经过卷积之后数据的维度和输入维度相同,所以初始化步骤如下:
C, H, W = input_dim
F, HH, WW = num_filters, filter_size, filter_size
W1 = np.random.randn(F, C, HH, WW) * weight_scale
b1 = np.zeros(F)
#根据后面算法推断,第一层卷积之后图像数据最后两个维度不变,总数据维度为
n = F * H * W
#W2是在2x2 max pool后使用的权重,所以第一个维度为n // 4
W2 = np.random.randn(n // 4, hidden_dim) * weight_scale
b2 = np.zeros(hidden_dim)
W3 = np.random.randn(hidden_dim, num_classes) * weight_scale
b3 = np.zeros(num_classes)
self.params["W1"] = W1
self.params["b1"] = b1
self.params["W2"] = W2
self.params["b2"] = b2
self.params["W3"] = W3
self.params["b3"] = b3
前向传播:
X1, cache1 = conv_forward_fast(X, W1, b1, conv_param)
X2, cache2 = relu_forward(X1)
X3, cache3 = max_pool_forward_fast(X2, pool_param)
X4, cache4 = affine_forward(X3, W2, b2)
X5, cache5 = relu_forward(X4)
X6, cache6 = affine_forward(X5, W3, b3)
scores = X6
反向传播:
loss, dz6 = softmax_loss(scores, y)
loss += self.reg * (np.sum(W1 ** 2) + np.sum(W2 ** 2) + np.sum(W2 ** 3)) / 2
dz5, dW3, db3 = affine_backward(dz6, cache6)
dz4 = relu_backward(dz5, cache5)
dz3, dW2, db2 = affine_backward(dz4, cache4)
dz2 = max_pool_backward_fast(dz3, cache3)
dz1 = relu_backward(dz2, cache2)
dz, dW1, db1 = conv_backward_fast(dz1, cache1)
grads["W3"] = dW3
grads["b3"] = db3
grads["W2"] = dW2
grads["b2"] = db2
grads["W1"] = dW1
grads["b1"] = db1
Spatial batch normalization: forward
这部分是对每个Channel上使用batch normalization,所以代码为:
N, C, H, W = x.shape
out = np.zeros_like(x)
cache = []
x1 = np.copy(x)
x1 = x.reshape(-1, C)
out, cache = batchnorm_forward(x1, gamma, beta, bn_param)
out = out.reshape(x.shape)
Spatial batch normalization: backward
反向传播也同理:
N, C, H, W = dout.shape
dout1 = np.copy(dout)
dout1 = dout1.reshape(-1, C)
dx, dgamma, dbeta = batchnorm_backward(dout1, cache)
dx = dx.reshape(dout.shape)
5.TensorFlow
如果读取数据的时候报错,则修改如下路径为自己存放数据的路径即可:
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
TensorFlow Details
这部分的难点是5408这个数据是怎么来的,因为使用的参数为’VALID’,官方文档给出的计算公式为:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
这里根据公式计算得到的结果是
向下取整为
所以输出维度为
参考资料:传送门。
Training a specific model
因为我对Tensorflow不是很熟,所以这部分主要参考了别人的作业,代码如下:
# define model
def complex_model(X,y,is_training):
#conv1
Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 3, 32])
bconv1 = tf.get_variable("bconv1", shape=[32])
#Affine layer
W1 = tf.get_variable("W1", shape=[5408, 1024])
b1 = tf.get_variable("b1", shape=[1024])
#Affine layer
W2 = tf.get_variable("W2", shape=[1024, 10])
b2 = tf.get_variable("b2", shape=[10])
#conv
a1 = tf.nn.conv2d(X, Wconv1, strides=[1,1,1,1], padding='VALID') + bconv1
#relu
h1 = tf.nn.relu(a1)
#Spatial Batch Normalization Layer
axis = [0, 1, 2]
mean, variance = tf.nn.moments(h1, axis)
offset = tf.Variable(tf.zeros([32]))
scala = tf.Variable(tf.ones([32]))
bn1 = tf.nn.batch_normalization(h1, mean, variance, offset, scala, 0.001)
#Max Pooling
p1 = tf.nn.max_pool(bn1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
#Affine layers
p1_flat = tf.reshape(p1, [-1, 5408])
a2 = tf.matmul(p1_flat, W1) + b1
#relu
h2 = tf.nn.relu(a2)
h2_flat = tf.reshape(h2, [-1, 1024])
#Affine layer
y_out = tf.matmul(h2_flat, W2) + b2
return y_out
Spatial Batch Normalization Layer相对复杂一些,别的部分照葫芦画瓢就可以了。
总结
这次作业真的非常难,需要反复体会,后续应该会把卷积部分的代码优化一下。