CS224N Natural Language Processing with Deep Learning Assignment 2

课程主页:https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/

视频地址:https://www.bilibili.com/video/av46216519?from=search&seid=13229282510647565239

这里回顾CS224N Assignment 2的内容,Assignment 1比较基础,这里从略。

1.Understanding word2vec

(a)

注意只有当$w= o$时,我们才有$y_w =1$,其余情形均为$0$,所以

(b)

所以

(c)

(d)

计算雅克比矩阵可得

所以

(e)

注意$o$不属于$\{1,\ldots ,K \}$

原始的损失函数中需要求指数和,很容易溢出,要进行处理,但是负采样的损失函数就没有这个问题。

(f)

2.Implementing word2vec

(a)

sigmoid

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def sigmoid(x):
"""
Compute the sigmoid function for the input here.
Arguments:
x -- A scalar or numpy array.
Return:
s -- sigmoid(x)
"""

### YOUR CODE HERE
s = 1 / (1 + np.exp(-x))

### END YOUR CODE

return s

naiveSoftmaxLossAndGradient

注意这里的矩阵是第一题矩阵的转置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
### YOUR CODE HERE

### Please use the provided softmax function (imported earlier in this file)
### This numerically stable implementation helps you avoid issues pertaining
### to integer overflow.
'''
centerWordVec: 1 * d
outsideVectors: n * d
'''
#1 * n
vec = centerWordVec.dot(outsideVectors.T)
#1 * n
prob = softmax(vec)
loss = -np.log(prob[outsideWordIdx])
#1 * d
gradCenterVec = -outsideVectors[outsideWordIdx] + prob.dot(outsideVectors)
#n * d
gradOutsideVecs = prob.reshape(-1, 1).dot(centerWordVec.reshape(1, -1))
#n * d
gradOutsideVecs[outsideWordIdx] -= centerWordVec
### END YOUR CODE

negSamplingLossAndGradient

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
### YOUR CODE HERE

### Please use your implementation of sigmoid in here.
'''
centerWordVec: 1 * d
outsideVectors: n * d
'''
#1 * m
vec = centerWordVec.dot(outsideVectors[indices].T)
vec[1:] *= -1
sig = sigmoid(vec)
tmp = np.log(sig)
loss = -tmp[0] - np.sum(tmp[1:])
#1 * m
t1 = 1 - sig
gradCenterVec = t1.dot(outsideVectors[indices]) - 2 * t1[0] * outsideVectors[outsideWordIdx]
#累加
gradOutsideVecs = np.zeros_like(outsideVectors)
gradOutsideVecs[outsideWordIdx] += -t1[0] * centerWordVec
for i in range(K):
k = negSampleWordIndices[i]
gradOutsideVecs[k] += t1[i + 1] * centerWordVec
### END YOUR CODE

(b)

1
2
3
4
5
### YOUR CODE HERE
loss, grad = f(x)
x -= step * grad

### END YOUR CODE

本文标题:CS224N Natural Language Processing with Deep Learning Assignment 2

文章作者:Doraemonzzz

发布时间:2020年03月17日 - 16:44:00

最后更新:2020年03月19日 - 15:04:46

原始链接:http://doraemonzzz.com/2020/03/17/CS224N Natural Language Processing with Deep Learning Assignment 2/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。