课程主页:https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/

视频地址:https://www.bilibili.com/video/av46216519?from=search&seid=13229282510647565239

这里回顾CS224N Assignment 2的内容,Assignment 1比较基础,这里从略。

1.Understanding word2vec

(a)

注意只有当$w= o$时,我们才有$y_w =1$,其余情形均为$0$,所以

(b)

所以

(c)

(d)

计算雅克比矩阵可得

所以

(e)

注意$o$不属于$\lbrace 1,\ldots ,K \rbrace $

原始的损失函数中需要求指数和,很容易溢出,要进行处理,但是负采样的损失函数就没有这个问题。

(f)

2.Implementing word2vec

(a)

sigmoid

def sigmoid(x):
    """
    Compute the sigmoid function for the input here.
    Arguments:
    x -- A scalar or numpy array.
    Return:
    s -- sigmoid(x)
    """

    ### YOUR CODE HERE
    s = 1 / (1 + np.exp(-x))

    ### END YOUR CODE

    return s

naiveSoftmaxLossAndGradient

注意这里的矩阵是第一题矩阵的转置。

### YOUR CODE HERE

### Please use the provided softmax function (imported earlier in this file)
### This numerically stable implementation helps you avoid issues pertaining
### to integer overflow. 
'''
    centerWordVec: 1 * d
    outsideVectors: n * d
    '''
#1 * n
vec = centerWordVec.dot(outsideVectors.T)
#1 * n
prob = softmax(vec)
loss = -np.log(prob[outsideWordIdx])
#1 * d
gradCenterVec = -outsideVectors[outsideWordIdx] + prob.dot(outsideVectors)
#n * d
gradOutsideVecs = prob.reshape(-1, 1).dot(centerWordVec.reshape(1, -1))
#n * d
gradOutsideVecs[outsideWordIdx] -= centerWordVec
### END YOUR CODE

negSamplingLossAndGradient

### YOUR CODE HERE

### Please use your implementation of sigmoid in here.
'''
centerWordVec: 1 * d
outsideVectors: n * d
'''
#1 * m
vec = centerWordVec.dot(outsideVectors[indices].T)
vec[1:] *= -1
sig = sigmoid(vec)
tmp = np.log(sig)
loss = -tmp[0] - np.sum(tmp[1:])
#1 * m
t1 = 1 - sig
gradCenterVec = t1.dot(outsideVectors[indices]) - 2 * t1[0] * outsideVectors[outsideWordIdx]
#累加
gradOutsideVecs = np.zeros_like(outsideVectors)
gradOutsideVecs[outsideWordIdx] += -t1[0] * centerWordVec
for i in range(K):
    k = negSampleWordIndices[i]
    gradOutsideVecs[k] += t1[i + 1] * centerWordVec
### END YOUR CODE

(b)

### YOUR CODE HERE
loss, grad = f(x)
x -= step * grad

### END YOUR CODE