CS224N Natural Language Processing with Deep Learning Assignment 3

课程主页:https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/

视频地址:https://www.bilibili.com/video/av46216519?from=search&seid=13229282510647565239

这里回顾CS224N Assignment 3的内容,参考资料如下:

https://github.com/ZacBi/CS224n-2019-solutions

1.Machine Learning & Neural Networks

(a)

(i)

该更新方式实际上计算了梯度的加权和,所以不会变化太大;低方差可以减少震荡的情形。

(ii)

梯度小的地方会得到更大的更新,梯度大的地方会得到更小的更新,该方法使得各个方向的更新幅度比较接近,可以减少震荡的情形。

(b)

(i)

所以

(ii)

训练时使用dropout是为了探索更多网络结构,增加模型的泛化性;评估时需要一个准确的结果,所以不使用dropout。

2. Neural Transition-Based Dependency Parsing

(a)

Stack Buffer New dependency Transition
[ROOT] [I, parsed, this, sentence, correctly] Initial Con guration
[ROOT, I] [parsed, this, sentence, correctly] SHIFT
[ROOT, I, parsed] [this, sentence, correctly] SHIFT
[ROOT, parsed] [this, sentence, correctly] parsed$\to$I LEFT-ARC
[ROOT, parsed,this] [sentence, correctly] SHIFT
[ROOT, parsed,this,sentence] [correctly] SHIFT
[ROOT, parsed,sentence] [correctly] sentence$\to$this LEFT-ARC
[ROOT, parsed] [correctly] parsed$\to$sentence RIGHT-ARC
[ROOT, parsed,correctly] [] SHIFT
[ROOT, parsed] [] parsed$\to$correctly RIGHT-ARC
[ROOT] [] ROOT$\to$parsed RIGHT-ARC

(b)

$O(n)$,因为一共要Shift $n$次,然后生成ARC也要$n$次。

(c)

init

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
### YOUR CODE HERE (3 Lines)
### Your code should initialize the following fields:
### self.stack: The current stack represented as a list with the top of the stack as the
### last element of the list.
### self.buffer: The current buffer represented as a list with the first item on the
### buffer as the first item of the list
### self.dependencies: The list of dependencies produced so far. Represented as a list of
### tuples where each tuple is of the form (head, dependent).
### Order for this list doesn't matter.
###
### Note: The root token should be represented with the string "ROOT"
###
self.stack = ["ROOT"]
self.buffer = copy.deepcopy(sentence)
self.dependencies = []


### END YOUR CODE

parse step

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
### YOUR CODE HERE (~7-10 Lines)
### TODO:
### Implement a single parsing step, i.e. the logic for the following as
### described in the pdf handout:
### 1. Shift
### 2. Left Arc
### 3. Right Arc
if transition == "S":
word = self.buffer.pop(0)
self.stack.append(word)
elif transition == "LA":
self.dependencies.append((self.stack[-1], self.stack[-2]))
self.stack.pop(-2)
else:
self.dependencies.append((self.stack[-2], self.stack[-1]))
self.stack.pop(-1)

### END YOUR CODE

(d)

minibatch parse

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
### YOUR CODE HERE (~8-10 Lines)
### TODO:
### Implement the minibatch parse algorithm as described in the pdf handout
###
### Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g.
### unfinished_parses = partial_parses[:].
### Here `unfinished_parses` is a shallow copy of `partial_parses`.
### In Python, a shallow copied list like `unfinished_parses` does not contain new instances
### of the object stored in `partial_parses`. Rather both lists refer to the same objects.
### In our case, `partial_parses` contains a list of partial parses. `unfinished_parses`
### contains references to the same objects. Thus, you should NOT use the `del` operator
### to remove objects from the `unfinished_parses` list. This will free the underlying memory that
### is being accessed by `partial_parses` and may cause your code to crash.
partial_parses = [PartialParse(sentence) for sentence in sentences]
unfinished_parses = partial_parses[:]
n = len(unfinished_parses)

while (n > 0):
l = min(n, batch_size)
transitions = model.predict(unfinished_parses[:l])
for parse, trans in zip(unfinished_parses[:l], transitions):
parse.parse_step(trans)
if len(parse.stack) == 1:
unfinished_parses.remove(parse)
n -= 1
dependencies = [partial_parses.dependencies for partial_parses in partial_parses]

### END YOUR CODE

(e)

init

1
2
3
4
5
self.embed_to_hidden = nn.Linear(self.n_features * self.embed_size, self.hidden_size)
nn.init.xavier_uniform_(self.embed_to_hidden.weight)
self.dropout = nn.Dropout(self.dropout_prob)
self.hidden_to_logits = nn.Linear(self.hidden_size, self.n_classes)
nn.init.xavier_uniform_(self.hidden_to_logits.weight)

embedding_lookup

1
2
x = self.pretrained_embeddings(t)
x = x.view(x.size()[0], -1)

forward

1
2
3
4
5
embeddings = self.embedding_lookup(t)
hidden = self.embed_to_hidden(embeddings)
hidden = nn.ReLU()(hidden)
hidden = self.dropout(hidden)
logits = self.hidden_to_logits(hidden)

train

1
2
optimizer = optim.Adam(parser.model.parameters(), lr=lr)
loss_func = nn.CrossEntropyLoss()

train_for_epoch

1
2
3
4
logits = parser.model.forward(train_x)
loss = loss_func(logits, train_y)
loss.backward()
optimizer.step()

计算结果如下

dev UAS: 88.38

test UAS: 88.90

(f)

略过

本文标题:CS224N Natural Language Processing with Deep Learning Assignment 3

文章作者:Doraemonzzz

发布时间:2020年03月29日 - 15:18:00

最后更新:2020年03月29日 - 15:44:48

原始链接:http://doraemonzzz.com/2020/03/29/CS224N Natural Language Processing with Deep Learning Assignment 3/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。