参考资料:

最近使用pytorch时候碰到了一个Paramerter和tensor的坑, 这里记录一下。

Parameter和tensor在初始化时的区别

使用torch的nn.Module时,即使初始化一些不需要计算梯度的量,也应该初始化为Parameter,因为model.to(device)是将Parameter移动到device上,但是并不会将tensor移动到device上。

下面从代码来看这点:

import torch
import torch.nn as nn

class Net(nn.Module):
	def __init__(self):
		super().__init__()
		self.w1 = torch.nn.Parameter(torch.rand(2, 3))
		self.w2 = torch.nn.Parameter(torch.rand(2, 3), requires_grad=False)
		self.w3 = torch.rand(2, 3)

	def forward(self, i):
		if i == 0:
			x = torch.rand(2, 3)
			self.w1 = x + self.w1

			return self.w1
		elif i == 1:
			x = torch.rand(2, 3)
			self.w1 += x

			return self.w1
		elif i == 2:
			x = torch.rand(2, 3).to(self.w1)
			self.w1 += x

			return self.w1
		elif i == 3:
			x = torch.rand(2, 3).to(self.w2)
			self.w2 += x

			return self.w2

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = Net().to(device)

print(net.w1.device)
print(net.w2.device)
print(net.w3.device)

结果:

cuda:0
cuda:0
cpu

Parameter和tensor相加时的类型转换

import torch

#### test1
x = torch.rand(2, 3)
y = torch.rand(2, 3)
z = torch.nn.Parameter(x, requires_grad=False)
print(type(z))
print(type(y + z))

#### test2
a = torch.rand(2, 3)
b = torch.rand(2, 3)
c = torch.nn.Parameter(a, requires_grad=False)
print(type(c))
c += b
print(type(c))

实验结果:

<class 'torch.nn.parameter.Parameter'>
<class 'torch.Tensor'>
<class 'torch.nn.parameter.Parameter'>
<class 'torch.nn.parameter.Parameter'>

类型小结:

  • parameter + tensor = tensor
  • parameter += tensor, parameter类型不变

示例

回到之前的例子,进行如下测试:

'''
class Net(nn.Module):
	def __init__(self):
		super().__init__()
		self.w1 = torch.nn.Parameter(torch.rand(2, 3))
		self.w2 = torch.nn.Parameter(torch.rand(2, 3), requires_grad=False)
		self.w3 = torch.rand(2, 3)

	def forward(self, i):
		if i == 0:
			x = torch.rand(2, 3)
			self.w1 = x + self.w1

			return self.w1
		elif i == 1:
			x = torch.rand(2, 3)
			self.w1 += x

			return self.w1
		elif i == 2:
			x = torch.rand(2, 3).to(self.w1)
			self.w1 += x

			return self.w1
		elif i == 3:
			x = torch.rand(2, 3).to(self.w2)
			self.w2 += x

			return self.w2
'''

for i in range(4):
	try:
		print(net.forward(i))
	except Exception as e:
		print(e)

结果:

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
a leaf Variable that requires grad is being used in an in-place operation.
a leaf Variable that requires grad is being used in an in-place operation.
Parameter containing:
tensor([[1.5199, 1.1808, 1.5584],
        [0.7060, 1.4738, 1.3582]], device='cuda:0')

解释:

  1. x在cpu上,self.w1在cuda上;
  2. 对需要计算梯度的参数使用in-place操作(会产生错误);
  3. 同2;
  4. 对不需要计算梯度的参数使用in-place操作;