torch中Parameter和tensor的区别

参考资料：

https://blog.csdn.net/hei653779919/article/details/106648928
https://discuss.pytorch.org/t/why-model-to-device-wouldnt-put-tensors-on-a-custom-layer-to-the-same-device/17964/4

最近使用pytorch时候碰到了一个Paramerter和tensor的坑，这里记录一下。

Parameter和tensor在初始化时的区别

使用torch的nn.Module时，即使初始化一些不需要计算梯度的量，也应该初始化为Parameter，因为model.to(device)是将Parameter移动到device上，但是并不会将tensor移动到device上。

下面从代码来看这点：

import torch
import torch.nn as nn

class Net(nn.Module):
	def __init__(self):
		super().__init__()
		self.w1 = torch.nn.Parameter(torch.rand(2, 3))
		self.w2 = torch.nn.Parameter(torch.rand(2, 3), requires_grad=False)
		self.w3 = torch.rand(2, 3)

	def forward(self, i):
		if i == 0:
			x = torch.rand(2, 3)
			self.w1 = x + self.w1

			return self.w1
		elif i == 1:
			x = torch.rand(2, 3)
			self.w1 += x

			return self.w1
		elif i == 2:
			x = torch.rand(2, 3).to(self.w1)
			self.w1 += x

			return self.w1
		elif i == 3:
			x = torch.rand(2, 3).to(self.w2)
			self.w2 += x

			return self.w2

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = Net().to(device)

print(net.w1.device)
print(net.w2.device)
print(net.w3.device)

结果：

cuda:0
cuda:0
cpu

Parameter和tensor相加时的类型转换

import torch

#### test1
x = torch.rand(2, 3)
y = torch.rand(2, 3)
z = torch.nn.Parameter(x, requires_grad=False)
print(type(z))
print(type(y + z))

#### test2
a = torch.rand(2, 3)
b = torch.rand(2, 3)
c = torch.nn.Parameter(a, requires_grad=False)
print(type(c))
c += b
print(type(c))

实验结果：

<class 'torch.nn.parameter.Parameter'>
<class 'torch.Tensor'>
<class 'torch.nn.parameter.Parameter'>
<class 'torch.nn.parameter.Parameter'>

类型小结：

parameter + tensor = tensor
parameter += tensor, parameter类型不变

示例

回到之前的例子，进行如下测试：

'''
class Net(nn.Module):
	def __init__(self):
		super().__init__()
		self.w1 = torch.nn.Parameter(torch.rand(2, 3))
		self.w2 = torch.nn.Parameter(torch.rand(2, 3), requires_grad=False)
		self.w3 = torch.rand(2, 3)

	def forward(self, i):
		if i == 0:
			x = torch.rand(2, 3)
			self.w1 = x + self.w1

			return self.w1
		elif i == 1:
			x = torch.rand(2, 3)
			self.w1 += x

			return self.w1
		elif i == 2:
			x = torch.rand(2, 3).to(self.w1)
			self.w1 += x

			return self.w1
		elif i == 3:
			x = torch.rand(2, 3).to(self.w2)
			self.w2 += x

			return self.w2
'''

for i in range(4):
	try:
		print(net.forward(i))
	except Exception as e:
		print(e)

结果：

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
a leaf Variable that requires grad is being used in an in-place operation.
a leaf Variable that requires grad is being used in an in-place operation.
Parameter containing:
tensor([[1.5199, 1.1808, 1.5584],
        [0.7060, 1.4738, 1.3582]], device='cuda:0')

解释：

x在cpu上，self.w1在cuda上；
对需要计算梯度的参数使用in-place操作（会产生错误）；
同2；
对不需要计算梯度的参数使用in-place操作；