Pytorch Notes

1 Tensor

Use Pytorch as Numpy

GPU-accelerated Tensor & dynamic-build network

MAIN-COMPONENT is Tensor

import tensor
import numpy as np
# CREATE a NUMPY ndarray
numpy_tensor = np.random.randn(10, 20)

Return : ndarray or float

A (d0, d1, ..., dn)-shaped array of floating-point samples from the standard normal distribution, or a single such float if no parameters were supplied.

1.1 CONVERT NUMPY to TENSOR

# CONVERT NUMPY to TENSOR
pytorch_tensor1 = torch.Tensor(numpy_tensor)
pytorch_tensor2 = torch.from_numpy(numpy_tensor)

1.2 CONVERT TENSOR to NUMPY

# CONVERT TENSOR to NUMPY
# absolutely add 'dot-numpy()'
# IF cpu
numpy_array = pytorch_tensor1.numpy()
# IF gpu
numpy_array = pytorch_tensor2.cpu().numpy()

Note

Tensor on the GPU cannot be directly converted to NumPy ndarray and needs to be used . cpu()first transfers Tensor on the GPU to the CPU.

1.3 Tensor on GPU

put Tensor on GPU

# VERSION1 : def CUDA data type [default]先定义dtype,然后再传入
dtype = torch.cuda.FloatTensor 
gpu_tensor = torch.randn(10, 20).type(dtype)
# VERSION2: SIMPLE & POPULAR
gpu_tensor = torch.randn(10, 20).cuda(0) # tensor on GPU1
gpu_tensor = torch.randn(10, 20).cuda(1) # tensor on GPU2

fetch back on cpu

# fetch back on cpu
cpu_tensor = gpu_tensor.cpu()

1.4 Tensor Attribute

Size

print(pytorch_tensor1.shape)  # Attention
print(pytorch_tensor2.size())

>>> torch.Size([10, 20])
    torch.Size([10, 20])

Type

print(pytorch_tensor1.type()) 

>>> torch.cuda.FloatTensor

Dimension

print(pytorch_tensor1.dim())

>>> 2

Number

print(pytorch_tensor1.numel())

>>> 200

Try

tensor_init = torch.randn((3, 2))
tensor = tensor_init.type(torch.DoubleTensor)
x_array = tensor.numpy()
print(x_array.dtype)

>>> float64

1.5 Tensor Operation

Just like Numpy

torch.ones

x = torch.ones(2, 2)
print(x) # float tensor

>>>  
tensor([[1., 1.],
        [1., 1.]])

type()

print(x.type())

>>> torch.FloatTensor

long() torch.LongTensor

x = x.long()
# x = x.type(torch.LongTensor)
print(x)

>>>
tensor([[1, 1],
        [1, 1]])

float() torch.FloatTensor

x = x.float()
# x = x.type(torch.FloatTensor)
print(x)

torch.randn(a, b)

x = torch.randn(4, 3)
print(x)

>>>
tensor([[ 0.3291,  2.2839, -0.2401],
        [ 0.5324,  0.9681,  0.2163],
        [ 0.6263, -0.3329,  1.6206],
        [ 0.5429, -1.8231, -1.1917]])

torch.max

max_value, max_idx = torch.max(x, dim=1)
print(max_value)
print(max_idx)

>>>
tensor([2.2839, 0.9681, 1.6206, 0.5429])
tensor([1, 1, 2, 0])

dim = 1 per row biggest 每一行最大值

dim = 0 per categories biggest 每一列最大值

torch.sum

sum_x = torch.sum(x, dim=1)
print(sum_x)

>>>
tensor([ 2.3730,  1.7168,  1.9140, -2.4719])1

torch.unsqueeze

x.unsqueeze(i) # the ist dimension plus

x.squeeze(i) # the ist dimension reduce

x = x.squeeze() # 􏱝1-D tensor 􏰝􏱁􏰰􏰄􏰘􏲡􏹢􏹣􏱊􏹤􏹥is completely removed

tensor([[-0.0255,  1.3384,  0.5698],
        [ 0.5936, -0.1986,  1.3338],
        [-1.6849,  0.3457,  1.9582],
        [ 1.0653, -0.9994,  0.0824]])
print(x.shape) # torch.Size([4, 3])
x = x.unsqueeze(0) # the 1st dim plus 
# torch.Size([1, 4, 3])
print(x)

>>>
tensor([[[-0.0255,  1.3384,  0.5698],
         [ 0.5936, -0.1986,  1.3338],
         [-1.6849,  0.3457,  1.9582],
         [ 1.0653, -0.9994,  0.0824]]])
x = x.unsqueeze(1) # the 2st dim plus
# torch.Size([1, 1, 4, 3])

permute & transpose

Permute can rearrange the dimensions of tensor

Transpose exchanges two dimensions in tensor

x = torch.randn(3, 4, 5) # torch.Size([3, 4, 5])

# Dimensional exchange
x = x.permute(1, 0, 2) # torch.Size([4, 3, 5])

x = x.transpose(0, 2) # torch.Size([5, 3, 4])

view

‘view’ to reshape Tensor

x = torch.randn(3, 4, 5)
print(x.shape)

>>>
torch.Size([3, 4, 5])

x = x.view(-1, 5) # torch.Size([12, 5])

x = x.view(3, 20) # torch.Size([3, 20])

add

torch.add(x, y)

x = torch.randn(3, 4)
y = torch.randn(3, 4)

# add two Tensor
z = x + y
# z = torch.add(x, y)

inplace

Review the previous

print(x.shape) # torch.Size([4, 3])
x = x.unsqueeze(0) # the 1st dim plus 
# torch.Size([1, 4, 3])
x = torch.ones(3, 3) # torch.Size([3, 3])

# INPLACE TO unsqueeze 
x.unsqueeze_(0) # x = x.unsqueeze(0) torch.Size([1, 3, 3])

# INPLACE TO inplace
x.transpose_(1, 0) # torch.Size([3, 1, 3])

X = torch.ones(3, 3)
Y = torch.ones(3, 3)
X.add_(Y) # X = X + Y || X = torch.add(X, Y)

Try

Create a float32, 4 x 4 all-one matrix, and modify the matrix in the middle of the matrix 2 x 2, all to 2

$$\left[ \begin{array} { l l l l } { 1 } & { 1 } & { 1 } & { 1 } \\ { 1 } & { 2 } & { 2 } & { 1 } \\ { 1 } & { 2 } & { 2 } & { 1 } \\ { 1 } & { 1 } & { 1 } & { 1 } \end{array} \right]$$
x = torch.ones(4, 4)
x[1:3, 1: 3] = 2

2 Variable

Tensor is a perfect component in Pytorch, but building a neural network is not enough. We need a tensor that can build a computational graph. This is Variable. Variable is a wrapper around tensor, and the operation is the same as tensor, but each Variable has three properties, tensor in Variable .data itself, gradient corresponding to tensor .grad, and how this Variable is obtained.grad_fn

Variable-properties

import torch
import numpy as np
from torch.autograd import Variable
x_tensor = torch.randn(10, 5)
y_tensor = torch.randn(10, 5)

2.1 CONVERT TENSOR to Variable

The default variable does not need to be gradient, so we use this method to declare that we need to find the gradient

# default variable does not need to be gradient, 
# so we use this method to declare that we need to find the gradient
x = Variable(x_tensor, requires_grad = True)
y = Variable(y_tensor, requires_grad = True)

z = torch.sum(x + y) # Attention: elements-wise plus
print('z', z)
# Variable.data
print('\nz.data', z.data)
# Variable.grad_fn
print('\nz.grad_fn', z.grad_fn)
z tensor(-0.2191, grad_fn=<SumBackward0>)

z.data tensor(-0.2191)

z.grad_fn <SumBackward0 object at 0x7f5e8a084d30>

2.2 backward automatic derivation

# Find the gradient of x and y  求 x 和 y 的梯度
z.backward()

print(x.grad)
print(y.grad)
# Using the automatic derivation mechanism provided by PyTorch
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

2.3 Try

Try to construct a function y= x2 and then find the derivative of x=2

import matplotlib.pyplot as plt
x = np.arange(-3, 3.01, 0.1) # np.arange(BEGIN, END, STEP)
y = x**2
plt.plot(x, y)
plt.plot(2, 4, 'ro') # 'red' & 'o'
[<matplotlib.lines.Line2D at 0x7f5e8495afd0>]

output

x = Variable(torch.FloatTensor([2]), requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)
tensor([2.], requires_grad=True)

2.4 Appendix - view

b = torch.arange(4 * 5 * 6).view(4, 5, 6)
b
tensor([[[  0,   1,   2,   3,   4,   5],
         [  6,   7,   8,   9,  10,  11],
         [ 12,  13,  14,  15,  16,  17],
         [ 18,  19,  20,  21,  22,  23],
         [ 24,  25,  26,  27,  28,  29]],

        [[ 30,  31,  32,  33,  34,  35],
         [ 36,  37,  38,  39,  40,  41],
         [ 42,  43,  44,  45,  46,  47],
         [ 48,  49,  50,  51,  52,  53],
         [ 54,  55,  56,  57,  58,  59]],

        [[ 60,  61,  62,  63,  64,  65],
         [ 66,  67,  68,  69,  70,  71],
         [ 72,  73,  74,  75,  76,  77],
         [ 78,  79,  80,  81,  82,  83],
         [ 84,  85,  86,  87,  88,  89]],

        [[ 90,  91,  92,  93,  94,  95],
         [ 96,  97,  98,  99, 100, 101],
         [102, 103, 104, 105, 106, 107],
         [108, 109, 110, 111, 112, 113],
         [114, 115, 116, 117, 118, 119]]])

3 automatic derivation

Automated derivation is a very important feature in PyTorch, which allows us to avoid manually calculating very complex derivatives, which can greatly reduce the time we build the model, which is not a feature of its predecessor, the Torch framework.

import torch
# torch.autograd
from torch.autograd import Variable

3.1 Simple Cases

The “simple” embodiment of the calculation is a scalar, that is, a number, we automatically derive this scalar.

“简单“体现在计算的结果都是标量,也就是一个数,我们对这个标量进行自动求导。

Verify Function of Automatic Derivation

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x + 2
z = y ** 2 + 3
print(z)
tensor([19.], grad_fn=<AddBackward0>)

Through the above column operations, we get the final result out, representing it as a mathematical formula$$z = ( x + 2 ) ^ { 2 } + 3$$
Then the result of our derivation from z to x is$$\frac{ \partial z } { \partial x } = 2 ( x + 2 ) = 2 ( 2 + 2 ) = 8$$

# Use automatic derivation
z.backward()
print(x.grad) # BACKWARD to Z, GRAD to x
tensor([8.])

Convenient Right?

More Complicated example:

x = Variable(torch.randn(10, 20), requires_grad=True)
y = Variable(torch.randn(10, 5), requires_grad=True)
w = Variable(torch.randn(20, 5), requires_grad=True)

out = torch.mean(y - torch.matmul(x, w)) # torch.matmul -- Matrix multiple
out.backward()
# get the gradient of x y w
print('x.grad', x.grad)
print('\ny.grad', y.grad)
print('\nw.grad', w.grad)
x.grad tensor([[ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152],
        [ 0.0633,  0.0355, -0.0060, -0.0336, -0.0119,  0.0798,  0.0388,  0.0087,
         -0.0505, -0.0557,  0.0231, -0.0929,  0.0838, -0.0613, -0.0386, -0.0656,
         -0.0167, -0.0023,  0.0108, -0.0152]])

y.grad tensor([[0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200]])

w.grad tensor([[ 0.0055,  0.0055,  0.0055,  0.0055,  0.0055],
        [ 0.0488,  0.0488,  0.0488,  0.0488,  0.0488],
        [-0.0287, -0.0287, -0.0287, -0.0287, -0.0287],
        [-0.0473, -0.0473, -0.0473, -0.0473, -0.0473],
        [ 0.0163,  0.0163,  0.0163,  0.0163,  0.0163],
        [-0.0765, -0.0765, -0.0765, -0.0765, -0.0765],
        [ 0.0227,  0.0227,  0.0227,  0.0227,  0.0227],
        [-0.0443, -0.0443, -0.0443, -0.0443, -0.0443],
        [ 0.0003,  0.0003,  0.0003,  0.0003,  0.0003],
        [ 0.0408,  0.0408,  0.0408,  0.0408,  0.0408],
        [-0.0221, -0.0221, -0.0221, -0.0221, -0.0221],
        [-0.0329, -0.0329, -0.0329, -0.0329, -0.0329],
        [ 0.0009,  0.0009,  0.0009,  0.0009,  0.0009],
        [-0.0223, -0.0223, -0.0223, -0.0223, -0.0223],
        [-0.0301, -0.0301, -0.0301, -0.0301, -0.0301],
        [-0.0046, -0.0046, -0.0046, -0.0046, -0.0046],
        [ 0.0628,  0.0628,  0.0628,  0.0628,  0.0628],
        [-0.0885, -0.0885, -0.0885, -0.0885, -0.0885],
        [-0.0456, -0.0456, -0.0456, -0.0456, -0.0456],
        [-0.0091, -0.0091, -0.0091, -0.0091, -0.0091]])

It is very convenient to solve the problem of network update with automatic derivation.

3.2 Complex Situations

m = Variable(torch.FloatTensor([[2, 3]]), requires_grad=True) # build one 1x2 MATRICX
n = Variable(torch.ones(1, 2)) # build one same ZERO MATRICS
print(m)
print(n)
tensor([[2., 3.]], requires_grad=True)
tensor([[1., 1.]])
# update the data in n from m
n[0, 0] = m[0, 0] ** 2
n[0, 1] = m[0, 1] ** 3
print(n)
tensor([[ 4., 27.]], grad_fn=<CopySlices>)

convert above equation to mathematical formula

$$n = \left( n _ { 0 } , n _ { 1 } \right) = \left( m _ { 0 } ^ { 2 } , m _ { 1 } ^ { 3 } \right) = \left( 2 ^ { 2 } , 3 ^ { 3 } \right)$$

Following will directly begin back-propagation of n ,which is finding the derivative of n to m.

$$\frac { \partial n } { \partial m } = \frac { \partial \left( n _ { 0 } , n _ { 1 } \right) } { \partial \left( m _ { 0 } , m _ { 1 } \right) }$$

In Pytorch, if you want to call auto-derivation, you need to pass a parameter to backward(). The shape of this parameter is the same as n, for example $\left( w _ { 0 } , w _ { 1 } \right)$, then the result of auto-derivation is:

$$
\begin{array} { c } { \frac { \partial n } { \partial m _ { 0 } } = w _ { 0 } \frac { \partial n _ { 0 } } { \partial m _ { 0 } } + w _ { 1 } \frac { \partial n _ { 1 } } { \partial m _ { 0 } } } \\ { \frac { \partial n } { \partial m _ { 1 } } = w _ { 0 } \frac { \partial n _ { 0 } } { \partial m _ { 1 } } + w _ { 1 } \frac { \partial n _ { 1 } } { \partial m _ { 1 } } } \end{array}
$$

n.backward(torch.ones_like(n)) # (w0, w1) -> (1, 1)
print(m.grad)
tensor([[ 4., 27.]])

By automatically deriving we got the gradients 4 and 27, we can check it out

$$\begin{array} { c } { \frac { \partial n } { \partial m _ { 0 } } = w _ { 0 } \frac { \partial n _ { 0 } } { \partial m _ { 0 } } + w _ { 1 } \frac { \partial n _ { 1 } } { \partial m _ { 0 } } = 2 m _ { 0 } + 0 = 2 \times 2 = 4 } \\ { \frac { \partial n } { \partial m _ { 1 } } = w _ { 0 } \frac { \partial n _ { 0 } } { \partial m _ { 1 } } + w _ { 1 } \frac { \partial n _ { 1 } } { \partial m _ { 1 } } = 0 + 3 m _ { 1 } ^ { 2 } = 3 \times 3 ^ { 2 } = 27 } \end{array}$$

By checking we can get the same result

3.2 Multiple automatic derivation

By calling backward we can do an automatic derivation. If we call backward again, we will find that the program reports an error and there is no way to do it again. This is because PyTorch defaults to an automatic derivation and the calculation graph is discarded, so two automatic derivations require manual setting of one thing.

通过调用 backward 我们可以进行一次自动求导,如果我们再调用一次 backward,会发现程序报错,没有办法再做一次。这是因为 PyTorch 默认做完一次自动求导之后,计算图就被丢弃了,所以两次自动求导需要手动设置一个东西。

x = Variable(torch.FloatTensor([3]), requires_grad=True)
y = x * 2 + x ** 2 + 3
print(y)
tensor([18.], grad_fn=<AddBackward0>)
# retain_graph = True
y.backward(retain_graph=True) # Set retain_graph to True to save the graph
print(x.grad)
tensor([8.])
y.backward() # once more Automatic derivation, this time no saving
print(x.grad)
tensor([16.])

It can be seen that the gradient of x becomes 16, because there are two automatic derivations, so the first gradient 8 and the second gradient 8 add up to a result of 16.

Try

Define
$$\begin{array} { c } { x = \left[ \begin{array} { l } { x _ { 0 } } \\ { x _ { 1 } } \end{array} \right] = \left[ \begin{array} { l } { 2 } \\ { 3 } \end{array} \right] } \\ { k = \left( k _ { 0 } , k _ { 1 } \right) = \left( x _ { 0 } ^ { 2 } + 3 x _ { 1 } , 2 x _ { 0 } + x _ { 1 } ^ { 2 } \right) } \end{array}$$
Hope to get
$$j = \left[ \begin{array} { c c } { \frac { \partial k _ { 0 } } { \partial x _ { 0 } } } & { \frac { \partial k _ { 0 } } { \partial x _ { 1 } } } \\ { \frac { \partial k _ { 1 } } { \partial x _ { 0 } } } & { \frac { \partial k _ { 1 } } { \partial x _ { 1 } } } \end{array} \right]$$

x = Variable(torch.FloatTensor([2, 3]), requires_grad=True)
k = Variable(torch.zeros(2))

k[0] = x[0] ** 2 + 3 * x[1]
k[1] = x[1] ** 2 + 3 * x[0]
print(k)
tensor([13., 15.], grad_fn=<CopySlices>)
b = torch.arange(4 * 5 * 6)
b
tensor([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
         14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,
         28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,
         42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,
         56,  57,  58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,
         70,  71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,
         84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,
         98,  99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
        112, 113, 114, 115, 116, 117, 118, 119])

4 Dynamic and static graph

At present, the neural network framework is divided into a static graph framework and a dynamic graph framework. The biggest difference between PyTorch and TensorFlow, Caffe and other frameworks is that they have different computational graph representations. TensorFlow uses static graphs, which means that we first define the computation graph and then use it continuously, and in PyTorch, we rebuild a new computation graph each time.

目前神经网络框架分为静态图框架和动态图框架,PyTorch 和 TensorFlow、Caffe 等框架最大的区别就是他们拥有不同的计算图表现形式。TensorFlow 使用静态图,这意味着我们先定义计算图,然后不断使用它,而在 PyTorch 中,每次都会重新构建一个新的计算图。通过这次课程,我们会了解静态图和动态图之间的优缺点。

For the user, there are very big differences between the two forms of calculation graphs. At the same time, static graphs and dynamic graphs have their own advantages. For example, dynamic graphs are more convenient for debugging, and users can debug in any way they like. At the same time, it is very intuitive, and the static graph is created by first defining and then running. When it is run again, it is no longer necessary to rebuild the calculation graph, so the speed will be faster than the dynamic graph.

对于使用者来说,两种形式的计算图有着非常大的区别,同时静态图和动态图都有他们各自的优点,比如动态图比较方便 debug,使用者能够用任何他们喜欢的方式进行 debug,同时非常直观,而静态图是通过先定义后运行的方式,之后再次运行的时候就不再需要重新构建计算图,所以速度会比动态图更快。

torch-graph

Compare the definition of the while loop statement in Tensorfow and Pytorch

TensorFlow

# tensorflow
import tensorflow as tf

first_counter = tf.constant(0)
second_counter = tf.constant(10)

def cond(first_counter, second_counter, *args):
  return first_counter < second_counter

def body(first_counter, second_counter):
  first_counter = tf.add(first_counter, 2)
  second_counter = tf.add(second_counter, 1)
  return first_counter, second_counter
c1, c2 = tf.while_loop(cond, body, [first_counter, second_counter])
with tf.Session() as sess:
    counter_1_res, counter_2_res = sess.run([c1, c2])
print(counter_1_res)
print(counter_2_res)
>>> 
 20 20

You can see that TensorFlow needs to build the whole graph into a static one. In other words, the graph is the same every time it is run, it can’t be changed, so you can’t directly use Python’s while loop statement, you need to use the helper function tf. While_ loop is written as the internal form of TensorFlow

This is very counterintuitive and the learning cost is relatively high.

Let’s take a look at PyTorch’s dynamic graph mechanism, which allows us to use Python’s while write loop, which is very convenient.

可以看到 TensorFlow 需要将整个图构建成静态的,换句话说,每次运行的时候图都是一样的,是不能够改变的,所以不能直接使用 Python 的 while 循环语句,需要使用辅助函数tf. While_ loop 写成 TensorFlow 内部的形式

这是非常反直觉的,学习成本也是比较高的

下面我们来看看 PyTorch 的动态图机制,这使得我们能够使用 Python 的 while 写循环,非常方便

Pytorch

 # pytorch
import torch
first_counter = torch.Tensor([0])
second_counter = torch.Tensor([10])
 while (first_counter < second_counter)[0]:
    first_counter += 2
    second_counter += 1
print(first_counter)
print(second_counter)
>>>
tensor([20.])
tensor([20.])

You can see that PyTorch is written in exactly the same way as Python, without any additional learning costs.The above example shows how to build a while loop using static and dynamic graphs. It seems that the dynamic graph is simpler and more intuitive.

可以看到 PyTorch 的写法跟 Python 的写法是完全一致的,没有任何额外的学习成本

上面的例子展示如何使用静态图和动态图构建 while 循环,看起来动态图的方式更加简单且直观


   Reprint policy


《Pytorch Notes》 by David Qiao is licensed under a Creative Commons Attribution 4.0 International License
  TOC