Pytorch v0.4.1 Release Notes
Release Date: 2018-07-26 // almost 6 years ago-
Table of Contents
- ๐ฅ Breaking Changes
- ๐ New Features
- Neural Networks
- Adaptive Softmax, Spectral Norm, etc.
- Operators
- torch.bincount, torch.as_tensor, ...
- torch.distributions
- Half Cauchy, Gamma Sampling, ...
- Other
- Automatic anomaly detection (detecting NaNs, etc.)
- ๐ Performance
- Faster CPU ops in a wide variety of cases
- Other improvements
- ๐ Bug Fixes
- ๐ Documentation Improvements
๐ฅ Breaking Changes
- ๐
torch.stft
has changed its signature to be consistent with librosa #9497- Before:
stft(signal, frame_length, hop, fft_size=None, normalized=False, onesided=True, window=None, pad_end=0)
- After:
stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)
torch.stft
is also now using FFT internally and is much faster.
- Before:
- ๐
torch.slice
is removed in favor of the tensor slicing notation #7924 - 0๏ธโฃ
torch.arange
now does dtype inference: any floating-point argument is inferred to be the defaultdtype
; all integer arguments are inferred to beint64
. #7016 - ๐
torch.nn.functional.embedding_bag
's old signature embedding_bag(weight, input, ...) is deprecated, embedding_bag(input, weight, ...) (consistent with torch.nn.functional.embedding) should be used instead - ๐
torch.nn.functional.sigmoid
andtorch.nn.functional.tanh
are deprecated in favor oftorch.sigmoid
andtorch.tanh
#8748 - Broadcast behavior changed in an (very rare) edge case:
[1] x [0]
now broadcasts to[0]
(used to be[1]
) #9209
๐ New Features
Neural Networks
๐ Adaptive Softmax
nn.AdaptiveLogSoftmaxWithLoss
#5287\>\>\> in\_features = 1000\>\>\> n\_classes = 200\>\>\> adaptive\_softmax = nn.AdaptiveLogSoftmaxWithLoss(in\_features, n\_classes, cutoffs=[20, 100, 150])\>\>\> adaptive\_softmax AdaptiveLogSoftmaxWithLoss( (head): Linear(in\_features=1000, out\_features=23, bias=False) (tail): ModuleList( (0): Sequential( (0): Linear(in\_features=1000, out\_features=250, bias=False) (1): Linear(in\_features=250, out\_features=80, bias=False) ) (1): Sequential( (0): Linear(in\_features=1000, out\_features=62, bias=False) (1): Linear(in\_features=62, out\_features=50, bias=False) ) (2): Sequential( (0): Linear(in\_features=1000, out\_features=15, bias=False) (1): Linear(in\_features=15, out\_features=50, bias=False) ) ) )\>\>\> batch = 15\>\>\> input = torch.randn(batch, in\_features)\>\>\> target = torch.randint(n\_classes, (batch,), dtype=torch.long)\>\>\> # get the log probabilities of target given input, and mean negative log probability loss\>\>\> adaptive\_softmax(input, target) ASMoutput(output=tensor([-6.8270, -7.9465, -7.3479, -6.8511, -7.5613, -7.1154, -2.9478, -6.9885, -7.7484, -7.9102, -7.1660, -8.2843, -7.7903, -8.4459, -7.2371], grad\_fn=\<ThAddBackward\>), loss=tensor(7.2112, grad\_fn=\<MeanBackward1\>))\>\>\> # get the log probabilities of all targets given input as a (batch x n\_classes) tensor\>\>\> adaptive\_softmax.log\_prob(input) tensor([[-2.6533, -3.3957, -2.7069, ..., -6.4749, -5.8867, -6.0611], [-3.4209, -3.2695, -2.9728, ..., -7.6664, -7.5946, -7.9606], [-3.6789, -3.6317, -3.2098, ..., -7.3722, -6.9006, -7.4314], ..., [-3.3150, -4.0957, -3.4335, ..., -7.9572, -8.4603, -8.2080], [-3.8726, -3.7905, -4.3262, ..., -8.0031, -7.8754, -8.7971], [-3.6082, -3.1969, -3.2719, ..., -6.9769, -6.3158, -7.0805]], grad\_fn=\<CopySlices\>)\>\>\> # predit: get the class that maximize log probaility for each input\>\>\> adaptive\_softmax.predict(input) tensor([8, 6, 6, 16, 14, 16, 16, 9, 4, 7, 5, 7, 8, 14, 3])
๐ Add spectral normalization
nn.utils.spectral_norm
#6929\>\>\> # Usage is similar to weight\_norm\>\>\> convT = nn.ConvTranspose2d(3, 64, kernel\_size=3, pad=1)\>\>\> # Can specify number of power iterations applied each time, or use default (1)\>\>\> convT = nn.utils.spectral\_norm(convT, n\_power\_iterations=2)\>\>\>\>\>\> # apply to every conv and conv transpose module in a model\>\>\> def add\_sn(m): for name, c in m.named\_children(): m.add\_module(name, add\_sn(c)) if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)): return nn.utils.spectral\_norm(m) else: return m\>\>\> my\_model = add\_sn(my\_model)
๐
nn.ModuleDict
andnn.ParameterDict
containers #8463Add
nn.init.zeros_
andnn.init.ones_
#7488โ Add sparse gradient option to pretrained embedding #7492
โ Add max pooling support to
nn.EmbeddingBag
#5725๐ Depthwise convolution support for MKLDNN #8782
โ Add
nn.FeatureAlphaDropout
(featurewise Alpha Dropout layer) #9073Operators
๐
torch.bincount
(count frequency of each value in an integral tensor) #6688\>\>\> input = torch.randint(0, 8, (5,), dtype=torch.int64)\>\>\> weights = torch.linspace(0, 1, steps=5)\>\>\> input, weights (tensor([4, 3, 6, 3, 4]), tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])\>\>\> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1])\>\>\> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])
๐
torch.as_tensor
(similar totorch.tensor
but never copies unless necessary) #7109\>\>\> tensor = torch.randn(3, device='cpu', dtype=torch.float32)\>\>\> torch.as\_tensor(tensor) # doesn't copy\>\>\> torch.as\_tensor(tensor, dtype=torch.float64) # copies due to incompatible dtype\>\>\> torch.as\_tensor(tensor, device='cuda') # copies due to incompatible device\>\>\> array = np.array([3, 4.5])\>\>\> torch.as\_tensor(array) # doesn't copy, sharing memory with the numpy array\>\>\> torch.as\_tensor(array, device='cuda') # copies due to incompatible device
๐
torch.randperm
for CUDA tensors #7606๐
nn.HardShrink
for CUDA tensors #8117๐
torch.flip
(flips a tensor along specified dims) #7873๐
torch.flatten
(flattens a contiguous range of dims) #8578๐
torch.pinverse
(computes svd-based pseudo-inverse) #9052๐
torch.meshgrid
#8581๐
torch.unique
for CUDA tensors #8899๐
torch.erfc
(complementary error function) https://github.com/pytorch/pytorch/pull/9366/files๐
torch.isinf
andtorch.isfinite
#9169 #9487๐ Support backward for target tensor in
torch.nn.functional.kl_div
#7839๐
torch.logsumexp
#7254โ Add batched linear solver to
torch.gesv
#6100๐
torch.sum
now supports summing over multiple dimensions https://github.com/pytorch/pytorch/pull/6152/files๐
torch.diagonal
torch.diagflat
to take arbitrary diagonals with numpy semantics #6718๐
tensor.any
andtensor.all
onByteTensor
can now acceptdim
andkeepdim
arguments #4627Distributions
- Half Cauchy and Half Normal #8411
- Gamma sampling for CUDA tensors #6855
- ๐ Allow vectorized counts in Binomial Distribution #6720
Misc
- ๐ Autograd automatic anomaly detection for
NaN
and errors occuring in backward. Two functions detect_anomaly and set_detect_anomaly are provided for this. #7677 - ๐ Support
reversed(torch.Tensor)
#9216 - ๐ Support
hash(torch.device)
#9246 - ๐ Support
gzip
intorch.load
#6490
๐ Performance
- Accelerate bernoulli number generation on CPU #7171
- Enable cuFFT plan caching (80% speed-up in certain cases) #8344
- ๐ Fix unnecessary copying in
bernoulli_
#8682 - ๐ Fix unnecessary copying in
broadcast
#8222 - Speed-up multidim
sum
(2x~6x speed-up in certain cases) #8992 - Vectorize CPU
sigmoid
(>3x speed-up in most cases) #8612 - โก๏ธ Optimize CPU
nn.LeakyReLU
andnn.PReLU
(2x speed-up) #9206 - ๐ Vectorize
softmax
andlogsoftmax
(4.5x speed-up on single core and 1.8x on 10 threads) #7375 - ๐ Speed up
nn.init.sparse
(10-20x speed-up) #6899
๐ Improvements
๐จ Tensor printing
- Tensor printing now includes
requires_grad
andgrad_fn
information #8211 - ๐ Improve number formatting in tensor print #7632
- ๐ Fix scale when printing some tensors #7189
- ๐จ Speed up printing of large tensors #6876
Neural Networks
NaN
is now propagated through many activation functions #8033- โ Add
non_blocking
option to nn.Module.to #7312 - Loss modules now allow target to require gradient #8460
- โ Add
pos_weight
argument tonn.BCEWithLogitsLoss
#6856 - ๐ Support
grad_clip
for parameters on different devices #9302 - โ Removes the requirement that input sequences to
pad_sequence
have to be sorted #7928 stride
argument formax_unpool1d
,max_unpool2d
,max_unpool3d
now defaults tokernel_size
#7388- Allowing calling grad mode context managers (e.g.,
torch.no_grad
,torch.enable_grad
) as decorators #7737 - โฑ
torch.optim.lr_scheduler._LRSchedulers
__getstate__
include optimizer info #7757 - Add support for accepting
Tensor
as input inclip_grad_*
functions #7769 - Return
NaN
inmax_pool
/adaptive_max_pool
forNaN
inputs #7670 nn.EmbeddingBag
can now handle empty bags in all modes #7389- โฑ
torch.optim.lr_scheduler.ReduceLROnPlateau
is now serializable #7201 - ๐ Allow only tensors of floating point dtype to require gradients #7034 and #7185
- ๐ Allow resetting of BatchNorm running stats and cumulative moving average #5766
- Set the gradient of
LP-Pool
ing to zero if the sum of all input elements to the power of p is zero #6766
Operators
- โ Add ellipses ('...') and diagonals (e.g. 'iiโi') to
torch.einsum
#7173 - โ Add
to
method forPackedSequence
#7319 - Add support for
__floordiv__
and__rdiv__
for integral tensors #7245 - ๐
torch.clamp
now has subgradient 1 at min and max #7049 - ๐
torch.arange
now uses NumPy-style type inference: #7016 - ๐ Support infinity norm properly in
torch.norm
andtorch.renorm
#6969 - ๐ Allow passing an output tensor via
out=
keyword arugment intorch.dot
andtorch.matmul
#6961
Distributions
- Always enable grad when calculating
lazy_property
#7708
๐ Sparse Tensor
- โ Add log1p for sparse tensor #8969
- ๐ Better support for adding zero-filled sparse tensors #7479
Data Parallel
- ๐ Allow modules that return scalars in
nn.DataParallel
#7973 - ๐ Allow
nn.parallel.parallel_apply
to take in a list/tuple of tensors #8047
Misc
torch.Size
can now accept PyTorch scalars #5676- Move
torch.utils.data.dataset.random_split
to torch.utils.data.random_split, andtorch.utils.data.dataset.Subset
totorch.utils.data.Subset
#7816 - โ Add serialization for
torch.device
#7713 - ๐ Allow copy.deepcopy of
torch.(int/float/...)*
dtype objects #7699 - ๐
torch.load
can now take atorch.device
as map location #7339
๐ Bug Fixes
- ๐ Fix
nn.BCELoss
sometimes returning negative results #8147 - ๐ Fix
tensor._indices
on scalar sparse tensor giving wrong result #8197 - ๐ Fix backward of
tensor.as_strided
not working properly when input has overlapping memory #8721 - ๐ Fix
x.pow(0)
gradient when x contains 0 #8945 - ๐ Fix CUDA
torch.svd
andtorch.eig
returning wrong results in certain cases #9082 - ๐ Fix
nn.MSELoss
having low precision #9287 - ๐ Fix segmentation fault when calling
torch.Tensor.grad_fn
#9292 - ๐ Fix
torch.topk
returning wrong results when input isn't contiguous #9441 - ๐ Fix segfault in convolution on CPU with large
inputs
/dilation
#9274 - Fix
avg_pool2/3d
count_include_pad
having default valueFalse
(should beTrue
) #8645 - ๐ Fix
nn.EmbeddingBag
'smax_norm
option #7959 - ๐ Fix returning scalar input in Python autograd function #7934
- ๐ Fix THCUNN
SpatialDepthwiseConvolution
assuming contiguity #7952 - ๐ Fix bug in seeding random module in
DataLoader
#7886 - ๐ Don't modify variables in-place for
torch.einsum
#7765 - ๐ Make return uniform in lbfgs step #7586
- The return value of
uniform.cdf()
is now clamped to[0..1]
#7538 - ๐ Fix advanced indexing with negative indices #7345
CUDAGenerator
will not initialize on the current device anymore, which will avoid unnecessary memory allocation onGPU:0
#7392- ๐ Fix
tensor.type(dtype)
not preserving device #7474 - ๐ท Batch sampler should return the same results when used alone or in dataloader with
num_workers
> 0 #7265 - ๐ Fix broadcasting error in LogNormal, TransformedDistribution #7269
- ๐ Fix
torch.max
andtorch.min
on CUDA in presence ofNaN
#7052 - ๐ Fix
torch.tensor
device-type calculation when used with CUDA #6995 - ๐ Fixed a missing
'='
innn.LPPoolNd
repr function #9629
๐ Documentation
- ๐ฆ Expose and document
torch.autograd.gradcheck
andtorch.autograd.gradgradcheck
#8166 - โ Document
tensor.scatter_add_
#9630 - ๐ Document variants of
torch.add
andtensor.add_
, e.g.tensor.add(value=1, other)
-> Tensor #9027 - ๐ Document
torch.logsumexp
#8428 - Document
torch.sparse_coo_tensor
#8152 - ๐ Document
torch.utils.data.dataset.random_split
#7676 - ๐ Document
torch.nn.GroupNorm
#7086 - ๐ A lot of other various documentation improvements including RNNs,
ConvTransposeNd
,Fold
/Unfold
,Embedding
/EmbeddingBag
, Loss functions, etc.