Pytorch v0.4.1 release notes (2018-07-26)

« Changelog History

Pytorch v0.4.1 Release Notes

Release Date: 2018-07-26 // almost 6 years ago

Table of Contents
- 💥 Breaking Changes
- 🆕 New Features
  - Neural Networks
  - Adaptive Softmax, Spectral Norm, etc.
  - Operators
  - torch.bincount, torch.as_tensor, ...
  - torch.distributions
  - Half Cauchy, Gamma Sampling, ...
  - Other
  - Automatic anomaly detection (detecting NaNs, etc.)
- 🐎 Performance
  - Faster CPU ops in a wide variety of cases
- Other improvements
- 🐛 Bug Fixes
- 📚 Documentation Improvements
💥 Breaking Changes
- 📄 torch.stft has changed its signature to be consistent with librosa #9497
  - Before: stft(signal, frame_length, hop, fft_size=None, normalized=False, onesided=True, window=None, pad_end=0)
  - After: stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)
  - torch.stft is also now using FFT internally and is much faster.
- 🚚 torch.slice is removed in favor of the tensor slicing notation #7924
- 0️⃣ torch.arange now does dtype inference: any floating-point argument is inferred to be the default dtype; all integer arguments are inferred to be int64. #7016
- 📄 torch.nn.functional.embedding_bag's old signature embedding_bag(weight, input, ...) is deprecated, embedding_bag(input, weight, ...) (consistent with torch.nn.functional.embedding) should be used instead
- 🗄 torch.nn.functional.sigmoid and torch.nn.functional.tanh are deprecated in favor of torch.sigmoid and torch.tanh#8748
- Broadcast behavior changed in an (very rare) edge case: [1] x [0] now broadcasts to [0] (used to be [1]) #9209
🆕 New Features

Neural Networks
🔊 Adaptive Softmax nn.AdaptiveLogSoftmaxWithLoss #5287
```
\>\>\> in\_features = 1000\>\>\> n\_classes = 200\>\>\> adaptive\_softmax = nn.AdaptiveLogSoftmaxWithLoss(in\_features, n\_classes, cutoffs=[20, 100, 150])\>\>\> adaptive\_softmax AdaptiveLogSoftmaxWithLoss( (head): Linear(in\_features=1000, out\_features=23, bias=False) (tail): ModuleList( (0): Sequential( (0): Linear(in\_features=1000, out\_features=250, bias=False) (1): Linear(in\_features=250, out\_features=80, bias=False) ) (1): Sequential( (0): Linear(in\_features=1000, out\_features=62, bias=False) (1): Linear(in\_features=62, out\_features=50, bias=False) ) (2): Sequential( (0): Linear(in\_features=1000, out\_features=15, bias=False) (1): Linear(in\_features=15, out\_features=50, bias=False) ) ) )\>\>\> batch = 15\>\>\> input = torch.randn(batch, in\_features)\>\>\> target = torch.randint(n\_classes, (batch,), dtype=torch.long)\>\>\> # get the log probabilities of target given input, and mean negative log probability loss\>\>\> adaptive\_softmax(input, target) ASMoutput(output=tensor([-6.8270, -7.9465, -7.3479, -6.8511, -7.5613, -7.1154, -2.9478, -6.9885, -7.7484, -7.9102, -7.1660, -8.2843, -7.7903, -8.4459, -7.2371], grad\_fn=\<ThAddBackward\>), loss=tensor(7.2112, grad\_fn=\<MeanBackward1\>))\>\>\> # get the log probabilities of all targets given input as a (batch x n\_classes) tensor\>\>\> adaptive\_softmax.log\_prob(input) tensor([[-2.6533, -3.3957, -2.7069, ..., -6.4749, -5.8867, -6.0611], [-3.4209, -3.2695, -2.9728, ..., -7.6664, -7.5946, -7.9606], [-3.6789, -3.6317, -3.2098, ..., -7.3722, -6.9006, -7.4314], ..., [-3.3150, -4.0957, -3.4335, ..., -7.9572, -8.4603, -8.2080], [-3.8726, -3.7905, -4.3262, ..., -8.0031, -7.8754, -8.7971], [-3.6082, -3.1969, -3.2719, ..., -6.9769, -6.3158, -7.0805]], grad\_fn=\<CopySlices\>)\>\>\> # predit: get the class that maximize log probaility for each input\>\>\> adaptive\_softmax.predict(input) tensor([8, 6, 6, 16, 14, 16, 16, 9, 4, 7, 5, 7, 8, 14, 3])
```
📄 Add spectral normalization nn.utils.spectral_norm #6929
```
\>\>\> # Usage is similar to weight\_norm\>\>\> convT = nn.ConvTranspose2d(3, 64, kernel\_size=3, pad=1)\>\>\> # Can specify number of power iterations applied each time, or use default (1)\>\>\> convT = nn.utils.spectral\_norm(convT, n\_power\_iterations=2)\>\>\>\>\>\> # apply to every conv and conv transpose module in a model\>\>\> def add\_sn(m): for name, c in m.named\_children(): m.add\_module(name, add\_sn(c)) if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)): return nn.utils.spectral\_norm(m) else: return m\>\>\> my\_model = add\_sn(my\_model)
```
📄 nn.ModuleDict and nn.ParameterDict containers #8463
Add nn.init.zeros_ and nn.init.ones_ #7488
➕ Add sparse gradient option to pretrained embedding #7492
➕ Add max pooling support to nn.EmbeddingBag #5725
👍 Depthwise convolution support for MKLDNN #8782
➕ Add nn.FeatureAlphaDropout (featurewise Alpha Dropout layer) #9073

Operators
📄 torch.bincount (count frequency of each value in an integral tensor) #6688
```
\>\>\> input = torch.randint(0, 8, (5,), dtype=torch.int64)\>\>\> weights = torch.linspace(0, 1, steps=5)\>\>\> input, weights (tensor([4, 3, 6, 3, 4]), tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])\>\>\> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1])\>\>\> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])
```
📄 torch.as_tensor (similar to torch.tensor but never copies unless necessary) #7109
```
\>\>\> tensor = torch.randn(3, device='cpu', dtype=torch.float32)\>\>\> torch.as\_tensor(tensor) # doesn't copy\>\>\> torch.as\_tensor(tensor, dtype=torch.float64) # copies due to incompatible dtype\>\>\> torch.as\_tensor(tensor, device='cuda') # copies due to incompatible device\>\>\> array = np.array([3, 4.5])\>\>\> torch.as\_tensor(array) # doesn't copy, sharing memory with the numpy array\>\>\> torch.as\_tensor(array, device='cuda') # copies due to incompatible device
```
📄 torch.randperm for CUDA tensors #7606
📄 nn.HardShrink for CUDA tensors #8117
📄 torch.flip (flips a tensor along specified dims) #7873
📄 torch.flatten (flattens a contiguous range of dims) #8578
📄 torch.pinverse (computes svd-based pseudo-inverse) #9052
📄 torch.meshgrid#8581
📄 torch.unique for CUDA tensors #8899
📄 torch.erfc (complementary error function) https://github.com/pytorch/pytorch/pull/9366/files
📄 torch.isinf and torch.isfinite#9169 #9487
📄 torch.reshape_as#9452
📄 Support backward for target tensor in torch.nn.functional.kl_div#7839
🔊 torch.logsumexp#7254
➕ Add batched linear solver to torch.gesv #6100
📄 torch.sum now supports summing over multiple dimensions https://github.com/pytorch/pytorch/pull/6152/files
📄 torch.diagonaltorch.diagflat to take arbitrary diagonals with numpy semantics #6718
📄 tensor.any and tensor.all on ByteTensor can now accept dim and keepdim arguments #4627

Distributions
- Half Cauchy and Half Normal #8411
- Gamma sampling for CUDA tensors #6855
- 👍 Allow vectorized counts in Binomial Distribution #6720
Misc
- 📄 Autograd automatic anomaly detection for NaN and errors occuring in backward. Two functions detect_anomaly and set_detect_anomaly are provided for this. #7677
- 👌 Support reversed(torch.Tensor) #9216
- 👌 Support hash(torch.device) #9246
- 👌 Support gzip in torch.load#6490
🐎 Performance
- Accelerate bernoulli number generation on CPU #7171
- Enable cuFFT plan caching (80% speed-up in certain cases) #8344
- 🛠 Fix unnecessary copying in bernoulli_ #8682
- 🛠 Fix unnecessary copying in broadcast #8222
- Speed-up multidim sum (2x~6x speed-up in certain cases) #8992
- Vectorize CPU sigmoid (>3x speed-up in most cases) #8612
- ⚡️ Optimize CPU nn.LeakyReLU and nn.PReLU (2x speed-up) #9206
- 🔊 Vectorize softmax and logsoftmax (4.5x speed-up on single core and 1.8x on 10 threads) #7375
- 📜 Speed up nn.init.sparse (10-20x speed-up) #6899
👌 Improvements

🖨 Tensor printing
- Tensor printing now includes requires_grad and grad_fn information #8211
- 👌 Improve number formatting in tensor print #7632
- 🛠 Fix scale when printing some tensors #7189
- 🖨 Speed up printing of large tensors #6876
Neural Networks
- NaN is now propagated through many activation functions #8033
- ➕ Add non_blocking option to nn.Module.to #7312
- Loss modules now allow target to require gradient #8460
- ➕ Add pos_weight argument to nn.BCEWithLogitsLoss #6856
- 👌 Support grad_clip for parameters on different devices #9302
- ✂ Removes the requirement that input sequences to pad_sequence have to be sorted #7928
- stride argument for max_unpool1d, max_unpool2d, max_unpool3d now defaults to kernel_size #7388
- Allowing calling grad mode context managers (e.g., torch.no_grad, torch.enable_grad) as decorators #7737
- ⏱ torch.optim.lr_scheduler._LRSchedulers __getstate__ include optimizer info #7757
- Add support for accepting Tensor as input in clip_grad_* functions #7769
- Return NaN in max_pool/adaptive_max_pool for NaN inputs #7670
- nn.EmbeddingBag can now handle empty bags in all modes #7389
- ⏱ torch.optim.lr_scheduler.ReduceLROnPlateau is now serializable #7201
- 👍 Allow only tensors of floating point dtype to require gradients #7034 and #7185
- 👍 Allow resetting of BatchNorm running stats and cumulative moving average #5766
- Set the gradient of LP-Pooling to zero if the sum of all input elements to the power of p is zero #6766
Operators
- ➕ Add ellipses ('...') and diagonals (e.g. 'ii→i') to torch.einsum#7173
- ➕ Add to method for PackedSequence #7319
- Add support for __floordiv__ and __rdiv__ for integral tensors #7245
- 📄 torch.clamp now has subgradient 1 at min and max #7049
- 💅 torch.arange now uses NumPy-style type inference: #7016
- 👌 Support infinity norm properly in torch.norm and torch.renorm#6969
- 👍 Allow passing an output tensor via out= keyword arugment in torch.dot and torch.matmul#6961
Distributions
- Always enable grad when calculating lazy_property #7708
📜 Sparse Tensor
- ➕ Add log1p for sparse tensor #8969
- 👍 Better support for adding zero-filled sparse tensors #7479
Data Parallel
- 👍 Allow modules that return scalars in nn.DataParallel #7973
- 👍 Allow nn.parallel.parallel_apply to take in a list/tuple of tensors #8047
Misc
- torch.Size can now accept PyTorch scalars #5676
- Move torch.utils.data.dataset.random_split to torch.utils.data.random_split, and torch.utils.data.dataset.Subset to torch.utils.data.Subset #7816
- ➕ Add serialization for torch.device #7713
- 👍 Allow copy.deepcopy of torch.(int/float/...)* dtype objects #7699
- 📄 torch.load can now take a torch.device as map location #7339
🐛 Bug Fixes
- 🛠 Fix nn.BCELoss sometimes returning negative results #8147
- 🛠 Fix tensor._indices on scalar sparse tensor giving wrong result #8197
- 🛠 Fix backward of tensor.as_strided not working properly when input has overlapping memory #8721
- 🛠 Fix x.pow(0) gradient when x contains 0 #8945
- 🛠 Fix CUDA torch.svd and torch.eig returning wrong results in certain cases #9082
- 🛠 Fix nn.MSELoss having low precision #9287
- 🛠 Fix segmentation fault when calling torch.Tensor.grad_fn #9292
- 🛠 Fix torch.topk returning wrong results when input isn't contiguous #9441
- 🛠 Fix segfault in convolution on CPU with large inputs / dilation #9274
- Fix avg_pool2/3d count_include_pad having default value False (should be True) #8645
- 🛠 Fix nn.EmbeddingBag's max_norm option #7959
- 🛠 Fix returning scalar input in Python autograd function #7934
- 🛠 Fix THCUNN SpatialDepthwiseConvolution assuming contiguity #7952
- 🛠 Fix bug in seeding random module in DataLoader #7886
- 📄 Don't modify variables in-place for torch.einsum#7765
- 👉 Make return uniform in lbfgs step #7586
- The return value of uniform.cdf() is now clamped to [0..1] #7538
- 🛠 Fix advanced indexing with negative indices #7345
- CUDAGenerator will not initialize on the current device anymore, which will avoid unnecessary memory allocation on GPU:0 #7392
- 🛠 Fix tensor.type(dtype) not preserving device #7474
- 👷 Batch sampler should return the same results when used alone or in dataloader with num_workers > 0 #7265
- 🛠 Fix broadcasting error in LogNormal, TransformedDistribution #7269
- 🛠 Fix torch.max and torch.min on CUDA in presence of NaN #7052
- 🛠 Fix torch.tensor device-type calculation when used with CUDA #6995
- 🛠 Fixed a missing '=' in nn.LPPoolNd repr function #9629
📚 Documentation
- 🔦 Expose and document torch.autograd.gradcheck and torch.autograd.gradgradcheck #8166
- ➕ Document tensor.scatter_add_ #9630
- 📄 Document variants of torch.add and tensor.add_, e.g. tensor.add(value=1, other) -> Tensor #9027
- 🔊 Document torch.logsumexp#8428
- Document torch.sparse_coo_tensor#8152
- 📄 Document torch.utils.data.dataset.random_split #7676
- 📄 Document torch.nn.GroupNorm #7086
- 📚 A lot of other various documentation improvements including RNNs, ConvTransposeNd, Fold/Unfold, Embedding/EmbeddingBag, Loss functions, etc.

Pytorch v0.4.1

Version Release Notes from July 26, 2018 (almost 6 years ago)

« Changelog History

Pytorch v0.4.1 Release Notes

Table of Contents

💥 Breaking Changes

🆕 New Features

Neural Networks

Operators

Distributions

Misc

🐎 Performance

👌 Improvements

🖨 Tensor printing

Neural Networks

Operators

Distributions

📜 Sparse Tensor

Data Parallel

Misc

🐛 Bug Fixes

📚 Documentation