Pytorch v1.0.0 release notes (2018-12-07)

« Changelog History

Pytorch v1.0.0 Release Notes

Release Date: 2018-12-07 // over 5 years ago

Table of Contents
- Highlights
  - JIT
  - Brand New Distributed Package
  - C++ Frontend [API Unstable]
  - Torch Hub
- 💥 Breaking Changes
- ➕ Additional New Features
  - N-dimensional empty tensors
  - New Operators
  - New Distributions
  - Sparse API Improvements
  - Additions to existing Operators and Distributions
- 🐛 Bug Fixes
  - Serious
  - Backwards Compatibility
  - Correctness
  - Error checking
  - Miscellaneous
- Other Improvements
- 🗄 Deprecations
  - CPP Extensions
- 🐎 Performance
- 📚 Documentation Improvements
Highlights

JIT

The JIT is a set of compiler tools for bridging the gap between research in PyTorch
⚡️ and production. It allows for the creation of models that can run without a dependency on the Python interpreter and which can be optimized more aggressively. Using program annotations existing models can be transformed into Torch Script, a subset of Python that PyTorch can run directly. Model code is still valid Python code and can be debugged with the standard Python toolchain. PyTorch 1.0 provides two ways in which you can make your existing code compatible with the JIT, using torch.jit.trace or torch.jit.script. Once annotated, Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.
```
# Write in Python, run [email protected] RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
```
As an example, see a tutorial on deploying a seq2seq model,
📄 loading an exported model from C++, or browse the docs.

📦 Brand New Distributed Package

📦 The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by a brand new re-designed distributed library. The main highlights of the new library are:
- 🆕 New torch.distributed is performance driven and operates entirely asynchronously for all backends: Gloo, NCCL, and MPI.
- 🐎 Significant Distributed Data Parallel performance improvements especially for hosts with slower networks such as ethernet-based hosts
- ➕ Adds async support for all distributed collective operations in the torch.distributed package.
- 📄 Adds the following CPU ops in the Gloo backend: send, recv, reduce, all_gather, gather, scatter
- ➕ Adds barrier op in the NCCL backend
- 📄 Adds new_group support for the NCCL backend
C++ Frontend [API Unstable].

🐎 The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:

Python C++

import torch

model = torch.nn.Linear(5, 1) ⚡️ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() ⚡️ optimizer.step() | #include <torch/torch.h>

torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); ⚡️ optimizer.step(); |

We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next couple of releases. Some parts of the API may undergo breaking changes during this time.

📚 See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.

Torch Hub

Torch Hub is a pre-trained model repository designed to facilitate research reproducibility.

👀 Torch Hub supports publishing pre-trained models (model definitions and pre-trained weights) to a github repository using a simple hubconf.py file; see hubconf for resnet models in pytorch/vision as an example. Once published, users can load the pre-trained models using the torch.hub.load API.

📚 For more details, see the torch.hub documentation. Expect a more-detailed blog post introducing Torch Hub in the near future!

💥 Breaking Changes
- 📄 Indexing a 0-dimensional tensor will now throw an error instead of warn. Use tensor.item() instead. (#11679).
- 🚚 torch.legacy is removed. (#11823).
- torch.masked_copy_ is removed, use torch.masked_scatter_ instead. (#9817).
- Operations that result in 0 element tensors may return changed shapes.
  - Before: all 0 element tensors would collapse to shape (0,). For example, torch.nonzero is documented to return a tensor of shape (n,z), where n = number of nonzero elements and z = dimensions of the input, but would always return a Tensor of shape _(0,) when no nonzero elements existed.
  - Now: Operations return their documented shape.
  Previously: all 0-element tensors are collapsed to shape (0,)
  
  torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)
  
  Now, proper shape is returned
  
  torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
- Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
- 🚚 torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
- Some inter-type operations (e.g. *) between torch.Tensors and NumPy arrays will now favor dispatching to the torch variant. This may result in different return types. (#9651).
- 🚚 Implicit numpy conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')) before an implicit conversion. (#10553).
- torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
- 📄 torch.tensor function with a Tensor argument now returns a detached Tensor (i.e. a Tensor where grad_fn is None). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
  #11815).
- torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape (N,) instead of (N, C) to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
  (#9965).
- The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
- 📄 Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
- 📄 CPP Extensions: Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and TensorOptions as the last argument. For example, replace your call to at::ones(torch::CPU(at::kFloat)), {2, 3}) with torch::ones({2, 3}, at::kCPU). This applies to the following functions:
  - arange, empty, eye, full, linspace, logspace, ones, rand, randint, randn, randperm, range, zeros.
- 0️⃣ torch.potrf renamed to torch.cholesky. It has a new default (upper=False) (#12699).
- 📇 Renamed elementwise_mean to mean for loss reduction functions (#13419)
➕ Additional New Features

N-dimensional empty tensors
- Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
  
  torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
🆕 New Operators
- 📄 torch.argsort similar to numpy.argsort.
  (#9600).
- 📄 torch.pdist similar to scipy.spatial.distance.pdist. (#10782).
- 📄 torch.tensordot similar to numpy.tensordot. (#10025).
- 📄 torch.broadcast_tensors similar to numpy.broadcast_arrays.
  (#10075).
- 📜 torch.narrow support for sparse tensors.
  (#11342).
- 📄 torch.matrix_rank similar to numpy.linalg.matrix_rank.
  (#10338).
- 📄 torch.matrix_power similar to numpy.linalg.matrix_power. (#11421).
- 📄 torch.nn.CeLU activation. (#8551).
- 📄 torch.nn.CTCLoss. (#9628).
- 📄 torch.diag_embed (#12447).
- 📄 torch.roll operator to match numpy.roll (#13261, #13588, #13874).
- 📄 torch.chain_matmul for performing a chain of matrix multiplies (#12380).
- 📄 torch.finfo,
  📄 torch.info to get more detailed information on a dtype, similar to numpy.finfo and numpy.iinfo (#12472).
- Tensor. __cuda_array_interface__ to provide compatibility with numba and other CUDA projects (#11984).
🆕 New Distributions
- 📄 Weibull Distribution. (#9454).
- 📄 NegativeBinomial Distribution. (#9345).
- 🌲 torch.mvlgamma Multivariate Log-Gamma Distribution. (#9451).
📜 Sparse API Improvements
- 👀 Implemented "sparse gradient" versions of some existing functions, see sparse.mm, sparse.sum, sparse.addmm for details. (#14526, #14661, #12430, #13345).
- 📜 Tensor.to_sparse() allows conversion from a dense tensor to a sparse tensor. (#12171)
- 📜 torch.cat now supports sparse tensors. (#13761, #13577).
- 📜 torch.unsqueeze now works with sparse vectors (this also makes torch.stack work out of the box). (#13760).
- Autograd is now supported on values() and torch.sparse_coo_tensor (with indices and values tensors). E.g., torch.sparse_coo_tensor(i, v).values().sum() is differentiable w.r.t. v. See the updated torch.sparse documentation for details. (#13001).
➕ Additions to existing Operators and Distributions
- 📄 torch.unique now accepts an optional dim argument. (#10423).
- 📄 torch.norm now supports matrix norms.
  (#11261).
- 📄 torch.distributions.kl.kl_divergence now supports broadcasting. (#10533).
- 📄 torch.distributions now support an expand method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341).
- 📄 torch.nn.functional.grid_sample now support nearest neighbor interpolation and reflection padding. (#10051).
- 📄 torch.mean now works across multiple dimensions. (#14252).
- 📄 torch.potrs supports batching (#13453).
- 📄 torch.multiprocessing.spawn helper for spawning processes. (#13518).
- 📄 torch.pow now allows taking derivatives when invoked with a python number as a base. (#12450).
- 📄 Tensor.to now supports a copy keyword argument. (#12571).
- 🌲 torch.softmax and and torch.log_softmax now support a dtype accumulation argument. (#11719).
- 📄 torch.svd supports a compute_uv argument for optionally computing singular vectors (#12517).
- 📄 torch.inverse now supports batches of tensors (#9949).
- 📄 autograd.profiler shows demangled names on nvtx ranges. (#13154).
🐛 Bug Fixes

Serious
- 📄 torch.nn.functional.softmin was using the incorrect formula in 0.4.1 (#10066).
- 📄 torch.as_strided backwards (called via view) was incorrect with overlapping data locations. (#9538).
- 📄 Pointwise losses (e.g. torch.nn.MSELoss) were sometimes using the wrong reduction method. (#10018).
- 📄 torch.from_numpy was not handling big-endian dtypes correctly. (#9508).
- 📄 torch.multiprocessing now correctly handles CUDA tensors, requires_grad settings, and hooks. (#10220).
- __rsub__ now works properly when the CUDA device is not 0. (#12956).
- 🛠 Fix memory leak during packing in tuples (#13305).
- 🛠 DataLoader fixed a couple of issues resulting in hangs. (#11985, #12700).
- 📄 torch.multinomial replacement=False will not properly throw an error message when there are no more categories to select (#12490).
- torch.masked_fill_ now works properly on non-contiguous tensor inputs (#12594).
- Tensor. __delitem__: fixed a segmentation fault on (#12726).
Backwards Compatibility
- torch.nn.Module load_from_state_dict now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781).
- 🛠 Fix RuntimeError: storages don't support slicing when loading models saved with PyTorch 0.3. (#11314).
- 🛠 BCEWithLogitsLoss: fixed an issue with legacy reduce parameter. (#12689).
Correctness
- 📄 torch.nn.Dropout fused kernel could change parameters in eval mode. (#10621).
- 🛠 torch.unbind backwards has been fixed. (#9995).
- 🛠 Fix a bug in sparse matrix-matrix multiplication when a sparse matrix is coalesced then transposed. (#10496).
- 📄 torch.bernoulli now handles out= parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273).
- 📄 torch.Tensor.normal_ could give incorrect results on CPU. (#10846).
- 📄 torch.tanh could return incorrect results on non-contiguous tensors. (#11226).
- 🌲 torch.log on an expanded Tensor gave incorrect results on CPU. (#10269).
- 🔊 torch.logsumexp now correctly modifies the out parameter if it is given. (#9755).
- 📄 torch.multinomial with replacement=True could select 0 probability events on CUDA. (#9960).
- 📄 torch.nn.ReLU will now properly propagate NaN.
  (#10277).
- 📄 torch.max and torch.min could return incorrect values on input containing inf / -inf. (#11091).
- 🛠 Fixed an issue with calculated output sizes of torch.nn.Conv modules with stride and dilation. (#9640).
- 📄 torch.nn.EmbeddingBag now correctly returns vectors filled with zeros for empty bags on CUDA. (#11740).
- 👉 Use integer math to compute output size of pooling operations (#14405).
- 🛠 Fix sum() on fp16 (#13926).
- Remove CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for accuracy (#13844).
- 🛠 fix stability in bce with pos_weight formula (#13863).
- 🛠 Fix torch.dist for infinity, zero and minus infinity norms (#13713).
- Give broadcast_coalesced tensors different version counters (#13594).
- 🛠 Fix flip() shape bug in CPU (#13344).
- 🛠 Fix more spectral norm bugs (#13350).
- 🛠 Fix handling of single input in gradcheck (#13543).
- 👀 torch.cuda.manual_seed now also sets the philox seed and offset. (#12677).
- 📄 utils.bottleneck fix ZeroDivisionError(#11987).
- 📄 Disable hook serialization (#11705).
- 📄 torch.norm: fix negative infinity norm (#12722).
- 🛠 Fix torch.isfinite for integer input (#12750).
- 📄 ConvTranspose3d fix output_size calculation (#12952).
- 📄 torch.randperm: properly use RNG mutex on CPU (#13832)
Error checking
- 📄 torch.gesv now properly checks LAPACK errors. (#11634).
- 🛠 Fixed an issue where extra positional arguments were accepted (and ignored) in Python functions calling into C++. (#10499).
- legacy Tensor constructors (e.g. torch.FloatTensor(...)) now correctly check their device argument.
  (#11669).
- Properly check that out parameter is a CPU Tensor for CPU unary ops. (#10358).
- 📄 torch.nn.InstanceNorm1d now correctly accepts 2 dimensional inputs. (#9776).
- torch.nn.Module.load_state_dict had an incorrect error message. (#11200).
- 📄 torch.nn.RNN now properly checks that inputs and hidden_states are on the same devices. (#10185).
- torch.nn.utils.rnn.pack_padded_sequence now properly checks for out-of-order length. (#13933).
- 📄 torch.bmm now properly checks that its Tensor arguments are on compatible devices. (#12434).
- 🛠 Conv2d: fixed incorrect error message for too-large kernel size (#12791).
- 📄 Tensor.expand error message now includes complete sizes. (#13124).
- 👌 Improve CUDA out-of-memory error message. (#13751).
- 📄 torch.arange now properly checks for invalid ranges. (#13915)
Miscellaneous
- 📄 torch.utils.data.DataLoader could hang if it was not completely iterated. (#10366).
- 🛠 Fixed a segfault when grad to a hook function is None. (#12028).
- 🛠 Fixed a segfault in backwards with torch.nn.PReLU when the input does not require grad. (#11758).
- 🛠 dir(torch) has been fixed with Python 3.7. (#10271).
- 🛠 Fixed a device-side assert in torch.multinomial when replacement=False and the input has fewer nonzero elements than num_samples. (#11933).
- Can now properly assign a torch.float16 dtype tensor to .grad. (#11781).
- 🛠 Fixed can only join a started process error with torch.utils.data.DataLoader. (#11432).
- 📄 Prevent unexpected exit in torch.utils.data.DataLoader on KeyboardInterrupt. (#11718).
- 📄 torch.einsum now handles spaces consistently. (#9994).
- 🛠 Fixed a broadcasting bug in torch.distributions.studentT.StudentT. (#12148).
- 🛠 fix a printing error with large non-contiguous tensors. (#10405).
- allow empty index for scatter_* methods (#14077)
- 📄 torch.nn.ModuleList now handles negative indices. (#13102).
- Minor fix to reenable nvtx sequence numbers for the forward methods of custom (Python) autograd functions (#13876)
- 🛠 Fix handling all empty bags in CUDA embedding bag (#13483)
- Fix half_tensor.bernoulli_(double) (#13474)
- 🛠 Fix cuda out of memory test (#13864)
- Implement NaN-propagating max/min on Vec256. (#13399).
- 🛠 Fix refcounting in anomaly metadata (#13249)
- 🛠 Fix pointwise loss broadcast (#12996)
- 🛠 Fix copying a nn.Parameter (#12886)
Other Improvements
- 📄 torch.cuda functions and torch.nn.parallel.data_parallel now accept torch.device objects in addition to integer device ids. (#10833, #10189).
- 📄 torch.nn.parallel.data_parallel now accepts torch.device inputs. (#10189).
- 🌲 torch.nn.functional.log_softmax is now more numerically stable. (#11866).
- 👌 Improve printing of sparse tensors and grad_fns. (#10181).
- Only involve CUDA device in CUDA -> CPU copy. (#11592).
- Accept numpy floating-point scalars as doubles more consistently. (#9659).
- 📄 sparse-to-sparse copy_ is now supported. (#9005).
- 📄 torch.bincount now supports 0 element inputs. (#9757).
- 📄 torch.nn.functional.conv2d error message have been improved. (#11053).
- 👍 Allow conversion of np.int64 to PyTorch scalar. (#9225).
- 📄 torch.einsum now handles varargs.
  (#10067).
- 📄 torch.symeig now returns 0-filled eigenvectors when eigenvectors=False is passed on CUDA rather than uninitialized data. (#10645).
- 📄 torch.utils.checkpoint: added an pption to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout. (#14253).
🗄 Deprecations
- ✂ Removed support for C extensions. Please use cpp extensions. (#12122)
- ✂ Delete torch.utils.trainer (#12487)
CPP Extensions
- 🗄 The torch/torch.h header is deprecated in favor of torch/extension.h, which should be used in all C++ extensions going forward. Including torch/torch.h from a C++ extension will produce a warning. It is safe to batch replace torch/torch.h with torch/extension.h.
- 🗄 Usage of the following functions in C++ extensions is also deprecated:
  - torch::set_requires_grad. Replacement: at::Tensor now has a set_requires_grad method.
  - torch::requires_grad. Replacement: at::Tensor now has a requires_grad method.
  - torch::getVariableType. Replacement: None.
- 🛠 Fix version.groups() (#14505)
- 👍 Allow building libraries with setuptools that dont have abi suffix (#14130)
- Missing .decode() after check_output in cpp_extensions (#13935)
torch.distributed
- 📦 the old (THD-backed) torch.distributed package is deprecated but still available at torch.distributed.deprecated.
- 🗄 The old (THD-backed) torch.nn.parallel.DistributedDataParallel is deprecated but still available at torch.nn.parallel.deprecated.DistributedDataParallel.
🐎 Performance
- 🐎 "Advanced Indexing" performance has improved on both CPU and GPU. (#13420)
- 📄 torch.nn.functional.grid_sample on CPU now uses vectorized operation and is now 2x~7x faster with AVX2 enabled CPUs. (#9961).
- 📄 torch.norm has been vectorized and parallelized on CPU. (#11565).
- 📄 torch.max and torch.min has been parallelized on CPU. (#10343).
- 📄 torch.nn.Threshold and torch.nn.ReLU have been parallelized on CPU. (#13182)
- torch.Tensor.masked_fill_ has been parallelized on CPU. (#11359).
- 📜 Tensor.sparse_mask has been parallelized on CPU. (#13290).
- 📄 torch.nn.PReLU has been sped up on both CPU and GPU.
  (#11758).
- 📄 torch.nn.KLDivLoss has been sped up on both CPU and GPU. (#10336).
- 📄 torch.svd has been sped up on both CPU and GPU.
  (#11194).
- 📄 torch.einsum has been greatly sped up on CPU.
  (#11292).
- 📄 torch.clamp no longer does unnecessary copying. (#10352).
- 📄 torch.add, torch.sub, torch.mul, torch.div are much faster for non-contiguous tensors on GPU. (#8919).
- 📄 torch.nn.RNN and related Modules have been ported to C++ and are more performant. (#10305, #10481).
- 📄 autograd.Profiler now has lower overhead. (#10969, #11773).
- 🖨 Printing large Tensors is now faster. (#14418).
📚 Documentation Improvements
- 📄 Reproducibility note added. (#11329).
- 📚 CPP Extensions have improved online documentation. Authors of C++ extensions may want to consult this documentation when writing new extensions.
- 📄 torch.Tensor.flatten is now documented.
  (#9876).
- 📄 torch.digamma is now documented. (#10967).
- 📄 torch.allclose is now documented. (#11185).
- 📄 torch.eig return format clarified. (#10315).
- 📄 torch.as_tensor now includes a proper example. (#10309).
- torch.sparse_coo_tensor now explains uncoalesced behavior. (#10308).
- 📄 torch.fft equation has been corrected. (#10760).
- 📄 torch.nn.LSTM behavior has been clarified in the multilayer case. (#11896).
- 📚 torch.nn.functional.dropout documentation has been clarified. (#10417).
- 📚 torch.nn.functional.pad documentation has been clarified. (#11623).
- Add documentation about Tensor properties device, is_cuda, requires_grad, is_leaf and grad.
  (#14339)
- 📚 torch.sparse documentation updated (#12221).
- ➕ Added a quick rundown of codebase structure. (#12693)
- 📄 torch.cat corrected parameter name to tensors from seq. (#12741)
- Warn that tensor.resize_() resets strides (#12816)

Pytorch v1.0.0

Version Release Notes from December 07, 2018 (over 5 years ago)

« Changelog History

Pytorch v1.0.0 Release Notes

Table of Contents

Highlights

JIT

📦 Brand New Distributed Package

C++ Frontend [API Unstable].

Torch Hub

💥 Breaking Changes

Previously: all 0-element tensors are collapsed to shape (0,)

Now, proper shape is returned

➕ Additional New Features

N-dimensional empty tensors

🆕 New Operators

🆕 New Distributions

📜 Sparse API Improvements

➕ Additions to existing Operators and Distributions

🐛 Bug Fixes

Serious

Backwards Compatibility

Correctness

Error checking

Miscellaneous

Other Improvements

🗄 Deprecations

CPP Extensions

torch.distributed

🐎 Performance

📚 Documentation Improvements