Pytorch v1.0.rc1 Release Notes

Release Date: 2018-10-02 // over 5 years ago
  • This is a pre-release preview, do not rely on the tag to have a fixed set of commits, or rely on the tag for anything practical / important

    Table of Contents

    • Highlights
      • JIT
      • torch.distributed new "C10D" library
      • C++ Frontend [API Unstable]
    • ๐Ÿ’ฅ Breaking Changes
      • Additional New Features
      • N-dimensional empty tensors
      • New Operators
      • New Distributions
      • Additions to existing Operators and Distributions
    • ๐Ÿ› Bug Fixes
      • Serious
      • Backwards Compatibility
      • Correctness
      • Error checking
      • Miscellaneous
    • Other Improvements
    • ๐Ÿ—„ Deprecations
      • CPP Extensions
    • ๐ŸŽ Performance
    • ๐Ÿ“š Documentation Improvements

    Highlights

    JIT

    The JIT is a set of compiler tools for bridging the gap between research in PyTorch
    and production. It includes a language called Torch Script (don't worry it is a subset of Python,
    so you'll still be writing Python), and two ways in which you can make your existing code compatible with the JIT.
    โšก๏ธ Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.

    # Write in Python, run [email protected] RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
    

    As an example, see a tutorial on deploying a seq2seq model,
    ๐Ÿ“„ loading an exported model from C++, or browse the docs.

    torch.distributed new "C10D" library

    ๐Ÿ“ฆ The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by the new "C10D" library. The main highlights of the new library are:

    • ๐ŸŽ C10D is performance driven and operates entirely asynchronously for all backends: Gloo, NCCL, and MPI.
    • ๐ŸŽ Significant Distributed Data Parallel performance improvements especially for slower network like ethernet-based hosts
    • โž• Adds async support for all distributed collective operations in the torch.distributed package.
    • โž• Adds send and recv support in the Gloo backend

    C++ Frontend [API Unstable].

    ๐ŸŽ The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:

    Python C++
    import torch

    model = torch.nn.Linear(5, 1) โšก๏ธ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() โšก๏ธ optimizer.step() | #include <torch/torch.h>

    torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); โšก๏ธ optimizer.step(); |

    We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next month or two. Some parts of the API may undergo breaking changes during this time.

    ๐Ÿ“š See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.

    ๐Ÿ’ฅ Breaking Changes

    • ๐Ÿ“„ Indexing a 0-dimensional tensor will now throw an error instead of warn. Use tensor.item() instead. (#11679).
    • ๐Ÿšš torch.legacy is removed. (#11823).
    • torch.masked_copy_ is removed, use torch.masked_scatter_ instead. (#9817).
    • Operations that result in 0 element tensors may return changed shapes.

      • Before: all 0 element tensors would collapse to shape (0,). For example, torch.nonzero is documented to return a tensor of shape (n,z), where n = number of nonzero elements and z = dimensions of the input, but would always return a Tensor of shape _(0,) when no nonzero elements existed.
      • Now: Operations return their documented shape.

      Previously: all 0-element tensors are collapsed to shape (0,)

      torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)

      Now, proper shape is returned

      torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)

    • Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).

    • ๐Ÿšš torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.

    • Some inter-type operations (e.g. *) between torch.Tensors and NumPy arrays will now favor dispatching to the torch variant. This may result in different return types. (#9651).

    • ๐Ÿšš Implicit numpy conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')) before an implicit conversion. (#10553).

    • torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).

    • ๐Ÿ“„ torch.tensor function with a Tensor argument now returns a detached Tensor (i.e. a Tensor where grad_fn is None). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
      #11815).

    • torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape (N,) instead of (N, C) to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
      (#9965).

    • The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).

    • ๐Ÿ“„ Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).

    • ๐Ÿ“„ CPP Extensions: Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and TensorOptions as the last argument. For example, replace your call to at::ones(torch::CPU(at::kFloat)), {2, 3}) with torch::ones({2, 3}, at::kCPU). This applies to the following functions:

      • arange, empty, eye, full, linspace, logspace, ones, rand, randint, randn, randperm, range, zeros.

    โž• Additional New Features

    N-dimensional empty tensors

    • Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:

      torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)

    ๐Ÿ†• New Operators

    ๐Ÿ†• New Distributions

    โž• Additions to existing Operators and Distributions

    ๐Ÿ› Bug Fixes

    Serious

    Backwards Compatibility

    • torch.nn.Module load_from_state_dict now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781).
    • ๐Ÿ›  Fix RuntimeError: storages don't support slicing when loading models saved with PyTorch 0.3. (#11314).

    Correctness

    • ๐Ÿ“„ torch.nn.Dropout fused kernel could change parameters in eval mode. (#10621).
    • ๐Ÿ›  torch.unbind backwards has been fixed. (#9995).
    • ๐Ÿ›  Fix a bug in sparse matrix-matrix multiplication when a sparse matrix is coalesced then transposed. (#10496).
    • ๐Ÿ“„ torch.bernoulli now handles out= parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273).
    • ๐Ÿ“„ torch.Tensor.normal_ could give incorrect results on CPU. (#10846).
    • ๐Ÿ“„ torch.tanh could return incorrect results on non-contiguous tensors. (#11226).
    • ๐ŸŒฒ torch.log on an expanded Tensor gave incorrect results on CPU. (#10269).
    • ๐Ÿ”Š torch.logsumexp now correctly modifies the out parameter if it is given. (#9755).
    • ๐Ÿ“„ torch.multinomial with replacement=True could select 0 probability events on CUDA. (#9960).
    • ๐Ÿ“„ torch.nn.ReLU will now properly propagate NaN.
      (#10277).
    • ๐Ÿ“„ torch.max and torch.min could return incorrect values on input containing inf / -inf. (#11091).
    • ๐Ÿ›  Fixed an issue with calculated output sizes of torch.nn.Conv modules with stride and dilation. (#9640).
    • ๐Ÿ“„ torch.nn.EmbeddingBag now correctly returns vectors filled with zeros for empty bags on CUDA. (#11740).

    Error checking

    • ๐Ÿ“„ torch.gesv now properly checks LAPACK errors. (#11634).
    • ๐Ÿ›  Fixed an issue where extra positional arguments were accepted (and ignored) in Python functions calling into C++. (#10499).
    • legacy Tensor constructors (e.g. torch.FloatTensor(...)) now correctly check their device argument.
      (#11669).
    • Properly check that out parameter is a CPU Tensor for CPU unary ops. (#10358).
    • ๐Ÿ“„ torch.nn.InstanceNorm1d now correctly accepts 2 dimensional inputs. (#9776).
    • torch.nn.Module.load_state_dict had an incorrect error message. (#11200).
    • ๐Ÿ“„ torch.nn.RNN now properly checks that inputs and hidden_states are on the same devices. (#10185).

    Miscellaneous

    Other Improvements

    ๐Ÿ—„ Deprecations

    CPP Extensions

    • ๐Ÿ—„ The torch/torch.h header is deprecated in favor of torch/extension.h, which should be used in all C++ extensions going forward. Including torch/torch.h from a C++ extension will produce a warning. It is safe to batch replace torch/torch.h with torch/extension.h.
    • ๐Ÿ—„ Usage of the following functions in C++ extensions is also deprecated:
      • torch::set_requires_grad. Replacement: at::Tensor now has a set_requires_grad method.
      • torch::requires_grad. Replacement: at::Tensor now has a requires_grad method.
      • torch::getVariableType. Replacement: None.

    torch.distributed

    ๐ŸŽ Performance

    ๐Ÿ“š Documentation Improvements