Pytorch v1.2.0 Release Notes

Release Date: 2019-08-08 // 14 days ago
  • πŸš€ We have just released PyTorch v1.2.0.

    🐎 It has over 1,900 commits and contains a significant amount of effort in areas spanning JIT, ONNX, Distributed, as well as Performance and Eager Frontend Improvements.


    [JIT] New TorchScript API

    πŸ”– Version 1.2 includes a new, easier-to-use API for converting nn.Modules into ScriptModules. A sample usage is:

    class MyModule(torch.nn.Module):
    # Construct an nn.Module instance
    module = MyModule(args)
    # Pass it to `torch.jit.script` to compile it into a ScriptModule.
    my_torchscript_module = torch.jit.script(module)

    πŸ‘€ torch.jit.script() will attempt to recursively compile the given nn.Module, including any submodules or methods called from forward(). See the migration guide for more info on what's changed and how to migrate.

    [JIT] Improved TorchScript Python language coverage

    πŸ‘ In 1.2, TorchScript has significantly improved its support for Python language constructs and Python's standard library. Highlights include:

    • Early returns, breaks and continues.
    • Iterator-based constructs, like loops, zip(), and enumerate().
    • NamedTuples.
    • πŸ‘ math and string library support.
    • πŸ‘Œ Support for most Python builtin functions.

    πŸ‘€ See the detailed notes below for more information.

    Expanded Onnx Export

    βœ… In PyTorch 1.2, working with Microsoft, we’ve added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). We’ve have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:

    • πŸ‘Œ Support for multiple Opsets including the ability to export dropout, slice, flip and interpolate in Opset 10.
    • πŸ‘Œ Improvements to ScriptModule including support for multiple outputs, tensor factories and tuples as inputs and outputs.
    • πŸ‘ More than a dozen additional PyTorch operators supported including the ability to export a custom operator.

    Updated docs can be found here and also a refreshed tutorial using ONNXRuntime can be found here.

    Tensorboard is no Longer Considered Experimental

    Read the documentation or simply type from torch.utils.tensorboard import SummaryWriter to get started!


    We include a standard nn.Transformer module, based on the paper β€œAttention is All You Need”. The nn.Transformer module relies entirely on an attention mechanism to draw global dependencies between input and output. The individual components of the nn.Transformer module are designed so they can be adopted independently. For example, the nn.TransformerEncoder can be used by itself, without the larger nn.Transformer. New APIs include:

    • nn.Transformer
    • nn.TransformerEncoder and nn.TransformerEncoderLayer
    • nn.TransformerDecoder and nn.TransformerDecoderLayer

    πŸ“š See the Transformer Layers documentation for more info.

    πŸ’₯ Breaking Changes

    Comparison operations (lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=) ) return dtype has changed from torch.uint8 to torch.bool (21113)

    πŸ”– Version 1.1:

    >>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
    tensor([1, 0, 0], dtype=torch.uint8)

    πŸ”– Version 1.2:

    >>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
    tensor([True, False, False])

    For most programs, we don't expect that any changes will need to be made as a result of this change. There are a couple of possible exceptions listed below.

    Mask Inversion

    πŸ‘ In prior versions of PyTorch, the idiomatic way to invert a mask was to call 1 - mask. This behavior is no longer supported; use the ~ or bitwise_not() operator instead.

    πŸ”– Version 1.1:

    >>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
    tensor([0, 1, 1], dtype=torch.uint8)

    πŸ”– Version 1.2:

    >>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
    RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported.
    If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.
    >>> ~(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
    tensor([False, True, True])

    sum(Tensor) (python built-in) does not upcast dtype like torch.sum

    Python's built-in sum returns results in the same dtype as the tensor itself, so it will not return the expected result if the value of the sum cannot be represented in the dtype of the tensor.

    πŸ”– Version 1.1:

    # value can be represented in result dtype
    >>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
    tensor(3, dtype=torch.uint8)
    # value can NOT be represented in result dtype
    >>> sum(torch.ones((300,)) > 0)
    tensor(44, dtype=torch.uint8)
    # torch.sum properly upcasts result dtype
    >>> torch.sum(torch.ones((300,)) > 0)

    πŸ”– Version 1.2:

    # value cannot be represented in result dtype (now torch.bool)
    >>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
    # value cannot be represented in result dtype
    >>> sum(torch.ones((300,)) > 0)
    # torch.sum properly upcasts result dtype
    >>> torch.sum(torch.ones((300,)) > 0)

    TLDR : use torch.sum instead of the built-in sum. Note that the built-in sum() behavior will more closely resemble torch.sum in the next release.

    πŸ—„ Note also that masking via torch.uint8 Tensors is now deprecated, see the Deprecations section for more information.

    __invert__ / ~: now calls torch.bitwise_not instead of 1 - tensor and is supported for all integral+Boolean dtypes instead of only torch.uint8. (22326)

    πŸ”– Version 1.1:

    >>> ~torch.arange(8, dtype=torch.uint8)
    tensor([1, 0, 255, 254, 253, 252, 251, 250], dtype=torch.uint8)

    πŸ”– Version 1.2:

    >>> ~torch.arange(8, dtype=torch.uint8)
    tensor([255, 254, 253, 252, 251, 250, 249, 248], dtype=torch.uint8)

    torch.tensor(bool) and torch.as_tensor(bool) now infer torch.bool dtype instead of torch.uint8. (19097)

    πŸ”– Version 1.1:

    >>> torch.tensor([True, False])
    tensor([1, 0], dtype=torch.uint8)

    πŸ”– Version 1.2:

    >>> torch.tensor([True, False])
    tensor([True, False])

    nn.BatchNorm{1,2,3}D: gamma (weight) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)

    πŸ”– Version 1.1:

    >>> torch.nn.BatchNorm2d(5).weight
    Parameter containing:
    tensor([0.1635, 0.7512, 0.4130, 0.6875, 0.5496], 

    πŸ”– Version 1.2:

    >>> torch.nn.BatchNorm2d(5).weight
    Parameter containing:
    tensor([1., 1., 1., 1., 1.], requires_grad=True)

    🚚 A number of deprecated Linear Algebra operators have been removed (22841)

    🚚 | Removed | Use Instead | | --- | --- | | btrifact | lu | | btrifact_with_info | lu with get_infos=True | | btrisolve | lu_solve | | btriunpack | lu_unpack | | gesv | solve | | pstrf | cholesky | | potrf | cholesky | | potri | cholesky_inverse | | potrs | cholesky_solve | | trtrs | triangular_solve |

    πŸ“œ Sparse Tensors: Changing the sparsity of a Tensor through .data is no longer supported. (17072)

    >>> x = torch.randn(2,3)
    >>> = torch.sparse_coo_tensor((2, 3))
    RuntimeError: Attempted to call `variable.set_data(tensor)`,
    but `variable` and `tensor` have incompatible tensor type.

    πŸ“œ Sparse Tensors: in-place shape modifications of Dense Tensor Constructor Arguments will no longer modify the Sparse Tensor itself (20614)

    πŸ”– Version 1.1:

    >>> i = torch.tensor([[0, 1]])
    >>> v = torch.ones(2)
    >>> s = torch.sparse_coo_tensor(i, v)
    >>> i.resize_(1, 1)
    >>> v.resize_(1)
    >>> s.coalesce().indices().shape
    torch.Size([1, 1])
    >>> s.coalesce().values().shape

    πŸ”” Notice indices() and values() reflect the resized tensor shapes.

    πŸ”– Version 1.2:

    >>> i = torch.tensor([[0, 1]])
    >>> v = torch.ones(2)
    >>> s = torch.sparse_coo_tensor(i, v)
    >>> i.resize_(1, 1)
    >>> v.resize_(1)
    >>> s.coalesce().indices().shape
    torch.Size([1, 2])
    >>> s.coalesce().values().shape

    πŸ”” Notice indices() and values() reflect the original tensor shapes.

    πŸ“œ Sparse Tensors: Accumulating dense gradients into a sparse .grad will no longer retain Python object identity. (17072)

    πŸ”– Version 1.1:

    >>> m = torch.nn.Embedding(10, 3, sparse=True)
    >>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
    >>> assert m.weight.grad.layout == torch.sparse_coo
    >>> m_weight_grad_saved = m.weight.grad
    # accumulate dense gradient into sparse .grad, change sparsity
    >>> m.weight.sum().backward()
    >>> assert m.weight.grad.layout == torch.strided
    # m_weight_grad_saved still refers to the .grad of m's weight
    # even though the sparsity has changed
    >>> assert id(m_weight_grad_saved) == id (m.weight.grad)

    πŸ”– Version 1.2:

    >>> m = torch.nn.Embedding(10, 3, sparse=True)
    >>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
    >>> assert m.weight.grad.layout == torch.sparse_coo
    >>> m_weight_grad_saved = m.weight.grad
    # accumulate dense gradient into sparse .grad, change sparsity
    >>> m.weight.sum().backward()
    >>> assert m.weight.grad.layout == torch.strided
    # m_weight_grad_saved NO LONGER refers to the .grad of m's weight
    >>> assert id(m_weight_grad_saved) == id (m.weight.grad)

    πŸ”€ nn.utils.convert_sync_batchnorm has been replaced with nn.SyncBatchNorm.convert_sync_batchnorm(18787)

    Example of new usage:

    >>> # Network with nn.BatchNorm layer
    >>> module = torch.nn.Sequential(
    >>> torch.nn.Linear(20, 100),
    >>> torch.nn.BatchNorm1d(100)
    >>> ).cuda()
    >>> # creating process group (optional)
    >>> process_group = torch.distributed.new_group(process_ids)
    >>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)

    Error Checking: torch.addcmul and torch.lerp operators enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.

    πŸ”– Version 1.1:

    >>> x=torch.zeros(1)
    >>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
    tensor([[0., 0., 0.],
            [0., 0., 0.]])

    πŸ”– Version 1.2:

    >>> x=torch.zeros(1)
    >>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
    RuntimeError: output with shape [1] doesn't match the broadcast shape [2, 3]

    If you run into this error, please ensure the out parameter is of the correct output shape (post-broadcasting).

    Error Checking: Improved Variable version tracking (20391, 22821, 21865)

    ⚑️ PyTorch’s autograd system uses a version tracking mechanism to ensure that Tensors that are saved for backwards computations retain their correct values when the backward pass is computed (i.e. that they haven’t been updated in-place since they were saved). See In Place Correctness Checks in the docs for more information.

    In PyTorch 1.2 we have enhanced the version tracking in a number of cases, which may flag issues that were not caught previously. There is now additional tracking through the Variable() constructor, the nn.Parameter() constructor, after setting .data, and via nn.Module._apply (internal API).

    Track changes through Variable constructor:

    >>> x = torch.ones(1, requires_grad=True)+1
    >>> y = x*x
    # do an in-place update through Variable constructor
    >>> torch.autograd.Variable(x).add_(1)
    >>> y.backward()
    RuntimeError: one of the variables needed for gradient computation has been modified
    by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 

    Track changes on an nn.Parameter:

    >>> x = torch.ones(1)
    >>> p = torch.nn.Parameter(x)
    >>> y = p * p
    # do an in-place update on a saved Parameter
    >>> x.add_(1)
    >>> y.sum().backward()
    RuntimeError: one of the variables needed for gradient computation has been modified
    by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 

    Track changes after setting .data:

    >>> x = torch.zeros(1, requires_grad=True)+1
    >>> y = x * x
    >>> = torch.zeros(1, requires_grad=True)+1
    >>> x.add_(1)
    >>> y.backward()
    RuntimeError: one of the variables needed for gradient computation has been modified
    by an inplace operation: [torch.FloatTensor [1]], which is output 0 of AddBackward0,
    is at version 1; expected version 0 instead.

    [JIT] Python called from scripted modules must be @ignored

    πŸ‘€ torch.jit.script now recursively compiles everything it finds in the original function, so if you had Python functions called from in your scripted function or module, you must now explicitly @ignore it. See the new API guide for more details.

    πŸ”– Version 1.1

    def my_unscriptable_python_fn():
        # weird stuff
    def fn():
        # This gets inserted as a Python call, and only errors on `save()`.

    πŸ”– Version 1.2

    @torch.jit.ignore # this needs to be added ...
    def my_unscriptable_python_fn():
    def fn():
        # ... or else recursive compilation will attempt to compile this call

    NOTE: This is also a change to behavior of the @torch.jit.ignore decorator. In version 1.1, @ignore tells the compiler to omit compiling a function entirely, to mark Python functions that you know will not be called after export. In version 1.2 @ignore, tells the compiler to insert a call back to the Python interpreter instead of trying to compile the function.

    To get the old behavior, use @torch.jit.ignore(drop_on_export=True) (@torch.jit.ignore with no arguments is equivalent to @torch.jit.ignore(drop_on_export=False)).

    ⚑️ [JIT] optimize for ScriptModules is now a context manager

    πŸ‘ Whether optimization passes are run is now a thread-local flag. This better reflects how optimization actually happens in the JIT (i.e. it is decided at runtime, not compilation time).

    πŸ”– Version 1.1

    def fn(inputs):

    πŸ”– Version 1.2

    def fn(inputs):
    with @torch.jit.optimized_execution(False):

    [jit] script::Module is now a reference type

    ⚑️ To better align with the PyTorch C++ API philosophy, script::Module and script::Method are now reference types. Our APIs have been updated to use script::Module instead of std::shared_ptr<script::Module>.

    πŸ”– Version 1.1

    using torch::jit::script::Module;
    std::shared_ptr<Module> m = torch::jit::load("");

    πŸ”– Version 1.2

    using torch::jit::script::Module;
    Module m = torch::jit::load("");

    [C++ only] mean() / sum() / prod() APIs have changed slightly (21088)

    πŸ”– Version 1.1 API:

    Tensor sum(IntArrayRef dim, bool keepdim=false) const;    
    Tensor sum(IntArrayRef dim, ScalarType dtype) const;

    πŸ”– Version 1.2 API:

    Tensor sum(IntArrayRef dim, bool keepdim=false,
               c10::optional<ScalarType> dtype=c10::nullopt) const;

    that is, to override dtype, keepdim must now be provided.

    Binary distribution and nightly changes

    ⚑️ We have streamlined our conda and wheel binary distributions, so that it is easier than ever to install the version of PyTorch appropriate for your needs. The install instructions on have been updated, but if you have tooling to download and install PyTorch, here is a detailed description of the changes we made:

    Wheels now have local version identifiers. Wheels that are for non-default CUDA configurations (the default CUDA version for this release is 10.0) now have local version identifiers like +cpu and +cu92. This means that, when installing, it is no longer necessary to specify a full wheel URLβ€”just specify an appropriate version constraint like torch==1.2.0+cu92.

    πŸ”– Version 1.1 (for Python 3.7 on Linux only):

    pip install numpy
    pip install

    πŸ”– Version 1.2 (works for all versions of Python, and both Linux and Mac):

    pip install torch==1.2.0+cpu -f

    CPU-only binaries on conda can be selected with the cpuonly feature. We’ve eliminated the pytorch-cpu conda package; instead, the cpu-only conda package can be enabled by installing the cpuonly metapackage. Similarly, there is no longer both a torchvision and torchvision-cpu package; the feature will ensure that the CPU version of torchvision is selected.

    πŸ”– Version 1.1:

    conda install -c pytorch pytorch-cpu

    πŸ”– Version 1.2:

    conda install -c pytorch pytorch cpuonly

    Conda nightlies now live in the pytorch-nightly channel and no longer have β€œ-nightly” in their name. We have added a new dedicated channel for nightlies called pytorch-nightly; all nightlies (pytorch, torchvision, torchaudio, etc.) will now be uploaded to this channel, but with the same name as their corresponding stable versions (unlike before, when we had a separate pytorch-nightly, torchvision-nightly, etc. packages.) This makes it more difficult to accidentally install a copy of the nightly and stable at the same time.

    πŸ”– Version 1.1:

    conda install -c pytorch pytorch-nightly

    πŸ”– Version 1.2:

    conda install -c pytorch-nightly pytorch

    Wheel nightlies no longer have -nightly in their name. Similar to the changes we made in Conda, we no longer suffix wheel nightlies with β€œ-nightly”, to make it harder to accidentally install a copy of nightly and stable at the same time.

    πŸ”– Version 1.1:

    pip install --pre torch_nightly -f

    πŸ”– Version 1.2:

    pip install --pre torch -f

    πŸ†• New Features

    πŸ‘ Tensor Type Support

    • πŸ’₯ torch.bool: added support for many operators (masking, comparison, arithmetic operators) to achieve feature parity with torch.uint8. See the Breaking Changes section for details about how this could affect existing programs. (21032, etc.)
    • πŸ“œ torch.sparse.HalfTensor: Added support for torch.float16 sparse Tensors on both CPU and CUDA. (19695)
    • torch.bfloat16: Added basic creation and serialization support for Brain Floating Point Tensors. (21522, 21523, 21860, 22852)

    πŸ“¦ NN Package

    • nn.Transformer: added implementation of Transformer from Attention is All You Need. (20170, 22588)
    • πŸ‘ nn.Embedding: support float16 embeddings on CUDA. (19695)
    • nn.Flatten: added a Module that performs torch.flatten. (22245)
    • πŸ‘ nn.functional.gelu: Added support for Gaussian Error Linear Units. (20665, 21237)
    • nn.Module hooks: add ability to replace input/output via forward_pre_hook and forward_hook. (22285)
    • nn.Module: add requires_grad_()method for turning on/off requires_grad for Module parameters. (22576)


    • πŸ“œ Tensor.to_sparse: now supports autograd. (20458)
    • Tensor.fill_diagonal_: operator to fill the main diagonal of a Tensor. (21892)
    • πŸ‘ torch.qr: supports autograd. (21274)
    • torch.bitwise_not: add operator for boolean/integer types. Also have python ~ operator use this. (22283, 22320)
    • πŸ“„ torch.trapz: integrate using the trapezoid rule; equivalent to numpy.trapz. (21610)
    • torch.var_mean / torch.std_mean: compute variance and mean at the same time.(18731)
    • torch.utils.ThroughputBenchmark: benchmark utility for measuring the throughput of PyTorch operators. (20766).
    • 🌲 Logging: lightweight at-most-once logging to record operators that are used (c10::Logging). (20745)

    πŸ“¦ Optim Package

    πŸ“¦ Distributed Package

    • πŸ‘ DistributedDataParallel: support CPU modules. (20236)
    • πŸ“œ DistributedDataParallel: support sparse tensors. (19146)
    • πŸ‘ DistributedDataParallel: support local gradient accumulation. (21736)


    • IterableDataset: introduces a new type of Dataset designed for data read from a stream. (19228)

    πŸ“¦ Tensorboard Package

    • πŸ‘ TensorBoard support in PyTorch has improved and is no longer experimental!
    • πŸ‘ SummaryWriter.flush: now supported. (20607)
    • πŸ‘ SummaryWriter.add_mesh: add support for 3D point clouds. (20413)

    JIT Features

    • πŸ‘Œ Improved support for iterator infrastructure. TorchScript now supports looping through a List, Tuple, Dict, Tensor, String and you can also use zip(), enumerate(), and (21801, 22006, 21990, 21985)
    • πŸ‘Œ Support in membership checks. (21527)
    • πŸ‘Œ Improved support for strings and the string libraries. (20826, 20188, 20761, 21656, 20617)
    • πŸ‘Œ Improved math support. (20979, 19707, 21151, 21131, 21129, 21130, 21512, 21126, 21127, 21128)
    • πŸ‘Œ Support for various other Python builtin functions. (21451)
    • πŸ‘Œ Support for NamedTuple. (21428)
    • All the rest of the dict methods. (21979)
    • sorted() keyword for lists and dicts. (23274)
    • βž• Add support for breaks and continues. (21692)
    • πŸ‘Œ Improved custom operator API with several bugfixes and new features. It now allows more primitive types, supports torch::List, torch::Dict and torch::Optional, supports dispatch (i.e. registering a different function for CPU and CUDA for the same operator).
    • πŸ‘Œ Support nn.GRU in script. (23266)
    • Support pack_padded_sequence and pad_packed_sequence. (23249)
    • Support torch._C._get_tracing_state in TorchScript. (23248)
    • πŸ‘Œ Support torch.as_tensor in TorchScript. (23247)
    • βž• add support for recursive compilation on Modules. (20708)
    • βž• add all builtin. (20521)
    • Add Final[T] annotated members to __constants__. (21603)
    • βž• Add save() to scripted Functions. (20386)
    • πŸ‘Œ Support for serializing class attributes. (22953)
    • πŸ‘Œ Support for class annotations. (21379)
    • πŸ‘Œ support Python 3.8 Constant node. (22007)
    • πŸ‘Œ Support for type annotations instead of torch.jit.annotate(). (21390)
    • πŸ‘Œ Support operator overloading for user-defined classes. (20033)
    • πŸ‘Œ Support recursive ModuleList / Sequential. (21306)
    • Trace multiple methods in a single Module. (19905)

    πŸ‘Œ Improvements

    • Tensor.pin_memory(): only ask for context on current device. (22229)
    • Tensor.view(): suggest using reshape() instead of contiguous() when the input is non-contiguous. (20968)
    • πŸ‘ Tensor.numpy(): throw TypeError instead of ValueError if the type isn’t supported. (21608)
    • πŸ‘ torch.norm: add support for p="nuc" with dim specified. (21022)
    • πŸ‘ torch.qr: support batching of input matrices. (20689)
    • πŸ‘ torch.qr: support some parameter akin to NumPy's mode option. (20689)
    • πŸ‘ torch.det / torch.logdet / torch.slogdet: added batching support. (22909)
    • πŸ‘ torch.cdist: support batching. (20934)
    • πŸ‘ torch.symeig: support batching. (21858)
    • torch._dirichlet_grad: support CUDA. (21191)
    • πŸ‘ torch.randperm: support torch.float16. (22102)
    • torch.Size is now pickle-able in Python2. (20952)
    • πŸ‘ torch.tensor / torch.as_tensor: infer device if input supports Numba’s __cuda_array_interface__. (20584)
    • torch.isinf / torch.isfinite: throw TypeError instead of ValueError when a non-tensor is passed in. (20817)
    • πŸ‘ nn.MultiheadedAttention: add functional support. (20415)
    • πŸ‘ nn.MultiheadedAttention: added support for key/value to have different number of features. (21288)
    • nn.MultiheadAttention: allow static key/values. (21288)
    • πŸ‘ nn.Conv{1,2,3}D: support torch.int64 dtype in forward. (20730, 22594)
    • πŸ‘ nn.AvgPool{1,2,3}D: support torch.int64 dtype in forward. (22433)
    • πŸ’Ύ nn.Module: make _save_to_state_dict overrideable. (21933)
    • autograd: Checkpointing of modules inside large fanout networks no longer hits a recursion error. (22397)
    • autograd: Track in-pace changes of Tensors through Module._apply (internal API). (21865)
    • πŸ‘ autograd.profiler: Add shape aggregation support. 20035)
    • autograd.profiler: Profile custom c10 ops. (20175)
    • πŸ‘ DataLoader: support setting batch_size=0 to disable automatic batching (collation) in DataLoader for easier bulk loading. (19228)
    • DataLoader: add multiprocessing_context parameter. (22990)
    • DataLoader: added error detection for worker_init_fn. (20150)
    • DataLoader: Retry on EINTR. (21723)
    • torch.cuda.set_rng_state / torch.cuda.get_rng_state: accept string as device parameter. (23448)
    • ⚠ CUDA: add warning when using Turing GPUs and CUDA <= 9000. (21468)
    • CUDA: warn on conditions that can trigger a cuBLAS 9.0 bug. (22034)
    • CPU: Improve CPUAllocator OOM message. (20618)
    • πŸ‘ [memory_format]: added support for torch.empty, torch.empty_like, Tensor.contiguous(), Tensor.is_contiguous() to specify / check the order in which dimensions are laid out in memory. (20455, 20558)
    • distributions.MultivariateNormal: fix precision matrix instability. (21366)
    • distributions.transforms.SigmoidTransform: fix numerical instability. (19802)

    Distributed Improvements

    • πŸ‘ DistributedDataParallel: Support DDP forward/backward calls even if no module parameter is used. (19821)
    • DistributedDataParallel: Only call into reducer if grad is enabled. (19897)
    • 🚚 DistributedDataParallel: Require finalize DDP backward only when there are indeed gradients computed, this allows application to completely discard DDP outputs and move on to the next iteration. (19901)
    • DistributedDataParallel: Improve DDP backward reduction error messages. (20586)
    • DistributedDataParallel: make DDP failure recoverable. (21591)
    • DistributedDataParallel: Delay reduction of unused parameters until first autograd hook is called. (22219)
    • πŸ‘ c10d: support tensors shared across processes. (21449)
    • c10d: ProcessGroupMPI Add device guard around MPI operations. (22446)
    • Make shuffling optional. (22479)

    Tensorboard Improvements

    • 🚚 Usage of kwarg-only arguments has been removed. (21786)

    Numpy Compatibility Improvements

    • πŸ‘ Tensor.T: added numpy-like support for reversing dimensions. (20598)
    • Tensor.ndim: NumPy equivalent property for the number of dimensions. (20565)
    • 0️⃣ Tensor.nonzero: added as_tuple argument (default False) that when True, will return a tuple of Tensors, which matches the behavior of numpy.nonzero. (20293)
    • πŸ‘ torch.dtype: support passing in NumPy dtypes as arguments. (21215)
    • torch.normal: add size parameter when called with two floats. (20545)
    • torch.where: add one-argument overload that is an alias for Numpy-like nonzero. (21986)
    • πŸ‘Œ support a number of argument name overrides, e.g. axis instead of dim. (20451)

    JIT Improvements

    • πŸ–¨ The original source code debug information is now saved with the model. If a model is saved and then loaded into another process, the loaded process can now print out error messages that point to the original source code. (22177, 22178, 22179, 22180)
    • Error message source range highlighting now includes filename, line number, and column number. (21157)
    • πŸ‘ Better Constant Propagation through Tuples. (22561)
    • βž• Add start and step parameters for range in TorchScript. (20795)
    • Support for threading options for TorchScript inference (doc)
    • βž• Add max_pool2d to symbolic derivatives. (19661)
    • ⚑️ Optimize matmul memory usage for certain cases. (23433)
    • Avoid kernel launches for zero-sized tensor inputs. (22790)
    • βž• Add support for steps (strides) in tensor slices. (20929)
    • Added error for classes that don't have an __init__ function. (21880)
    • πŸ‘ Allow classes to be used in their own methods. (20106)
    • πŸ‘ Better error message when a variable is conditionally defined. (20911)
    • Consider contained types in alias analysis. (21431)
    • Convenience APIs for script objects. (20226)
    • πŸ–¨ Don't print backtrace for interpreter errors. (20925)
    • πŸ‘Œ Improve error msg for missing attribute. (20779)
    • πŸ‘Œ Improve error msg on inferred type. (21058)
    • πŸ‘Œ Improve error msg on recursive class defs. (21842)
    • Include module names in recursive error stacks. (22921)
    • πŸ‘Œ Improve recursive scripting error message. (21841)
    • Index into a tuple with non constant integer. (20081)
    • Let ScriptModule buffer attributes can also cast device/type. (19700)
    • Lower batchmm to non-diff optimization. (19987)
    • πŸ‘‰ Make an attribute instead of a parameter. (21078)
    • πŸ‘‰ Make strtod_c compatible with different gcc abi. (21293)
    • πŸ‘‰ make magic methods work with casts too. (20654)
    • πŸ‘Œ Improve performance of alias analysis. (20899)
    • ⚠ Print a warning if a type annotation prefix is invalid according to mypy. (20884)
    • schema_matching.cpp: improve error messages. (21141)
    • Resolve with closed over variables instead of stack frame. (22270)
    • Report errors through call stack. (22280)
    • ⬇️ Reduce number of stack manipulation instructions in interpreter. (21240)

    C++ API Improvements

    • πŸ‘ nn::PoissonNLLLoss: Added support. (19316)
    • nn::Module: added replace_module API to overwrite submodules in C++ Frontend. (22546)
    • nn:Module::register_module / register_parameter / register_buffer: make public (23196)
    • data::datasets::ChunkDataReader: fix include headers and a vector issue. (19485)
    • data::datasets::ChunkDataset: add new get_batch method. (21797)
    • πŸ‘ data::datasets::ChunkDataset: add checkpoint support. (21889)
    • πŸ‘ data::datasets::ChunkDataset: add support for cross-chunk shuffling. (22347)
    • data::datasets::ChunkDataset: add sorting policy. (23053)

    MKLDNN Tensor Improvements

    βž• Add support for a number of operators on MKLDNN Tensors including:

    • Tensor.is_mkldnn: (22386)
    • Tensor.transpose(): (21943)
    • Tensor.zero_(): (20573)
    • torch.empty: (21184)
    • torch.mul: (20575)
    • nn.AdaptiveAvgPool{1,2,3}D: (19818)
    • nn.Sigmoid: (20820)
    • nn.Softmax: (21516)
    • πŸ‘ nn.Module: support saving/loading MKLDNN modules. (20799)
    • πŸ‘ nn.MaxPool{1,2,3}D: support ceil_mode. (21310)

    πŸ› Bug Fixes

    • Indexing: fix advanced indexing where there are more than (231)-1 bytes in the output. (20919)
    • Indexing: fix indexing when there are more than 65535 elements in a non-indexing first dimension on CUDA. (23123)
    • Indexing: fix issue with slicing empty tensors. (20914)
    • Tensor.index_copy_: fix segfault by properly checking dimension is in range. (21617)
    • Tensor.copy_: Fix a bug where non-blocking was not being respected. (20305)
    • πŸ‘― Tensor.clone: Fix an issue with MKLDNN tensors. (20943)
    • Tensor subclassing: give a proper error instead of crashing. (20283)
    • Fix segfault with tensors that can't be indexed with 32-bit ints. (21530)
    • πŸ”Š torch.range / torch.linspace / torch.logspace: properly respect the current Stream. (21619)
    • return the identity permutation instead of zeros when not using pivoting. (22242)
    • torch.einsum: Fix an issue where the backward pass would potentially be skipped. (22111)
    • torch.cosh: Fix an issue where torch.cos was instead calculated with torch.double dtype and vectorized instructions. (20797)
    • torch.triu / torch.tril: handle strides correctly for in-place versions. (22730).
    • torch.triu / torch.tril: Fix handling of batches > 65535 on CUDA. (21067)
    • torch.inverse / torch.solve / torch.cholesky_solve / torch.triangular_solve: Fix batch sizes > 65535 on CUDA. (21689)
    • torch.histc: return dtype is now the same as the input tensor on CUDA, matching CPU behavior. (20369)
    • torch.histc: properly return 1-dim tensor on CPU with 0-dim input and 1 bin. (21497)
    • torch.randperm: handle non-contiguous out parameter. (23043)
    • torch.unique: Fix empty tensor handling when dim is passed as an argument. (19000)
    • torch.min / torch.max: properly error on empty tensor inputs, as with CPU tensors. (19612).
    • CUDA: fix launch parameters for reductions. (22827).
    • torch.hub: fix an issue with find_module. (20782)
    • autograd: Fix a number of custom autograd Function corner cases by inverting the relationship between PyFunction and THPFunction. (22983)
    • autograd: give β€œTrying to backward through the graph a second time" error instead of internal assert when the buffers are a list of Tensors (with indexing). (21533)
    • ⏱ optim.lr_scheduler.CosineAnnealingLR: rename from CosineAnnealingLr. (23242)
    • distributions.Binomial: Fix overflow of log_prob when logits is large. (20679)
    • distributions.SigmoidTransform: Fix numerical issues that could result in inf / -inf return values. (20288)
    • distributions.Categorical.sample: fix a view bug. (23328)
    • CUDA: Give proper error message for bad cuda forks. (23322)
    • pickle: Fix Unpickling error when loading multiple objects from a file. (20270)
    • NCCL: Fix race condition. (23040)

    πŸ›  torch.nn Bug Fixes

    • nn.Conv{1,2,3}D: fix memory leak on MKLDNN code path. (22392)
    • nn.Conv{1,2,3}D: properly unpickle older pickled versions. (21687)
    • nn.CTCLoss: fix backward on CUDA when 2d target tensor is larger than max_target_length. (20971)
    • nn.CTCLoss: fix some numerical stability issues. (21392)
    • nn.CTCLoss: disable buggy non-deterministic CudNN algorithm. (22977)
    • πŸ›  nn.CTCLoss: fixed empty target handling. (21910, 23298)
    • πŸ”€ nn.SyncBatchNorm: fix syncing of running statistics when count size differs between GPUs. (22248)
    • πŸ”€ nn.SyncBatchNorm: retain requires_grad value when converting from nn.BatchNorm. (22569)
    • nn.SyncBatchNorm: correctly handle process_group in convert_sync_batchnorm. (19240)
    • nn.MultiheadedAttention: fix for torch.float16 dtype. (21658).
    • nn.EmbeddingBag: fix NaN output when input is empty. (21400)
    • nn.Dropout: fix python crash (with SIGFPE) when called on an empty cuda tensor. (20541)
    • nn.MaxPool: fix output size calculation in some corner cases. (22304)
    • nn.MaxPool: return valid indices if all entries are -inf. (23161)
    • nn.Softmax: respect the current Stream. (22470)
    • πŸ”Š nn.LogSoftmax: fix numerical stability issues. (21672)
    • nn.Module.load_state_dict: break ref cycle. (20397)
    • nn.Module: fix loading in 32-bit environments. (20900)
    • nn.utils.rnn.pack_padded_sequence: Fix segfault on empty tensors. (21461)
    • nn.utils.spectral_norm: fix loading state_dict when strict=False. (22545)
    • 🏁 CudNN: Fix uninitialized PoolWindow on Windows. (22405)

    πŸ›  Distributed Bug fixes

    • nn.parallel.DataParallel: fix error in no_grad mode. (21262)
    • torch.distributed.all_gather: fix errors for views and aliases. (21490)
    • c10d: fix collective communication errors on empty tensors. (20658)

    πŸ›  JIT Bug Fixes

    • πŸ›  Fix specialized list from dict keys. (23267)
    • Switch keys to be sequential and stable in pickle serialization. (23280)
    • deepCopy also copies type information of lists, (23271)
    • dictKeys and dictItems ops on typed dicts return typed lists. (23270)
    • πŸ›  Fix pickler bug where it would not load if no tensors were saved. (23263)
    • Avoid multiple writes to files on export. (21186)
    • πŸ‘ Better error msg for mismatched dict key type. (22231)
    • Better error msg for using Python builtin_function_or_method. (22935)
    • Better error msg in __get_state__ to let a user know that ScriptModules can't be deep-copied at the moment.(20885)
    • πŸ‘ Better error msg when seeing a unsupported builtin function. (21068)
    • dropout derivative should respect the train flag. (20760)
    • Fix __constants__ for some nn modules. (21071)
    • Fix ScriptModule. __dir__ (). (22426)
    • πŸ›  Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias. (21425)
    • πŸ›  Fix a bug in loop unrolling. (21239)
    • πŸ›  Fix alias annotations for dict ops. (22900)
    • πŸ›  Fix inaccurate SourceRange reporting. (21109)
    • πŸ›  Fix broken indexing when using None and ellipses indexing together. (22905)
    • πŸ›  Fix bug in CompilationUnit::define. (21886)
    • πŸ›  Fix compilation order for class methods. (20094)
    • πŸ›  Fix dead code elimination over loops. (22632)
    • πŸ›  Fix dead code elimination in onnx export. (22476)
    • πŸ›  Fix incorrect default on Graph::toString. (21370)
    • πŸ›  Fix optional type promotion for classes. (21593)
    • πŸ›  Fix optional type unification. (19813)
    • πŸ›  Fix NameError with PYTORCH_JIT=0. (20120)
    • πŸ›  Fix overspecializing constants in compilation. (22816)
    • πŸ›  Fix pow() bug on overloads. (20824)
    • πŸ›  Fix recusive method compilation. (21862)
    • πŸ›  Fix reflection on weak modules, copy attributes. (20190)
    • πŸ›  Fix slow unpickling. (21542)
    • πŸ›  Fix input/output type mismatch. (20829)
    • πŸ›  Fix insert_guard for norm decomposation. (19646)
    • πŸ›  Fix Trace inlining of graphs with optional inputs. (22686)
    • πŸ›  Fix tracing bugs where using 1 - x in C++ would cause the size of 1 to get hardcoded. (20932)
    • πŸ›  Fix tuple indexing bug. (21521)
    • πŸ›  Fix type hints for None constants. (23029)
    • Fix weak module cuda() _flat_weights bug. (21107)
    • πŸ›  Fix WeakIValueEq. (21891)
    • πŸ›  Fixed gcd to use 64 bit integers. (21041)
    • πŸ›  Fixed list() not making a copy. (22093)
    • πŸ›  Fix race condition on Module::forward method. (21398)
    • Made a += b for lists do an in place add. (21896)
    • Made floor/ceil return ints. (21124)
    • Out-of-memory on GPU due to the "weak_script" decorators. (20588)
    • πŸ–¨ Override print when python is present. (21625)
    • Set __file__ for torch.ops. (21888)
    • Set correct list type in pybind_utils. (23188)

    πŸ›  C++ Frontend bug fixes

    • nn::RNN: Fix assertions in bidirectional RNN. (22850).
    • nn::MaxPool / nn::AvgPool: expand incomplete kernel size, as in Python. (22073, 22075)
    • Optim: Fix memory leak when weight_decay is applied to Adam, Adagrad, RMSProp. (23125)
    • Optim::SGD: fix memory leak with weight_decay. (23007)
    • torch::autograd::Scatter / torch::autograd::Gather: Fix nullptr bug. (20286)
    • torch::nn::parallel::data_parallel: fix gradient computation error. (20910)
    • πŸ— [C++ Extensions] Fix an issue when building multiple extensions in the same directory. (20221)

    πŸ—„ Deprecations

    πŸ—„ Masking via torch.uint8 Tensors is now deprecated in favor of masking via torch.bool Tensors.

    πŸ’₯ See the Breaking Changes section for more details about torch.bool Tensors and comparison operators.

    torch.masked_select, torch.masked_fill, torch.masked_scatter now expect torch.bool masks rather than torch.uint8.

    >>> a = torch.tensor([1, 2, 3])
    >>> b = torch.tensor([3, 1, 2])
    >>> a.masked_select(tensor([0, 1, 1], dtype=torch.uint8))
    UserWarning: masked_select received a mask with dtype torch.uint8,
    this behavior is now deprecated, please use a mask with dtype torch.bool instead.
    tensor([2, 3])
    # instead use torch.bool
    >>> a.masked_select(tensor([False, True, True]))
    tensor([2, 3])

    Comparison operators with out= parameters now expect torch.bool dtype rather than torch.uint8.

    >>> a = torch.tensor([1, 2, 3])
    >>> b = torch.tensor([3, 1, 2])
    >>> res = torch.empty_like(a, dtype=torch.uint8)
    >>>, b, out=res)
    UserWarning: received 'out' parameter with dtype torch.uint8, this behavior
    is now deprecated, please use 'out' parameter with dtype torch.bool instead.
    tensor([0, 1, 1], dtype=torch.uint8)
    # instead use torch.bool
    >>> res = torch.empty_like(a, dtype=torch.bool)
    >>>, b, out=res)
    tensor([False, True, True])

    πŸ—„ Legacy autograd.Function (Function without static forward method) is now deprecated

    >>> class MyLegacyFunction(Function):
    >>> def forward(self, x):
    >>> return x
    >>> def backward(self, grad_output):
    >>> return grad_output
    >>> MyLegacyFunction()(torch.randn((3,), requires_grad=True)
    UserWarning: Legacy autograd function with non-static forward method is deprecated
    and will be removed in 1.3. Please use new-style autograd function
    with static forward method.
    # instead use new-style Autograd Function
    >>> class MyFunction(Function):
    >>> @staticmethod
    >>> def forward(ctx, x):
    >>> return x
    >>> @staticmethod
    >>> def backward(ctx, grad_output):
    >>> return grad_output
    >>> MyFunction.apply(torch.randn((3,), requires_grad=True)

    πŸ“š See the torch.autograd.Function documentation for more details.

    πŸš€ torch.gels: has been renamed to torch.lstsq; torch.gels will work for this release but is now deprecated. (23460)

    🐎 Performance

    • 🐎 Advanced Indexing: significantly improve performance of advanced indexing backward. (20557)
    • 🐎 Tensor.copy_: increase broadcasting CUDA copy performance by 25%. (20685)
    • ⚑️ torch.matmul: Optimize the case A.ndim <= 2 && B.ndim >= 3, shows up to 15x speed up. (20448)
    • 🐎 torch.bmm: Improve performance by up to 3x for small cases on CPU by applying TensorAccessor. (20266)
    • 🐎 torch.inverse: Move workspace query and allocation outside loop to improve performance by up to 5x. (20904)
    • ⚑️ torch.topk: Optimize CPU perf using parallel and partial sort, up to 6x improvement. (22865)
    • torch.cdist: Improve CPU perf by up to 10x for some cases. (20605)
    • torch.normal: Move normal, normal_means, normal_stddevs, and normal_means_stddevs to ATen, increasing performance by up to 3x. (21287)
    • torch.bernoulli: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop, increasing performance by up to 2x. (21300)
    • πŸ“œ torch.coalesce: Use _sparse_coo_tensor_unsafe in coalesce for up to 10x speedup. (21214)
    • torch.sinh / torch.cosh: Parallelize and vectorize on CPU. (21115)
    • torch.lerp: Vectorize on CPU. (22038)
    • torch.eye: Parallelize on CPU. (21077)
    • torch.randperm: Parallelize initialization in randperm on CPU. (21529)
    • Vectorization: Don't split 256-bit AVX2 load/store intrinsics. (20609).

    🐎 Torch.NN Performance Improvements

    • 🐎 nn.Softmax: Add persistent CUDA kernels that increase performance 2-10x on small inputs. (20827)
    • 🐎 nn.Embedding / nn.EmbeddingBag: Optimize CUDA kernel, increasing performance up to 2.7x. (22016)
    • ⚑️ nn.Linear: optimize BERT model perf by using mkldnn inner product. (21851)
    • nn.Conv{1,2,3}D: improve perf for depthwise convolutions in torch.float16 on Volta and Turing GPUs. (22302)
    • ⚑️ nn.RNN: optimize on CPU by fusing matmul ops. (22512)
    • nn.Upsample: a number of significant perf improvements on CUDA. (21879, 21694).
    • ⚑️ nn.functional.layer_norm: optimize a fast path for layer_norm, increasing perf by up to 4x on CPU. (20345, 20883)
    • πŸ‘‰ Use mkldnn inner product for nn.Linear() to improve BERT perf. (21851).

    πŸ“š Documentation

    • torch.bool: doc the Boolean tensor type. (21601)
    • πŸ“„ torch.as_strided: add docs. (22842)
    • πŸ“„ torch.empty_strided: add docs. (23740)
    • torch.lerp: clarify broadcasting requirements. (23268)
    • torch.enable_grad / torch.no_grad / torch.set_grad_enable: clarify interaction between these features. (23310)
    • torch.autograd.grad_mode: Document that no_grad is thread local. (21755)
    • torch.multiprocessing: Explain refcounting of CUDA tensors. (19904)
    • ⚠ torch.Tensor: Add a warning about memory usage. (20801)
    • Document RNG state consumption. (22540)
    • ⏱ torch.optim.lr_scheduler.CyclicLR: Clarify base_momentum and max_momentum. (20880).
    • Document production environment features. (23010)
    • βž• Add note about contributing recently released research. (23513)
    • 🐎 Clarify performance implications of deterministic mode. (21337)
    • ⚑️ Update cuda pinned memory note to include (20977)

    πŸ“š Torch.NN Documentation

    • πŸ“„ nn.functional / nn.init: Break up NN in docs so they load faster. (21291)
    • 🚚 nn.functional.conv{1,2,3}d: Remove padding_mode. (20891)
    • nn.functional.upsample / nn.functional.interpolate: add note about overshooting with mode=β€˜bicubic’. (23321)
    • nn.init.zeros_ / nn.init.ones_: add documentation. (23145)
    • nn.MultiheadAttention: Add documentation for add_bias_kv, add_zero_attn, and attn_mask. (20071)
    • πŸ“š nn.MultiheadAttention: Fix documentation for attention mask shape. (20850)
    • nn.Softmax: Fixed to specify dimension to prevent warning in 1.1.0. (20310)

    πŸ“š Contributor Documentation

    • πŸ“š Updated web links on contribution_guide and governance documentation. (21243)
    • πŸ‘Œ Improve documentation for publishing hub models. (21307)
    • Suggest a faster linker in the contributing guide. (21334)
    • βž• Add CUDA C++11 and profiling notes to the contribution guide. (21386)

    πŸ“š Build Documentation

    • βž• Add magma for CUDA 10.1 to Windows docs. (19914)
    • πŸ‘Œ Improve build-from-source instructions. (20088)
    • βž• Add ninja to build instructions. (20079)
    • ⚑️ Update libtorch build docs. (21150)

    πŸ“š TensorBoard Documentation

    • πŸ“š Tensorboard Documentation has been greatly improved! Browse the latest version here.

    πŸ“š Torch HUB Documentation

    • πŸ‘Œ Improve docs for publishing hub models. (21307)
    • ⚑️ Update docs of entry point in hub. (21568)


    πŸ‘ In PyTorch 1.2, we have added the full support for ONNX Opset 7, 8, 9 and 10 in ONNX exporter, and we have also enhanced the constant folding pass to support Opset 10. The export of ScriptModule has better support. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export.

    πŸ‘Œ Supporting More ONNX Opsets

    • βž• Add basic supports for multiple ONNX Opsets and support for Opset 10. (19294)
    • πŸ‘Œ Support ONNX Opset 7 and 8 in PyTorch ONNX Exporter. (22421, 20036)
    • Export Dropout for Opset 10. (20710)
    • Export Slice and Flip for Opset 10. (20533)
    • Export Interpolate (Resize) for Opset 10. (21434)

    πŸ‘ Enhancing the Support for ScriptModule

    • πŸ‘Œ Support multiple outputs in ScriptModule in ONNX Exporter. (20256)
    • πŸ‘Œ Support tensor factories in ScriptModule in ONNX Exporter. (20255)
    • πŸ‘Œ Support tuples as inputs and outputs in ScriptModule. (20784)

    Exporting More Torch Operators to ONNX

    • Export custom ops. (21321)
    • Export torch.arange. (22601)
    • Export torch.masked_fill. (22521)
    • Export torch.floor, torch.ceil, torch.log2 and prim::shape. (17895)
    • Export torch._dim_arange. (20078)
    • Export torch.randn_like. (20093)
    • Export torch._standard_gamma. (20126)
    • Export torch.topk. (21104)
    • Export __and__, __or__. (17894)
    • Export torch.sign. (20470)
    • Export torch.scatter. (18543)
    • Export torch.rand. (20559)
    • Export torch.gather. (21235)
    • Export torch.cosine_similarity. (21884)
    • Export torch.sum. (22240)
    • πŸ”Š Export torch.logsumexp. (22306)
    • Export torch.layer_norm. (22265)

    Extending Existing Exporting Logic

    • πŸ‘Œ Support torch.min and torch.max with dim. (19689)
    • πŸ‘Œ Support maxpool with dilations. (18721)
    • πŸ‘Œ Support RNN with batch_first=True. (19766)
    • πŸ‘Œ Support Upsample with dynamic input. (20116)
    • πŸ‘Œ Improve support for Loop export. (20445)
    • Enable torch.full with scalar parameters. (21931)
    • βž• Added support for exporting models with variable length input/output to ONNX. (20034)

    ⚑️ Optimizing Exported ONNX Graph

    • πŸ‘Œ Support constant folding in Opset 10. (22515)
    • πŸ‘Œ Support negative indexing for Slice in constant folding optimization. (21811)

    πŸ›  Bugfixes/Improvements

    • πŸ›  Fix the shape of PReLU weight. (21330)
    • πŸ›  Fix the export for torch.pixel_shuffle. (21486)
    • πŸ›  Fix the export for torch.full. (21669)
    • ⚑️ Update logic for folding onnx::Constant nodes. (20109)

Previous changes from v1.1.0

  • πŸ‘ Note: CUDA 8.0 is no longer supported


    TensorBoard (currently experimental)

    🌐 First-class and native support for visualization and model debugging with TensorBoard, a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple from torch.utils.tensorboard import SummaryWriter command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs here.

    [JIT] Attributes in ScriptModules

    πŸ‘€ Attributes can be assigned on a ScriptModule by wrapping them with torch.jit.Attribute and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you call, so they are a great way to store arbitrary state in your model. See the docs for more info.


    class Foo(torch.jit.ScriptModule):
      def __init__ (self, a_dict):
        super(Foo, self). __init__ (False)
        self.words = torch.jit.Attribute([], List[str])
        self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int])
      def forward(self, input: str) -> int:
        return self.some_dict[input]

    πŸ‘ [JIT] Dictionary and List Support in TorchScript

    πŸ‘ TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and for…in constructs.

    [JIT] User-defined classes in TorchScript (experimental)

    πŸ‘€ For more complex stateful operations, TorchScript now supports annotating a class with @torch.jit.script. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See the docs for more info.

    class Pair:
        def __init__ (self, first, second)
            self.first = first
            self.second = second
        def sum(self):
            return self.first + self.second

    DistributedDataParallel new functionality and tutorials

    nn.parallel.DistributedDataParallel: can now wrap multi-GPU modules, which enables use cases such as model parallel (tutorial) on one server and data parallel (tutorial) across servers.

    πŸ’₯ Breaking Changes

    • Tensor.set_: the device of a Tensor can no longer be changed via Tensor.set_. This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in a Storage on a different CUDA device. Instead, set up the Tensor on the correct device from the beginning. (18832).
    • ⏱ Pay attention to the order change of lr_scheduler.step(). (7889).
    • 0️⃣ torch.unique: changed the default value of sorted to True. (15379).
    • [JIT] Rename isTensor api -> isCompleteTensor. #18437
    • [JIT] Remove GraphExecutor's python bindings. #19141
    • [C++]: many methods on Type no longer exist; use the functional or Tensor method equivalent. (17991).
    • [C++]: the Backend constructor of TensorOptions no longer exists. (18137).
    • [C++, Distributed]: Remove c10d ProcessGroup::getGroupRank has been removed. (19147).

    πŸ†• New Features


    • torch.tril_indices, torch.triu_indices: added operator with same behavior as NumPy. (14904, 15203).
    • torch.combinations, torch.cartesian_prod: added new itertools-like operators. (9393).
    • torch.repeat_interleave: new operator similar to numpy.repeat. (18395).
    • torch.from_file: new operator similar to Storage.from_file, but returning a tensor. (18688).
    • torch.unique_consecutive: new operator with semantics similar to std::unique in C++. (19060).
    • πŸ‘ torch.tril, torch.triu, torch.trtrs: now support batching. (15257, 18025).
    • πŸ“œ torch.gather: add support for sparse_grad option. (17182).
    • torch.std, torch.max_values, torch.min_values, torch.logsumexp can now operate over multiple dimensions at once. (14535, 15892, 16475).
    • torch.cdist: added operator equivalent to scipy.spatial.distance.cdist. (16168, 17173).
    • torch. reports detailed version of all libraries. (18579).


    • nn.MultiheadedAttention: new module implementing MultiheadedAttention from Attention Is All You Need. (18334).
    • πŸ‘ nn.functional.interpolate: added support for bicubic. (9849).
    • πŸ”€ nn.SyncBatchNorm: support synchronous Batch Normalization. (14267).
    • πŸ‘ nn.Conv: added support for Circular Padding via mode='circular'. (17240).
    • nn.EmbeddingBag: now supports trainable `per_sample_weights. (18799).
    • πŸ‘ nn.EmbeddingBag: add support for from_pretrained method, as in nn.Embedding. (15273).
    • RNNs: automatically handle unsorted variable-length sequences via enforce_sorted. (15225).
    • nn.Identity: new module for easier model surgery. (19249).

    Tensors / dtypes

    • πŸ‘ torch.bool: added support for torch.bool dtype and Tensors with that dtype (1-byte storage). NumPy conversion is supported, but operations are currently limited. (16810).


    • ⏱ optim.lr_scheduler.CyclicLR: Support for Cyclical Learning Rate and Momentum. (18001).
    • ⏱ optim.lr_scheduler.CosineAnnealingWarmRestarts: new scheduler: Stochastic Gradient Descent with Warm Restarts). (17226).
    • πŸ‘Œ Support multiple simultaneous LR schedulers. (14010)


    • πŸ‘ torch.distributions: now support multiple inheritance. (16772).


    • quasirandom.SobolEngine: new sampler. (10505).


    • πŸ‘ nn.parallel.DistributedDataParallel: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). (18251, 18953).

    TorchScript and Tracer

    • πŸ‘ Allow early returns from if-statements. (#154463)
    • βž• Add an @ignore annotation, which statically tells the TorchScript compiler to ignore the Python function. (#16055)
    • Simple loops on lists. (#16726)
    • Ellipses (...) in Tensor indexing. (#17763)
    • None in Tensor indexing. (#18615)
    • πŸ‘Œ Support for basic list comprehensions. (#17267)
    • βž• Add implicit unwrapping of optionals on if foo is not None. (#15587)
    • Tensors, ints, and floats will once again be implicitly cast to bool if used in a conditional. (#18755).
    • Implement to(), cpu(), and cuda() on ScriptModules. (#15340 , #15904)
    • βž• Add support for various methods on lists: (clear(), pop(), reverse(), copy() , extend(),index(), count(), insert(), remove() ).
    • βž• Add support for sort() on lists of specialized type (Tensors, int, float, bool). (#19572)
    • βž• Add support for various methods on strings: (index(), slice(), len())
    • πŸ‘Œ Support in TorchScript. ( #15976 )
    • πŸ‘Œ Support for Torch.tensor() in TorchScript. (#14913, #19445)
    • πŸ‘Œ Support for torch.manual_seed() in TorchScript. (#19510)
    • πŸ‘Œ Support for nn.LSTM in TorchScript. (#15744)
    • πŸ‘Œ Support for nn.init in TorchScript. (#19640)
    • βž• Add hash() builtin. (#18258)
    • βž• Add min() and max() builtins for numerical types. (#15680)
    • βž• Add isinstance() builtin, which performs a static type check. (#15076)
    • βž• Add train() / eval() / is_training() to C++ ScriptModule API. (#16044)
    • πŸ‘ Allow List arguments to Python functions called from TorchScript. (#15721)
    • πŸ‘ Allow using std::vector and std::unordered_map as arguments to custom operators. (#17587)
    • Tracer: now allows passing static dicts and lists as trace inputs. (#18092, #19580)
    • πŸ‘ Allow generic containers as ScriptModule inputs. (#16482)
    • πŸ‘ Allow nn.Sequential in ModuleList. (#16882)

    Experimental Features

    • [Quantization] (API unstable): added limited support for quantized datatypes via torch.qint8 dtype, torch.quantize_linear conversion function. (18230).
    • [MKLDNN tensor] (API unstable): Added limited (opaque) support for MKLDNN tensors via Tensor.to_mkldnn(); operators are currently limited to ResNext101 operators. (17748).

    πŸ‘Œ Improvements

    • torch.min, torch.max, torch.median, torch.mode, torch.kthvalue, torch.symeig, torch.eig, torch.pstrf, torch.qr, torch.geqrf, torch.solve, torch.slogdet, torch.sort, torch.topk, torch.gels, torch.triangular_solve, torch.svd now return namedtuples describing their outputs. (16186, 16950, 17093, 17195, 15429).
    • πŸ“Œ torch.empty (and other factory functions): now take a pin_memory kwarg; can now pin without going through torch.Storage interface.. (18455).
    • πŸ‘ torch.histc: Now supported on CUDA. (15842)
    • torch.unique: Add return_counts. (18391, 18651).
    • πŸ”Š torch.logspace: add the ability to specify a base. (19542).
    • πŸ–¨ torch.set_printoptions: added scientific notation support. (16876).
    • torch.btrifact now handles tensors with greater than 3 dimensions. (14964).
    • πŸ‘ torch.kthvalue: now supported on CUDA. (17544).
    • πŸ‘ torch.abs: now supported on uint8 and int8 dtypes. (16893).
    • πŸ‘ torch.stack, now supported for CPU half tensors. (16389).
    • πŸ‘ torch.cross: added support for negative dimensions. (17582).
    • πŸ‘ torch.lerp: add support for weight as a Tensor. (17348).
    • torch.transpose: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is. (17462, 17535).
    • πŸ”Š torch.linspace, torch.logspace can now be used with steps=1 and start != end. (14748).
    • torch.cholesky: changed the derivative from a triangular matrix to symmetric matrix. (19116).
    • torch.lerp: Improved numerical stability. (18871).
    • torch.logdet, torch.slogdet: improve numerical precision. (18449).
    • Tensor. __contains__ is now supported. (17733).
    • πŸ‘ Tensor.fill_ and torch.zeros now support half on CPU. (17536).
    • Tensor.resize_as_, Tensor.view: now supported on half CPU tensors. (18821).
    • Tensor indexing: allow indexing via NumPy booleans. (14932).
    • nn.EmbeddingBag: enable half precision dense backward. (19293).
    • nn.Embedding: fix dense Embedding to work with double backwards. (9078).
    • nn.MaxPool1d: Allow list and tuples to be passed as output_size. (16489).
    • πŸ‘ nn.CTCLoss: support zeroing infinite losses via zero_infinity argument. (16199).
    • πŸ‘ nn.Dropout: add support for enabling during eval. (17549).
    • ⚠ nn.MSELoss: add warning about unexpected broadcasting. (18349).
    • nn.Module.load_state_dict: also return missing_keys and unexpected_keys. (18668).
    • nn.parallel.data_parallel: Enforce devices match device_ids. (17129).
    • torch.device: handle in more places that used to accept only device ordinals. (14929)
    • dtype.int8 tensors can now be converted to NumPy arrays. (14710).
    • nn.functional.gumbel_softmax: allow multidimensional input with dim argument. (13339).
    • nn.functional.cosine_similarity: improved precision. (18250).
    • torch.autograd: Don't keep unnecessary saved_inputs alive, increasing memory efficiency. (16583).
    • torch.autograd.profiler: add Self (non-nested) CPU Time Total, CPU time total (19378).
    • πŸ“Œ DataLoader: support accepting a custom memory pinning function. (16743).
    • DataLoader: retry libshm on EINTR. (15964).
    • πŸ›  DataLoader: fixed an issue with pin_memory and PackedSequence. (18079)
    • data.utils.collate, data.utils.pin_memory: now preserve namedtuples. (16440)
    • πŸ‘‰ Use IndexError instead of RuntimeError on many indexing error cases. (17049, 17114).
    • πŸ‘Œ Support indexing a torch.float16 tensor on CPU. (17645).
    • βž• Add (limited) error checking in case of internal overlap on inplace operators. (19317, 17927).
    • πŸ‘ utils.checkpoint.checkpoint: support None as an argument to checkpoint function. (17969).
    • πŸ‘» torch.autograd: added more information for one of the variables needed for gradient computation has been modified by an inplace operation exception. (18523).
    • πŸ”€ cuda.synchronize: add a device argument. (19573).
    • cuda.reset_max_memory_*: now supported. (15985).
    • distributions.Independent: can now calculate KL Divergence. (17681).
    • 0️⃣ torch.distributed.new_group: now supports overriding default backend. (18595).
    • πŸ–¨ torch.distributed.init_process_group: will now propagate timeout to underlying Store. (16571).
    • [JIT] Preserve module hierarchy on traced modules. (#15101)
    • [JIT] Add metadata for TracedModules. (#17311)
    • [JIT] Improve portability of int and float checks. (#19532)
    • [JIT] Preserve method parameter names during serialization. (#16750)
    • [JIT] Add a correctness check for C++ types to custom operators. (#15247)
    • [JIT] Added a few extra python bindings to help with walking the IR graph from Python. #17822
    • [JIT Error Messages] Print out operator suggestions for "unknown builtin op" error. (#15183)
    • [JIT Error Messages] Better error message when creating a module instance in TorchScript. (#16416)
    • [JIT Error Messages] Print suggestion to add nn.Module attributes to __constants__ when they are using in TorchScript. (#18164)
    • [JIT Error Messages] Improve error message when you try to save a ScriptModule. (#15321)
    • [JIT Error Messages] Improve error message when trying to save a model with Python code. (#16850)
    • [JIT Error Messages] Better errors when trying to close over a Tensor with grad enabled while tracing. (#18298, #19645)
    • [JIT Error Messages] Better error when trying to add a Tensor to __constants__. (#16724)
    • [JIT Error Messages] Better error when a module list isn't added to __constants__. (#17167)
    • [JIT Error Messages] Add a warning when attempting to trace legacy constructors. (#16770)
    • [JIT Error Messages] Improve hint when trying to trace non-deterministic nodes. (#17957)
    • [C++] nn::Module: added Python interop. (13481).
    • [C++] autograd::profiler: is now supported. (16580)
    • [C++] allow detection of C++ ABI flag for cpp extensions from available runtime information. (18994).
    • [C++] torch.argsort is now supported in C++. (17099).
    • [C++] Tensor.isnan: now supported in C++. (15722).
    • [C++]: Added named submodule support to nn::Sequential. (17552).
    • [C++]: Kaiming Initialization. (14718).
    • [C++] torch::data::transforms::Normalize: now supported in C++. (15891).
    • [C++]: Support call operator on module holder calling forward. (15831).
      Random and Sequential distributed samplers. (16910).
    • [C++]: pretty printing of C++ Modules. (15326).
    • [C++] Support serializing std::vector<torch::Tensor>. (19677).

    πŸ› Bug Fixes


    • correct erroneous calculation on large tensors. (15653).
    • torch.mean (and other reductions): fix incorrect calculation on CUDA on large inputs. (16023).
    • nn.Conv: correctly handle non-contiguous inputs on MKLDNN convolution codepath. (16300).
    • Tensor.eq_: Fix erroneous calculation. (15475).
    • torch.mean: Fix fp16 output calculation. (14878).
    • nn.PoissonNLLLoss: Properly handle reduction=None. (17358).
    • [JIT] Fix bug where custom ops could get optimized out if their outputs weren't used. (#18711).
    • [JIT] Fix bug where the model serializer would accidentally reorder statements. (#17557).


    • Tensor.round is now consistently half to even. (17443).
    • Tensor.resize_: Fix some 0-element cases. (14874).
    • Tensor.numpy: Fix conversion of torch.int8 dtype. (15194).
    • Tensor.grad: correctly handle del. (16525).
    • Tensor.clamp: correctly handle NaN on CUDA. (15479).
    • Tensor.topk: properly set launch bounds on CUDA. (17296).
    • Tensor.kthvalue: treat NaN as bigger than any number. (17824).
    • πŸ”€ Tensor.copy_: Properly synchronize on src and dst sreams. (16966).
    • Tensor indexing: Fix incorrect dimension error message. (16495).
    • πŸ“œ Tensor.coalesce, Tensor.clone, Tensor.to_dense: fixed for sparse 0-dimensional tensors. (17379).
    • torch.isinf: Don't error out on integral tensors. (15489).
    • torch.argsort, torch.sort: Match NumPy by considering NaNs to be larger than any number. (15886).
    • torch.geqrf, torch.ormqr: when an out parameter is specified, dispatch to the correct function. (16964).
    • torch.cuda.get_device_name / torch.cuda.get_device_capability: Fix handling of optional. (17222).
    • Tensor.tril_ / Tensor.triu_: properly reuse input memory. (17031).
    • torch.arange: fix shape inconsistency between CPU and CUDA. (18462).
    • torch.empty (and other size-based factory functions): properly enforce non-negative sizes. (17077).
    • πŸ‘ torch.load: support serializing / deserializing pathlib.Path object. (18562).
    • nn.BatchNorm: correctly handle very large batches. (17047).
    • πŸ”Š nn.Softmax / nn.LogSoftmax: fix double backward for torch.half. (17330).
    • nn.Softmax: handle empty inputs in backward. (17259).
    • nn.NLLLoss: Fix crash when ignore_index is out-of-bounds on CPU. (17328).
    • πŸ”Š nn.Softmax, nn.LogSoftmax: handle 0-element inputs. (17651).
    • nn.CTCLoss: correct error checking. (16269).
    • πŸ‘ nn.Conv: better report convolution size mismatch. (17436).
    • torch.nn.functional.cosine_similarity: fix output sometimes returning result > 1.0. (18168).
    • nn.parallel.data_parallel: Fix handling of buffers that require_grad. (13352).
    • nn.parallel.data_parallel: would previously sometimes frees tensors before all pending operations finish. (18465).
    • πŸ›  torch.distributed.broadcast: fixed repeated calls leading to OOM. (19219).
    • torch.multiprocessing: fix serialization of integer nn.Parameters. (18639).
    • torch.multiprocessing: Fix handling of distributions on CUDA. (16854).
    • torch.nonzero: Fix for 0-dimensional tensors on CUDA. (17406).
    • torch.slogdet: Fix sign requiring grad when input required grad. (16337).
    • βͺ torch.cuda.Stream: Properly restore stream on destination device when switching devices. (17439).
    • πŸ”€ torch.cuda.Stream: Fixed synchronization issue when used with non-current device. (15689).
    • torch.cuda.Stream: properly change device in stream context manager. (16128).
    • πŸ›  DataLoader: fixed a hang when no data was read and the buffer size is smaller than the chunk size. (17409).
    • 0️⃣ DataLoader: _utils.collate.default_collate now converts bool lists to byte Tensors, not integer tensors.
    • DataLoader: ensure dataset is indexed by integers. (17649).
    • πŸ“œ Handle transposed dense tensors in backwards. (18737).
    • πŸ“œ torch.sparse.sum: Fix parsing of dim. (16517).
    • πŸ“œ / torch.sparse.addmm: fix broadcasting and using uninitialized data. (16572).
    • πŸ“œ Tensor.to_sparse: Fix for 0-dimensional tensors. (17406).
    • πŸ“œ SparseTensor: fix add with non-contiguous values tensors. (18179).
    • Fix compare_exchange_weak in weak_intrusive_ptr. (16302).
    • utils.model_zoo.load_url: Fix race condition. (16578).
    • have len properly take into account num_samples. (15991).
    • torch.distributions: Fix precision issue with expansion that prefers probs over logits. (18614).
    • πŸ›  distributions.dirichlet.Dirichlet: fixed an underflow issue. (17488).
    • πŸ›  distributions.binomial.Binomial.log_prob: fixed numerical stability issue. (15962).
    • πŸ†“ Caching Allocator: Free all blocks with outstanding events on OOM-retry. (19222).
    • torch.dtype: fix pickling issue with Python 2. (18045).
    • Fix SIGCHLD checking. (19421).
    • ⚑️ optim.Optimizer: Properly copy defaults. (19308).
    • ⏱ optim.lr_scheduler.CosineAnnealingLR: Fix division-by-zero error. (19180).
    • ⏱ optim.lr_scheduler.ReduceLROnPlateau: fix bug when the argument to step is reused outside the function.
    • cudNN: fix race condition with multiple threads calling into the same device. (15080).
    • cudNN: Properly specify accumulation types. (16825).
    • cuDNN: Fix incorrectly selecting slower algorithms in certain cases. (15881).
    • cuFFT: Properly handle CUDA contexts. (19300)
    • Fix infinite loop in reduction functions when get_max_threads is nonzero but num_threads is 1. (15114).
    • πŸ›  Fix tensor printing bug with Python 2. (12732).
    • MKLDNN: fix thread safety. (17022).
    • [JIT] floordiv: Fix integer division and divide-by-zero semantics. (#15813).
    • [JIT] Fix bug in alias analysis that disabled optimizations even in models without mutation. (#18416).
    • [JIT] ord(): Fix handling of utf8 chars. (#19423).
    • [JIT] Fix error when too many parameters are passed to a fused CUDA kernel. (#18063).
    • [JIT] Fix bug where common subexpression elimination accidentally introduced aliasing to function outputs. (#19576).
    • [JIT] Fix infinite loop in requires_grad analysis pass. (#18361).
    • [JIT] Fix ordering of parameters for in (#18198).
    • [JIT]] Fix contiguous autodiff and AutoGradZero inconsistency (#18633).
    • [JIT] Fix error reporting in NVRTC use of the fuser. (#18327).
    • [JIT] Ensure GIL is acquired before doing module lookup on import. (#17135).
    • [JIT] Fix bug where _unique_state_dict could contain duplicate Tensors. (#18139).
    • [C++]: Fix module serialization issue where one submodule doesn't have any parameters, but its submodules do. (15033).
    • [C++]: Add Stream and Event APIs. (15937).
    • [C++]: Fix Module serialization incompatibility between Python and C++ with weight-less layers. (19740).
    • [C++]: Properly pass extra_cuda_cflags to C++ extensions on Windows. (18638).
    • [C++] Make SGD semantics match python. (15840).
    • [C++] torch::nn::init::orthogonal_: match Python API. (18915).

    πŸ—„ Deprecations

    • 🚚 torch.btrifact: the deprecated info argument has been removed. (14935).
    • 0️⃣ torch.potrs has been deprecated, use torch.cholesky_solve instead. Note that upper defaults to False for torch.cholesky_solve, and True for torch.potrs. (15334).
    • πŸ—„ torch.pstrf is deprecated; use torch.cholesky instead. Note that upper defaults to False for torch.cholesky, and True for torch.pstrf. (17866).
    • 0️⃣ torch.potri is deprecated; use torch.cholesky_inverse instead. Note that upper defaults to False for torch.cholesky_inverse, and True for torch.potri. (19498).
    • torch.btrifact_with_info has been deprecated; use with get_infos=True instead.(18435).
    • πŸ—„ torch.btrifact has been deprecated; use the new name instead. (18435).
    • πŸ—„ torch.gesv is deprecated; use the new name `torch.solve instead. (18060).
    • πŸ—„ torch.trtrs has been deprecated; use the new name torch.triangular_solve instead. (18213).
    • πŸ—„ torch. btriunpack has been deprecated; use the new name torch.lu_unpack instead. (18529).
    • πŸ—„ torch.btrisolve has been deprecated; use the new name torch.lu_solve instead. (18726).
    • [C++] IntList has been deprecated, use IntArrayRef instead, as it better describes the type and ownership semantics in C++. (16751).
    • [C++] Dispatch macros with Type parameters, e.g. AT_DISPATCH_ALL_TYPES(tensor.type(), ..., are now deprecated; use ScalarType instead, e.g. AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), .... (17527, 17996).
    • [C++] the deprecated variable_tensor_functions have been removed. (15003).

    🐎 Performance


    • nn.BatchNorm CPU inference speed increased up to ~19x.(19152).
    • nn.AdaptiveAvgPool: speed up common-case of size=1 output by ~30x. (17011).
    • 🐎 nn.EmbeddingBag CPU performance increased by ~4x. (19329).
    • Tensor.copy_: sped up larger tensor copy ~2-3x, small regression in small tensor copy. (18618).
    • torch.nonzero: is now ~2x faster than numpy on CPU. (15190)
    • πŸ‘Œ Improve caching allocator for Pascal and newer GPUs; 10-20% better memory utilization on Mask-RCNN. (17120).
    • reduction functions: Speed up some large Tensor cases by 50-80%. (17428).
    • [JIT] Graph fuser: better fusion for backwards graphs in the presence of broadcasting. (#14957)
    • [JIT] Graph fuser: batch_norm fusion for inference. (#15146)
    • [JIT] Graph fuser: layer_norm fusion for inference. (#18266)


    • torch.abs, torch.frac, torch.repiprocal, torch.neg have been vectorized and parallelized (19041).
    • 🐎 torch.bmm: CPU performance increased by 2x. (19338).
    • 🐎 torch.sort: CUDA performance increased by ~2x. (19379).
    • on CPU is now ~4x faster in the case where inputs are contiguous and dim != 0. (17032).
    • 🐎 torch.multinomial fixed a 2x performance regression. (17121).
    • torch.empty (and another factory functions): reduce overhead by 20-40%. (17565).
    • torch.linspace has been parallelized on CPU. (15320).
    • πŸ”Š torch.logspace has been parallelized on CPU. (15438).
    • torch.range has been parallelized on CPU. (15484).
    • torch.arange has been parallelized on CPU. (15667).
    • torch.load: avoid unnecessary CPU-to-CUDA copy. (17297).
    • reduction functions: improve efficiency on CUDA. (16224, 17040).
    • Speed up some GEMM cases on CPU by up to 7x.(17730)
    • Tensor iterator loop unrolling. (17667).
    • πŸ“œ sparse/dense matrix multiply: improve speed by ~5x. (16905).
    • distributions.MultivariateNormal: sped up. (17294).
    • [JIT] Graph fuser: pow scalar exponent / base autodiff, fusion (#19324)
    • [JIT] Graph fuser: allow fusion of function float arguments. (#18087)
    • [JIT] Shape analysis: specialize optional Tensor inputs to graphs. (#18360)
    • [JIT] Shape analysis: various correctness improvements. (#18271)
    • [JIT] Shape analysis: aten::_convolution now participates in shape analysis. (#16837]
    • [JIT] Autodiff: coverage for ops used in maskrcnn & BERT. (#16689)
    • [JIT] Autodiff: support for scalar comparison ops and randlike. (#14740)
    • [JIT] Autodiff: support for adaptive_avg_pool2d. (#15459)
    • [JIT] Autodiff: support for erf and erfc. (#15139)
    • [JIT] Autodiff: support for layernorm. (#17702)
    • [JIT] Autodiff: support for tanh. (#17816)
    • [JIT] Autodiff: support for matmul/dropout. (#17523)
    • [JIT] Autodiff: specialized CUDA impl for dropout. (#17756)
    • [JIT] Constant folding: improved inlining of control flow. (#16244)

    πŸ“š Documentation

    • πŸ“š Tensor.scatter_: add documentation about value parameter. (17467).
    • Tensor.unfold: correctly document dimension parameter, not dim. (19020).
    • Tensor.is_floating_point() is now documented. (15704).
    • πŸ“š torch.cholesky: Fix broken upper example in documentation. (15215).
    • torch.gesv: document out parameter. (15649).
    • πŸ‘ torch.mul: better explain elementwise multiplication. (15664).
    • πŸ‘ torch.eig, torch.symeig: better explain backwards limitations. (15929).
    • πŸ›  torch.ormqr: fixed output specification. (15694).
    • torch.from_numpy: replaced usage with torch.as_tensor in documentation. (16587).
    • πŸ“„ torch.mvlgamma: Fix the constant in the docs. (17045).
    • torch.mode: more precisely describe what is returned. (17069).
    • πŸ“š torch.upsample: documentation now matches torch.interpolate. (17134)
    • πŸ“š torch.arange: correct dtype documentation. (18604)
    • torch.cumprod: document out parameter. (19340).
    • torch.nonzero: document indices being returned lexicographically. (19539).
    • πŸ‘ torch.nn.functional.interpolate: better explain aligned_corners parameter. (14806).
    • πŸ“š torch.nn.functional.pad: documentation has been made consistent with other functional ops. (15984).
    • nn.functional.grid_sample: clarify behavior of padding. (19754).
    • nn.TripletMarginLoss: correct type of swap parameter. (18115).
    • πŸ“š nn.CrossEntropyLoss: clarify ignore_index documentation. (18117).
    • nn.CrossEntropyLoss: the input format is more clearly explained. (15990).
    • nn.CTCLoss: Clarify a number of ambiguities. (18415).
    • πŸ‘ nn.BCEWithLogitsLoss: add better explanation. (19212).
    • πŸ‘ nn.BCEWithLogitsLoss: better explain positive samples. (17258).
    • πŸ“š nn.ModuleList / nn.ParameterList: update documentation. (17731).
    • nn.Module.load_state_dict: correct semantics of strict. (17618)
    • nn.parallel.DataParallel: more accurately specify how different argument types are handled. (15993).
    • nn.parallel.DistributedDataParallel: Clarified batch size requirements. (16010).
    • torch.distributed: Document mixed-precision training. (15440).
    • torch.multiprocessing: Include example multiprocessing code. (16345).
    • πŸ‘ torch.autograd: Better explain computing Jacobian-vector product. (15197).
    • torch.cuda.get_rng_state, torch.cuda.set_rng_state: document taking a device object. (14324).
    • torch.device: Fix example of passing device to tensor factory. (16839).
    • πŸ“š DataLoader: update documentation to describe how workers are managed. (18091).
    • πŸ“š Unified shape formats throughout the documentation. (15741).
    • πŸ“š Update documentation for reduction arguments to use non-deprecated format. (17300).
    • mark_non_differentiable: document correct semantics. (17891).
    • Warn about memory overlaps on inplace operations. (17576).
    • πŸ›  Fix a number of small issues with conv and pooling docstrings. (17052).
    • πŸ›  Fix a number of small issues with padding and activation docstrings. (17197).
    • [C++]: mention packed accessors in Tensor basics. (19464).


    Exporting More Torch Operators to ONNX

    • Export torch.isnan to ONNX (17698).
    • Export torch.flatten to ONNX (16240).
    • Export torch.where, torch.ceil, torch.floor to ONNX (18571).
    • Export torch.narrow to ONNX (17550).
    • Export torch.argmax and torch torch.argmin (17382, 18264, 18261).
    • Export adaptive_avg_pool1D, adaptive_avg_pool2D, adaptive_avg_pool3D, adaptive_max_pool1D, adaptive_max_pool2D, adaptive_max_pool3D to ONNX (17412).
    • Export torch.nonzero to ONNX (17036, 18047).
    • Export torch.erf to ONNX (16106).
    • Export torch.split (15092).
    • Export,, torch.le,, torch.eq, to ONNX (15677).
    • Export torch.expand and to ONNX (15050).
    • πŸ”Š Export torch.nn.LogSigmoid to ONNX (14830).
    • Export torch.nn.RReLU to ONNX (14781).
    • Export torch.reshape and torch.reshape_as to ONNX (16632, 16971).
    • Replace use of ConstantLike with with ConstantOfShape (16095, 16214).

    Extending Existing Exporting Logic

    • πŸ‘ Enable dim support in torch.nn.Softmax's export (18482).
    • πŸ‘Œ Support exporting squeeze & unsqueeze with negative dim attribute (19297).
    • Support exporting max_pool1d, max_pool2d, max_pool3d with indices (16455).
    • βž• Add dtype support in torch.logsoftmax and torch.softmax's export (17672).
    • Support ceil_mode in max_pool_1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d's export (16769).

    ⚑️ Optimizing Exported ONNX Graph

    • βž• Add constant folding in ONNX exporter (18698).
    • Retain the parameter names in ONNX exporter (17551).
    • Omit slice op if it is a non-op (19155).
    • βž• Add a flag to strip doc_string from exported ONNX models (18882).
    • Omit torch.dropout if the model is in eval mode (16547).

    βž• Adding Utility Functions and Refactoring

    • Remove unused arg f from _model_to_graph(). (19647).
    • βž• Add the support for stable ONNX opsets in exporter (16068, 17419).
    • βœ… Set the default ONNX opset to the latest stable opset (i.e., 9) (17736).
    • βž• Add an utility function to check whether it's in the middle of ONNX export or not (19050).
    • ♻️ Refactoring serialization of ONNX initializers to be name-based (17830).
    • πŸ”¦ Expose dim() on type and use it in ONNX symbolics (15933).
    • Add scalar_type_to_pytorch_type dict in ONNX symbolic (15965).
    • βž• Add an assertion to check the number of the parameters passed to ONNX exporter (18145).

    πŸ›  Bugfixes

    • πŸ›  Fix different types in rsub caused bug (15707).
    • πŸ›  Fix list structure supports in ONNX exporter (19102).
    • πŸ›  Fix case for activations attribute in nn.RNN ONNX export. (19368).
    • Minor fix for onnx ConstantOfShape export (18199).
    • πŸ›  Fix the torch.(reduce)min and torch.(reduce)max's export (15241).
    • πŸ›  Fixing ONNX export of logical ops to have correct output datatype (15185).
    • πŸ›  Fix typo in docstring (18216).