Pytorch/CHANGELOG and Pytorch Releases (Page 4)

All Versions

Latest Version

1.7.1

Avg Release Cycle

11 days

Latest Release

1224 days ago

Changelog History

Page 4

v1.3.0.a0
August 07, 2019
v1.2.0 Changes
August 08, 2019
🚀 We have just released PyTorch v1.2.0.

🐎 It has over 1,900 commits and contains a significant amount of effort in areas spanning JIT, ONNX, Distributed, as well as Performance and Eager Frontend Improvements.

Highlights

[JIT] New TorchScript API

🔖 Version 1.2 includes a new, easier-to-use API for converting nn.Modules into ScriptModules. A sample usage is:
```
class MyModule(torch.nn.Module):
    ...

# Construct an nn.Module instance
module = MyModule(args)

# Pass it to `torch.jit.script` to compile it into a ScriptModule.
my_torchscript_module = torch.jit.script(module)
```
👀 torch.jit.script() will attempt to recursively compile the given nn.Module, including any submodules or methods called from forward(). See the migration guide for more info on what's changed and how to migrate.

[JIT] Improved TorchScript Python language coverage

👍 In 1.2, TorchScript has significantly improved its support for Python language constructs and Python's standard library. Highlights include:
- Early returns, breaks and continues.
- Iterator-based constructs, like for..in loops, zip(), and enumerate().
- NamedTuples.
- 👍 math and string library support.
- 👌 Support for most Python builtin functions.
👀 See the detailed notes below for more information.

Expanded Onnx Export

✅ In PyTorch 1.2, working with Microsoft, we’ve added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). We’ve have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:
- 👌 Support for multiple Opsets including the ability to export dropout, slice, flip and interpolate in Opset 10.
- 👌 Improvements to ScriptModule including support for multiple outputs, tensor factories and tuples as inputs and outputs.
- 👍 More than a dozen additional PyTorch operators supported including the ability to export a custom operator.
Updated docs can be found here and also a refreshed tutorial using ONNXRuntime can be found here.

Tensorboard is no Longer Considered Experimental

Read the documentation or simply type from torch.utils.tensorboard import SummaryWriter to get started!

NN.Transformer

We include a standard nn.Transformer module, based on the paper “Attention is All You Need”. The nn.Transformer module relies entirely on an attention mechanism to draw global dependencies between input and output. The individual components of the nn.Transformer module are designed so they can be adopted independently. For example, the nn.TransformerEncoder can be used by itself, without the larger nn.Transformer. New APIs include:
- nn.Transformer
- nn.TransformerEncoder and nn.TransformerEncoderLayer
- nn.TransformerDecoder and nn.TransformerDecoderLayer
📚 See the Transformer Layers documentation for more info.

💥 Breaking Changes

Comparison operations (lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=) ) return dtype has changed from torch.uint8 to torch.bool (21113)

🔖 Version 1.1:
```
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([1, 0, 0], dtype=torch.uint8)
```
🔖 Version 1.2:
```
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([True, False, False])
```
For most programs, we don't expect that any changes will need to be made as a result of this change. There are a couple of possible exceptions listed below.

Mask Inversion

👍 In prior versions of PyTorch, the idiomatic way to invert a mask was to call 1 - mask. This behavior is no longer supported; use the ~ or bitwise_not() operator instead.

🔖 Version 1.1:
```
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([0, 1, 1], dtype=torch.uint8)
```
🔖 Version 1.2:
```
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported.
If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.

>>> ~(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([False, True, True])
```
sum(Tensor) (python built-in) does not upcast dtype like torch.sum

Python's built-in sum returns results in the same dtype as the tensor itself, so it will not return the expected result if the value of the sum cannot be represented in the dtype of the tensor.

🔖 Version 1.1:
```
# value can be represented in result dtype
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(3, dtype=torch.uint8)

# value can NOT be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(44, dtype=torch.uint8)

# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)
```
🔖 Version 1.2:
```
# value cannot be represented in result dtype (now torch.bool)
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(True)

# value cannot be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(True)

# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)
```
TLDR : use torch.sum instead of the built-in sum. Note that the built-in sum() behavior will more closely resemble torch.sum in the next release.

🗄 Note also that masking via torch.uint8 Tensors is now deprecated, see the Deprecations section for more information.

__invert__ / ~: now calls torch.bitwise_not instead of 1 - tensor and is supported for all integral+Boolean dtypes instead of only torch.uint8. (22326)

🔖 Version 1.1:
```
>>> ~torch.arange(8, dtype=torch.uint8)
tensor([1, 0, 255, 254, 253, 252, 251, 250], dtype=torch.uint8)
```
🔖 Version 1.2:
```
>>> ~torch.arange(8, dtype=torch.uint8)
tensor([255, 254, 253, 252, 251, 250, 249, 248], dtype=torch.uint8)
```
torch.tensor(bool) and torch.as_tensor(bool) now infer torch.bool dtype instead of torch.uint8. (19097)

🔖 Version 1.1:
```
>>> torch.tensor([True, False])
tensor([1, 0], dtype=torch.uint8)
```
🔖 Version 1.2:
```
>>> torch.tensor([True, False])
tensor([True, False])
```
nn.BatchNorm{1,2,3}D: gamma (weight) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)

🔖 Version 1.1:
```
>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([0.1635, 0.7512, 0.4130, 0.6875, 0.5496], 
       requires_grad=True)
```
🔖 Version 1.2:
```
>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([1., 1., 1., 1., 1.], requires_grad=True)
```
🚚 A number of deprecated Linear Algebra operators have been removed (22841)

🚚 | Removed | Use Instead | | --- | --- | | btrifact | lu | | btrifact_with_info | lu with get_infos=True | | btrisolve | lu_solve | | btriunpack | lu_unpack | | gesv | solve | | pstrf | cholesky | | potrf | cholesky | | potri | cholesky_inverse | | potrs | cholesky_solve | | trtrs | triangular_solve |

📜 Sparse Tensors: Changing the sparsity of a Tensor through .data is no longer supported. (17072)
```
>>> x = torch.randn(2,3)
>>> x.data = torch.sparse_coo_tensor((2, 3))
RuntimeError: Attempted to call `variable.set_data(tensor)`,
but `variable` and `tensor` have incompatible tensor type.
```
📜 Sparse Tensors: in-place shape modifications of Dense Tensor Constructor Arguments will no longer modify the Sparse Tensor itself (20614)

🔖 Version 1.1:
```
>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)

>>> s.coalesce().indices().shape
torch.Size([1, 1])

>>> s.coalesce().values().shape
torch.Size([1])
```
🔔 Notice indices() and values() reflect the resized tensor shapes.

🔖 Version 1.2:
```
>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)

>>> s.coalesce().indices().shape
torch.Size([1, 2])

>>> s.coalesce().values().shape
torch.Size([2])
```
🔔 Notice indices() and values() reflect the original tensor shapes.

📜 Sparse Tensors: Accumulating dense gradients into a sparse .grad will no longer retain Python object identity. (17072)

🔖 Version 1.1:
```
>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad

# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved still refers to the .grad of m's weight
# even though the sparsity has changed
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)
```
🔖 Version 1.2:
```
>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad

# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved NO LONGER refers to the .grad of m's weight
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)
AssertionError
```
🔀 nn.utils.convert_sync_batchnorm has been replaced with nn.SyncBatchNorm.convert_sync_batchnorm(18787)

Example of new usage:
```
>>> # Network with nn.BatchNorm layer
>>> module = torch.nn.Sequential(
>>> torch.nn.Linear(20, 100),
>>> torch.nn.BatchNorm1d(100)
>>> ).cuda()
>>> # creating process group (optional)
>>> process_group = torch.distributed.new_group(process_ids)
>>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)
```
Error Checking: torch.addcmul and torch.lerp operators enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.

🔖 Version 1.1:
```
>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
tensor([[0., 0., 0.],
        [0., 0., 0.]])
```
🔖 Version 1.2:
```
>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
RuntimeError: output with shape [1] doesn't match the broadcast shape [2, 3]
```
If you run into this error, please ensure the out parameter is of the correct output shape (post-broadcasting).

Error Checking: Improved Variable version tracking (20391, 22821, 21865)

⚡️ PyTorch’s autograd system uses a version tracking mechanism to ensure that Tensors that are saved for backwards computations retain their correct values when the backward pass is computed (i.e. that they haven’t been updated in-place since they were saved). See In Place Correctness Checks in the docs for more information.

In PyTorch 1.2 we have enhanced the version tracking in a number of cases, which may flag issues that were not caught previously. There is now additional tracking through the Variable() constructor, the nn.Parameter() constructor, after setting .data, and via nn.Module._apply (internal API).

Track changes through Variable constructor:
```
>>> x = torch.ones(1, requires_grad=True)+1
>>> y = x*x

# do an in-place update through Variable constructor
>>> torch.autograd.Variable(x).add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 
instead.
```
Track changes on an nn.Parameter:
```
>>> x = torch.ones(1)
>>> p = torch.nn.Parameter(x)
>>> y = p * p

# do an in-place update on a saved Parameter
>>> x.add_(1)
>>> y.sum().backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 
instead.
```
Track changes after setting .data:
```
>>> x = torch.zeros(1, requires_grad=True)+1
>>> y = x * x
>>> x.data = torch.zeros(1, requires_grad=True)+1

>>> x.add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]], which is output 0 of AddBackward0,
is at version 1; expected version 0 instead.
```
[JIT] Python called from scripted modules must be @ignored

👀 torch.jit.script now recursively compiles everything it finds in the original function, so if you had Python functions called from in your scripted function or module, you must now explicitly @ignore it. See the new API guide for more details.

🔖 Version 1.1
```
def my_unscriptable_python_fn():
    # weird stuff

@torch.jit.script
def fn():
    # This gets inserted as a Python call, and only errors on `save()`.
    my_unscriptable_python_fn()
```
🔖 Version 1.2
```
@torch.jit.ignore # this needs to be added ...
def my_unscriptable_python_fn():
    ...

@torch.jit.script
def fn():
    # ... or else recursive compilation will attempt to compile this call
    my_unscriptable_python_fn()
```
NOTE: This is also a change to behavior of the @torch.jit.ignore decorator. In version 1.1, @ignore tells the compiler to omit compiling a function entirely, to mark Python functions that you know will not be called after export. In version 1.2 @ignore, tells the compiler to insert a call back to the Python interpreter instead of trying to compile the function.

To get the old behavior, use @torch.jit.ignore(drop_on_export=True) (@torch.jit.ignore with no arguments is equivalent to @torch.jit.ignore(drop_on_export=False)).

⚡️ [JIT] optimize for ScriptModules is now a context manager

👍 Whether optimization passes are run is now a thread-local flag. This better reflects how optimization actually happens in the JIT (i.e. it is decided at runtime, not compilation time).

🔖 Version 1.1
```
@torch.jit.script(optimize=False)
def fn(inputs):
    ...

fn(inputs)
```
🔖 Version 1.2
```
@torch.jit.script
def fn(inputs):
    ...

with @torch.jit.optimized_execution(False):
    fn(inputs)
```
[jit] script::Module is now a reference type

⚡️ To better align with the PyTorch C++ API philosophy, script::Module and script::Method are now reference types. Our APIs have been updated to use script::Module instead of std::shared_ptr<script::Module>.

🔖 Version 1.1
```
using torch::jit::script::Module;

std::shared_ptr<Module> m = torch::jit::load("my_model.py");
m->forward(...);
```
🔖 Version 1.2
```
using torch::jit::script::Module;

Module m = torch::jit::load("my_model.py");
m.forward(...);
```
[C++ only] mean() / sum() / prod() APIs have changed slightly (21088)

🔖 Version 1.1 API:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;    
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
🔖 Version 1.2 API:
```
Tensor sum(IntArrayRef dim, bool keepdim=false,
           c10::optional<ScalarType> dtype=c10::nullopt) const;
```
that is, to override dtype, keepdim must now be provided.

Binary distribution and nightly changes

⚡️ We have streamlined our conda and wheel binary distributions, so that it is easier than ever to install the version of PyTorch appropriate for your needs. The install instructions on https://pytorch.org/ have been updated, but if you have tooling to download and install PyTorch, here is a detailed description of the changes we made:

Wheels now have local version identifiers. Wheels that are for non-default CUDA configurations (the default CUDA version for this release is 10.0) now have local version identifiers like +cpu and +cu92. This means that, when installing, it is no longer necessary to specify a full wheel URL—just specify an appropriate version constraint like torch==1.2.0+cu92.

🔖 Version 1.1 (for Python 3.7 on Linux only):
```
pip install numpy
pip install https://download.pytorch.org/whl/cpu/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
```
🔖 Version 1.2 (works for all versions of Python, and both Linux and Mac):
```
pip install torch==1.2.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
```
CPU-only binaries on conda can be selected with the cpuonly feature. We’ve eliminated the pytorch-cpu conda package; instead, the cpu-only conda package can be enabled by installing the cpuonly metapackage. Similarly, there is no longer both a torchvision and torchvision-cpu package; the feature will ensure that the CPU version of torchvision is selected.

🔖 Version 1.1:
```
conda install -c pytorch pytorch-cpu
```
🔖 Version 1.2:
```
conda install -c pytorch pytorch cpuonly
```
Conda nightlies now live in the pytorch-nightly channel and no longer have “-nightly” in their name. We have added a new dedicated channel for nightlies called pytorch-nightly; all nightlies (pytorch, torchvision, torchaudio, etc.) will now be uploaded to this channel, but with the same name as their corresponding stable versions (unlike before, when we had a separate pytorch-nightly, torchvision-nightly, etc. packages.) This makes it more difficult to accidentally install a copy of the nightly and stable at the same time.

🔖 Version 1.1:
```
conda install -c pytorch pytorch-nightly
```
🔖 Version 1.2:
```
conda install -c pytorch-nightly pytorch
```
Wheel nightlies no longer have -nightly in their name. Similar to the changes we made in Conda, we no longer suffix wheel nightlies with “-nightly”, to make it harder to accidentally install a copy of nightly and stable at the same time.

🔖 Version 1.1:
```
pip install --pre torch_nightly -f https://download.pytorch.org/whl/nightly/torch_nightly.html
```
🔖 Version 1.2:
```
pip install --pre torch -f https://download.pytorch.org/whl/nightly/torch_nightly.html
```
🆕 New Features

👍 Tensor Type Support
- 💥 torch.bool: added support for many operators (masking, comparison, arithmetic operators) to achieve feature parity with torch.uint8. See the Breaking Changes section for details about how this could affect existing programs. (21032, etc.)
- 📜 torch.sparse.HalfTensor: Added support for torch.float16 sparse Tensors on both CPU and CUDA. (19695)
- torch.bfloat16: Added basic creation and serialization support for Brain Floating Point Tensors. (21522, 21523, 21860, 22852)
📦 NN Package
- nn.Transformer: added implementation of Transformer from Attention is All You Need. (20170, 22588)
- 👍 nn.Embedding: support float16 embeddings on CUDA. (19695)
- nn.Flatten: added a Module that performs torch.flatten. (22245)
- 👍 nn.functional.gelu: Added support for Gaussian Error Linear Units. (20665, 21237)
- nn.Module hooks: add ability to replace input/output via forward_pre_hook and forward_hook. (22285)
- nn.Module: add requires_grad_()method for turning on/off requires_grad for Module parameters. (22576)
Operators
- 📜 Tensor.to_sparse: now supports autograd. (20458)
- Tensor.fill_diagonal_: operator to fill the main diagonal of a Tensor. (21892)
- 👍 torch.qr: supports autograd. (21274)
- torch.bitwise_not: add operator for boolean/integer types. Also have python ~ operator use this. (22283, 22320)
- 📄 torch.trapz: integrate using the trapezoid rule; equivalent to numpy.trapz. (21610)
- torch.var_mean / torch.std_mean: compute variance and mean at the same time.(18731)
- torch.utils.ThroughputBenchmark: benchmark utility for measuring the throughput of PyTorch operators. (20766).
- 🌲 Logging: lightweight at-most-once logging to record operators that are used (c10::Logging). (20745)
📦 Optim Package
- ⚡️ optim.AdamW: introduce AdamW optimizer from Decoupled Weight Decay Regularization. (21250)
- 👍 optim.LBFGS: added support for strong Wolfe line search. (8824)
📦 Distributed Package
- 👍 DistributedDataParallel: support CPU modules. (20236)
- 📜 DistributedDataParallel: support sparse tensors. (19146)
- 👍 DistributedDataParallel: support local gradient accumulation. (21736)
IterableDataset
- IterableDataset: introduces a new type of Dataset designed for data read from a stream. (19228)
📦 Tensorboard Package
- 👍 TensorBoard support in PyTorch has improved and is no longer experimental!
- 👍 SummaryWriter.flush: now supported. (20607)
- 👍 SummaryWriter.add_mesh: add support for 3D point clouds. (20413)
JIT Features
- 👌 Improved support for iterator infrastructure. TorchScript now supports looping through a List, Tuple, Dict, Tensor, String and you can also use zip(), enumerate(), and for...in. (21801, 22006, 21990, 21985)
- 👌 Support in membership checks. (21527)
- 👌 Improved support for strings and the string libraries. (20826, 20188, 20761, 21656, 20617)
- 👌 Improved math support. (20979, 19707, 21151, 21131, 21129, 21130, 21512, 21126, 21127, 21128)
- 👌 Support for various other Python builtin functions. (21451)
- 👌 Support for NamedTuple. (21428)
- All the rest of the dict methods. (21979)
- sorted() keyword for lists and dicts. (23274)
- ➕ Add support for breaks and continues. (21692)
- 👌 Improved custom operator API with several bugfixes and new features. It now allows more primitive types, supports torch::List, torch::Dict and torch::Optional, supports dispatch (i.e. registering a different function for CPU and CUDA for the same operator).
- 👌 Support nn.GRU in script. (23266)
- Support pack_padded_sequence and pad_packed_sequence. (23249)
- Support torch._C._get_tracing_state in TorchScript. (23248)
- 👌 Support torch.as_tensor in TorchScript. (23247)
- ➕ add support for recursive compilation on Modules. (20708)
- ➕ add all builtin. (20521)
- Add Final[T] annotated members to __constants__. (21603)
- ➕ Add save() to scripted Functions. (20386)
- 👌 Support for serializing class attributes. (22953)
- 👌 Support for class annotations. (21379)
- 👌 support Python 3.8 Constant node. (22007)
- 👌 Support for type annotations instead of torch.jit.annotate(). (21390)
- 👌 Support operator overloading for user-defined classes. (20033)
- 👌 Support recursive ModuleList / Sequential. (21306)
- Trace multiple methods in a single Module. (19905)
👌 Improvements
- Tensor.pin_memory(): only ask for context on current device. (22229)
- Tensor.view(): suggest using reshape() instead of contiguous() when the input is non-contiguous. (20968)
- 👍 Tensor.numpy(): throw TypeError instead of ValueError if the type isn’t supported. (21608)
- 👍 torch.norm: add support for p="nuc" with dim specified. (21022)
- 👍 torch.qr: support batching of input matrices. (20689)
- 👍 torch.qr: support some parameter akin to NumPy's mode option. (20689)
- 👍 torch.det / torch.logdet / torch.slogdet: added batching support. (22909)
- 👍 torch.cdist: support batching. (20934)
- 👍 torch.symeig: support batching. (21858)
- torch._dirichlet_grad: support CUDA. (21191)
- 👍 torch.randperm: support torch.float16. (22102)
- torch.Size is now pickle-able in Python2. (20952)
- 👍 torch.tensor / torch.as_tensor: infer device if input supports Numba’s __cuda_array_interface__. (20584)
- torch.isinf / torch.isfinite: throw TypeError instead of ValueError when a non-tensor is passed in. (20817)
- 👍 nn.MultiheadedAttention: add functional support. (20415)
- 👍 nn.MultiheadedAttention: added support for key/value to have different number of features. (21288)
- nn.MultiheadAttention: allow static key/values. (21288)
- 👍 nn.Conv{1,2,3}D: support torch.int64 dtype in forward. (20730, 22594)
- 👍 nn.AvgPool{1,2,3}D: support torch.int64 dtype in forward. (22433)
- 💾 nn.Module: make _save_to_state_dict overrideable. (21933)
- autograd: Checkpointing of modules inside large fanout networks no longer hits a recursion error. (22397)
- autograd: Track in-pace changes of Tensors through Module._apply (internal API). (21865)
- 👍 autograd.profiler: Add shape aggregation support. 20035)
- autograd.profiler: Profile custom c10 ops. (20175)
- 👍 DataLoader: support setting batch_size=0 to disable automatic batching (collation) in DataLoader for easier bulk loading. (19228)
- DataLoader: add multiprocessing_context parameter. (22990)
- DataLoader: added error detection for worker_init_fn. (20150)
- DataLoader: Retry on EINTR. (21723)
- torch.cuda.set_rng_state / torch.cuda.get_rng_state: accept string as device parameter. (23448)
- ⚠ CUDA: add warning when using Turing GPUs and CUDA <= 9000. (21468)
- CUDA: warn on conditions that can trigger a cuBLAS 9.0 bug. (22034)
- CPU: Improve CPUAllocator OOM message. (20618)
- 👍 [memory_format]: added support for torch.empty, torch.empty_like, Tensor.contiguous(), Tensor.is_contiguous() to specify / check the order in which dimensions are laid out in memory. (20455, 20558)
- distributions.MultivariateNormal: fix precision matrix instability. (21366)
- distributions.transforms.SigmoidTransform: fix numerical instability. (19802)
Distributed Improvements
- 👍 DistributedDataParallel: Support DDP forward/backward calls even if no module parameter is used. (19821)
- DistributedDataParallel: Only call into reducer if grad is enabled. (19897)
- 🚚 DistributedDataParallel: Require finalize DDP backward only when there are indeed gradients computed, this allows application to completely discard DDP outputs and move on to the next iteration. (19901)
- DistributedDataParallel: Improve DDP backward reduction error messages. (20586)
- DistributedDataParallel: make DDP failure recoverable. (21591)
- DistributedDataParallel: Delay reduction of unused parameters until first autograd hook is called. (22219)
- 👍 c10d: support tensors shared across processes. (21449)
- c10d: ProcessGroupMPI Add device guard around MPI operations. (22446)
- utils.data.distributed.DistributedSampler: Make shuffling optional. (22479)
Tensorboard Improvements
- 🚚 Usage of kwarg-only arguments has been removed. (21786)
Numpy Compatibility Improvements
- 👍 Tensor.T: added numpy-like support for reversing dimensions. (20598)
- Tensor.ndim: NumPy equivalent property for the number of dimensions. (20565)
- 0️⃣ Tensor.nonzero: added as_tuple argument (default False) that when True, will return a tuple of Tensors, which matches the behavior of numpy.nonzero. (20293)
- 👍 torch.dtype: support passing in NumPy dtypes as arguments. (21215)
- torch.normal: add size parameter when called with two floats. (20545)
- torch.where: add one-argument overload that is an alias for Numpy-like nonzero. (21986)
- 👌 support a number of argument name overrides, e.g. axis instead of dim. (20451)
JIT Improvements
- 🖨 The original source code debug information is now saved with the model. If a model is saved and then loaded into another process, the loaded process can now print out error messages that point to the original source code. (22177, 22178, 22179, 22180)
- Error message source range highlighting now includes filename, line number, and column number. (21157)
- 👍 Better Constant Propagation through Tuples. (22561)
- ➕ Add start and step parameters for range in TorchScript. (20795)
- Support for threading options for TorchScript inference (doc)
- ➕ Add max_pool2d to symbolic derivatives. (19661)
- ⚡️ Optimize matmul memory usage for certain cases. (23433)
- Avoid kernel launches for zero-sized tensor inputs. (22790)
- ➕ Add support for steps (strides) in tensor slices. (20929)
- Added error for classes that don't have an __init__ function. (21880)
- 👍 Allow classes to be used in their own methods. (20106)
- 👍 Better error message when a variable is conditionally defined. (20911)
- Consider contained types in alias analysis. (21431)
- Convenience APIs for script objects. (20226)
- 🖨 Don't print backtrace for interpreter errors. (20925)
- 👌 Improve error msg for missing attribute. (20779)
- 👌 Improve error msg on inferred type. (21058)
- 👌 Improve error msg on recursive class defs. (21842)
- Include module names in recursive error stacks. (22921)
- 👌 Improve recursive scripting error message. (21841)
- Index into a tuple with non constant integer. (20081)
- Let ScriptModule buffer attributes can also cast device/type. (19700)
- Lower batchmm to non-diff optimization. (19987)
- 👉 Make ScriptModule.training an attribute instead of a parameter. (21078)
- 👉 Make strtod_c compatible with different gcc abi. (21293)
- 👉 make magic methods work with casts too. (20654)
- 👌 Improve performance of alias analysis. (20899)
- ⚠ Print a warning if a type annotation prefix is invalid according to mypy. (20884)
- schema_matching.cpp: improve error messages. (21141)
- Resolve with closed over variables instead of stack frame. (22270)
- Report errors through call stack. (22280)
- ⬇️ Reduce number of stack manipulation instructions in interpreter. (21240)
C++ API Improvements
- 👍 nn::PoissonNLLLoss: Added support. (19316)
- nn::Module: added replace_module API to overwrite submodules in C++ Frontend. (22546)
- nn:Module::register_module / register_parameter / register_buffer: make public (23196)
- data::datasets::ChunkDataReader: fix include headers and a vector issue. (19485)
- data::datasets::ChunkDataset: add new get_batch method. (21797)
- 👍 data::datasets::ChunkDataset: add checkpoint support. (21889)
- 👍 data::datasets::ChunkDataset: add support for cross-chunk shuffling. (22347)
- data::datasets::ChunkDataset: add sorting policy. (23053)
MKLDNN Tensor Improvements

➕ Add support for a number of operators on MKLDNN Tensors including:
- Tensor.is_mkldnn: (22386)
- Tensor.transpose(): (21943)
- Tensor.zero_(): (20573)
- torch.empty: (21184)
- torch.mul: (20575)
- nn.AdaptiveAvgPool{1,2,3}D: (19818)
- nn.Sigmoid: (20820)
- nn.Softmax: (21516)
- 👍 nn.Module: support saving/loading MKLDNN modules. (20799)
- 👍 nn.MaxPool{1,2,3}D: support ceil_mode. (21310)
🐛 Bug Fixes
- Indexing: fix advanced indexing where there are more than (2^31)-1 bytes in the output. (20919)
- Indexing: fix indexing when there are more than 65535 elements in a non-indexing first dimension on CUDA. (23123)
- Indexing: fix issue with slicing empty tensors. (20914)
- Tensor.index_copy_: fix segfault by properly checking dimension is in range. (21617)
- Tensor.copy_: Fix a bug where non-blocking was not being respected. (20305)
- 👯 Tensor.clone: Fix an issue with MKLDNN tensors. (20943)
- Tensor subclassing: give a proper error instead of crashing. (20283)
- torch.cat: Fix segfault with tensors that can't be indexed with 32-bit ints. (21530)
- 🔊 torch.range / torch.linspace / torch.logspace: properly respect the current Stream. (21619)
- torch.lu: return the identity permutation instead of zeros when not using pivoting. (22242)
- torch.einsum: Fix an issue where the backward pass would potentially be skipped. (22111)
- torch.cosh: Fix an issue where torch.cos was instead calculated with torch.double dtype and vectorized instructions. (20797)
- torch.triu / torch.tril: handle strides correctly for in-place versions. (22730).
- torch.triu / torch.tril: Fix handling of batches > 65535 on CUDA. (21067)
- torch.inverse / torch.solve / torch.cholesky_solve / torch.triangular_solve: Fix batch sizes > 65535 on CUDA. (21689)
- torch.histc: return dtype is now the same as the input tensor on CUDA, matching CPU behavior. (20369)
- torch.histc: properly return 1-dim tensor on CPU with 0-dim input and 1 bin. (21497)
- torch.randperm: handle non-contiguous out parameter. (23043)
- torch.unique: Fix empty tensor handling when dim is passed as an argument. (19000)
- torch.min / torch.max: properly error on empty tensor inputs, as with CPU tensors. (19612).
- CUDA: fix launch parameters for reductions. (22827).
- torch.hub: fix an issue with find_module. (20782)
- autograd: Fix a number of custom autograd Function corner cases by inverting the relationship between PyFunction and THPFunction. (22983)
- autograd: give “Trying to backward through the graph a second time" error instead of internal assert when the buffers are a list of Tensors (with indexing). (21533)
- ⏱ optim.lr_scheduler.CosineAnnealingLR: rename from CosineAnnealingLr. (23242)
- distributions.Binomial: Fix overflow of log_prob when logits is large. (20679)
- distributions.SigmoidTransform: Fix numerical issues that could result in inf / -inf return values. (20288)
- distributions.Categorical.sample: fix a view bug. (23328)
- CUDA: Give proper error message for bad cuda forks. (23322)
- pickle: Fix Unpickling error when loading multiple objects from a file. (20270)
- NCCL: Fix race condition. (23040)
🛠 torch.nn Bug Fixes
- nn.Conv{1,2,3}D: fix memory leak on MKLDNN code path. (22392)
- nn.Conv{1,2,3}D: properly unpickle older pickled versions. (21687)
- nn.CTCLoss: fix backward on CUDA when 2d target tensor is larger than max_target_length. (20971)
- nn.CTCLoss: fix some numerical stability issues. (21392)
- nn.CTCLoss: disable buggy non-deterministic CudNN algorithm. (22977)
- 🛠 nn.CTCLoss: fixed empty target handling. (21910, 23298)
- 🔀 nn.SyncBatchNorm: fix syncing of running statistics when count size differs between GPUs. (22248)
- 🔀 nn.SyncBatchNorm: retain requires_grad value when converting from nn.BatchNorm. (22569)
- nn.SyncBatchNorm: correctly handle process_group in convert_sync_batchnorm. (19240)
- nn.MultiheadedAttention: fix for torch.float16 dtype. (21658).
- nn.EmbeddingBag: fix NaN output when input is empty. (21400)
- nn.Dropout: fix python crash (with SIGFPE) when called on an empty cuda tensor. (20541)
- nn.MaxPool: fix output size calculation in some corner cases. (22304)
- nn.MaxPool: return valid indices if all entries are -inf. (23161)
- nn.Softmax: respect the current Stream. (22470)
- 🔊 nn.LogSoftmax: fix numerical stability issues. (21672)
- nn.Module.load_state_dict: break ref cycle. (20397)
- nn.Module: fix loading in 32-bit environments. (20900)
- nn.utils.rnn.pack_padded_sequence: Fix segfault on empty tensors. (21461)
- nn.utils.spectral_norm: fix loading state_dict when strict=False. (22545)
- 🏁 CudNN: Fix uninitialized PoolWindow on Windows. (22405)
🛠 Distributed Bug fixes
- nn.parallel.DataParallel: fix error in no_grad mode. (21262)
- torch.distributed.all_gather: fix errors for views and aliases. (21490)
- c10d: fix collective communication errors on empty tensors. (20658)
🛠 JIT Bug Fixes
- 🛠 Fix specialized list from dict keys. (23267)
- Switch keys to be sequential and stable in pickle serialization. (23280)
- deepCopy also copies type information of lists, (23271)
- dictKeys and dictItems ops on typed dicts return typed lists. (23270)
- 🛠 Fix pickler bug where it would not load if no tensors were saved. (23263)
- Avoid multiple writes to files on export. (21186)
- 👍 Better error msg for mismatched dict key type. (22231)
- Better error msg for using Python builtin_function_or_method. (22935)
- Better error msg in __get_state__ to let a user know that ScriptModules can't be deep-copied at the moment.(20885)
- 👍 Better error msg when seeing a unsupported builtin function. (21068)
- dropout derivative should respect the train flag. (20760)
- Fix __constants__ for some nn modules. (21071)
- Fix ScriptModule. __dir__ (). (22426)
- 🛠 Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias. (21425)
- 🛠 Fix a bug in loop unrolling. (21239)
- 🛠 Fix alias annotations for dict ops. (22900)
- 🛠 Fix inaccurate SourceRange reporting. (21109)
- 🛠 Fix broken indexing when using None and ellipses indexing together. (22905)
- 🛠 Fix bug in CompilationUnit::define. (21886)
- 🛠 Fix compilation order for class methods. (20094)
- 🛠 Fix dead code elimination over loops. (22632)
- 🛠 Fix dead code elimination in onnx export. (22476)
- 🛠 Fix incorrect default on Graph::toString. (21370)
- 🛠 Fix optional type promotion for classes. (21593)
- 🛠 Fix optional type unification. (19813)
- 🛠 Fix NameError with PYTORCH_JIT=0. (20120)
- 🛠 Fix overspecializing constants in compilation. (22816)
- 🛠 Fix pow() bug on overloads. (20824)
- 🛠 Fix recusive method compilation. (21862)
- 🛠 Fix reflection on weak modules, copy attributes. (20190)
- 🛠 Fix slow unpickling. (21542)
- 🛠 Fix input/output type mismatch. (20829)
- 🛠 Fix insert_guard for norm decomposation. (19646)
- 🛠 Fix Trace inlining of graphs with optional inputs. (22686)
- 🛠 Fix tracing bugs where using 1 - x in C++ would cause the size of 1 to get hardcoded. (20932)
- 🛠 Fix tuple indexing bug. (21521)
- 🛠 Fix type hints for None constants. (23029)
- Fix weak module cuda() _flat_weights bug. (21107)
- 🛠 Fix WeakIValueEq. (21891)
- 🛠 Fixed gcd to use 64 bit integers. (21041)
- 🛠 Fixed list() not making a copy. (22093)
- 🛠 Fix race condition on Module::forward method. (21398)
- Made a += b for lists do an in place add. (21896)
- Made floor/ceil return ints. (21124)
- Out-of-memory on GPU due to the "weak_script" decorators. (20588)
- 🖨 Override print when python is present. (21625)
- Set __file__ for torch.ops. (21888)
- Set correct list type in pybind_utils. (23188)
🛠 C++ Frontend bug fixes
- nn::RNN: Fix assertions in bidirectional RNN. (22850).
- nn::MaxPool / nn::AvgPool: expand incomplete kernel size, as in Python. (22073, 22075)
- Optim: Fix memory leak when weight_decay is applied to Adam, Adagrad, RMSProp. (23125)
- Optim::SGD: fix memory leak with weight_decay. (23007)
- torch::autograd::Scatter / torch::autograd::Gather: Fix nullptr bug. (20286)
- torch::nn::parallel::data_parallel: fix gradient computation error. (20910)
- 🏗 [C++ Extensions] Fix an issue when building multiple extensions in the same directory. (20221)
🗄 Deprecations

🗄 Masking via torch.uint8 Tensors is now deprecated in favor of masking via torch.bool Tensors.

💥 See the Breaking Changes section for more details about torch.bool Tensors and comparison operators.

torch.masked_select, torch.masked_fill, torch.masked_scatter now expect torch.bool masks rather than torch.uint8.
```
>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])

>>> a.masked_select(tensor([0, 1, 1], dtype=torch.uint8))
UserWarning: masked_select received a mask with dtype torch.uint8,
this behavior is now deprecated, please use a mask with dtype torch.bool instead.

tensor([2, 3])

# instead use torch.bool
>>> a.masked_select(tensor([False, True, True]))
tensor([2, 3])
```
Comparison operators with out= parameters now expect torch.bool dtype rather than torch.uint8.
```
>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])
>>> res = torch.empty_like(a, dtype=torch.uint8)
>>> torch.gt(a, b, out=res)
UserWarning: torch.gt received 'out' parameter with dtype torch.uint8, this behavior
is now deprecated, please use 'out' parameter with dtype torch.bool instead.

tensor([0, 1, 1], dtype=torch.uint8)

# instead use torch.bool
>>> res = torch.empty_like(a, dtype=torch.bool)
>>> torch.gt(a, b, out=res)
tensor([False, True, True])
```
🗄 Legacy autograd.Function (Function without static forward method) is now deprecated
```
>>> class MyLegacyFunction(Function):
>>> def forward(self, x):
>>> return x
>>>
>>> def backward(self, grad_output):
>>> return grad_output
>>>
>>> MyLegacyFunction()(torch.randn((3,), requires_grad=True)
UserWarning: Legacy autograd function with non-static forward method is deprecated
and will be removed in 1.3. Please use new-style autograd function
with static forward method.

# instead use new-style Autograd Function
>>> class MyFunction(Function):
>>> @staticmethod
>>> def forward(ctx, x):
>>> return x
>>>
>>> @staticmethod
>>> def backward(ctx, grad_output):
>>> return grad_output
>>>
>>> MyFunction.apply(torch.randn((3,), requires_grad=True)
```
📚 See the torch.autograd.Function documentation for more details.

🚀 torch.gels: has been renamed to torch.lstsq; torch.gels will work for this release but is now deprecated. (23460)

🐎 Performance
- 🐎 Advanced Indexing: significantly improve performance of advanced indexing backward. (20557)
- 🐎 Tensor.copy_: increase broadcasting CUDA copy performance by 25%. (20685)
- ⚡️ torch.matmul: Optimize the case A.ndim <= 2 && B.ndim >= 3, shows up to 15x speed up. (20448)
- 🐎 torch.bmm: Improve performance by up to 3x for small cases on CPU by applying TensorAccessor. (20266)
- 🐎 torch.inverse: Move workspace query and allocation outside loop to improve performance by up to 5x. (20904)
- ⚡️ torch.topk: Optimize CPU perf using parallel and partial sort, up to 6x improvement. (22865)
- torch.cdist: Improve CPU perf by up to 10x for some cases. (20605)
- torch.normal: Move normal, normal_means, normal_stddevs, and normal_means_stddevs to ATen, increasing performance by up to 3x. (21287)
- torch.bernoulli: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop, increasing performance by up to 2x. (21300)
- 📜 torch.coalesce: Use _sparse_coo_tensor_unsafe in coalesce for up to 10x speedup. (21214)
- torch.sinh / torch.cosh: Parallelize and vectorize on CPU. (21115)
- torch.lerp: Vectorize on CPU. (22038)
- torch.eye: Parallelize on CPU. (21077)
- torch.randperm: Parallelize initialization in randperm on CPU. (21529)
- Vectorization: Don't split 256-bit AVX2 load/store intrinsics. (20609).
🐎 Torch.NN Performance Improvements
- 🐎 nn.Softmax: Add persistent CUDA kernels that increase performance 2-10x on small inputs. (20827)
- 🐎 nn.Embedding / nn.EmbeddingBag: Optimize CUDA kernel, increasing performance up to 2.7x. (22016)
- ⚡️ nn.Linear: optimize BERT model perf by using mkldnn inner product. (21851)
- nn.Conv{1,2,3}D: improve perf for depthwise convolutions in torch.float16 on Volta and Turing GPUs. (22302)
- ⚡️ nn.RNN: optimize on CPU by fusing matmul ops. (22512)
- nn.Upsample: a number of significant perf improvements on CUDA. (21879, 21694).
- ⚡️ nn.functional.layer_norm: optimize a fast path for layer_norm, increasing perf by up to 4x on CPU. (20345, 20883)
- 👉 Use mkldnn inner product for nn.Linear() to improve BERT perf. (21851).
📚 Documentation
- torch.bool: doc the Boolean tensor type. (21601)
- 📄 torch.as_strided: add docs. (22842)
- 📄 torch.empty_strided: add docs. (23740)
- torch.lerp: clarify broadcasting requirements. (23268)
- torch.enable_grad / torch.no_grad / torch.set_grad_enable: clarify interaction between these features. (23310)
- torch.autograd.grad_mode: Document that no_grad is thread local. (21755)
- torch.multiprocessing: Explain refcounting of CUDA tensors. (19904)
- ⚠ torch.Tensor: Add a warning about memory usage. (20801)
- torch.utils.data.Dataloader: Document RNG state consumption. (22540)
- ⏱ torch.optim.lr_scheduler.CyclicLR: Clarify base_momentum and max_momentum. (20880).
- Document production environment features. (23010)
- ➕ Add note about contributing recently released research. (23513)
- 🐎 Clarify performance implications of deterministic mode. (21337)
- ⚡️ Update cuda pinned memory note to include tensor.to. (20977)
📚 Torch.NN Documentation
- 📄 nn.functional / nn.init: Break up NN in docs so they load faster. (21291)
- 🚚 nn.functional.conv{1,2,3}d: Remove padding_mode. (20891)
- nn.functional.upsample / nn.functional.interpolate: add note about overshooting with mode=‘bicubic’. (23321)
- nn.init.zeros_ / nn.init.ones_: add documentation. (23145)
- nn.MultiheadAttention: Add documentation for add_bias_kv, add_zero_attn, and attn_mask. (20071)
- 📚 nn.MultiheadAttention: Fix documentation for attention mask shape. (20850)
- nn.Softmax: Fixed to specify dimension to prevent warning in 1.1.0. (20310)
📚 Contributor Documentation
- 📚 Updated web links on contribution_guide and governance documentation. (21243)
- 👌 Improve documentation for publishing hub models. (21307)
- Suggest a faster linker in the contributing guide. (21334)
- ➕ Add CUDA C++11 and profiling notes to the contribution guide. (21386)
📚 Build Documentation
- ➕ Add magma for CUDA 10.1 to Windows docs. (19914)
- 👌 Improve build-from-source instructions. (20088)
- ➕ Add ninja to build instructions. (20079)
- ⚡️ Update libtorch build docs. (21150)
📚 TensorBoard Documentation
- 📚 Tensorboard Documentation has been greatly improved! Browse the latest version here.
📚 Torch HUB Documentation
- 👌 Improve docs for publishing hub models. (21307)
- ⚡️ Update docs of entry point in hub. (21568)
ONNX

👍 In PyTorch 1.2, we have added the full support for ONNX Opset 7, 8, 9 and 10 in ONNX exporter, and we have also enhanced the constant folding pass to support Opset 10. The export of ScriptModule has better support. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export.

👌 Supporting More ONNX Opsets
- ➕ Add basic supports for multiple ONNX Opsets and support for Opset 10. (19294)
- 👌 Support ONNX Opset 7 and 8 in PyTorch ONNX Exporter. (22421, 20036)
- Export Dropout for Opset 10. (20710)
- Export Slice and Flip for Opset 10. (20533)
- Export Interpolate (Resize) for Opset 10. (21434)
👍 Enhancing the Support for ScriptModule
- 👌 Support multiple outputs in ScriptModule in ONNX Exporter. (20256)
- 👌 Support tensor factories in ScriptModule in ONNX Exporter. (20255)
- 👌 Support tuples as inputs and outputs in ScriptModule. (20784)
Exporting More Torch Operators to ONNX
- Export custom ops. (21321)
- Export torch.arange. (22601)
- Export torch.masked_fill. (22521)
- Export torch.floor, torch.ceil, torch.log2 and prim::shape. (17895)
- Export torch._dim_arange. (20078)
- Export torch.randn_like. (20093)
- Export torch._standard_gamma. (20126)
- Export torch.topk. (21104)
- Export __and__, __or__. (17894)
- Export torch.sign. (20470)
- Export torch.scatter. (18543)
- Export torch.rand. (20559)
- Export torch.gather. (21235)
- Export torch.cosine_similarity. (21884)
- Export torch.sum. (22240)
- 🔊 Export torch.logsumexp. (22306)
- Export torch.layer_norm. (22265)
Extending Existing Exporting Logic
- 👌 Support torch.min and torch.max with dim. (19689)
- 👌 Support maxpool with dilations. (18721)
- 👌 Support RNN with batch_first=True. (19766)
- 👌 Support Upsample with dynamic input. (20116)
- 👌 Improve support for Loop export. (20445)
- Enable torch.full with scalar parameters. (21931)
- ➕ Added support for exporting models with variable length input/output to ONNX. (20034)
⚡️ Optimizing Exported ONNX Graph
- 👌 Support constant folding in Opset 10. (22515)
- 👌 Support negative indexing for Slice in constant folding optimization. (21811)
🛠 Bugfixes/Improvements
- 🛠 Fix the shape of PReLU weight. (21330)
- 🛠 Fix the export for torch.pixel_shuffle. (21486)
- 🛠 Fix the export for torch.full. (21669)
- ⚡️ Update logic for folding onnx::Constant nodes. (20109)
v1.2.0.a0
May 24, 2019
v1.1.0 Changes
May 01, 2019
👍 Note: CUDA 8.0 is no longer supported

Highlights

TensorBoard (currently experimental)

🌐 First-class and native support for visualization and model debugging with TensorBoard, a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple from torch.utils.tensorboard import SummaryWriter command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs here.

[JIT] Attributes in ScriptModules

👀 Attributes can be assigned on a ScriptModule by wrapping them with torch.jit.Attribute and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you call torch.jit.save(), so they are a great way to store arbitrary state in your model. See the docs for more info.

Example:
```
class Foo(torch.jit.ScriptModule):
  def __init__ (self, a_dict):
    super(Foo, self). __init__ (False)
    self.words = torch.jit.Attribute([], List[str])
    self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int])

  @torch.jit.script_method
  def forward(self, input: str) -> int:
    self.words.append(input)
    return self.some_dict[input]
```
👍 [JIT] Dictionary and List Support in TorchScript

👍 TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and for…in constructs.

[JIT] User-defined classes in TorchScript (experimental)

👀 For more complex stateful operations, TorchScript now supports annotating a class with @torch.jit.script. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See the docs for more info.
```
@torch.jit.script
class Pair:
    def __init__ (self, first, second)
        self.first = first
        self.second = second

    def sum(self):
        return self.first + self.second
```
DistributedDataParallel new functionality and tutorials

nn.parallel.DistributedDataParallel: can now wrap multi-GPU modules, which enables use cases such as model parallel (tutorial) on one server and data parallel (tutorial) across servers.
(19271).

💥 Breaking Changes
- Tensor.set_: the device of a Tensor can no longer be changed via Tensor.set_. This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in a Storage on a different CUDA device. Instead, set up the Tensor on the correct device from the beginning. (18832).
- ⏱ Pay attention to the order change of lr_scheduler.step(). (7889).
- 0️⃣ torch.unique: changed the default value of sorted to True. (15379).
- [JIT] Rename isTensor api -> isCompleteTensor. #18437
- [JIT] Remove GraphExecutor's python bindings. #19141
- [C++]: many methods on Type no longer exist; use the functional or Tensor method equivalent. (17991).
- [C++]: the Backend constructor of TensorOptions no longer exists. (18137).
- [C++, Distributed]: Remove c10d ProcessGroup::getGroupRank has been removed. (19147).
🆕 New Features

Operators
- torch.tril_indices, torch.triu_indices: added operator with same behavior as NumPy. (14904, 15203).
- torch.combinations, torch.cartesian_prod: added new itertools-like operators. (9393).
- torch.repeat_interleave: new operator similar to numpy.repeat. (18395).
- torch.from_file: new operator similar to Storage.from_file, but returning a tensor. (18688).
- torch.unique_consecutive: new operator with semantics similar to std::unique in C++. (19060).
- 👍 torch.tril, torch.triu, torch.trtrs: now support batching. (15257, 18025).
- 📜 torch.gather: add support for sparse_grad option. (17182).
- torch.std, torch.max_values, torch.min_values, torch.logsumexp can now operate over multiple dimensions at once. (14535, 15892, 16475).
- torch.cdist: added operator equivalent to scipy.spatial.distance.cdist. (16168, 17173).
- torch. __config__.show(): reports detailed version of all libraries. (18579).
NN
- nn.MultiheadedAttention: new module implementing MultiheadedAttention from Attention Is All You Need. (18334).
- 👍 nn.functional.interpolate: added support for bicubic. (9849).
- 🔀 nn.SyncBatchNorm: support synchronous Batch Normalization. (14267).
- 👍 nn.Conv: added support for Circular Padding via mode='circular'. (17240).
- nn.EmbeddingBag: now supports trainable `per_sample_weights. (18799).
- 👍 nn.EmbeddingBag: add support for from_pretrained method, as in nn.Embedding. (15273).
- RNNs: automatically handle unsorted variable-length sequences via enforce_sorted. (15225).
- nn.Identity: new module for easier model surgery. (19249).
Tensors / dtypes
- 👍 torch.bool: added support for torch.bool dtype and Tensors with that dtype (1-byte storage). NumPy conversion is supported, but operations are currently limited. (16810).
Optim
- ⏱ optim.lr_scheduler.CyclicLR: Support for Cyclical Learning Rate and Momentum. (18001).
- ⏱ optim.lr_scheduler.CosineAnnealingWarmRestarts: new scheduler: Stochastic Gradient Descent with Warm Restarts). (17226).
- 👌 Support multiple simultaneous LR schedulers. (14010)
Distributions
- 👍 torch.distributions: now support multiple inheritance. (16772).
Samplers
- quasirandom.SobolEngine: new sampler. (10505).
DistributedDataParallel
- 👍 nn.parallel.DistributedDataParallel: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). (18251, 18953).
TorchScript and Tracer
- 👍 Allow early returns from if-statements. (#154463)
- ➕ Add an @ignore annotation, which statically tells the TorchScript compiler to ignore the Python function. (#16055)
- Simple for...in loops on lists. (#16726)
- Ellipses (...) in Tensor indexing. (#17763)
- None in Tensor indexing. (#18615)
- 👌 Support for basic list comprehensions. (#17267)
- ➕ Add implicit unwrapping of optionals on if foo is not None. (#15587)
- Tensors, ints, and floats will once again be implicitly cast to bool if used in a conditional. (#18755).
- Implement to(), cpu(), and cuda() on ScriptModules. (#15340 , #15904)
- ➕ Add support for various methods on lists: (clear(), pop(), reverse(), copy() , extend(),index(), count(), insert(), remove() ).
- ➕ Add support for sort() on lists of specialized type (Tensors, int, float, bool). (#19572)
- ➕ Add support for various methods on strings: (index(), slice(), len())
- 👌 Support Tensor.to() in TorchScript. ( #15976 )
- 👌 Support for Torch.tensor() in TorchScript. (#14913, #19445)
- 👌 Support for torch.manual_seed() in TorchScript. (#19510)
- 👌 Support for nn.LSTM in TorchScript. (#15744)
- 👌 Support for nn.init in TorchScript. (#19640)
- ➕ Add hash() builtin. (#18258)
- ➕ Add min() and max() builtins for numerical types. (#15680)
- ➕ Add isinstance() builtin, which performs a static type check. (#15076)
- ➕ Add train() / eval() / is_training() to C++ ScriptModule API. (#16044)
- 👍 Allow List arguments to Python functions called from TorchScript. (#15721)
- 👍 Allow using std::vector and std::unordered_map as arguments to custom operators. (#17587)
- Tracer: now allows passing static dicts and lists as trace inputs. (#18092, #19580)
- 👍 Allow generic containers as ScriptModule inputs. (#16482)
- 👍 Allow nn.Sequential in ModuleList. (#16882)
Experimental Features
- [Quantization] (API unstable): added limited support for quantized datatypes via torch.qint8 dtype, torch.quantize_linear conversion function. (18230).
- [MKLDNN tensor] (API unstable): Added limited (opaque) support for MKLDNN tensors via Tensor.to_mkldnn(); operators are currently limited to ResNext101 operators. (17748).
👌 Improvements
- torch.min, torch.max, torch.median, torch.mode, torch.kthvalue, torch.symeig, torch.eig, torch.pstrf, torch.qr, torch.geqrf, torch.solve, torch.slogdet, torch.sort, torch.topk, torch.gels, torch.triangular_solve, torch.svd now return namedtuples describing their outputs. (16186, 16950, 17093, 17195, 15429).
- 📌 torch.empty (and other factory functions): now take a pin_memory kwarg; can now pin without going through torch.Storage interface.. (18455).
- 👍 torch.histc: Now supported on CUDA. (15842)
- torch.unique: Add return_counts. (18391, 18651).
- 🔊 torch.logspace: add the ability to specify a base. (19542).
- 🖨 torch.set_printoptions: added scientific notation support. (16876).
- torch.btrifact now handles tensors with greater than 3 dimensions. (14964).
- 👍 torch.kthvalue: now supported on CUDA. (17544).
- 👍 torch.abs: now supported on uint8 and int8 dtypes. (16893).
- 👍 torch.stack, torch.cat: now supported for CPU half tensors. (16389).
- 👍 torch.cross: added support for negative dimensions. (17582).
- 👍 torch.lerp: add support for weight as a Tensor. (17348).
- torch.transpose: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is. (17462, 17535).
- 🔊 torch.linspace, torch.logspace can now be used with steps=1 and start != end. (14748).
- torch.cholesky: changed the derivative from a triangular matrix to symmetric matrix. (19116).
- torch.lerp: Improved numerical stability. (18871).
- torch.logdet, torch.slogdet: improve numerical precision. (18449).
- Tensor. __contains__ is now supported. (17733).
- 👍 Tensor.fill_ and torch.zeros now support half on CPU. (17536).
- Tensor.resize_as_, Tensor.view: now supported on half CPU tensors. (18821).
- Tensor indexing: allow indexing via NumPy booleans. (14932).
- nn.EmbeddingBag: enable half precision dense backward. (19293).
- nn.Embedding: fix dense Embedding to work with double backwards. (9078).
- nn.MaxPool1d: Allow list and tuples to be passed as output_size. (16489).
- 👍 nn.CTCLoss: support zeroing infinite losses via zero_infinity argument. (16199).
- 👍 nn.Dropout: add support for enabling during eval. (17549).
- ⚠ nn.MSELoss: add warning about unexpected broadcasting. (18349).
- nn.Module.load_state_dict: also return missing_keys and unexpected_keys. (18668).
- nn.parallel.data_parallel: Enforce devices match device_ids. (17129).
- torch.device: handle in more places that used to accept only device ordinals. (14929)
- dtype.int8 tensors can now be converted to NumPy arrays. (14710).
- nn.functional.gumbel_softmax: allow multidimensional input with dim argument. (13339).
- nn.functional.cosine_similarity: improved precision. (18250).
- torch.autograd: Don't keep unnecessary saved_inputs alive, increasing memory efficiency. (16583).
- torch.autograd.profiler: add Self (non-nested) CPU Time Total, CPU time total (19378).
- 📌 DataLoader: support accepting a custom memory pinning function. (16743).
- DataLoader: retry libshm on EINTR. (15964).
- 🛠 DataLoader: fixed an issue with pin_memory and PackedSequence. (18079)
- data.utils.collate, data.utils.pin_memory: now preserve namedtuples. (16440)
- 👉 Use IndexError instead of RuntimeError on many indexing error cases. (17049, 17114).
- 👌 Support indexing a torch.float16 tensor on CPU. (17645).
- ➕ Add (limited) error checking in case of internal overlap on inplace operators. (19317, 17927).
- 👍 utils.checkpoint.checkpoint: support None as an argument to checkpoint function. (17969).
- 👻 torch.autograd: added more information for one of the variables needed for gradient computation has been modified by an inplace operation exception. (18523).
- 🔀 cuda.synchronize: add a device argument. (19573).
- cuda.reset_max_memory_*: now supported. (15985).
- distributions.Independent: can now calculate KL Divergence. (17681).
- 0️⃣ torch.distributed.new_group: now supports overriding default backend. (18595).
- 🖨 torch.distributed.init_process_group: will now propagate timeout to underlying Store. (16571).
- [JIT] Preserve module hierarchy on traced modules. (#15101)
- [JIT] Add metadata for TracedModules. (#17311)
- [JIT] Improve portability of int and float checks. (#19532)
- [JIT] Preserve method parameter names during serialization. (#16750)
- [JIT] Add a correctness check for C++ types to custom operators. (#15247)
- [JIT] Added a few extra python bindings to help with walking the IR graph from Python. #17822
- [JIT Error Messages] Print out operator suggestions for "unknown builtin op" error. (#15183)
- [JIT Error Messages] Better error message when creating a module instance in TorchScript. (#16416)
- [JIT Error Messages] Print suggestion to add nn.Module attributes to __constants__ when they are using in TorchScript. (#18164)
- [JIT Error Messages] torch.save(): Improve error message when you try to save a ScriptModule. (#15321)
- [JIT Error Messages] torch.jit.save(): Improve error message when trying to save a model with Python code. (#16850)
- [JIT Error Messages] Better errors when trying to close over a Tensor with grad enabled while tracing. (#18298, #19645)
- [JIT Error Messages] Better error when trying to add a Tensor to __constants__. (#16724)
- [JIT Error Messages] Better error when a module list isn't added to __constants__. (#17167)
- [JIT Error Messages] Add a warning when attempting to trace legacy constructors. (#16770)
- [JIT Error Messages] Improve hint when trying to trace non-deterministic nodes. (#17957)
- [C++] nn::Module: added Python interop. (13481).
- [C++] autograd::profiler: is now supported. (16580)
- [C++] allow detection of C++ ABI flag for cpp extensions from available runtime information. (18994).
- [C++] torch.argsort is now supported in C++. (17099).
- [C++] Tensor.isnan: now supported in C++. (15722).
- [C++]: Added named submodule support to nn::Sequential. (17552).
- [C++]: Kaiming Initialization. (14718).
- [C++] torch::data::transforms::Normalize: now supported in C++. (15891).
- [C++]: Support call operator on module holder calling forward. (15831).
  Random and Sequential distributed samplers. (16910).
- [C++]: pretty printing of C++ Modules. (15326).
- [C++] Support serializing std::vector<torch::Tensor>. (19677).
🐛 Bug Fixes

Serious
- torch.prod: correct erroneous calculation on large tensors. (15653).
- torch.mean (and other reductions): fix incorrect calculation on CUDA on large inputs. (16023).
- nn.Conv: correctly handle non-contiguous inputs on MKLDNN convolution codepath. (16300).
- Tensor.eq_: Fix erroneous calculation. (15475).
- torch.mean: Fix fp16 output calculation. (14878).
- nn.PoissonNLLLoss: Properly handle reduction=None. (17358).
- [JIT] Fix bug where custom ops could get optimized out if their outputs weren't used. (#18711).
- [JIT] Fix bug where the model serializer would accidentally reorder statements. (#17557).
Other
- Tensor.round is now consistently half to even. (17443).
- Tensor.resize_: Fix some 0-element cases. (14874).
- Tensor.numpy: Fix conversion of torch.int8 dtype. (15194).
- Tensor.grad: correctly handle del. (16525).
- Tensor.clamp: correctly handle NaN on CUDA. (15479).
- Tensor.topk: properly set launch bounds on CUDA. (17296).
- Tensor.kthvalue: treat NaN as bigger than any number. (17824).
- 🔀 Tensor.copy_: Properly synchronize on src and dst sreams. (16966).
- Tensor indexing: Fix incorrect dimension error message. (16495).
- 📜 Tensor.coalesce, Tensor.clone, Tensor.to_dense: fixed for sparse 0-dimensional tensors. (17379).
- torch.isinf: Don't error out on integral tensors. (15489).
- torch.argsort, torch.sort: Match NumPy by considering NaNs to be larger than any number. (15886).
- torch.geqrf, torch.ormqr: when an out parameter is specified, dispatch to the correct function. (16964).
- torch.cuda.get_device_name / torch.cuda.get_device_capability: Fix handling of optional. (17222).
- Tensor.tril_ / Tensor.triu_: properly reuse input memory. (17031).
- torch.arange: fix shape inconsistency between CPU and CUDA. (18462).
- torch.empty (and other size-based factory functions): properly enforce non-negative sizes. (17077).
- 👍 torch.load: support serializing / deserializing pathlib.Path object. (18562).
- nn.BatchNorm: correctly handle very large batches. (17047).
- 🔊 nn.Softmax / nn.LogSoftmax: fix double backward for torch.half. (17330).
- nn.Softmax: handle empty inputs in backward. (17259).
- nn.NLLLoss: Fix crash when ignore_index is out-of-bounds on CPU. (17328).
- 🔊 nn.Softmax, nn.LogSoftmax: handle 0-element inputs. (17651).
- nn.CTCLoss: correct error checking. (16269).
- 👍 nn.Conv: better report convolution size mismatch. (17436).
- torch.nn.functional.cosine_similarity: fix output sometimes returning result > 1.0. (18168).
- nn.parallel.data_parallel: Fix handling of buffers that require_grad. (13352).
- nn.parallel.data_parallel: would previously sometimes frees tensors before all pending operations finish. (18465).
- 🛠 torch.distributed.broadcast: fixed repeated calls leading to OOM. (19219).
- torch.multiprocessing: fix serialization of integer nn.Parameters. (18639).
- torch.multiprocessing: Fix handling of distributions on CUDA. (16854).
- torch.nonzero: Fix for 0-dimensional tensors on CUDA. (17406).
- torch.slogdet: Fix sign requiring grad when input required grad. (16337).
- ⏪ torch.cuda.Stream: Properly restore stream on destination device when switching devices. (17439).
- 🔀 torch.cuda.Stream: Fixed synchronization issue when used with non-current device. (15689).
- torch.cuda.Stream: properly change device in stream context manager. (16128).
- 🛠 DataLoader: fixed a hang when no data was read and the buffer size is smaller than the chunk size. (17409).
- 0️⃣ DataLoader: _utils.collate.default_collate now converts bool lists to byte Tensors, not integer tensors.
  (14669).
- DataLoader: ensure dataset is indexed by integers. (17649).
- 📜 torch.sparse.mm: Handle transposed dense tensors in backwards. (18737).
- 📜 torch.sparse.sum: Fix parsing of dim. (16517).
- 📜 torch.sparse.mm / torch.sparse.addmm: fix broadcasting and using uninitialized data. (16572).
- 📜 Tensor.to_sparse: Fix for 0-dimensional tensors. (17406).
- 📜 SparseTensor: fix add with non-contiguous values tensors. (18179).
- Fix compare_exchange_weak in weak_intrusive_ptr. (16302).
- utils.model_zoo.load_url: Fix race condition. (16578).
- utils.data.RandomSampler: have len properly take into account num_samples. (15991).
- torch.distributions: Fix precision issue with expansion that prefers probs over logits. (18614).
- 🛠 distributions.dirichlet.Dirichlet: fixed an underflow issue. (17488).
- 🛠 distributions.binomial.Binomial.log_prob: fixed numerical stability issue. (15962).
- 🆓 Caching Allocator: Free all blocks with outstanding events on OOM-retry. (19222).
- torch.dtype: fix pickling issue with Python 2. (18045).
- utils.data.DataLoader: Fix SIGCHLD checking. (19421).
- ⚡️ optim.Optimizer: Properly copy defaults. (19308).
- ⏱ optim.lr_scheduler.CosineAnnealingLR: Fix division-by-zero error. (19180).
- ⏱ optim.lr_scheduler.ReduceLROnPlateau: fix bug when the argument to step is reused outside the function.
  (16697).
- cudNN: fix race condition with multiple threads calling into the same device. (15080).
- cudNN: Properly specify accumulation types. (16825).
- cuDNN: Fix incorrectly selecting slower algorithms in certain cases. (15881).
- cuFFT: Properly handle CUDA contexts. (19300)
- Fix infinite loop in reduction functions when get_max_threads is nonzero but num_threads is 1. (15114).
- 🛠 Fix tensor printing bug with Python 2. (12732).
- MKLDNN: fix thread safety. (17022).
- [JIT] floordiv: Fix integer division and divide-by-zero semantics. (#15813).
- [JIT] Fix bug in alias analysis that disabled optimizations even in models without mutation. (#18416).
- [JIT] ord(): Fix handling of utf8 chars. (#19423).
- [JIT] Fix error when too many parameters are passed to a fused CUDA kernel. (#18063).
- [JIT] Fix bug where common subexpression elimination accidentally introduced aliasing to function outputs. (#19576).
- [JIT] Fix infinite loop in requires_grad analysis pass. (#18361).
- [JIT] Fix ordering of parameters for in rnn.py. (#18198).
- [JIT]] Fix contiguous autodiff and AutoGradZero inconsistency (#18633).
- [JIT] Fix error reporting in NVRTC use of the fuser. (#18327).
- [JIT] Ensure GIL is acquired before doing module lookup on import. (#17135).
- [JIT] Fix bug where _unique_state_dict could contain duplicate Tensors. (#18139).
- [C++]: Fix module serialization issue where one submodule doesn't have any parameters, but its submodules do. (15033).
- [C++]: Add Stream and Event APIs. (15937).
- [C++]: Fix Module serialization incompatibility between Python and C++ with weight-less layers. (19740).
- [C++]: Properly pass extra_cuda_cflags to C++ extensions on Windows. (18638).
- [C++] Make SGD semantics match python. (15840).
- [C++] torch::nn::init::orthogonal_: match Python API. (18915).
🗄 Deprecations
- 🚚 torch.btrifact: the deprecated info argument has been removed. (14935).
- 0️⃣ torch.potrs has been deprecated, use torch.cholesky_solve instead. Note that upper defaults to False for torch.cholesky_solve, and True for torch.potrs. (15334).
- 🗄 torch.pstrf is deprecated; use torch.cholesky instead. Note that upper defaults to False for torch.cholesky, and True for torch.pstrf. (17866).
- 0️⃣ torch.potri is deprecated; use torch.cholesky_inverse instead. Note that upper defaults to False for torch.cholesky_inverse, and True for torch.potri. (19498).
- torch.btrifact_with_info has been deprecated; use torch.lu with get_infos=True instead.(18435).
- 🗄 torch.btrifact has been deprecated; use the new name torch.lu instead. (18435).
- 🗄 torch.gesv is deprecated; use the new name `torch.solve instead. (18060).
- 🗄 torch.trtrs has been deprecated; use the new name torch.triangular_solve instead. (18213).
- 🗄 torch. btriunpack has been deprecated; use the new name torch.lu_unpack instead. (18529).
- 🗄 torch.btrisolve has been deprecated; use the new name torch.lu_solve instead. (18726).
- [C++] IntList has been deprecated, use IntArrayRef instead, as it better describes the type and ownership semantics in C++. (16751).
- [C++] Dispatch macros with Type parameters, e.g. AT_DISPATCH_ALL_TYPES(tensor.type(), ..., are now deprecated; use ScalarType instead, e.g. AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), .... (17527, 17996).
- [C++] the deprecated variable_tensor_functions have been removed. (15003).
🐎 Performance

Highlights
- nn.BatchNorm CPU inference speed increased up to ~19x.(19152).
- nn.AdaptiveAvgPool: speed up common-case of size=1 output by ~30x. (17011).
- 🐎 nn.EmbeddingBag CPU performance increased by ~4x. (19329).
- Tensor.copy_: sped up larger tensor copy ~2-3x, small regression in small tensor copy. (18618).
- torch.nonzero: is now ~2x faster than numpy on CPU. (15190)
- 👌 Improve caching allocator for Pascal and newer GPUs; 10-20% better memory utilization on Mask-RCNN. (17120).
- reduction functions: Speed up some large Tensor cases by 50-80%. (17428).
- [JIT] Graph fuser: better fusion for backwards graphs in the presence of broadcasting. (#14957)
- [JIT] Graph fuser: batch_norm fusion for inference. (#15146)
- [JIT] Graph fuser: layer_norm fusion for inference. (#18266)
Other
- torch.abs, torch.frac, torch.repiprocal, torch.neg have been vectorized and parallelized (19041).
- 🐎 torch.bmm: CPU performance increased by 2x. (19338).
- 🐎 torch.sort: CUDA performance increased by ~2x. (19379).
- torch.cat on CPU is now ~4x faster in the case where inputs are contiguous and dim != 0. (17032).
- 🐎 torch.multinomial fixed a 2x performance regression. (17121).
- torch.empty (and another factory functions): reduce overhead by 20-40%. (17565).
- torch.linspace has been parallelized on CPU. (15320).
- 🔊 torch.logspace has been parallelized on CPU. (15438).
- torch.range has been parallelized on CPU. (15484).
- torch.arange has been parallelized on CPU. (15667).
- torch.load: avoid unnecessary CPU-to-CUDA copy. (17297).
- reduction functions: improve efficiency on CUDA. (16224, 17040).
- Speed up some GEMM cases on CPU by up to 7x.(17730)
- Tensor iterator loop unrolling. (17667).
- 📜 sparse/dense matrix multiply: improve speed by ~5x. (16905).
- distributions.MultivariateNormal: sped up. (17294).
- [JIT] Graph fuser: pow scalar exponent / base autodiff, fusion (#19324)
- [JIT] Graph fuser: allow fusion of function float arguments. (#18087)
- [JIT] Shape analysis: specialize optional Tensor inputs to graphs. (#18360)
- [JIT] Shape analysis: various correctness improvements. (#18271)
- [JIT] Shape analysis: aten::_convolution now participates in shape analysis. (#16837]
- [JIT] Autodiff: coverage for ops used in maskrcnn & BERT. (#16689)
- [JIT] Autodiff: support for scalar comparison ops and randlike. (#14740)
- [JIT] Autodiff: support for adaptive_avg_pool2d. (#15459)
- [JIT] Autodiff: support for erf and erfc. (#15139)
- [JIT] Autodiff: support for layernorm. (#17702)
- [JIT] Autodiff: support for tanh. (#17816)
- [JIT] Autodiff: support for matmul/dropout. (#17523)
- [JIT] Autodiff: specialized CUDA impl for dropout. (#17756)
- [JIT] Constant folding: improved inlining of control flow. (#16244)
📚 Documentation
- 📚 Tensor.scatter_: add documentation about value parameter. (17467).
- Tensor.unfold: correctly document dimension parameter, not dim. (19020).
- Tensor.is_floating_point() is now documented. (15704).
- 📚 torch.cholesky: Fix broken upper example in documentation. (15215).
- torch.gesv: document out parameter. (15649).
- 👍 torch.mul: better explain elementwise multiplication. (15664).
- 👍 torch.eig, torch.symeig: better explain backwards limitations. (15929).
- 🛠 torch.ormqr: fixed output specification. (15694).
- torch.from_numpy: replaced usage with torch.as_tensor in documentation. (16587).
- 📄 torch.mvlgamma: Fix the constant in the docs. (17045).
- torch.mode: more precisely describe what is returned. (17069).
- 📚 torch.upsample: documentation now matches torch.interpolate. (17134)
- 📚 torch.arange: correct dtype documentation. (18604)
- torch.cumprod: document out parameter. (19340).
- torch.nonzero: document indices being returned lexicographically. (19539).
- 👍 torch.nn.functional.interpolate: better explain aligned_corners parameter. (14806).
- 📚 torch.nn.functional.pad: documentation has been made consistent with other functional ops. (15984).
- nn.functional.grid_sample: clarify behavior of padding. (19754).
- nn.TripletMarginLoss: correct type of swap parameter. (18115).
- 📚 nn.CrossEntropyLoss: clarify ignore_index documentation. (18117).
- nn.CrossEntropyLoss: the input format is more clearly explained. (15990).
- nn.CTCLoss: Clarify a number of ambiguities. (18415).
- 👍 nn.BCEWithLogitsLoss: add better explanation. (19212).
- 👍 nn.BCEWithLogitsLoss: better explain positive samples. (17258).
- 📚 nn.ModuleList / nn.ParameterList: update documentation. (17731).
- nn.Module.load_state_dict: correct semantics of strict. (17618)
- nn.parallel.DataParallel: more accurately specify how different argument types are handled. (15993).
- nn.parallel.DistributedDataParallel: Clarified batch size requirements. (16010).
- torch.distributed: Document mixed-precision training. (15440).
- torch.multiprocessing: Include example multiprocessing code. (16345).
- 👍 torch.autograd: Better explain computing Jacobian-vector product. (15197).
- torch.cuda.get_rng_state, torch.cuda.set_rng_state: document taking a device object. (14324).
- torch.device: Fix example of passing device to tensor factory. (16839).
- 📚 DataLoader: update documentation to describe how workers are managed. (18091).
- 📚 Unified shape formats throughout the documentation. (15741).
- 📚 Update documentation for reduction arguments to use non-deprecated format. (17300).
- mark_non_differentiable: document correct semantics. (17891).
- Warn about memory overlaps on inplace operations. (17576).
- 🛠 Fix a number of small issues with conv and pooling docstrings. (17052).
- 🛠 Fix a number of small issues with padding and activation docstrings. (17197).
- [C++]: mention packed accessors in Tensor basics. (19464).
ONNX

Exporting More Torch Operators to ONNX
- Export torch.isnan to ONNX (17698).
- Export torch.flatten to ONNX (16240).
- Export torch.where, torch.ceil, torch.floor to ONNX (18571).
- Export torch.narrow to ONNX (17550).
- Export torch.argmax and torch torch.argmin (17382, 18264, 18261).
- Export adaptive_avg_pool1D, adaptive_avg_pool2D, adaptive_avg_pool3D, adaptive_max_pool1D, adaptive_max_pool2D, adaptive_max_pool3D to ONNX (17412).
- Export torch.nonzero to ONNX (17036, 18047).
- Export torch.erf to ONNX (16106).
- Export torch.split (15092).
- Export torch.lt, torch.gt, torch.le, torch.ge, torch.eq, torch.ne to ONNX (15677).
- Export torch.expand and torch.ne to ONNX (15050).
- 🔊 Export torch.nn.LogSigmoid to ONNX (14830).
- Export torch.nn.RReLU to ONNX (14781).
- Export torch.reshape and torch.reshape_as to ONNX (16632, 16971).
- Replace use of ConstantLike with with ConstantOfShape (16095, 16214).
Extending Existing Exporting Logic
- 👍 Enable dim support in torch.nn.Softmax's export (18482).
- 👌 Support exporting squeeze & unsqueeze with negative dim attribute (19297).
- Support exporting max_pool1d, max_pool2d, max_pool3d with indices (16455).
- ➕ Add dtype support in torch.logsoftmax and torch.softmax's export (17672).
- Support ceil_mode in max_pool_1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d's export (16769).
⚡️ Optimizing Exported ONNX Graph
- ➕ Add constant folding in ONNX exporter (18698).
- Retain the parameter names in ONNX exporter (17551).
- Omit slice op if it is a non-op (19155).
- ➕ Add a flag to strip doc_string from exported ONNX models (18882).
- Omit torch.dropout if the model is in eval mode (16547).
➕ Adding Utility Functions and Refactoring
- Remove unused arg f from _model_to_graph(). (19647).
- ➕ Add the support for stable ONNX opsets in exporter (16068, 17419).
- ✅ Set the default ONNX opset to the latest stable opset (i.e., 9) (17736).
- ➕ Add an utility function to check whether it's in the middle of ONNX export or not (19050).
- 🔨 Refactoring serialization of ONNX initializers to be name-based (17830).
- 🔦 Expose dim() on type and use it in ONNX symbolics (15933).
- Add scalar_type_to_pytorch_type dict in ONNX symbolic (15965).
- ➕ Add an assertion to check the number of the parameters passed to ONNX exporter (18145).
🛠 Bugfixes
- 🛠 Fix different types in rsub caused bug (15707).
- 🛠 Fix list structure supports in ONNX exporter (19102).
- 🛠 Fix case for activations attribute in nn.RNN ONNX export. (19368).
- Minor fix for onnx ConstantOfShape export (18199).
- 🛠 Fix the torch.(reduce)min and torch.(reduce)max's export (15241).
- 🛠 Fixing ONNX export of logical ops to have correct output datatype (15185).
- 🛠 Fix typo in docstring (18216).
v1.1.0.a0
December 26, 2018
v1.0.1 Changes
February 07, 2019
Note: our conda install commands have slightly changed. Version specifiers such as cuda100 in conda install pytorch cuda100 -c pytorch have changed to conda install pytorch cudatoolkit=10.0 -c pytorch

💥 Breaking Changes

🚀 There are no breaking changes in this release.

🐛 Bug Fixes

Serious
- 🛠 Higher order gradients for CPU Convolutions have been fixed (regressed in 1.0.0 under MKL-DNN setting) #15686
- Correct gradients for non-contiguous weights in CPU Convolutions #16301
- 🛠 Fix ReLU on CPU Integer Tensors by fixing vec256 inversions #15634
- 🛠 Fix bincount for non-contiguous Tensors #15109
- 🛠 Fix torch.norm on CPU for large Tensors #15602
- 🛠 Fix eq_ to do equality on GPU (was doing greater-equal due to a typo) (#15475)
- ↪ Workaround a CuDNN bug that gave wrong results in certain strided convolution gradient setups
  - blacklist fft algorithms for strided dgrad (#16626)
Correctness
- 🛠 Fix cuda native loss_ctc for varying input length (#15798)
  - this avoids NaNs in variable length settings
- C++ Frontend: Fix serialization (#15033)
  - Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do
- 🛠 Fix derivative for mvlgamma (#15049)
- 🛠 Fix numerical stability in log_prob for Gumbel distribution (#15878)
- multinomial: fix detection and drawing of zero probability events (#16075)
Crashes
- ⚡️ PyTorch binaries were crashing on AWS Lambda and a few other niche systems, stemming from CPUInfo handling certain warnings as errors. Updated CPUInfo with relevant fixes.
- MKL-DNN is now statically built, to avoid conflicts with system versions
- 👍 Allow ReadyQueue to handle empty tasks (#15791)
  - Fixes a segfault with a DataParallel + Checkpoint neural network setting
- Avoid integer divide by zero error in index_put_ (#14984)
- 🛠 Fix for model inference crash on Win10 (#15919) (#16092)
- 👉 Use CUDAGuard when serializing Tensors:
  - Before this change, torch.save and torch.load would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1.
- Fix error with handling scalars and rpow , for example 1 ^^ x, where x is a PyTorch scalar (#16687)
- Switch to CUDA implementation instead of CuDNN if batch size >= 65536 for affine_grid (#16403)
  - CuDNN crashes when batch size >= 65536
- [Distributed] TCP init method race condition fix (#15684)
- [Distributed] Fix a memory leak in Gloo's CPU backend
- [C++ Frontend] Fix LBFGS issue around using inplace ops (#16167)
- [Hub] Fix github branch prefix v (#15552)
- 🛠 [Hub] url download bugfix for URLs served without Content-Length header
🐎 Performance
- 🛠 LibTorch binaries now ship with CuDNN enabled. Without this change, many folks saw significant perf differences while using LibTorch vs PyTorch, this should be fixed now. #14976
- 👉 Make btriunpack work for high dimensional batches and faster than before (#15286)
- 👌 improve performance of unique with inverse indices (#16145)
- 🔨 Re-enable OpenMP in binaries (got disabled because of a CMake refactor)
Other
- create type hint stub files for module torch (#16089)
  - This will restore auto-complete functionality in PyCharm, VSCode etc.
- 🛠 Fix sum_to behavior with zero dimensions (#15796)
- Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
- 🛠 Fixes various error message / settings in dynamic weight GRU / LSTMs (#15766)
- C++ Frontend: Make call operator on module holder call forward (#15831)
- C++ Frontend: Add the normalize transform to the core library (#15891)
- 🛠 Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)
- Implements Batched upper triangular, lower triangular (#15257)
- ➕ Add torch.roll to documentation (#14880)
- 👍 (better errors) Add backend checks for batch norm (#15955)
JIT
- ➕ Add better support for bools in the graph fuser (#15057)
- 👍 Allow tracing with fork/wait (#15184)
- 👌 improve script/no script save error (#15321)
- ➕ Add self to Python printer reserved words (#15318)
- 👍 Better error when torch.load-ing a JIT model (#15578)
- 🛠 fix select after chunk op (#15672)
- ➕ Add script standard library documentation + cleanup (#14912)
v1.0.0 Changes
December 07, 2018
Table of Contents
- Highlights
  - JIT
  - Brand New Distributed Package
  - C++ Frontend [API Unstable]
  - Torch Hub
- 💥 Breaking Changes
- ➕ Additional New Features
  - N-dimensional empty tensors
  - New Operators
  - New Distributions
  - Sparse API Improvements
  - Additions to existing Operators and Distributions
- 🐛 Bug Fixes
  - Serious
  - Backwards Compatibility
  - Correctness
  - Error checking
  - Miscellaneous
- Other Improvements
- 🗄 Deprecations
  - CPP Extensions
- 🐎 Performance
- 📚 Documentation Improvements
Highlights

JIT

The JIT is a set of compiler tools for bridging the gap between research in PyTorch
⚡️ and production. It allows for the creation of models that can run without a dependency on the Python interpreter and which can be optimized more aggressively. Using program annotations existing models can be transformed into Torch Script, a subset of Python that PyTorch can run directly. Model code is still valid Python code and can be debugged with the standard Python toolchain. PyTorch 1.0 provides two ways in which you can make your existing code compatible with the JIT, using torch.jit.trace or torch.jit.script. Once annotated, Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.
```
# Write in Python, run [email protected] RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
```
As an example, see a tutorial on deploying a seq2seq model,
📄 loading an exported model from C++, or browse the docs.

📦 Brand New Distributed Package

📦 The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by a brand new re-designed distributed library. The main highlights of the new library are:
- 🆕 New torch.distributed is performance driven and operates entirely asynchronously for all backends: Gloo, NCCL, and MPI.
- 🐎 Significant Distributed Data Parallel performance improvements especially for hosts with slower networks such as ethernet-based hosts
- ➕ Adds async support for all distributed collective operations in the torch.distributed package.
- 📄 Adds the following CPU ops in the Gloo backend: send, recv, reduce, all_gather, gather, scatter
- ➕ Adds barrier op in the NCCL backend
- 📄 Adds new_group support for the NCCL backend
C++ Frontend [API Unstable].

🐎 The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:

Python C++

import torch

model = torch.nn.Linear(5, 1) ⚡️ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() ⚡️ optimizer.step() | #include <torch/torch.h>

torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); ⚡️ optimizer.step(); |

We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next couple of releases. Some parts of the API may undergo breaking changes during this time.

📚 See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.

Torch Hub

Torch Hub is a pre-trained model repository designed to facilitate research reproducibility.

👀 Torch Hub supports publishing pre-trained models (model definitions and pre-trained weights) to a github repository using a simple hubconf.py file; see hubconf for resnet models in pytorch/vision as an example. Once published, users can load the pre-trained models using the torch.hub.load API.

📚 For more details, see the torch.hub documentation. Expect a more-detailed blog post introducing Torch Hub in the near future!

💥 Breaking Changes
- 📄 Indexing a 0-dimensional tensor will now throw an error instead of warn. Use tensor.item() instead. (#11679).
- 🚚 torch.legacy is removed. (#11823).
- torch.masked_copy_ is removed, use torch.masked_scatter_ instead. (#9817).
- Operations that result in 0 element tensors may return changed shapes.
  - Before: all 0 element tensors would collapse to shape (0,). For example, torch.nonzero is documented to return a tensor of shape (n,z), where n = number of nonzero elements and z = dimensions of the input, but would always return a Tensor of shape _(0,) when no nonzero elements existed.
  - Now: Operations return their documented shape.
  Previously: all 0-element tensors are collapsed to shape (0,)
  
  torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)
  
  Now, proper shape is returned
  
  torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
- Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
- 🚚 torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
- Some inter-type operations (e.g. *) between torch.Tensors and NumPy arrays will now favor dispatching to the torch variant. This may result in different return types. (#9651).
- 🚚 Implicit numpy conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')) before an implicit conversion. (#10553).
- torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
- 📄 torch.tensor function with a Tensor argument now returns a detached Tensor (i.e. a Tensor where grad_fn is None). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
  #11815).
- torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape (N,) instead of (N, C) to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
  (#9965).
- The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
- 📄 Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
- 📄 CPP Extensions: Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and TensorOptions as the last argument. For example, replace your call to at::ones(torch::CPU(at::kFloat)), {2, 3}) with torch::ones({2, 3}, at::kCPU). This applies to the following functions:
  - arange, empty, eye, full, linspace, logspace, ones, rand, randint, randn, randperm, range, zeros.
- 0️⃣ torch.potrf renamed to torch.cholesky. It has a new default (upper=False) (#12699).
- 📇 Renamed elementwise_mean to mean for loss reduction functions (#13419)
➕ Additional New Features

N-dimensional empty tensors
- Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
  
  torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
🆕 New Operators
- 📄 torch.argsort similar to numpy.argsort.
  (#9600).
- 📄 torch.pdist similar to scipy.spatial.distance.pdist. (#10782).
- 📄 torch.tensordot similar to numpy.tensordot. (#10025).
- 📄 torch.broadcast_tensors similar to numpy.broadcast_arrays.
  (#10075).
- 📜 torch.narrow support for sparse tensors.
  (#11342).
- 📄 torch.matrix_rank similar to numpy.linalg.matrix_rank.
  (#10338).
- 📄 torch.matrix_power similar to numpy.linalg.matrix_power. (#11421).
- 📄 torch.nn.CeLU activation. (#8551).
- 📄 torch.nn.CTCLoss. (#9628).
- 📄 torch.diag_embed (#12447).
- 📄 torch.roll operator to match numpy.roll (#13261, #13588, #13874).
- 📄 torch.chain_matmul for performing a chain of matrix multiplies (#12380).
- 📄 torch.finfo,
  📄 torch.info to get more detailed information on a dtype, similar to numpy.finfo and numpy.iinfo (#12472).
- Tensor. __cuda_array_interface__ to provide compatibility with numba and other CUDA projects (#11984).
🆕 New Distributions
- 📄 Weibull Distribution. (#9454).
- 📄 NegativeBinomial Distribution. (#9345).
- 🌲 torch.mvlgamma Multivariate Log-Gamma Distribution. (#9451).
📜 Sparse API Improvements
- 👀 Implemented "sparse gradient" versions of some existing functions, see sparse.mm, sparse.sum, sparse.addmm for details. (#14526, #14661, #12430, #13345).
- 📜 Tensor.to_sparse() allows conversion from a dense tensor to a sparse tensor. (#12171)
- 📜 torch.cat now supports sparse tensors. (#13761, #13577).
- 📜 torch.unsqueeze now works with sparse vectors (this also makes torch.stack work out of the box). (#13760).
- Autograd is now supported on values() and torch.sparse_coo_tensor (with indices and values tensors). E.g., torch.sparse_coo_tensor(i, v).values().sum() is differentiable w.r.t. v. See the updated torch.sparse documentation for details. (#13001).
➕ Additions to existing Operators and Distributions
- 📄 torch.unique now accepts an optional dim argument. (#10423).
- 📄 torch.norm now supports matrix norms.
  (#11261).
- 📄 torch.distributions.kl.kl_divergence now supports broadcasting. (#10533).
- 📄 torch.distributions now support an expand method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341).
- 📄 torch.nn.functional.grid_sample now support nearest neighbor interpolation and reflection padding. (#10051).
- 📄 torch.mean now works across multiple dimensions. (#14252).
- 📄 torch.potrs supports batching (#13453).
- 📄 torch.multiprocessing.spawn helper for spawning processes. (#13518).
- 📄 torch.pow now allows taking derivatives when invoked with a python number as a base. (#12450).
- 📄 Tensor.to now supports a copy keyword argument. (#12571).
- 🌲 torch.softmax and and torch.log_softmax now support a dtype accumulation argument. (#11719).
- 📄 torch.svd supports a compute_uv argument for optionally computing singular vectors (#12517).
- 📄 torch.inverse now supports batches of tensors (#9949).
- 📄 autograd.profiler shows demangled names on nvtx ranges. (#13154).
🐛 Bug Fixes

Serious
- 📄 torch.nn.functional.softmin was using the incorrect formula in 0.4.1 (#10066).
- 📄 torch.as_strided backwards (called via view) was incorrect with overlapping data locations. (#9538).
- 📄 Pointwise losses (e.g. torch.nn.MSELoss) were sometimes using the wrong reduction method. (#10018).
- 📄 torch.from_numpy was not handling big-endian dtypes correctly. (#9508).
- 📄 torch.multiprocessing now correctly handles CUDA tensors, requires_grad settings, and hooks. (#10220).
- __rsub__ now works properly when the CUDA device is not 0. (#12956).
- 🛠 Fix memory leak during packing in tuples (#13305).
- 🛠 DataLoader fixed a couple of issues resulting in hangs. (#11985, #12700).
- 📄 torch.multinomial replacement=False will not properly throw an error message when there are no more categories to select (#12490).
- torch.masked_fill_ now works properly on non-contiguous tensor inputs (#12594).
- Tensor. __delitem__: fixed a segmentation fault on (#12726).
Backwards Compatibility
- torch.nn.Module load_from_state_dict now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781).
- 🛠 Fix RuntimeError: storages don't support slicing when loading models saved with PyTorch 0.3. (#11314).
- 🛠 BCEWithLogitsLoss: fixed an issue with legacy reduce parameter. (#12689).
Correctness
- 📄 torch.nn.Dropout fused kernel could change parameters in eval mode. (#10621).
- 🛠 torch.unbind backwards has been fixed. (#9995).
- 🛠 Fix a bug in sparse matrix-matrix multiplication when a sparse matrix is coalesced then transposed. (#10496).
- 📄 torch.bernoulli now handles out= parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273).
- 📄 torch.Tensor.normal_ could give incorrect results on CPU. (#10846).
- 📄 torch.tanh could return incorrect results on non-contiguous tensors. (#11226).
- 🌲 torch.log on an expanded Tensor gave incorrect results on CPU. (#10269).
- 🔊 torch.logsumexp now correctly modifies the out parameter if it is given. (#9755).
- 📄 torch.multinomial with replacement=True could select 0 probability events on CUDA. (#9960).
- 📄 torch.nn.ReLU will now properly propagate NaN.
  (#10277).
- 📄 torch.max and torch.min could return incorrect values on input containing inf / -inf. (#11091).
- 🛠 Fixed an issue with calculated output sizes of torch.nn.Conv modules with stride and dilation. (#9640).
- 📄 torch.nn.EmbeddingBag now correctly returns vectors filled with zeros for empty bags on CUDA. (#11740).
- 👉 Use integer math to compute output size of pooling operations (#14405).
- 🛠 Fix sum() on fp16 (#13926).
- Remove CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for accuracy (#13844).
- 🛠 fix stability in bce with pos_weight formula (#13863).
- 🛠 Fix torch.dist for infinity, zero and minus infinity norms (#13713).
- Give broadcast_coalesced tensors different version counters (#13594).
- 🛠 Fix flip() shape bug in CPU (#13344).
- 🛠 Fix more spectral norm bugs (#13350).
- 🛠 Fix handling of single input in gradcheck (#13543).
- 👀 torch.cuda.manual_seed now also sets the philox seed and offset. (#12677).
- 📄 utils.bottleneck fix ZeroDivisionError(#11987).
- 📄 Disable hook serialization (#11705).
- 📄 torch.norm: fix negative infinity norm (#12722).
- 🛠 Fix torch.isfinite for integer input (#12750).
- 📄 ConvTranspose3d fix output_size calculation (#12952).
- 📄 torch.randperm: properly use RNG mutex on CPU (#13832)
Error checking
- 📄 torch.gesv now properly checks LAPACK errors. (#11634).
- 🛠 Fixed an issue where extra positional arguments were accepted (and ignored) in Python functions calling into C++. (#10499).
- legacy Tensor constructors (e.g. torch.FloatTensor(...)) now correctly check their device argument.
  (#11669).
- Properly check that out parameter is a CPU Tensor for CPU unary ops. (#10358).
- 📄 torch.nn.InstanceNorm1d now correctly accepts 2 dimensional inputs. (#9776).
- torch.nn.Module.load_state_dict had an incorrect error message. (#11200).
- 📄 torch.nn.RNN now properly checks that inputs and hidden_states are on the same devices. (#10185).
- torch.nn.utils.rnn.pack_padded_sequence now properly checks for out-of-order length. (#13933).
- 📄 torch.bmm now properly checks that its Tensor arguments are on compatible devices. (#12434).
- 🛠 Conv2d: fixed incorrect error message for too-large kernel size (#12791).
- 📄 Tensor.expand error message now includes complete sizes. (#13124).
- 👌 Improve CUDA out-of-memory error message. (#13751).
- 📄 torch.arange now properly checks for invalid ranges. (#13915)
Miscellaneous
- 📄 torch.utils.data.DataLoader could hang if it was not completely iterated. (#10366).
- 🛠 Fixed a segfault when grad to a hook function is None. (#12028).
- 🛠 Fixed a segfault in backwards with torch.nn.PReLU when the input does not require grad. (#11758).
- 🛠 dir(torch) has been fixed with Python 3.7. (#10271).
- 🛠 Fixed a device-side assert in torch.multinomial when replacement=False and the input has fewer nonzero elements than num_samples. (#11933).
- Can now properly assign a torch.float16 dtype tensor to .grad. (#11781).
- 🛠 Fixed can only join a started process error with torch.utils.data.DataLoader. (#11432).
- 📄 Prevent unexpected exit in torch.utils.data.DataLoader on KeyboardInterrupt. (#11718).
- 📄 torch.einsum now handles spaces consistently. (#9994).
- 🛠 Fixed a broadcasting bug in torch.distributions.studentT.StudentT. (#12148).
- 🛠 fix a printing error with large non-contiguous tensors. (#10405).
- allow empty index for scatter_* methods (#14077)
- 📄 torch.nn.ModuleList now handles negative indices. (#13102).
- Minor fix to reenable nvtx sequence numbers for the forward methods of custom (Python) autograd functions (#13876)
- 🛠 Fix handling all empty bags in CUDA embedding bag (#13483)
- Fix half_tensor.bernoulli_(double) (#13474)
- 🛠 Fix cuda out of memory test (#13864)
- Implement NaN-propagating max/min on Vec256. (#13399).
- 🛠 Fix refcounting in anomaly metadata (#13249)
- 🛠 Fix pointwise loss broadcast (#12996)
- 🛠 Fix copying a nn.Parameter (#12886)
Other Improvements
- 📄 torch.cuda functions and torch.nn.parallel.data_parallel now accept torch.device objects in addition to integer device ids. (#10833, #10189).
- 📄 torch.nn.parallel.data_parallel now accepts torch.device inputs. (#10189).
- 🌲 torch.nn.functional.log_softmax is now more numerically stable. (#11866).
- 👌 Improve printing of sparse tensors and grad_fns. (#10181).
- Only involve CUDA device in CUDA -> CPU copy. (#11592).
- Accept numpy floating-point scalars as doubles more consistently. (#9659).
- 📄 sparse-to-sparse copy_ is now supported. (#9005).
- 📄 torch.bincount now supports 0 element inputs. (#9757).
- 📄 torch.nn.functional.conv2d error message have been improved. (#11053).
- 👍 Allow conversion of np.int64 to PyTorch scalar. (#9225).
- 📄 torch.einsum now handles varargs.
  (#10067).
- 📄 torch.symeig now returns 0-filled eigenvectors when eigenvectors=False is passed on CUDA rather than uninitialized data. (#10645).
- 📄 torch.utils.checkpoint: added an pption to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout. (#14253).
🗄 Deprecations
- ✂ Removed support for C extensions. Please use cpp extensions. (#12122)
- ✂ Delete torch.utils.trainer (#12487)
CPP Extensions
- 🗄 The torch/torch.h header is deprecated in favor of torch/extension.h, which should be used in all C++ extensions going forward. Including torch/torch.h from a C++ extension will produce a warning. It is safe to batch replace torch/torch.h with torch/extension.h.
- 🗄 Usage of the following functions in C++ extensions is also deprecated:
  - torch::set_requires_grad. Replacement: at::Tensor now has a set_requires_grad method.
  - torch::requires_grad. Replacement: at::Tensor now has a requires_grad method.
  - torch::getVariableType. Replacement: None.
- 🛠 Fix version.groups() (#14505)
- 👍 Allow building libraries with setuptools that dont have abi suffix (#14130)
- Missing .decode() after check_output in cpp_extensions (#13935)
torch.distributed
- 📦 the old (THD-backed) torch.distributed package is deprecated but still available at torch.distributed.deprecated.
- 🗄 The old (THD-backed) torch.nn.parallel.DistributedDataParallel is deprecated but still available at torch.nn.parallel.deprecated.DistributedDataParallel.
🐎 Performance
- 🐎 "Advanced Indexing" performance has improved on both CPU and GPU. (#13420)
- 📄 torch.nn.functional.grid_sample on CPU now uses vectorized operation and is now 2x~7x faster with AVX2 enabled CPUs. (#9961).
- 📄 torch.norm has been vectorized and parallelized on CPU. (#11565).
- 📄 torch.max and torch.min has been parallelized on CPU. (#10343).
- 📄 torch.nn.Threshold and torch.nn.ReLU have been parallelized on CPU. (#13182)
- torch.Tensor.masked_fill_ has been parallelized on CPU. (#11359).
- 📜 Tensor.sparse_mask has been parallelized on CPU. (#13290).
- 📄 torch.nn.PReLU has been sped up on both CPU and GPU.
  (#11758).
- 📄 torch.nn.KLDivLoss has been sped up on both CPU and GPU. (#10336).
- 📄 torch.svd has been sped up on both CPU and GPU.
  (#11194).
- 📄 torch.einsum has been greatly sped up on CPU.
  (#11292).
- 📄 torch.clamp no longer does unnecessary copying. (#10352).
- 📄 torch.add, torch.sub, torch.mul, torch.div are much faster for non-contiguous tensors on GPU. (#8919).
- 📄 torch.nn.RNN and related Modules have been ported to C++ and are more performant. (#10305, #10481).
- 📄 autograd.Profiler now has lower overhead. (#10969, #11773).
- 🖨 Printing large Tensors is now faster. (#14418).
📚 Documentation Improvements
- 📄 Reproducibility note added. (#11329).
- 📚 CPP Extensions have improved online documentation. Authors of C++ extensions may want to consult this documentation when writing new extensions.
- 📄 torch.Tensor.flatten is now documented.
  (#9876).
- 📄 torch.digamma is now documented. (#10967).
- 📄 torch.allclose is now documented. (#11185).
- 📄 torch.eig return format clarified. (#10315).
- 📄 torch.as_tensor now includes a proper example. (#10309).
- torch.sparse_coo_tensor now explains uncoalesced behavior. (#10308).
- 📄 torch.fft equation has been corrected. (#10760).
- 📄 torch.nn.LSTM behavior has been clarified in the multilayer case. (#11896).
- 📚 torch.nn.functional.dropout documentation has been clarified. (#10417).
- 📚 torch.nn.functional.pad documentation has been clarified. (#11623).
- Add documentation about Tensor properties device, is_cuda, requires_grad, is_leaf and grad.
  (#14339)
- 📚 torch.sparse documentation updated (#12221).
- ➕ Added a quick rundown of codebase structure. (#12693)
- 📄 torch.cat corrected parameter name to tensors from seq. (#12741)
- Warn that tensor.resize_() resets strides (#12816)
v1.0.rc1 Changes
October 02, 2018
This is a pre-release preview, do not rely on the tag to have a fixed set of commits, or rely on the tag for anything practical / important

Table of Contents
- Highlights
  - JIT
  - torch.distributed new "C10D" library
  - C++ Frontend [API Unstable]
- 💥 Breaking Changes
  - Additional New Features
  - N-dimensional empty tensors
  - New Operators
  - New Distributions
  - Additions to existing Operators and Distributions
- 🐛 Bug Fixes
  - Serious
  - Backwards Compatibility
  - Correctness
  - Error checking
  - Miscellaneous
- Other Improvements
- 🗄 Deprecations
  - CPP Extensions
- 🐎 Performance
- 📚 Documentation Improvements
Highlights

JIT

The JIT is a set of compiler tools for bridging the gap between research in PyTorch
and production. It includes a language called Torch Script (don't worry it is a subset of Python,
so you'll still be writing Python), and two ways in which you can make your existing code compatible with the JIT.
⚡️ Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.
```
# Write in Python, run [email protected] RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
```
As an example, see a tutorial on deploying a seq2seq model,
📄 loading an exported model from C++, or browse the docs.

torch.distributed new "C10D" library

📦 The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by the new "C10D" library. The main highlights of the new library are:
- 🐎 C10D is performance driven and operates entirely asynchronously for all backends: Gloo, NCCL, and MPI.
- 🐎 Significant Distributed Data Parallel performance improvements especially for slower network like ethernet-based hosts
- ➕ Adds async support for all distributed collective operations in the torch.distributed package.
- ➕ Adds send and recv support in the Gloo backend
C++ Frontend [API Unstable].

🐎 The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:

Python C++

import torch

model = torch.nn.Linear(5, 1) ⚡️ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() ⚡️ optimizer.step() | #include <torch/torch.h>

torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); ⚡️ optimizer.step(); |

We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next month or two. Some parts of the API may undergo breaking changes during this time.

📚 See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.

💥 Breaking Changes
- 📄 Indexing a 0-dimensional tensor will now throw an error instead of warn. Use tensor.item() instead. (#11679).
- 🚚 torch.legacy is removed. (#11823).
- torch.masked_copy_ is removed, use torch.masked_scatter_ instead. (#9817).
- Operations that result in 0 element tensors may return changed shapes.
  - Before: all 0 element tensors would collapse to shape (0,). For example, torch.nonzero is documented to return a tensor of shape (n,z), where n = number of nonzero elements and z = dimensions of the input, but would always return a Tensor of shape _(0,) when no nonzero elements existed.
  - Now: Operations return their documented shape.
  Previously: all 0-element tensors are collapsed to shape (0,)
  
  torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)
  
  Now, proper shape is returned
  
  torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
- Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
- 🚚 torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
- Some inter-type operations (e.g. *) between torch.Tensors and NumPy arrays will now favor dispatching to the torch variant. This may result in different return types. (#9651).
- 🚚 Implicit numpy conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')) before an implicit conversion. (#10553).
- torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
- 📄 torch.tensor function with a Tensor argument now returns a detached Tensor (i.e. a Tensor where grad_fn is None). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
  #11815).
- torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape (N,) instead of (N, C) to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
  (#9965).
- The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
- 📄 Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
- 📄 CPP Extensions: Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and TensorOptions as the last argument. For example, replace your call to at::ones(torch::CPU(at::kFloat)), {2, 3}) with torch::ones({2, 3}, at::kCPU). This applies to the following functions:
  - arange, empty, eye, full, linspace, logspace, ones, rand, randint, randn, randperm, range, zeros.
➕ Additional New Features

N-dimensional empty tensors
- Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
  
  torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
🆕 New Operators
- 📄 torch.argsort similar to numpy.argsort.
  (#9600).
- 📄 torch.pdist similar to scipy.spatial.distance.pdist. (#10782).
- 📄 torch.tensordot similar to numpy.tensordot. (#10025).
- 📄 torch.broadcast_tensors similar to numpy.broadcast_arrays.
  (#10075).
- 📜 torch.narrow support for sparse tensors.
  (#11342).
- 📄 torch.matrix_rank similar to numpy.linalg.matrix_rank.
  (#10338).
- 📄 torch.matrix_power similar to numpy.linalg.matrix_power. (#11421).
- 📄 torch.nn.CeLU activation. (#8551).
- 📄 torch.nn.CTCLoss. (#9628).
🆕 New Distributions
- 📄 Weibull Distribution. (#9454).
- 📄 NegativeBinomial Distribution. (#9345).
- 🌲 torch.mvlgamma Multivariate Log-Gamma Distribution. (#9451).
➕ Additions to existing Operators and Distributions
- 📄 torch.unique now accepts an optional dim argument. (#10423).
- 📄 torch.norm now supports matrix norms.
  (#11261).
- 📄 torch.distributions.kl.kl_divergence now supports broadcasting. (#10533).
- 📄 torch.distributions now support an expand method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341).
- 📄 torch.nn.functional.grid_sample now support nearest neighbor interpolation and reflection padding. (#10051).
🐛 Bug Fixes

Serious
- 📄 torch.nn.functional.softmin was using the incorrect formula in 0.4.1. (#10066)
- 📄 torch.as_strided backwards (called via view) was incorrect with overlapping data locations. (#9538).
- 📄 Pointwise losses (e.g. torch.nn.MSELoss were sometimes using the wrong reduction method. (#10018).
- 📄 torch.from_numpy was not handling big-endian dtypes correctly. (#9508).
- 📄 torch.multiprocessing now correctly handles CUDA tensors, requires_grad settings, and hooks. (#10220).
Backwards Compatibility
- torch.nn.Module load_from_state_dict now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781).
- 🛠 Fix RuntimeError: storages don't support slicing when loading models saved with PyTorch 0.3. (#11314).
Correctness
- 📄 torch.nn.Dropout fused kernel could change parameters in eval mode. (#10621).
- 🛠 torch.unbind backwards has been fixed. (#9995).
- 🛠 Fix a bug in sparse matrix-matrix multiplication when a sparse matrix is coalesced then transposed. (#10496).
- 📄 torch.bernoulli now handles out= parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273).
- 📄 torch.Tensor.normal_ could give incorrect results on CPU. (#10846).
- 📄 torch.tanh could return incorrect results on non-contiguous tensors. (#11226).
- 🌲 torch.log on an expanded Tensor gave incorrect results on CPU. (#10269).
- 🔊 torch.logsumexp now correctly modifies the out parameter if it is given. (#9755).
- 📄 torch.multinomial with replacement=True could select 0 probability events on CUDA. (#9960).
- 📄 torch.nn.ReLU will now properly propagate NaN.
  (#10277).
- 📄 torch.max and torch.min could return incorrect values on input containing inf / -inf. (#11091).
- 🛠 Fixed an issue with calculated output sizes of torch.nn.Conv modules with stride and dilation. (#9640).
- 📄 torch.nn.EmbeddingBag now correctly returns vectors filled with zeros for empty bags on CUDA. (#11740).
Error checking
- 📄 torch.gesv now properly checks LAPACK errors. (#11634).
- 🛠 Fixed an issue where extra positional arguments were accepted (and ignored) in Python functions calling into C++. (#10499).
- legacy Tensor constructors (e.g. torch.FloatTensor(...)) now correctly check their device argument.
  (#11669).
- Properly check that out parameter is a CPU Tensor for CPU unary ops. (#10358).
- 📄 torch.nn.InstanceNorm1d now correctly accepts 2 dimensional inputs. (#9776).
- torch.nn.Module.load_state_dict had an incorrect error message. (#11200).
- 📄 torch.nn.RNN now properly checks that inputs and hidden_states are on the same devices. (#10185).
Miscellaneous
- 📄 torch.utils.data.DataLoader could hang if it was not completely iterated. (#10366).
- 🛠 Fixed a segfault when grad to a hook function is None. (#12028).
- 🛠 Fixed a segfault in backwards with torch.nn.PReLU when the input does not require grad. (#11758).
- 🛠 dir(torch) has been fixed with Python 3.7. (#10271).
- 🛠 Fixed a device-side assert in torch.multinomial when replacement=False and the input has fewer nonzero elements than num_samples. (#11933).
- Can now properly assign a torch.float16 dtype tensor to .grad. (#11781).
- 🛠 Fixed can only join a started process error with torch.utils.data.DataLoader. (#11432).
- 📄 Prevent unexpected exit in torch.utils.data.DataLoader on KeyboardInterrupt. (#11718).
- 📄 torch.einsum now handles spaces consistently. (#9994).
- 🛠 Fixed a broadcasting bug in torch.distributions.studentT.StudentT. (#12148).
- 🛠 fix a printing error with large non-contiguous tensors. (#10405).
Other Improvements
- 📄 torch.cuda functions and torch.nn.parallel.data_parallel now accept torch.device objects in addition to integer device ids. (#10833, #10189).
- 📄 torch.nn.parallel.data_parallel now accepts torch.device inputs. (#10189).
- 🌲 torch.nn.functional.log_softmax is now more numerically stable. (#11866).
- 👌 Improve printing of sparse tensors and grad_fns. (#10181).
- Only involve CUDA device in CUDA -> CPU copy. (#11592).
- Accept numpy floating-point scalars as doubles more consistently. (#9659).
- 📄 sparse-to-sparse copy_ is now supported. (#9005).
- 📄 torch.bincount now supports 0 element inputs. (#9757).
- 📄 torch.nn.functional.conv2d error message have been improved. (#11053).
- 👍 Allow conversion of np.int64 to PyTorch scalar. (#9225).
- 📄 torch.einsum now handles varargs.
  (#10067).
- 📄 torch.symeig now returns 0-filled eigenvectors when eigenvectors=False is passed on CUDA rather than uninitialized data. (#10645).
🗄 Deprecations

CPP Extensions
- 🗄 The torch/torch.h header is deprecated in favor of torch/extension.h, which should be used in all C++ extensions going forward. Including torch/torch.h from a C++ extension will produce a warning. It is safe to batch replace torch/torch.h with torch/extension.h.
- 🗄 Usage of the following functions in C++ extensions is also deprecated:
  - torch::set_requires_grad. Replacement: at::Tensor now has a set_requires_grad method.
  - torch::requires_grad. Replacement: at::Tensor now has a requires_grad method.
  - torch::getVariableType. Replacement: None.
torch.distributed
- 📦 the old (THD-backed) torch.distributed package is deprecated but still available at torch.distributed.deprecated.
- 🗄 The old (THD-backed) torch.nn.parallel.DistributedDataParallel is deprecated but still available at torch.nn.parallel.deprecated.DistributedDataParallel.
🐎 Performance
- 📄 torch.nn.functional.grid_sample on CPU now uses vectorized operation and is now 2x~7x faster with AVX2 enabled CPUs. (#9961).
- 📄 torch.norm has been vectorized and parallelized on CPU. (#11565).
- 📄 torch.max and torch.min has been parallelized on CPU. (#10343).
- torch.Tensor.masked_fill_ has been parallelized on CPU. (#11359).
- 📄 torch.nn.PReLU has been sped up on both CPU and GPU.
  (#11758).
- 📄 torch.nn.KLDivLoss has been sped up on both CPU and GPU. (#10336).
- 📄 torch.svd has been sped up on both CPU and GPU.
  (#11194).
- 📄 torch.einsum has been greatly sped up on CPU.
  (#11292).
- 📄 torch.clamp no longer does unnecessary copying. (#10352).
- 📄 torch.add, torch.sub, torch.mul, torch.div are much faster for non-contiguous tensors on GPU. (#8919).
- 📄 torch.nn.RNN and related Modules have been ported to C++ and are more performant. (#10305, #10481).
- 📄 Profiler now has lower overhead. (#10969, #11773).
📚 Documentation Improvements
- 📄 Reproducibility note added. (#11329).
- 📚 CPP Extensions have improved online documentation. Authors of C++ extensions may want to consult this documentation when writing new extensions.
- 📄 torch.Tensor.flatten is now documented.
  (#9876).
- 📄 torch.digamma is now documented. (#10967).
- 📄 torch.allclose is now documented. (#11185).
- 📄 torch.eig return format clarified. (#10315).
- 📄 torch.as_tensor now includes a proper example. (#10309).
- torch.sparse_coo_tensor now explains uncoalesced behavior. (#10308).
- 📄 torch.fft equation has been corrected. (#10760).
- 📄 torch.nn.LSTM behavior has been clarified in the multilayer case. (#11896).
- 📚 torch.nn.functional.dropout documentation has been clarified. (#10417).
- 📚 torch.nn.functional.pad documentation has been clarified. (#11623).
- Various mathematical formulas have been clarified. (#11106).
v1.0.rc0
October 02, 2018
v0.4.1 Changes
July 26, 2018
Table of Contents
- 💥 Breaking Changes
- 🆕 New Features
  - Neural Networks
  - Adaptive Softmax, Spectral Norm, etc.
  - Operators
  - torch.bincount, torch.as_tensor, ...
  - torch.distributions
  - Half Cauchy, Gamma Sampling, ...
  - Other
  - Automatic anomaly detection (detecting NaNs, etc.)
- 🐎 Performance
  - Faster CPU ops in a wide variety of cases
- Other improvements
- 🐛 Bug Fixes
- 📚 Documentation Improvements
💥 Breaking Changes
- 📄 torch.stft has changed its signature to be consistent with librosa #9497
  - Before: stft(signal, frame_length, hop, fft_size=None, normalized=False, onesided=True, window=None, pad_end=0)
  - After: stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)
  - torch.stft is also now using FFT internally and is much faster.
- 🚚 torch.slice is removed in favor of the tensor slicing notation #7924
- 0️⃣ torch.arange now does dtype inference: any floating-point argument is inferred to be the default dtype; all integer arguments are inferred to be int64. #7016
- 📄 torch.nn.functional.embedding_bag's old signature embedding_bag(weight, input, ...) is deprecated, embedding_bag(input, weight, ...) (consistent with torch.nn.functional.embedding) should be used instead
- 🗄 torch.nn.functional.sigmoid and torch.nn.functional.tanh are deprecated in favor of torch.sigmoid and torch.tanh#8748
- Broadcast behavior changed in an (very rare) edge case: [1] x [0] now broadcasts to [0] (used to be [1]) #9209
🆕 New Features

Neural Networks
🔊 Adaptive Softmax nn.AdaptiveLogSoftmaxWithLoss #5287
```
\>\>\> in\_features = 1000\>\>\> n\_classes = 200\>\>\> adaptive\_softmax = nn.AdaptiveLogSoftmaxWithLoss(in\_features, n\_classes, cutoffs=[20, 100, 150])\>\>\> adaptive\_softmax AdaptiveLogSoftmaxWithLoss( (head): Linear(in\_features=1000, out\_features=23, bias=False) (tail): ModuleList( (0): Sequential( (0): Linear(in\_features=1000, out\_features=250, bias=False) (1): Linear(in\_features=250, out\_features=80, bias=False) ) (1): Sequential( (0): Linear(in\_features=1000, out\_features=62, bias=False) (1): Linear(in\_features=62, out\_features=50, bias=False) ) (2): Sequential( (0): Linear(in\_features=1000, out\_features=15, bias=False) (1): Linear(in\_features=15, out\_features=50, bias=False) ) ) )\>\>\> batch = 15\>\>\> input = torch.randn(batch, in\_features)\>\>\> target = torch.randint(n\_classes, (batch,), dtype=torch.long)\>\>\> # get the log probabilities of target given input, and mean negative log probability loss\>\>\> adaptive\_softmax(input, target) ASMoutput(output=tensor([-6.8270, -7.9465, -7.3479, -6.8511, -7.5613, -7.1154, -2.9478, -6.9885, -7.7484, -7.9102, -7.1660, -8.2843, -7.7903, -8.4459, -7.2371], grad\_fn=\<ThAddBackward\>), loss=tensor(7.2112, grad\_fn=\<MeanBackward1\>))\>\>\> # get the log probabilities of all targets given input as a (batch x n\_classes) tensor\>\>\> adaptive\_softmax.log\_prob(input) tensor([[-2.6533, -3.3957, -2.7069, ..., -6.4749, -5.8867, -6.0611], [-3.4209, -3.2695, -2.9728, ..., -7.6664, -7.5946, -7.9606], [-3.6789, -3.6317, -3.2098, ..., -7.3722, -6.9006, -7.4314], ..., [-3.3150, -4.0957, -3.4335, ..., -7.9572, -8.4603, -8.2080], [-3.8726, -3.7905, -4.3262, ..., -8.0031, -7.8754, -8.7971], [-3.6082, -3.1969, -3.2719, ..., -6.9769, -6.3158, -7.0805]], grad\_fn=\<CopySlices\>)\>\>\> # predit: get the class that maximize log probaility for each input\>\>\> adaptive\_softmax.predict(input) tensor([8, 6, 6, 16, 14, 16, 16, 9, 4, 7, 5, 7, 8, 14, 3])
```
📄 Add spectral normalization nn.utils.spectral_norm #6929
```
\>\>\> # Usage is similar to weight\_norm\>\>\> convT = nn.ConvTranspose2d(3, 64, kernel\_size=3, pad=1)\>\>\> # Can specify number of power iterations applied each time, or use default (1)\>\>\> convT = nn.utils.spectral\_norm(convT, n\_power\_iterations=2)\>\>\>\>\>\> # apply to every conv and conv transpose module in a model\>\>\> def add\_sn(m): for name, c in m.named\_children(): m.add\_module(name, add\_sn(c)) if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)): return nn.utils.spectral\_norm(m) else: return m\>\>\> my\_model = add\_sn(my\_model)
```
📄 nn.ModuleDict and nn.ParameterDict containers #8463
Add nn.init.zeros_ and nn.init.ones_ #7488
➕ Add sparse gradient option to pretrained embedding #7492
➕ Add max pooling support to nn.EmbeddingBag #5725
👍 Depthwise convolution support for MKLDNN #8782
➕ Add nn.FeatureAlphaDropout (featurewise Alpha Dropout layer) #9073

Operators
📄 torch.bincount (count frequency of each value in an integral tensor) #6688
```
\>\>\> input = torch.randint(0, 8, (5,), dtype=torch.int64)\>\>\> weights = torch.linspace(0, 1, steps=5)\>\>\> input, weights (tensor([4, 3, 6, 3, 4]), tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])\>\>\> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1])\>\>\> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])
```
📄 torch.as_tensor (similar to torch.tensor but never copies unless necessary) #7109
```
\>\>\> tensor = torch.randn(3, device='cpu', dtype=torch.float32)\>\>\> torch.as\_tensor(tensor) # doesn't copy\>\>\> torch.as\_tensor(tensor, dtype=torch.float64) # copies due to incompatible dtype\>\>\> torch.as\_tensor(tensor, device='cuda') # copies due to incompatible device\>\>\> array = np.array([3, 4.5])\>\>\> torch.as\_tensor(array) # doesn't copy, sharing memory with the numpy array\>\>\> torch.as\_tensor(array, device='cuda') # copies due to incompatible device
```
📄 torch.randperm for CUDA tensors #7606
📄 nn.HardShrink for CUDA tensors #8117
📄 torch.flip (flips a tensor along specified dims) #7873
📄 torch.flatten (flattens a contiguous range of dims) #8578
📄 torch.pinverse (computes svd-based pseudo-inverse) #9052
📄 torch.meshgrid#8581
📄 torch.unique for CUDA tensors #8899
📄 torch.erfc (complementary error function) https://github.com/pytorch/pytorch/pull/9366/files
📄 torch.isinf and torch.isfinite#9169 #9487
📄 torch.reshape_as#9452
📄 Support backward for target tensor in torch.nn.functional.kl_div#7839
🔊 torch.logsumexp#7254
➕ Add batched linear solver to torch.gesv #6100
📄 torch.sum now supports summing over multiple dimensions https://github.com/pytorch/pytorch/pull/6152/files
📄 torch.diagonaltorch.diagflat to take arbitrary diagonals with numpy semantics #6718
📄 tensor.any and tensor.all on ByteTensor can now accept dim and keepdim arguments #4627

Distributions
- Half Cauchy and Half Normal #8411
- Gamma sampling for CUDA tensors #6855
- 👍 Allow vectorized counts in Binomial Distribution #6720
Misc
- 📄 Autograd automatic anomaly detection for NaN and errors occuring in backward. Two functions detect_anomaly and set_detect_anomaly are provided for this. #7677
- 👌 Support reversed(torch.Tensor) #9216
- 👌 Support hash(torch.device) #9246
- 👌 Support gzip in torch.load#6490
🐎 Performance
- Accelerate bernoulli number generation on CPU #7171
- Enable cuFFT plan caching (80% speed-up in certain cases) #8344
- 🛠 Fix unnecessary copying in bernoulli_ #8682
- 🛠 Fix unnecessary copying in broadcast #8222
- Speed-up multidim sum (2x~6x speed-up in certain cases) #8992
- Vectorize CPU sigmoid (>3x speed-up in most cases) #8612
- ⚡️ Optimize CPU nn.LeakyReLU and nn.PReLU (2x speed-up) #9206
- 🔊 Vectorize softmax and logsoftmax (4.5x speed-up on single core and 1.8x on 10 threads) #7375
- 📜 Speed up nn.init.sparse (10-20x speed-up) #6899
👌 Improvements

🖨 Tensor printing
- Tensor printing now includes requires_grad and grad_fn information #8211
- 👌 Improve number formatting in tensor print #7632
- 🛠 Fix scale when printing some tensors #7189
- 🖨 Speed up printing of large tensors #6876
Neural Networks
- NaN is now propagated through many activation functions #8033
- ➕ Add non_blocking option to nn.Module.to #7312
- Loss modules now allow target to require gradient #8460
- ➕ Add pos_weight argument to nn.BCEWithLogitsLoss #6856
- 👌 Support grad_clip for parameters on different devices #9302
- ✂ Removes the requirement that input sequences to pad_sequence have to be sorted #7928
- stride argument for max_unpool1d, max_unpool2d, max_unpool3d now defaults to kernel_size #7388
- Allowing calling grad mode context managers (e.g., torch.no_grad, torch.enable_grad) as decorators #7737
- ⏱ torch.optim.lr_scheduler._LRSchedulers __getstate__ include optimizer info #7757
- Add support for accepting Tensor as input in clip_grad_* functions #7769
- Return NaN in max_pool/adaptive_max_pool for NaN inputs #7670
- nn.EmbeddingBag can now handle empty bags in all modes #7389
- ⏱ torch.optim.lr_scheduler.ReduceLROnPlateau is now serializable #7201
- 👍 Allow only tensors of floating point dtype to require gradients #7034 and #7185
- 👍 Allow resetting of BatchNorm running stats and cumulative moving average #5766
- Set the gradient of LP-Pooling to zero if the sum of all input elements to the power of p is zero #6766
Operators
- ➕ Add ellipses ('...') and diagonals (e.g. 'ii→i') to torch.einsum#7173
- ➕ Add to method for PackedSequence #7319
- Add support for __floordiv__ and __rdiv__ for integral tensors #7245
- 📄 torch.clamp now has subgradient 1 at min and max #7049
- 💅 torch.arange now uses NumPy-style type inference: #7016
- 👌 Support infinity norm properly in torch.norm and torch.renorm#6969
- 👍 Allow passing an output tensor via out= keyword arugment in torch.dot and torch.matmul#6961
Distributions
- Always enable grad when calculating lazy_property #7708
📜 Sparse Tensor
- ➕ Add log1p for sparse tensor #8969
- 👍 Better support for adding zero-filled sparse tensors #7479
Data Parallel
- 👍 Allow modules that return scalars in nn.DataParallel #7973
- 👍 Allow nn.parallel.parallel_apply to take in a list/tuple of tensors #8047
Misc
- torch.Size can now accept PyTorch scalars #5676
- Move torch.utils.data.dataset.random_split to torch.utils.data.random_split, and torch.utils.data.dataset.Subset to torch.utils.data.Subset #7816
- ➕ Add serialization for torch.device #7713
- 👍 Allow copy.deepcopy of torch.(int/float/...)* dtype objects #7699
- 📄 torch.load can now take a torch.device as map location #7339
🐛 Bug Fixes
- 🛠 Fix nn.BCELoss sometimes returning negative results #8147
- 🛠 Fix tensor._indices on scalar sparse tensor giving wrong result #8197
- 🛠 Fix backward of tensor.as_strided not working properly when input has overlapping memory #8721
- 🛠 Fix x.pow(0) gradient when x contains 0 #8945
- 🛠 Fix CUDA torch.svd and torch.eig returning wrong results in certain cases #9082
- 🛠 Fix nn.MSELoss having low precision #9287
- 🛠 Fix segmentation fault when calling torch.Tensor.grad_fn #9292
- 🛠 Fix torch.topk returning wrong results when input isn't contiguous #9441
- 🛠 Fix segfault in convolution on CPU with large inputs / dilation #9274
- Fix avg_pool2/3d count_include_pad having default value False (should be True) #8645
- 🛠 Fix nn.EmbeddingBag's max_norm option #7959
- 🛠 Fix returning scalar input in Python autograd function #7934
- 🛠 Fix THCUNN SpatialDepthwiseConvolution assuming contiguity #7952
- 🛠 Fix bug in seeding random module in DataLoader #7886
- 📄 Don't modify variables in-place for torch.einsum#7765
- 👉 Make return uniform in lbfgs step #7586
- The return value of uniform.cdf() is now clamped to [0..1] #7538
- 🛠 Fix advanced indexing with negative indices #7345
- CUDAGenerator will not initialize on the current device anymore, which will avoid unnecessary memory allocation on GPU:0 #7392
- 🛠 Fix tensor.type(dtype) not preserving device #7474
- 👷 Batch sampler should return the same results when used alone or in dataloader with num_workers > 0 #7265
- 🛠 Fix broadcasting error in LogNormal, TransformedDistribution #7269
- 🛠 Fix torch.max and torch.min on CUDA in presence of NaN #7052
- 🛠 Fix torch.tensor device-type calculation when used with CUDA #6995
- 🛠 Fixed a missing '=' in nn.LPPoolNd repr function #9629
📚 Documentation
- 🔦 Expose and document torch.autograd.gradcheck and torch.autograd.gradgradcheck #8166
- ➕ Document tensor.scatter_add_ #9630
- 📄 Document variants of torch.add and tensor.add_, e.g. tensor.add(value=1, other) -> Tensor #9027
- 🔊 Document torch.logsumexp#8428
- Document torch.sparse_coo_tensor#8152
- 📄 Document torch.utils.data.dataset.random_split #7676
- 📄 Document torch.nn.GroupNorm #7086
- 📚 A lot of other various documentation improvements including RNNs, ConvTransposeNd, Fold/Unfold, Embedding/EmbeddingBag, Loss functions, etc.

Pytorch changelog

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Changelog History Page 4

v1.3.0.a0

v1.2.0 Changes

Highlights

[JIT] New TorchScript API

[JIT] Improved TorchScript Python language coverage

Expanded Onnx Export

Tensorboard is no Longer Considered Experimental

NN.Transformer

💥 Breaking Changes

Comparison operations (lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=) ) return dtype has changed from torch.uint8 to torch.bool (21113)

__invert__ / ~: now calls torch.bitwise_not instead of 1 - tensor and is supported for all integral+Boolean dtypes instead of only torch.uint8. (22326)

torch.tensor(bool) and torch.as_tensor(bool) now infer torch.bool dtype instead of torch.uint8. (19097)

nn.BatchNorm{1,2,3}D: gamma (weight) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)

🚚 A number of deprecated Linear Algebra operators have been removed (22841)

📜 Sparse Tensors: Changing the sparsity of a Tensor through .data is no longer supported. (17072)

📜 Sparse Tensors: in-place shape modifications of Dense Tensor Constructor Arguments will no longer modify the Sparse Tensor itself (20614)

📜 Sparse Tensors: Accumulating dense gradients into a sparse .grad will no longer retain Python object identity. (17072)

🔀 nn.utils.convert_sync_batchnorm has been replaced with nn.SyncBatchNorm.convert_sync_batchnorm(18787)

Error Checking: torch.addcmul and torch.lerp operators enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.

Error Checking: Improved Variable version tracking (20391, 22821, 21865)

[JIT] Python called from scripted modules must be @ignored

⚡️ [JIT] optimize for ScriptModules is now a context manager

[jit] script::Module is now a reference type

[C++ only] mean() / sum() / prod() APIs have changed slightly (21088)

Binary distribution and nightly changes

🆕 New Features

👍 Tensor Type Support

📦 NN Package

Operators

📦 Optim Package

📦 Distributed Package

IterableDataset

📦 Tensorboard Package

JIT Features

👌 Improvements

Distributed Improvements

Tensorboard Improvements

Numpy Compatibility Improvements

JIT Improvements

C++ API Improvements

MKLDNN Tensor Improvements

🐛 Bug Fixes

🛠 torch.nn Bug Fixes

🛠 Distributed Bug fixes

🛠 JIT Bug Fixes

🛠 C++ Frontend bug fixes

🗄 Deprecations

🗄 Masking via torch.uint8 Tensors is now deprecated in favor of masking via torch.bool Tensors.

🗄 Legacy autograd.Function (Function without static forward method) is now deprecated

🚀 torch.gels: has been renamed to torch.lstsq; torch.gels will work for this release but is now deprecated. (23460)

🐎 Performance

🐎 Torch.NN Performance Improvements

📚 Documentation

📚 Torch.NN Documentation

📚 Contributor Documentation

📚 Build Documentation

📚 TensorBoard Documentation

📚 Torch HUB Documentation

ONNX

👌 Supporting More ONNX Opsets

👍 Enhancing the Support for ScriptModule

Exporting More Torch Operators to ONNX

Extending Existing Exporting Logic

⚡️ Optimizing Exported ONNX Graph

🛠 Bugfixes/Improvements

v1.2.0.a0

v1.1.0 Changes

Highlights

TensorBoard (currently experimental)

[JIT] Attributes in ScriptModules

👍 [JIT] Dictionary and List Support in TorchScript

[JIT] User-defined classes in TorchScript (experimental)

DistributedDataParallel new functionality and tutorials

💥 Breaking Changes

🆕 New Features

Operators

NN

Changelog History

Page 4

Comparison operations (`lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=)` ) return dtype has changed from `torch.uint8` to `torch.bool` (21113)

`invert` / `~`: now calls `torch.bitwise_not` instead of `1 - tensor` and is supported for all integral+Boolean dtypes instead of only `torch.uint8`. (22326)

`torch.tensor(bool)` and `torch.as_tensor(bool)` now infer `torch.bool` dtype instead of `torch.uint8`. (19097)

`nn.BatchNorm{1,2,3}D`: gamma (`weight`) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)

📜 Sparse Tensors: Changing the sparsity of a Tensor through `.data` is no longer supported. (17072)

📜 Sparse Tensors: Accumulating dense gradients into a sparse `.grad` will no longer retain Python object identity. (17072)

🔀 `nn.utils.convert_sync_batchnorm` has been replaced with `nn.SyncBatchNorm.convert_sync_batchnorm`(18787)

Error Checking: `torch.addcmul` and `torch.lerp` operators enforce stronger shape requirements on the output tensor (`out=` keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.

[JIT] Python called from scripted modules must be `@ignore`d

⚡️ [JIT] `optimize` for ScriptModules is now a context manager

[jit] `script::Module` is now a reference type

🗄 Masking via `torch.uint8` Tensors is now deprecated in favor of masking via `torch.bool` Tensors.

🗄 Legacy `autograd.Function` (Function without static forward method) is now deprecated

🚀 `torch.gels`: has been renamed to `torch.lstsq`; `torch.gels` will work for this release but is now deprecated. (23460)