Changelog History
Page 4
-
v1.3.0.a0
August 07, 2019 -
v1.2.0 Changes
August 08, 2019π We have just released PyTorch v1.2.0.
π It has over 1,900 commits and contains a significant amount of effort in areas spanning JIT, ONNX, Distributed, as well as Performance and Eager Frontend Improvements.
Highlights
[JIT] New TorchScript API
π Version 1.2 includes a new, easier-to-use API for converting
nn.Module
s intoScriptModule
s. A sample usage is:class MyModule(torch.nn.Module): ... # Construct an nn.Module instance module = MyModule(args) # Pass it to `torch.jit.script` to compile it into a ScriptModule. my_torchscript_module = torch.jit.script(module)
π
torch.jit.script()
will attempt to recursively compile the givennn.Module
, including any submodules or methods called fromforward()
. See the migration guide for more info on what's changed and how to migrate.[JIT] Improved TorchScript Python language coverage
π In 1.2, TorchScript has significantly improved its support for Python language constructs and Python's standard library. Highlights include:
- Early returns, breaks and continues.
- Iterator-based constructs, like
for..in
loops,zip()
, andenumerate()
. NamedTuples
.- π
math
andstring
library support. - π Support for most Python builtin functions.
π See the detailed notes below for more information.
Expanded Onnx Export
β In PyTorch 1.2, working with Microsoft, weβve added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). Weβve have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:
- π Support for multiple Opsets including the ability to export dropout, slice, flip and interpolate in Opset 10.
- π Improvements to ScriptModule including support for multiple outputs, tensor factories and tuples as inputs and outputs.
- π More than a dozen additional PyTorch operators supported including the ability to export a custom operator.
Updated docs can be found here and also a refreshed tutorial using ONNXRuntime can be found here.
Tensorboard is no Longer Considered Experimental
Read the documentation or simply type
from
torch.utils.tensorboard
import
SummaryWriter
to get started!NN.Transformer
We include a standard nn.Transformer module, based on the paper βAttention is All You Needβ. The
nn.Transformer
module relies entirely on an attention mechanism to draw global dependencies between input and output. The individual components of thenn.Transformer
module are designed so they can be adopted independently. For example, the nn.TransformerEncoder can be used by itself, without the largernn.Transformer
. New APIs include:nn.Transformer
nn.TransformerEncoder
andnn.TransformerEncoderLayer
nn.TransformerDecoder
andnn.TransformerDecoderLayer
π See the Transformer Layers documentation for more info.
π₯ Breaking Changes
Comparison operations (
lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=)
) return dtype has changed fromtorch.uint8
totorch.bool
(21113)π Version 1.1:
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]) tensor([1, 0, 0], dtype=torch.uint8)
π Version 1.2:
>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]) tensor([True, False, False])
For most programs, we don't expect that any changes will need to be made as a result of this change. There are a couple of possible exceptions listed below.
Mask Inversion
π In prior versions of PyTorch, the idiomatic way to invert a mask was to call
1 - mask
. This behavior is no longer supported; use the~
orbitwise_not()
operator instead.π Version 1.1:
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])) tensor([0, 1, 1], dtype=torch.uint8)
π Version 1.2:
>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])) RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead. >>> ~(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])) tensor([False, True, True])
sum(Tensor) (python built-in) does not upcast
dtype
liketorch.sum
Python's built-in
sum
returns results in the samedtype
as the tensor itself, so it will not return the expected result if the value of the sum cannot be represented in thedtype
of the tensor.π Version 1.1:
# value can be represented in result dtype >>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2) tensor(3, dtype=torch.uint8) # value can NOT be represented in result dtype >>> sum(torch.ones((300,)) > 0) tensor(44, dtype=torch.uint8) # torch.sum properly upcasts result dtype >>> torch.sum(torch.ones((300,)) > 0) tensor(300)
π Version 1.2:
# value cannot be represented in result dtype (now torch.bool) >>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2) tensor(True) # value cannot be represented in result dtype >>> sum(torch.ones((300,)) > 0) tensor(True) # torch.sum properly upcasts result dtype >>> torch.sum(torch.ones((300,)) > 0) tensor(300)
TLDR : use
torch.sum
instead of the built-insum
. Note that the built-insum()
behavior will more closely resembletorch.sum
in the next release.π Note also that masking via
torch.uint8
Tensors is now deprecated, see the Deprecations section for more information.__invert__
/~
: now callstorch.bitwise_not
instead of1 - tensor
and is supported for all integral+Boolean dtypes instead of onlytorch.uint8
. (22326)π Version 1.1:
>>> ~torch.arange(8, dtype=torch.uint8) tensor([1, 0, 255, 254, 253, 252, 251, 250], dtype=torch.uint8)
π Version 1.2:
>>> ~torch.arange(8, dtype=torch.uint8) tensor([255, 254, 253, 252, 251, 250, 249, 248], dtype=torch.uint8)
torch.tensor(bool)
andtorch.as_tensor(bool)
now infertorch.bool
dtype instead oftorch.uint8
. (19097)π Version 1.1:
>>> torch.tensor([True, False]) tensor([1, 0], dtype=torch.uint8)
π Version 1.2:
>>> torch.tensor([True, False]) tensor([True, False])
nn.BatchNorm{1,2,3}D
: gamma (weight
) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)π Version 1.1:
>>> torch.nn.BatchNorm2d(5).weight Parameter containing: tensor([0.1635, 0.7512, 0.4130, 0.6875, 0.5496], requires_grad=True)
π Version 1.2:
>>> torch.nn.BatchNorm2d(5).weight Parameter containing: tensor([1., 1., 1., 1., 1.], requires_grad=True)
π A number of deprecated Linear Algebra operators have been removed (22841)
π | Removed | Use Instead | | --- | --- | |
btrifact
|lu
| |btrifact_with_info
|lu
withget_infos=True
| |btrisolve
|lu_solve
| |btriunpack
|lu_unpack
| |gesv
|solve
| |pstrf
|cholesky
| |potrf
|cholesky
| |potri
|cholesky_inverse
| |potrs
|cholesky_solve
| |trtrs
|triangular_solve
|π Sparse Tensors: Changing the sparsity of a Tensor through
.data
is no longer supported. (17072)>>> x = torch.randn(2,3) >>> x.data = torch.sparse_coo_tensor((2, 3)) RuntimeError: Attempted to call `variable.set_data(tensor)`, but `variable` and `tensor` have incompatible tensor type.
π Sparse Tensors: in-place shape modifications of Dense Tensor Constructor Arguments will no longer modify the Sparse Tensor itself (20614)
π Version 1.1:
>>> i = torch.tensor([[0, 1]]) >>> v = torch.ones(2) >>> s = torch.sparse_coo_tensor(i, v) >>> i.resize_(1, 1) >>> v.resize_(1) >>> s.coalesce().indices().shape torch.Size([1, 1]) >>> s.coalesce().values().shape torch.Size([1])
π Notice
indices()
andvalues()
reflect the resized tensor shapes.π Version 1.2:
>>> i = torch.tensor([[0, 1]]) >>> v = torch.ones(2) >>> s = torch.sparse_coo_tensor(i, v) >>> i.resize_(1, 1) >>> v.resize_(1) >>> s.coalesce().indices().shape torch.Size([1, 2]) >>> s.coalesce().values().shape torch.Size([2])
π Notice
indices()
andvalues()
reflect the original tensor shapes.π Sparse Tensors: Accumulating dense gradients into a sparse
.grad
will no longer retain Python object identity. (17072)π Version 1.1:
>>> m = torch.nn.Embedding(10, 3, sparse=True) >>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward() >>> assert m.weight.grad.layout == torch.sparse_coo >>> m_weight_grad_saved = m.weight.grad # accumulate dense gradient into sparse .grad, change sparsity >>> m.weight.sum().backward() >>> assert m.weight.grad.layout == torch.strided # m_weight_grad_saved still refers to the .grad of m's weight # even though the sparsity has changed >>> assert id(m_weight_grad_saved) == id (m.weight.grad)
π Version 1.2:
>>> m = torch.nn.Embedding(10, 3, sparse=True) >>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward() >>> assert m.weight.grad.layout == torch.sparse_coo >>> m_weight_grad_saved = m.weight.grad # accumulate dense gradient into sparse .grad, change sparsity >>> m.weight.sum().backward() >>> assert m.weight.grad.layout == torch.strided # m_weight_grad_saved NO LONGER refers to the .grad of m's weight >>> assert id(m_weight_grad_saved) == id (m.weight.grad) AssertionError
π
nn.utils.convert_sync_batchnorm
has been replaced withnn.SyncBatchNorm.convert_sync_batchnorm
(18787)Example of new usage:
>>> # Network with nn.BatchNorm layer >>> module = torch.nn.Sequential( >>> torch.nn.Linear(20, 100), >>> torch.nn.BatchNorm1d(100) >>> ).cuda() >>> # creating process group (optional) >>> process_group = torch.distributed.new_group(process_ids) >>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)
Error Checking:
torch.addcmul
andtorch.lerp
operators enforce stronger shape requirements on the output tensor (out=
keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.π Version 1.1:
>>> x=torch.zeros(1) >>> torch.addcmul(x, x, torch.zeros(2,3), out=x) tensor([[0., 0., 0.], [0., 0., 0.]])
π Version 1.2:
>>> x=torch.zeros(1) >>> torch.addcmul(x, x, torch.zeros(2,3), out=x) RuntimeError: output with shape [1] doesn't match the broadcast shape [2, 3]
If you run into this error, please ensure the
out
parameter is of the correct output shape (post-broadcasting).Error Checking: Improved Variable version tracking (20391, 22821, 21865)
β‘οΈ PyTorchβs autograd system uses a version tracking mechanism to ensure that Tensors that are saved for backwards computations retain their correct values when the backward pass is computed (i.e. that they havenβt been updated in-place since they were saved). See In Place Correctness Checks in the docs for more information.
In PyTorch 1.2 we have enhanced the version tracking in a number of cases, which may flag issues that were not caught previously. There is now additional tracking through the
Variable()
constructor, thenn.Parameter()
constructor, after setting.data
, and viann.Module._apply
(internal API).Track changes through Variable constructor:
>>> x = torch.ones(1, requires_grad=True)+1 >>> y = x*x # do an in-place update through Variable constructor >>> torch.autograd.Variable(x).add_(1) >>> y.backward() RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 instead.
Track changes on an nn.Parameter:
>>> x = torch.ones(1) >>> p = torch.nn.Parameter(x) >>> y = p * p # do an in-place update on a saved Parameter >>> x.add_(1) >>> y.sum().backward() RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 instead.
Track changes after setting
.data
:>>> x = torch.zeros(1, requires_grad=True)+1 >>> y = x * x >>> x.data = torch.zeros(1, requires_grad=True)+1 >>> x.add_(1) >>> y.backward() RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]], which is output 0 of AddBackward0, is at version 1; expected version 0 instead.
[JIT] Python called from scripted modules must be
@ignore
dπ
torch.jit.script
now recursively compiles everything it finds in the original function, so if you had Python functions called from in your scripted function or module, you must now explicitly@ignore
it. See the new API guide for more details.π Version 1.1
def my_unscriptable_python_fn(): # weird stuff @torch.jit.script def fn(): # This gets inserted as a Python call, and only errors on `save()`. my_unscriptable_python_fn()
π Version 1.2
@torch.jit.ignore # this needs to be added ... def my_unscriptable_python_fn(): ... @torch.jit.script def fn(): # ... or else recursive compilation will attempt to compile this call my_unscriptable_python_fn()
NOTE: This is also a change to behavior of the
@torch.jit.ignore
decorator. In version 1.1,@ignore
tells the compiler to omit compiling a function entirely, to mark Python functions that you know will not be called after export. In version 1.2@ignore
, tells the compiler to insert a call back to the Python interpreter instead of trying to compile the function.To get the old behavior, use
@torch.jit.ignore(drop_on_export=True)
(@torch.jit.ignore
with no arguments is equivalent to@torch.jit.ignore(drop_on_export=False
)).β‘οΈ [JIT]
optimize
for ScriptModules is now a context managerπ Whether optimization passes are run is now a thread-local flag. This better reflects how optimization actually happens in the JIT (i.e. it is decided at runtime, not compilation time).
π Version 1.1
@torch.jit.script(optimize=False) def fn(inputs): ... fn(inputs)
π Version 1.2
@torch.jit.script def fn(inputs): ... with @torch.jit.optimized_execution(False): fn(inputs)
[jit]
script::Module
is now a reference typeβ‘οΈ To better align with the PyTorch C++ API philosophy,
script::Module
andscript::Method
are now reference types. Our APIs have been updated to usescript::Module
instead ofstd::shared_ptr<script::Module>
.π Version 1.1
using torch::jit::script::Module; std::shared_ptr<Module> m = torch::jit::load("my_model.py"); m->forward(...);
π Version 1.2
using torch::jit::script::Module; Module m = torch::jit::load("my_model.py"); m.forward(...);
[C++ only] mean() / sum() / prod() APIs have changed slightly (21088)
π Version 1.1 API:
Tensor sum(IntArrayRef dim, bool keepdim=false) const; Tensor sum(IntArrayRef dim, ScalarType dtype) const;
π Version 1.2 API:
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
that is, to override
dtype
,keepdim
must now be provided.Binary distribution and nightly changes
β‘οΈ We have streamlined our conda and wheel binary distributions, so that it is easier than ever to install the version of PyTorch appropriate for your needs. The install instructions on https://pytorch.org/ have been updated, but if you have tooling to download and install PyTorch, here is a detailed description of the changes we made:
Wheels now have local version identifiers. Wheels that are for non-default CUDA configurations (the default CUDA version for this release is 10.0) now have local version identifiers like +cpu and +cu92. This means that, when installing, it is no longer necessary to specify a full wheel URLβjust specify an appropriate version constraint like
torch==1.2.0+cu92
.π Version 1.1 (for Python 3.7 on Linux only):
pip install numpy pip install https://download.pytorch.org/whl/cpu/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
π Version 1.2 (works for all versions of Python, and both Linux and Mac):
pip install torch==1.2.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
CPU-only binaries on conda can be selected with the cpuonly feature. Weβve eliminated the pytorch-cpu conda package; instead, the cpu-only conda package can be enabled by installing the cpuonly metapackage. Similarly, there is no longer both a torchvision and torchvision-cpu package; the feature will ensure that the CPU version of torchvision is selected.
π Version 1.1:
conda install -c pytorch pytorch-cpu
π Version 1.2:
conda install -c pytorch pytorch cpuonly
Conda nightlies now live in the pytorch-nightly channel and no longer have β-nightlyβ in their name. We have added a new dedicated channel for nightlies called pytorch-nightly; all nightlies (pytorch, torchvision, torchaudio, etc.) will now be uploaded to this channel, but with the same name as their corresponding stable versions (unlike before, when we had a separate pytorch-nightly, torchvision-nightly, etc. packages.) This makes it more difficult to accidentally install a copy of the nightly and stable at the same time.
π Version 1.1:
conda install -c pytorch pytorch-nightly
π Version 1.2:
conda install -c pytorch-nightly pytorch
Wheel nightlies no longer have -nightly in their name. Similar to the changes we made in Conda, we no longer suffix wheel nightlies with β-nightlyβ, to make it harder to accidentally install a copy of nightly and stable at the same time.
π Version 1.1:
pip install --pre torch_nightly -f https://download.pytorch.org/whl/nightly/torch_nightly.html
π Version 1.2:
pip install --pre torch -f https://download.pytorch.org/whl/nightly/torch_nightly.html
π New Features
π Tensor Type Support
- π₯
torch.bool
: added support for many operators (masking, comparison, arithmetic operators) to achieve feature parity withtorch.uint8
. See the Breaking Changes section for details about how this could affect existing programs. (21032, etc.) - π
torch.sparse.HalfTensor
: Added support fortorch.float16
sparse Tensors on both CPU and CUDA. (19695) torch.bfloat16
: Added basic creation and serialization support for Brain Floating Point Tensors. (21522, 21523, 21860, 22852)
π¦ NN Package
nn.Transformer
: added implementation of Transformer from Attention is All You Need. (20170, 22588)- π
nn.Embedding
: supportfloat16
embeddings on CUDA. (19695) nn.Flatten
: added a Module that performstorch.flatten
. (22245)- π
nn.functional.gelu
: Added support for Gaussian Error Linear Units. (20665, 21237) nn.Module hooks
: add ability to replace input/output viaforward_pre_hook
andforward_hook
. (22285)nn.Module
: addrequires_grad_()
method for turning on/offrequires_grad
for Module parameters. (22576)
Operators
- π
Tensor.to_sparse
: now supports autograd. (20458) Tensor.fill_diagonal_
: operator to fill the main diagonal of a Tensor. (21892)- π
torch.qr
: supports autograd. (21274) torch.bitwise_not
: add operator for boolean/integer types. Also have python~
operator use this. (22283, 22320)- π
torch.trapz
: integrate using the trapezoid rule; equivalent to numpy.trapz. (21610) torch.var_mean
/torch.std_mean
: compute variance and mean at the same time.(18731)torch.utils.ThroughputBenchmark
: benchmark utility for measuring the throughput of PyTorch operators. (20766).- π²
Logging
: lightweight at-most-once logging to record operators that are used (c10::Logging
). (20745)
π¦ Optim Package
- β‘οΈ
optim.AdamW
: introduce AdamW optimizer from Decoupled Weight Decay Regularization. (21250) - π
optim.LBFGS
: added support for strong Wolfe line search. (8824)
π¦ Distributed Package
- π
DistributedDataParallel
: support CPU modules. (20236) - π
DistributedDataParallel
: support sparse tensors. (19146) - π
DistributedDataParallel
: support local gradient accumulation. (21736)
IterableDataset
IterableDataset
: introduces a new type of Dataset designed for data read from a stream. (19228)
π¦ Tensorboard Package
- π TensorBoard support in PyTorch has improved and is no longer experimental!
- π
SummaryWriter.flush
: now supported. (20607) - π
SummaryWriter.add_mesh
: add support for 3D point clouds. (20413)
JIT Features
- π Improved support for iterator infrastructure. TorchScript now supports looping through a
List
,Tuple
,Dict
,Tensor
,String
and you can also usezip()
,enumerate()
, andfor...in
. (21801, 22006, 21990, 21985) - π Support
in
membership checks. (21527) - π Improved support for strings and the string libraries. (20826, 20188, 20761, 21656, 20617)
- π Improved
math
support. (20979, 19707, 21151, 21131, 21129, 21130, 21512, 21126, 21127, 21128) - π Support for various other Python builtin functions. (21451)
- π Support for
NamedTuple
. (21428) - All the rest of the
dict
methods. (21979) sorted()
keyword for lists and dicts. (23274)- β Add support for breaks and continues. (21692)
- π Improved custom operator API with several bugfixes and new features. It now allows more primitive types, supports
torch::List
,torch::Dict
andtorch::Optional
, supports dispatch (i.e. registering a different function for CPU and CUDA for the same operator). - π Support
nn.GRU
in script. (23266) - Support
pack_padded_sequence
andpad_packed_sequence
. (23249) - Support
torch._C._get_tracing_state
in TorchScript. (23248) - π Support
torch.as_tensor
in TorchScript. (23247) - β add support for recursive compilation on
Modules
. (20708) - β add
all
builtin. (20521) - Add
Final[T]
annotated members to__constants__
. (21603) - β Add
save()
to scriptedFunction
s. (20386) - π Support for serializing class attributes. (22953)
- π Support for class annotations. (21379)
- π support Python 3.8
Constant
node. (22007) - π Support for type annotations instead of
torch.jit.annotate()
. (21390) - π Support operator overloading for user-defined classes. (20033)
- π Support recursive
ModuleList
/Sequential
. (21306) - Trace multiple methods in a single
Module
. (19905)
π Improvements
Tensor.pin_memory()
: only ask for context on current device. (22229)Tensor.view()
: suggest usingreshape()
instead ofcontiguous()
when the input is non-contiguous. (20968)- π
Tensor.numpy()
: throwTypeError
instead ofValueError
if the type isnβt supported. (21608) - π
torch.norm
: add support forp="nuc"
withdim
specified. (21022) - π
torch.qr
: support batching of input matrices. (20689) - π
torch.qr
: supportsome
parameter akin to NumPy'smode
option. (20689) - π
torch.det
/torch.logdet
/torch.slogdet
: added batching support. (22909) - π
torch.cdist
: support batching. (20934) - π
torch.symeig
: support batching. (21858) torch._dirichlet_grad
: support CUDA. (21191)- π
torch.randperm
: supporttorch.float16
. (22102) torch.Size
is now pickle-able in Python2. (20952)- π
torch.tensor
/torch.as_tensor
: infer device if input supports Numbaβs__cuda_array_interface__
. (20584) torch.isinf
/torch.isfinite
: throwTypeError
instead ofValueError
when a non-tensor is passed in. (20817)- π
nn.MultiheadedAttention
: add functional support. (20415) - π
nn.MultiheadedAttention
: added support for key/value to have different number of features. (21288) nn.MultiheadAttention
: allow static key/values. (21288)- π
nn.Conv{1,2,3}D
: supporttorch.int64
dtype in forward. (20730, 22594) - π
nn.AvgPool{1,2,3}D
: supporttorch.int64
dtype in forward. (22433) - πΎ
nn.Module
: make_save_to_state_dict
overrideable. (21933) autograd
: Checkpointing of modules inside large fanout networks no longer hits a recursion error. (22397)autograd
: Track in-pace changes of Tensors throughModule._apply
(internal API). (21865)- π
autograd.profiler
: Add shape aggregation support. 20035) autograd.profiler
: Profile custom c10 ops. (20175)- π
DataLoader
: support settingbatch_size=0
to disable automatic batching (collation) inDataLoader
for easier bulk loading. (19228) DataLoader
: addmultiprocessing_context
parameter. (22990)DataLoader
: added error detection forworker_init_fn
. (20150)DataLoader
: Retry onEINTR
. (21723)torch.cuda.set_rng_state
/torch.cuda.get_rng_state
: accept string asdevice
parameter. (23448)- β
CUDA
: add warning when using Turing GPUs and CUDA <= 9000. (21468) CUDA
: warn on conditions that can trigger a cuBLAS 9.0 bug. (22034)CPU
: Improve CPUAllocator OOM message. (20618)- π
[memory_format]
: added support fortorch.empty
,torch.empty_like
,Tensor.contiguous()
,Tensor.is_contiguous()
to specify / check the order in which dimensions are laid out in memory. (20455, 20558) distributions.MultivariateNormal
: fix precision matrix instability. (21366)distributions.transforms.SigmoidTransform
: fix numerical instability. (19802)
Distributed Improvements
- π
DistributedDataParallel
: Support DDP forward/backward calls even if no module parameter is used. (19821) DistributedDataParallel
: Only call into reducer if grad is enabled. (19897)- π
DistributedDataParallel
: Require finalize DDP backward only when there are indeed gradients computed, this allows application to completely discard DDP outputs and move on to the next iteration. (19901) DistributedDataParallel
: Improve DDP backward reduction error messages. (20586)DistributedDataParallel
: make DDP failure recoverable. (21591)DistributedDataParallel
: Delay reduction of unused parameters until first autograd hook is called. (22219)- π
c10d:
support tensors shared across processes. (21449) c10d:
ProcessGroupMPI
Add device guard around MPI operations. (22446)utils.data.distributed.DistributedSampler
: Make shuffling optional. (22479)
Tensorboard Improvements
- π Usage of kwarg-only arguments has been removed. (21786)
Numpy Compatibility Improvements
- π
Tensor.T:
added numpy-like support for reversing dimensions. (20598) Tensor.ndim
: NumPy equivalent property for the number of dimensions. (20565)- 0οΈβ£
Tensor.nonzero
: addedas_tuple
argument (defaultFalse
) that whenTrue
, will return a tuple of Tensors, which matches the behavior of numpy.nonzero. (20293) - π
torch.dtype
: support passing in NumPy dtypes as arguments. (21215) torch.normal
: addsize
parameter when called with two floats. (20545)torch.where
: add one-argument overload that is an alias for Numpy-likenonzero
. (21986)- π support a number of argument name overrides, e.g.
axis
instead ofdim
. (20451)
JIT Improvements
- π¨ The original source code debug information is now saved with the model. If a model is saved and then loaded into another process, the loaded process can now print out error messages that point to the original source code. (22177, 22178, 22179, 22180)
- Error message source range highlighting now includes filename, line number, and column number. (21157)
- π Better Constant Propagation through Tuples. (22561)
- β Add
start
andstep
parameters forrange
in TorchScript. (20795) - Support for threading options for TorchScript inference (doc)
- β Add
max_pool2d
to symbolic derivatives. (19661) - β‘οΈ Optimize
matmul
memory usage for certain cases. (23433) - Avoid kernel launches for zero-sized tensor inputs. (22790)
- β Add support for steps (strides) in tensor slices. (20929)
- Added error for classes that don't have an
__init__
function. (21880) - π Allow classes to be used in their own methods. (20106)
- π Better error message when a variable is conditionally defined. (20911)
- Consider contained types in alias analysis. (21431)
- Convenience APIs for script objects. (20226)
- π¨ Don't print backtrace for interpreter errors. (20925)
- π Improve error msg for missing attribute. (20779)
- π Improve error msg on inferred type. (21058)
- π Improve error msg on recursive class defs. (21842)
- Include module names in recursive error stacks. (22921)
- π Improve recursive scripting error message. (21841)
- Index into a tuple with non constant integer. (20081)
- Let
ScriptModule
buffer attributes can also cast device/type. (19700) - Lower batchmm to non-diff optimization. (19987)
- π Make
ScriptModule.training
an attribute instead of a parameter. (21078) - π Make
strtod_c
compatible with different gcc abi. (21293) - π make magic methods work with casts too. (20654)
- π Improve performance of alias analysis. (20899)
- β Print a warning if a type annotation prefix is invalid according to mypy. (20884)
- schema_matching.cpp: improve error messages. (21141)
- Resolve with closed over variables instead of stack frame. (22270)
- Report errors through call stack. (22280)
- β¬οΈ Reduce number of stack manipulation instructions in interpreter. (21240)
C++ API Improvements
- π
nn::PoissonNLLLoss
: Added support. (19316) nn::Module
: addedreplace_module
API to overwrite submodules in C++ Frontend. (22546)nn:Module::register_module
/register_parameter
/register_buffer
: make public (23196)data::datasets::ChunkDataReader
: fix include headers and a vector issue. (19485)data::datasets::ChunkDataset
: add newget_batch
method. (21797)- π
data::datasets::ChunkDataset
: add checkpoint support. (21889) - π
data::datasets::ChunkDataset
: add support for cross-chunk shuffling. (22347) data::datasets::ChunkDataset
: add sorting policy. (23053)
MKLDNN Tensor Improvements
β Add support for a number of operators on MKLDNN Tensors including:
Tensor.is_mkldnn
: (22386)Tensor.transpose()
: (21943)Tensor.zero_()
: (20573)torch.empty
: (21184)torch.mul
: (20575)nn.AdaptiveAvgPool{1,2,3}D
: (19818)nn.Sigmoid
: (20820)nn.Softmax
: (21516)- π
nn.Module
: support saving/loading MKLDNN modules. (20799) - π
nn.MaxPool{1,2,3}D
: supportceil_mode
. (21310)
π Bug Fixes
- Indexing: fix advanced indexing where there are more than (231)-1 bytes in the output. (20919)
- Indexing: fix indexing when there are more than 65535 elements in a non-indexing first dimension on CUDA. (23123)
- Indexing: fix issue with slicing empty tensors. (20914)
Tensor.index_copy_:
fix segfault by properly checking dimension is in range. (21617)Tensor.copy_
: Fix a bug where non-blocking was not being respected. (20305)- π―
Tensor.clone
: Fix an issue with MKLDNN tensors. (20943) - Tensor subclassing: give a proper error instead of crashing. (20283)
torch.cat
: Fix segfault with tensors that can't be indexed with 32-bit ints. (21530)- π
torch.range
/torch.linspace
/torch.logspace
: properly respect the currentStream
. (21619) torch.lu
: return the identity permutation instead of zeros when not using pivoting. (22242)torch.einsum
: Fix an issue where the backward pass would potentially be skipped. (22111)torch.cosh
: Fix an issue wheretorch.cos
was instead calculated withtorch.double
dtype and vectorized instructions. (20797)torch.triu
/torch.tril
: handle strides correctly for in-place versions. (22730).torch.triu
/torch.tril
: Fix handling of batches > 65535 on CUDA. (21067)torch.inverse
/torch.solve
/torch.cholesky_solve
/torch.triangular_solve
: Fix batch sizes > 65535 on CUDA. (21689)torch.histc
: returndtype
is now the same as the input tensor on CUDA, matching CPU behavior. (20369)torch.histc
: properly return 1-dim tensor on CPU with 0-dim input and 1 bin. (21497)torch.randperm
: handle non-contiguousout
parameter. (23043)torch.unique
: Fix empty tensor handling whendim
is passed as an argument. (19000)torch.min
/torch.max
: properly error on empty tensor inputs, as with CPU tensors. (19612).CUDA
: fix launch parameters for reductions. (22827).torch.hub
: fix an issue withfind_module
. (20782)autograd
: Fix a number of custom autogradFunction
corner cases by inverting the relationship between PyFunction and THPFunction. (22983)autograd
: give βTrying to backward through the graph a second time" error instead of internal assert when the buffers are a list of Tensors (with indexing). (21533)- β±
optim.lr_scheduler.CosineAnnealingLR
: rename from CosineAnnealingLr. (23242) distributions.Binomial
: Fix overflow oflog_prob
whenlogits
is large. (20679)distributions.SigmoidTransform
: Fix numerical issues that could result ininf
/-inf
return values. (20288)distributions.Categorical.sample
: fix a view bug. (23328)CUDA
: Give proper error message for bad cuda forks. (23322)pickle
: Fix Unpickling error when loading multiple objects from a file. (20270)NCCL
: Fix race condition. (23040)
π torch.nn Bug Fixes
nn.Conv{1,2,3}D
: fix memory leak on MKLDNN code path. (22392)nn.Conv{1,2,3}D
: properly unpickle older pickled versions. (21687)nn.CTCLoss
: fix backward on CUDA when 2d target tensor is larger thanmax_target_length
. (20971)nn.CTCLoss
: fix some numerical stability issues. (21392)nn.CTCLoss
: disable buggy non-deterministic CudNN algorithm. (22977)- π
nn.CTCLoss
: fixed empty target handling. (21910, 23298) - π
nn.SyncBatchNorm
: fix syncing of running statistics when count size differs between GPUs. (22248) - π
nn.SyncBatchNorm
: retainrequires_grad
value when converting fromnn.BatchNorm
. (22569) nn.SyncBatchNorm
: correctly handleprocess_group
inconvert_sync_batchnorm
. (19240)nn.MultiheadedAttention
: fix fortorch.float16
dtype. (21658).nn.EmbeddingBag
: fix NaN output when input is empty. (21400)nn.Dropout
: fix python crash (with SIGFPE) when called on an empty cuda tensor. (20541)nn.MaxPool
: fix output size calculation in some corner cases. (22304)nn.MaxPool
: return valid indices if all entries are-inf
. (23161)nn.Softmax
: respect the current Stream. (22470)- π
nn.LogSoftmax
: fix numerical stability issues. (21672) nn.Module.load_state_dict
: break ref cycle. (20397)nn.Module
: fix loading in 32-bit environments. (20900)nn.utils.rnn.pack_padded_sequence
: Fix segfault on empty tensors. (21461)nn.utils.spectral_norm
: fix loadingstate_dict
whenstrict=False
. (22545)- π
CudNN
: Fix uninitialized PoolWindow on Windows. (22405)
π Distributed Bug fixes
nn.parallel.DataParallel
: fix error inno_grad
mode. (21262)torch.distributed.all_gather
: fix errors for views and aliases. (21490)c10d
: fix collective communication errors on empty tensors. (20658)
π JIT Bug Fixes
- π Fix specialized list from dict keys. (23267)
- Switch keys to be sequential and stable in pickle serialization. (23280)
deepCopy
also copies type information of lists, (23271)dictKeys
anddictItems
ops on typed dicts return typed lists. (23270)- π Fix pickler bug where it would not load if no tensors were saved. (23263)
- Avoid multiple writes to files on export. (21186)
- π Better error msg for mismatched
dict
key type. (22231) - Better error msg for using Python
builtin_function_or_method
. (22935) - Better error msg in
__get_state__
to let a user know that ScriptModules can't be deep-copied at the moment.(20885) - π Better error msg when seeing a unsupported builtin function. (21068)
dropout
derivative should respect thetrain
flag. (20760)- Fix
__constants__
for some nn modules. (21071) - Fix
ScriptModule. __dir__ ()
. (22426) - π Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias. (21425)
- π Fix a bug in loop unrolling. (21239)
- π Fix alias annotations for dict ops. (22900)
- π Fix inaccurate SourceRange reporting. (21109)
- π Fix broken indexing when using None and ellipses indexing together. (22905)
- π Fix bug in
CompilationUnit::define
. (21886) - π Fix compilation order for class methods. (20094)
- π Fix dead code elimination over loops. (22632)
- π Fix dead code elimination in onnx export. (22476)
- π Fix incorrect default on
Graph::toString
. (21370) - π Fix optional type promotion for classes. (21593)
- π Fix optional type unification. (19813)
- π Fix
NameError
withPYTORCH_JIT=0
. (20120) - π Fix overspecializing constants in compilation. (22816)
- π Fix
pow()
bug on overloads. (20824) - π Fix recusive method compilation. (21862)
- π Fix reflection on weak modules, copy attributes. (20190)
- π Fix slow unpickling. (21542)
- π Fix input/output type mismatch. (20829)
- π Fix insert_guard for norm decomposation. (19646)
- π Fix Trace inlining of graphs with optional inputs. (22686)
- π Fix tracing bugs where using
1 - x
in C++ would cause the size of 1 to get hardcoded. (20932) - π Fix tuple indexing bug. (21521)
- π Fix type hints for
None
constants. (23029) - Fix weak module cuda()
_flat_weights bug
. (21107) - π Fix
WeakIValueEq
. (21891) - π Fixed gcd to use 64 bit integers. (21041)
- π Fixed
list()
not making a copy. (22093) - π Fix race condition on
Module::forward
method. (21398) - Made
a += b
for lists do an in place add. (21896) - Made
floor/ceil
return ints. (21124) - Out-of-memory on GPU due to the "weak_script" decorators. (20588)
- π¨ Override print when python is present. (21625)
- Set
__file__
fortorch.ops
. (21888) - Set correct list type in pybind_utils. (23188)
π C++ Frontend bug fixes
nn::RNN
: Fix assertions in bidirectional RNN. (22850).nn::MaxPool
/nn::AvgPool
: expand incomplete kernel size, as in Python. (22073, 22075)Optim
: Fix memory leak whenweight_decay
is applied toAdam
,Adagrad
,RMSProp
. (23125)Optim::SGD
: fix memory leak with weight_decay. (23007)torch::autograd::Scatter
/ torch::autograd::Gather
: Fix nullptr bug. (20286)torch::nn::parallel::data_parallel
: fix gradient computation error. (20910)- π [C++ Extensions] Fix an issue when building multiple extensions in the same directory. (20221)
π Deprecations
π Masking via
torch.uint8
Tensors is now deprecated in favor of masking viatorch.bool
Tensors.π₯ See the Breaking Changes section for more details about
torch.bool
Tensors and comparison operators.torch.masked_select
,torch.masked_fill
,torch.masked_scatter
now expecttorch.bool
masks rather thantorch.uint8
.>>> a = torch.tensor([1, 2, 3]) >>> b = torch.tensor([3, 1, 2]) >>> a.masked_select(tensor([0, 1, 1], dtype=torch.uint8)) UserWarning: masked_select received a mask with dtype torch.uint8, this behavior is now deprecated, please use a mask with dtype torch.bool instead. tensor([2, 3]) # instead use torch.bool >>> a.masked_select(tensor([False, True, True])) tensor([2, 3])
Comparison operators with
out=
parameters now expecttorch.bool
dtype rather thantorch.uint8
.>>> a = torch.tensor([1, 2, 3]) >>> b = torch.tensor([3, 1, 2]) >>> res = torch.empty_like(a, dtype=torch.uint8) >>> torch.gt(a, b, out=res) UserWarning: torch.gt received 'out' parameter with dtype torch.uint8, this behavior is now deprecated, please use 'out' parameter with dtype torch.bool instead. tensor([0, 1, 1], dtype=torch.uint8) # instead use torch.bool >>> res = torch.empty_like(a, dtype=torch.bool) >>> torch.gt(a, b, out=res) tensor([False, True, True])
π Legacy
autograd.Function
(Function without static forward method) is now deprecated>>> class MyLegacyFunction(Function): >>> def forward(self, x): >>> return x >>> >>> def backward(self, grad_output): >>> return grad_output >>> >>> MyLegacyFunction()(torch.randn((3,), requires_grad=True) UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. # instead use new-style Autograd Function >>> class MyFunction(Function): >>> @staticmethod >>> def forward(ctx, x): >>> return x >>> >>> @staticmethod >>> def backward(ctx, grad_output): >>> return grad_output >>> >>> MyFunction.apply(torch.randn((3,), requires_grad=True)
π See the torch.autograd.Function documentation for more details.
π
torch.gels
: has been renamed totorch.lstsq
;torch.gels
will work for this release but is now deprecated. (23460)π Performance
- π Advanced Indexing: significantly improve performance of advanced indexing backward. (20557)
- π
Tensor.copy_
: increase broadcasting CUDA copy performance by 25%. (20685) - β‘οΈ
torch.matmul
: Optimize the case A.ndim <= 2 && B.ndim >= 3, shows up to 15x speed up. (20448) - π
torch.bmm
: Improve performance by up to 3x for small cases on CPU by applying TensorAccessor. (20266) - π
torch.inverse
: Move workspace query and allocation outside loop to improve performance by up to 5x. (20904) - β‘οΈ
torch.topk
: Optimize CPU perf using parallel and partial sort, up to 6x improvement. (22865) torch.cdist
: Improve CPU perf by up to 10x for some cases. (20605)torch.normal
: Movenormal
,normal_means
,normal_stddevs
, andnormal_means_stddevs
to ATen, increasing performance by up to 3x. (21287)torch.bernoulli
: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop, increasing performance by up to 2x. (21300)- π
torch.coalesce
: Use_sparse_coo_tensor_unsafe
incoalesce
for up to 10x speedup. (21214) torch.sinh
/torch.cosh
: Parallelize and vectorize on CPU. (21115)torch.lerp
: Vectorize on CPU. (22038)torch.eye
: Parallelize on CPU. (21077)torch.randperm
: Parallelize initialization in randperm on CPU. (21529)- Vectorization: Don't split 256-bit AVX2 load/store intrinsics. (20609).
π Torch.NN Performance Improvements
- π
nn.Softmax
: Add persistent CUDA kernels that increase performance 2-10x on small inputs. (20827) - π
nn.Embedding
/nn.EmbeddingBag
: Optimize CUDA kernel, increasing performance up to 2.7x. (22016) - β‘οΈ
nn.Linear
: optimize BERT model perf by using mkldnn inner product. (21851) nn.Conv{1,2,3}D
: improve perf for depthwise convolutions intorch.float16
on Volta and Turing GPUs. (22302)- β‘οΈ
nn.RNN
: optimize on CPU by fusing matmul ops. (22512) nn.Upsample
: a number of significant perf improvements on CUDA. (21879, 21694).- β‘οΈ
nn.functional.layer_norm
: optimize a fast path for layer_norm, increasing perf by up to 4x on CPU. (20345, 20883) - π Use
mkldnn
inner product fornn.Linear()
to improve BERT perf. (21851).
π Documentation
torch.bool
: doc the Boolean tensor type. (21601)- π
torch.as_strided
: add docs. (22842) - π
torch.empty_strided
: add docs. (23740) torch.lerp
: clarify broadcasting requirements. (23268)torch.enable_grad
/torch.no_grad
/torch.set_grad_enable
: clarify interaction between these features. (23310)torch.autograd.grad_mode
: Document that no_grad is thread local. (21755)torch.multiprocessing
: Explain refcounting of CUDA tensors. (19904)- β
torch.Tensor
: Add a warning about memory usage. (20801) torch.utils.data.Dataloader
: Document RNG state consumption. (22540)- β±
torch.optim.lr_scheduler.CyclicLR
: Clarifybase_momentum
andmax_momentum
. (20880). - Document production environment features. (23010)
- β Add note about contributing recently released research. (23513)
- π Clarify performance implications of deterministic mode. (21337)
- β‘οΈ Update cuda pinned memory note to include
tensor.to
. (20977)
π Torch.NN Documentation
- π
nn.functional / nn.init
: Break up NN in docs so they load faster. (21291) - π
nn.functional.conv{1,2,3}d
: Removepadding_mode
. (20891) nn.functional.upsample
/nn.functional.interpolate
: add note about overshooting withmode=βbicubicβ
. (23321)nn.init.zeros_
/nn.init.ones_
: add documentation. (23145)nn.MultiheadAttention
: Add documentation foradd_bias_kv
,add_zero_attn
, andattn_mask
. (20071)- π
nn.MultiheadAttention
: Fix documentation for attention mask shape. (20850) nn.Softmax
: Fixed to specify dimension to prevent warning in 1.1.0. (20310)
π Contributor Documentation
- π Updated web links on contribution_guide and governance documentation. (21243)
- π Improve documentation for publishing hub models. (21307)
- Suggest a faster linker in the contributing guide. (21334)
- β Add CUDA C++11 and profiling notes to the contribution guide. (21386)
π Build Documentation
- β Add magma for CUDA 10.1 to Windows docs. (19914)
- π Improve build-from-source instructions. (20088)
- β Add
ninja
to build instructions. (20079) - β‘οΈ Update libtorch build docs. (21150)
π TensorBoard Documentation
- π Tensorboard Documentation has been greatly improved! Browse the latest version here.
π Torch HUB Documentation
- π Improve docs for publishing hub models. (21307)
- β‘οΈ Update docs of entry point in hub. (21568)
ONNX
π In PyTorch 1.2, we have added the full support for ONNX Opset 7, 8, 9 and 10 in ONNX exporter, and we have also enhanced the constant folding pass to support Opset 10. The export of ScriptModule has better support. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export.
π Supporting More ONNX Opsets
- β Add basic supports for multiple ONNX Opsets and support for Opset 10. (19294)
- π Support ONNX Opset 7 and 8 in PyTorch ONNX Exporter. (22421, 20036)
- Export
Dropout
for Opset 10. (20710) - Export
Slice
andFlip
for Opset 10. (20533) - Export
Interpolate (Resize)
for Opset 10. (21434)
π Enhancing the Support for ScriptModule
- π Support multiple outputs in ScriptModule in ONNX Exporter. (20256)
- π Support tensor factories in ScriptModule in ONNX Exporter. (20255)
- π Support tuples as inputs and outputs in ScriptModule. (20784)
Exporting More Torch Operators to ONNX
- Export custom ops. (21321)
- Export
torch.arange
. (22601) - Export
torch.masked_fill
. (22521) - Export
torch.floor
,torch.ceil
,torch.log2
andprim::shape
. (17895) - Export
torch._dim_arange
. (20078) - Export
torch.randn_like
. (20093) - Export
torch._standard_gamma
. (20126) - Export
torch.topk
. (21104) - Export
__and__
,__or__
. (17894) - Export
torch.sign
. (20470) - Export
torch.scatter
. (18543) - Export
torch.rand
. (20559) - Export
torch.gather
. (21235) - Export
torch.cosine_similarity
. (21884) - Export
torch.sum
. (22240) - π Export
torch.logsumexp
. (22306) - Export
torch.layer_norm
. (22265)
Extending Existing Exporting Logic
- π Support
torch.min
andtorch.max
with dim. (19689) - π Support
maxpool
with dilations. (18721) - π Support
RNN
withbatch_first=True
. (19766) - π Support
Upsample
with dynamic input. (20116) - π Improve support for Loop export. (20445)
- Enable
torch.full
with scalar parameters. (21931) - β Added support for exporting models with variable length input/output to ONNX. (20034)
β‘οΈ Optimizing Exported ONNX Graph
- π Support constant folding in Opset 10. (22515)
- π Support negative indexing for
Slice
in constant folding optimization. (21811)
π Bugfixes/Improvements
-
v1.2.0.a0
May 24, 2019 -
v1.1.0 Changes
May 01, 2019π Note: CUDA 8.0 is no longer supported
Highlights
TensorBoard (currently experimental)
π First-class and native support for visualization and model debugging with TensorBoard, a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple
from torch.utils.tensorboard import SummaryWriter
command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs here.[JIT] Attributes in ScriptModules
π Attributes can be assigned on a
ScriptModule
by wrapping them withtorch.jit.Attribute
and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you calltorch.jit.save()
, so they are a great way to store arbitrary state in your model. See the docs for more info.Example:
class Foo(torch.jit.ScriptModule): def __init__ (self, a_dict): super(Foo, self). __init__ (False) self.words = torch.jit.Attribute([], List[str]) self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int]) @torch.jit.script_method def forward(self, input: str) -> int: self.words.append(input) return self.some_dict[input]
π [JIT] Dictionary and List Support in TorchScript
π TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and
forβ¦in
constructs.[JIT] User-defined classes in TorchScript (experimental)
π For more complex stateful operations, TorchScript now supports annotating a class with
@torch.jit.script
. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See the docs for more info.@torch.jit.script class Pair: def __init__ (self, first, second) self.first = first self.second = second def sum(self): return self.first + self.second
DistributedDataParallel new functionality and tutorials
nn.parallel.DistributedDataParallel
: can now wrap multi-GPU modules, which enables use cases such as model parallel (tutorial) on one server and data parallel (tutorial) across servers.
(19271).π₯ Breaking Changes
Tensor.set_
: thedevice
of a Tensor can no longer be changed viaTensor.set_
. This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in aStorage
on a different CUDA device. Instead, set up the Tensor on the correct device from the beginning. (18832).- β± Pay attention to the order change of
lr_scheduler.step()
. (7889). - 0οΈβ£
torch.unique
: changed the default value ofsorted
toTrue
. (15379). - [JIT] Rename isTensor api -> isCompleteTensor. #18437
- [JIT] Remove GraphExecutor's python bindings. #19141
- [C++]: many methods on
Type
no longer exist; use the functional or Tensor method equivalent. (17991). - [C++]: the
Backend
constructor ofTensorOptions
no longer exists. (18137). - [C++, Distributed]: Remove c10d
ProcessGroup::getGroupRank
has been removed. (19147).
π New Features
Operators
torch.tril_indices
,torch.triu_indices
: added operator with same behavior as NumPy. (14904, 15203).torch.combinations
,torch.cartesian_prod
: added newitertools
-like operators. (9393).torch.repeat_interleave
: new operator similar tonumpy.repeat
. (18395).torch.from_file
: new operator similar toStorage.from_file
, but returning a tensor. (18688).torch.unique_consecutive
: new operator with semantics similar tostd::unique
in C++. (19060).- π
torch.tril
,torch.triu
,torch.trtrs
: now support batching. (15257, 18025). - π
torch.gather
: add support forsparse_grad
option. (17182). torch.std
,torch.max_values
,torch.min_values
,torch.logsumexp
can now operate over multiple dimensions at once. (14535, 15892, 16475).torch.cdist
: added operator equivalent toscipy.spatial.distance.cdist
. (16168, 17173).torch. __config__.show()
: reports detailed version of all libraries. (18579).
NN
nn.MultiheadedAttention
: new module implementing MultiheadedAttention fromAttention Is All You Need
. (18334).- π
nn.functional.interpolate
: added support forbicubic
. (9849). - π
nn.SyncBatchNorm
: support synchronous Batch Normalization. (14267). - π
nn.Conv
: added support for Circular Padding viamode='circular'
. (17240). nn.EmbeddingBag
: now supports trainable `per_sample_weights. (18799).- π
nn.EmbeddingBag
: add support forfrom_pretrained
method, as innn.Embedding
. (15273). RNNs
: automatically handle unsorted variable-length sequences viaenforce_sorted
. (15225).nn.Identity
: new module for easier model surgery. (19249).
Tensors / dtypes
- π
torch.bool
: added support fortorch.bool
dtype and Tensors with that dtype (1-byte storage). NumPy conversion is supported, but operations are currently limited. (16810).
Optim
- β±
optim.lr_scheduler.CyclicLR
: Support for Cyclical Learning Rate and Momentum. (18001). - β±
optim.lr_scheduler.CosineAnnealingWarmRestarts
: new scheduler: Stochastic Gradient Descent with Warm Restarts). (17226). - π Support multiple simultaneous LR schedulers. (14010)
Distributions
- π
torch.distributions
: now support multiple inheritance. (16772).
Samplers
quasirandom.SobolEngine
: new sampler. (10505).
DistributedDataParallel
- π
nn.parallel.DistributedDataParallel
: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). (18251, 18953).
TorchScript and Tracer
- π Allow early returns from if-statements. (#154463)
- β Add an
@ignore
annotation, which statically tells the TorchScript compiler to ignore the Python function. (#16055) - Simple
for...in
loops on lists. (#16726) - Ellipses (
...
) in Tensor indexing. (#17763) None
in Tensor indexing. (#18615)- π Support for basic list comprehensions. (#17267)
- β Add implicit unwrapping of optionals on
if foo is not None
. (#15587) - Tensors, ints, and floats will once again be implicitly cast to bool if used in a conditional. (#18755).
- Implement
to()
,cpu()
, andcuda()
on ScriptModules. (#15340 , #15904) - β Add support for various methods on lists: (
clear()
,pop()
,reverse()
,copy()
,extend()
,index()
,count()
,insert()
,remove()
). - β Add support for
sort()
on lists of specialized type (Tensors
,int
,float
,bool
). (#19572) - β Add support for various methods on strings: (
index()
,slice()
,len()
) - π Support
Tensor.to()
in TorchScript. ( #15976 ) - π Support for
Torch.tensor()
in TorchScript. (#14913, #19445) - π Support for
torch.manual_seed()
in TorchScript. (#19510) - π Support for
nn.LSTM
in TorchScript. (#15744) - π Support for
nn.init
in TorchScript. (#19640) - β Add
hash()
builtin. (#18258) - β Add
min()
andmax()
builtins for numerical types. (#15680) - β Add
isinstance()
builtin, which performs a static type check. (#15076) - β Add
train()
/eval()
/is_training()
to C++ ScriptModule API. (#16044) - π Allow List arguments to Python functions called from TorchScript. (#15721)
- π Allow using
std::vector
andstd::unordered_map
as arguments to custom operators. (#17587) - Tracer: now allows passing static dicts and lists as trace inputs. (#18092, #19580)
- π Allow generic containers as ScriptModule inputs. (#16482)
- π Allow
nn.Sequential
in ModuleList. (#16882)
Experimental Features
- [Quantization] (API unstable): added limited support for quantized datatypes via
torch.qint8
dtype,torch.quantize_linear
conversion function. (18230). - [MKLDNN tensor] (API unstable): Added limited (opaque) support for
MKLDNN
tensors viaTensor.to_mkldnn()
; operators are currently limited to ResNext101 operators. (17748).
π Improvements
torch.min
,torch.max
,torch.median
,torch.mode
,torch.kthvalue
,torch.symeig
,torch.eig
,torch.pstrf
,torch.qr
,torch.geqrf
,torch.solve
,torch.slogdet
,torch.sort
,torch.topk
,torch.gels
,torch.triangular_solve
,torch.svd
now return namedtuples describing their outputs. (16186, 16950, 17093, 17195, 15429).- π
torch.empty
(and other factory functions): now take apin_memory
kwarg; can now pin without going throughtorch.Storage
interface.. (18455). - π
torch.histc
: Now supported on CUDA. (15842) torch.unique
: Addreturn_counts
. (18391, 18651).- π
torch.logspace
: add the ability to specify abase
. (19542). - π¨
torch.set_printoptions
: added scientific notation support. (16876). torch.btrifact
now handles tensors with greater than 3 dimensions. (14964).- π
torch.kthvalue
: now supported on CUDA. (17544). - π
torch.abs
: now supported onuint8
andint8
dtypes. (16893). - π
torch.stack
,torch.cat
: now supported for CPU half tensors. (16389). - π
torch.cross
: added support for negative dimensions. (17582). - π
torch.lerp
: add support forweight
as a Tensor. (17348). torch.transpose
: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is. (17462, 17535).- π
torch.linspace
,torch.logspace
can now be used withsteps=1
andstart != end
. (14748). torch.cholesky
: changed the derivative from a triangular matrix to symmetric matrix. (19116).torch.lerp
: Improved numerical stability. (18871).torch.logdet
,torch.slogdet
: improve numerical precision. (18449).Tensor. __contains__
is now supported. (17733).- π
Tensor.fill_
andtorch.zeros
now support half on CPU. (17536). Tensor.resize_as_
,Tensor.view
: now supported on half CPU tensors. (18821).Tensor indexing
: allow indexing via NumPy booleans. (14932).nn.EmbeddingBag
: enable half precision dense backward. (19293).nn.Embedding
: fix dense Embedding to work with double backwards. (9078).nn.MaxPool1d
: Allow list and tuples to be passed asoutput_size
. (16489).- π
nn.CTCLoss
: support zeroing infinite losses viazero_infinity
argument. (16199). - π
nn.Dropout
: add support for enabling during eval. (17549). - β
nn.MSELoss
: add warning about unexpected broadcasting. (18349). nn.Module.load_state_dict
: also returnmissing_keys
andunexpected_keys
. (18668).nn.parallel.data_parallel
: Enforce devices matchdevice_ids
. (17129).torch.device
: handle in more places that used to accept only device ordinals. (14929)dtype.int8
tensors can now be converted to NumPy arrays. (14710).nn.functional.gumbel_softmax
: allow multidimensional input withdim
argument. (13339).nn.functional.cosine_similarity
: improved precision. (18250).torch.autograd
: Don't keep unnecessary saved_inputs alive, increasing memory efficiency. (16583).torch.autograd.profiler
: add Self (non-nested) CPU Time Total, CPU time total (19378).- π
DataLoader
: support accepting a custom memory pinning function. (16743). DataLoader
: retry libshm on EINTR. (15964).- π
DataLoader
: fixed an issue withpin_memory
andPackedSequence
. (18079) data.utils.collate
,data.utils.pin_memory
: now preserve namedtuples. (16440)- π Use
IndexError
instead ofRuntimeError
on many indexing error cases. (17049, 17114). - π Support indexing a
torch.float16
tensor on CPU. (17645). - β Add (limited) error checking in case of internal overlap on inplace operators. (19317, 17927).
- π
utils.checkpoint.checkpoint
: supportNone
as an argument to checkpoint function. (17969). - π»
torch.autograd
: added more information forone of the variables needed for gradient computation has been modified by an inplace operation
exception. (18523). - π
cuda.synchronize
: add a device argument. (19573). cuda.reset_max_memory_*
: now supported. (15985).distributions.Independent
: can now calculate KL Divergence. (17681).- 0οΈβ£
torch.distributed.new_group
: now supports overriding default backend. (18595). - π¨
torch.distributed.init_process_group
: will now propagate timeout to underlying Store. (16571). - [JIT] Preserve module hierarchy on traced modules. (#15101)
- [JIT] Add metadata for TracedModules. (#17311)
- [JIT] Improve portability of int and float checks. (#19532)
- [JIT] Preserve method parameter names during serialization. (#16750)
- [JIT] Add a correctness check for C++ types to custom operators. (#15247)
- [JIT] Added a few extra python bindings to help with walking the IR graph from Python. #17822
- [JIT Error Messages] Print out operator suggestions for "unknown builtin op" error. (#15183)
- [JIT Error Messages] Better error message when creating a module instance in TorchScript. (#16416)
- [JIT Error Messages] Print suggestion to add
nn.Module
attributes to__constants__
when they are using in TorchScript. (#18164) - [JIT Error Messages]
torch.save()
: Improve error message when you try to save a ScriptModule. (#15321) - [JIT Error Messages]
torch.jit.save()
: Improve error message when trying to save a model with Python code. (#16850) - [JIT Error Messages] Better errors when trying to close over a Tensor with grad enabled while tracing. (#18298, #19645)
- [JIT Error Messages] Better error when trying to add a Tensor to
__constants__
. (#16724) - [JIT Error Messages] Better error when a module list isn't added to
__constants__
. (#17167) - [JIT Error Messages] Add a warning when attempting to trace legacy constructors. (#16770)
- [JIT Error Messages] Improve hint when trying to trace non-deterministic nodes. (#17957)
- [C++]
nn::Module
: added Python interop. (13481). - [C++]
autograd::profiler
: is now supported. (16580) - [C++] allow detection of C++ ABI flag for cpp extensions from available runtime information. (18994).
- [C++]
torch.argsort
is now supported in C++. (17099). - [C++]
Tensor.isnan
: now supported in C++. (15722). - [C++]: Added named submodule support to
nn::Sequential
. (17552). - [C++]: Kaiming Initialization. (14718).
- [C++]
torch::data::transforms::Normalize
: now supported in C++. (15891). - [C++]: Support call operator on module holder calling forward. (15831).
Random and Sequential distributed samplers. (16910). - [C++]: pretty printing of C++ Modules. (15326).
- [C++] Support serializing
std::vector<torch::Tensor>
. (19677).
π Bug Fixes
Serious
torch.prod
: correct erroneous calculation on large tensors. (15653).torch.mean
(and other reductions): fix incorrect calculation on CUDA on large inputs. (16023).nn.Conv
: correctly handle non-contiguous inputs on MKLDNN convolution codepath. (16300).Tensor.eq_
: Fix erroneous calculation. (15475).torch.mean
: Fix fp16 output calculation. (14878).nn.PoissonNLLLoss
: Properly handlereduction=None
. (17358).- [JIT] Fix bug where custom ops could get optimized out if their outputs weren't used. (#18711).
- [JIT] Fix bug where the model serializer would accidentally reorder statements. (#17557).
Other
Tensor.round
is now consistently half to even. (17443).Tensor.resize_
: Fix some 0-element cases. (14874).Tensor.numpy
: Fix conversion oftorch.int8
dtype. (15194).Tensor.grad
: correctly handledel
. (16525).Tensor.clamp
: correctly handle NaN on CUDA. (15479).Tensor.topk
: properly set launch bounds on CUDA. (17296).Tensor.kthvalue
: treat NaN as bigger than any number. (17824).- π
Tensor.copy_
: Properly synchronize on src and dst sreams. (16966). Tensor indexing
: Fix incorrect dimension error message. (16495).- π
Tensor.coalesce
,Tensor.clone
,Tensor.to_dense
: fixed for sparse 0-dimensional tensors. (17379). torch.isinf
: Don't error out on integral tensors. (15489).torch.argsort
,torch.sort
: Match NumPy by considering NaNs to be larger than any number. (15886).torch.geqrf
,torch.ormqr
: when anout
parameter is specified, dispatch to the correct function. (16964).torch.cuda.get_device_name
/torch.cuda.get_device_capability
: Fix handling of optional. (17222).Tensor.tril_
/Tensor.triu_
: properly reuse input memory. (17031).torch.arange
: fix shape inconsistency between CPU and CUDA. (18462).torch.empty
(and other size-based factory functions): properly enforce non-negative sizes. (17077).- π
torch.load
: support serializing / deserializingpathlib.Path
object. (18562). nn.BatchNorm
: correctly handle very large batches. (17047).- π
nn.Softmax
/nn.LogSoftmax
: fix double backward fortorch.half
. (17330). nn.Softmax
: handle empty inputs in backward. (17259).nn.NLLLoss
: Fix crash whenignore_index
is out-of-bounds on CPU. (17328).- π
nn.Softmax
,nn.LogSoftmax
: handle 0-element inputs. (17651). nn.CTCLoss
: correct error checking. (16269).- π
nn.Conv
: better report convolution size mismatch. (17436). torch.nn.functional.cosine_similarity
: fix output sometimes returning result > 1.0. (18168).nn.parallel.data_parallel
: Fix handling of buffers that require_grad. (13352).nn.parallel.data_parallel
: would previously sometimes frees tensors before all pending operations finish. (18465).- π
torch.distributed.broadcast
: fixed repeated calls leading to OOM. (19219). torch.multiprocessing
: fix serialization of integernn.Parameters
. (18639).torch.multiprocessing
: Fix handling ofdistributions
on CUDA. (16854).torch.nonzero
: Fix for 0-dimensional tensors on CUDA. (17406).torch.slogdet
: Fixsign
requiring grad wheninput
required grad. (16337).- βͺ
torch.cuda.Stream
: Properly restore stream on destination device when switching devices. (17439). - π
torch.cuda.Stream
: Fixed synchronization issue when used with non-current device. (15689). torch.cuda.Stream
: properly change device in stream context manager. (16128).- π
DataLoader
: fixed a hang when no data was read and the buffer size is smaller than the chunk size. (17409). - 0οΈβ£
DataLoader
:_utils.collate.default_collate
now converts bool lists to byte Tensors, not integer tensors.
(14669). DataLoader
: ensure dataset is indexed by integers. (17649).- π
torch.sparse.mm
: Handle transposed dense tensors in backwards. (18737). - π
torch.sparse.sum
: Fix parsing ofdim
. (16517). - π
torch.sparse.mm
/torch.sparse.addmm
: fix broadcasting and using uninitialized data. (16572). - π
Tensor.to_sparse
: Fix for 0-dimensional tensors. (17406). - π
SparseTensor
: fix add with non-contiguousvalues
tensors. (18179). - Fix
compare_exchange_weak
inweak_intrusive_ptr
. (16302). utils.model_zoo.load_url
: Fix race condition. (16578).utils.data.RandomSampler
: havelen
properly take into accountnum_samples
. (15991).torch.distributions
: Fix precision issue with expansion that prefersprobs
overlogits
. (18614).- π
distributions.dirichlet.Dirichlet
: fixed an underflow issue. (17488). - π
distributions.binomial.Binomial.log_prob
: fixed numerical stability issue. (15962). - π
Caching Allocator
: Free all blocks with outstanding events on OOM-retry. (19222). torch.dtype
: fix pickling issue with Python 2. (18045).utils.data.DataLoader
: Fix SIGCHLD checking. (19421).- β‘οΈ
optim.Optimizer
: Properly copy defaults. (19308). - β±
optim.lr_scheduler.CosineAnnealingLR
: Fix division-by-zero error. (19180). - β±
optim.lr_scheduler.ReduceLROnPlateau
: fix bug when the argument tostep
is reused outside the function.
(16697). cudNN
: fix race condition with multiple threads calling into the same device. (15080).cudNN
: Properly specify accumulation types. (16825).cuDNN
: Fix incorrectly selecting slower algorithms in certain cases. (15881).cuFFT
: Properly handle CUDA contexts. (19300)- Fix infinite loop in reduction functions when get_max_threads is nonzero but num_threads is 1. (15114).
- π Fix tensor printing bug with Python 2. (12732).
MKLDNN
: fix thread safety. (17022).- [JIT]
floordiv
: Fix integer division and divide-by-zero semantics. (#15813). - [JIT] Fix bug in alias analysis that disabled optimizations even in models without mutation. (#18416).
- [JIT]
ord()
: Fix handling of utf8 chars. (#19423). - [JIT] Fix error when too many parameters are passed to a fused CUDA kernel. (#18063).
- [JIT] Fix bug where common subexpression elimination accidentally introduced aliasing to function outputs. (#19576).
- [JIT] Fix infinite loop in
requires_grad
analysis pass. (#18361). - [JIT] Fix ordering of parameters for in
rnn.py
. (#18198). - [JIT]] Fix contiguous autodiff and AutoGradZero inconsistency (#18633).
- [JIT] Fix error reporting in NVRTC use of the fuser. (#18327).
- [JIT] Ensure GIL is acquired before doing module lookup on import. (#17135).
- [JIT] Fix bug where
_unique_state_dict
could contain duplicate Tensors. (#18139). - [C++]: Fix module serialization issue where one submodule doesn't have any parameters, but its submodules do. (15033).
- [C++]: Add
Stream
andEvent
APIs. (15937). - [C++]: Fix Module serialization incompatibility between Python and C++ with weight-less layers. (19740).
- [C++]: Properly pass
extra_cuda_cflags
to C++ extensions on Windows. (18638). - [C++] Make SGD semantics match python. (15840).
- [C++]
torch::nn::init::orthogonal_
: match Python API. (18915).
π Deprecations
- π
torch.btrifact
: the deprecatedinfo
argument has been removed. (14935). - 0οΈβ£
torch.potrs
has been deprecated, usetorch.cholesky_solve
instead. Note thatupper
defaults toFalse
fortorch.cholesky_solve
, andTrue
fortorch.potrs
. (15334). - π
torch.pstrf
is deprecated; usetorch.cholesky
instead. Note thatupper
defaults toFalse
fortorch.cholesky
, andTrue
fortorch.pstrf
. (17866). - 0οΈβ£
torch.potri
is deprecated; usetorch.cholesky_inverse
instead. Note thatupper
defaults toFalse
fortorch.cholesky_inverse
, andTrue
fortorch.potri
. (19498). torch.btrifact_with_info
has been deprecated; usetorch.lu
withget_infos=True
instead.(18435).- π
torch.btrifact
has been deprecated; use the new nametorch.lu
instead. (18435). - π
torch.gesv
is deprecated; use the new name `torch.solve instead. (18060). - π
torch.trtrs
has been deprecated; use the new nametorch.triangular_solve
instead. (18213). - π
torch. btriunpack
has been deprecated; use the new nametorch.lu_unpack
instead. (18529). - π
torch.btrisolve
has been deprecated; use the new nametorch.lu_solve
instead. (18726). - [C++]
IntList
has been deprecated, useIntArrayRef
instead, as it better describes the type and ownership semantics in C++. (16751). - [C++] Dispatch macros with
Type
parameters, e.g.AT_DISPATCH_ALL_TYPES(tensor.type(), ...
, are now deprecated; useScalarType
instead, e.g.AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), ...
. (17527, 17996). - [C++] the deprecated
variable_tensor_functions
have been removed. (15003).
π Performance
Highlights
nn.BatchNorm
CPU inference speed increased up to ~19x.(19152).nn.AdaptiveAvgPool
: speed up common-case of size=1 output by ~30x. (17011).- π
nn.EmbeddingBag
CPU performance increased by ~4x. (19329). Tensor.copy_
: sped up larger tensor copy ~2-3x, small regression in small tensor copy. (18618).torch.nonzero
: is now ~2x faster than numpy on CPU. (15190)- π Improve caching allocator for Pascal and newer GPUs; 10-20% better memory utilization on Mask-RCNN. (17120).
reduction functions
: Speed up some large Tensor cases by 50-80%. (17428).- [JIT] Graph fuser: better fusion for backwards graphs in the presence of broadcasting. (#14957)
- [JIT] Graph fuser:
batch_norm
fusion for inference. (#15146) - [JIT] Graph fuser:
layer_norm
fusion for inference. (#18266)
Other
torch.abs
,torch.frac
,torch.repiprocal
,torch.neg
have been vectorized and parallelized (19041).- π
torch.bmm
: CPU performance increased by 2x. (19338). - π
torch.sort
: CUDA performance increased by ~2x. (19379). torch.cat
on CPU is now ~4x faster in the case where inputs are contiguous anddim
!= 0. (17032).- π
torch.multinomial
fixed a 2x performance regression. (17121). torch.empty
(and another factory functions): reduce overhead by 20-40%. (17565).torch.linspace
has been parallelized on CPU. (15320).- π
torch.logspace
has been parallelized on CPU. (15438). torch.range
has been parallelized on CPU. (15484).torch.arange
has been parallelized on CPU. (15667).torch.load
: avoid unnecessary CPU-to-CUDA copy. (17297).reduction functions
: improve efficiency on CUDA. (16224, 17040).- Speed up some GEMM cases on CPU by up to 7x.(17730)
- Tensor iterator loop unrolling. (17667).
- π
sparse/dense matrix multiply
: improve speed by ~5x. (16905). distributions.MultivariateNormal
: sped up. (17294).- [JIT] Graph fuser: pow scalar exponent / base autodiff, fusion (#19324)
- [JIT] Graph fuser: allow fusion of function float arguments. (#18087)
- [JIT] Shape analysis: specialize optional Tensor inputs to graphs. (#18360)
- [JIT] Shape analysis: various correctness improvements. (#18271)
- [JIT] Shape analysis:
aten::_convolution
now participates in shape analysis. (#16837] - [JIT] Autodiff: coverage for ops used in maskrcnn & BERT. (#16689)
- [JIT] Autodiff: support for scalar comparison ops and
randlike
. (#14740) - [JIT] Autodiff: support for
adaptive_avg_pool2d
. (#15459) - [JIT] Autodiff: support for
erf
anderfc
. (#15139) - [JIT] Autodiff: support for
layernorm
. (#17702) - [JIT] Autodiff: support for
tanh
. (#17816) - [JIT] Autodiff: support for
matmul
/dropout
. (#17523) - [JIT] Autodiff: specialized CUDA impl for dropout. (#17756)
- [JIT] Constant folding: improved inlining of control flow. (#16244)
π Documentation
- π
Tensor.scatter_
: add documentation aboutvalue
parameter. (17467). Tensor.unfold
: correctly documentdimension
parameter, notdim
. (19020).Tensor.is_floating_point()
is now documented. (15704).- π
torch.cholesky
: Fix brokenupper
example in documentation. (15215). torch.gesv
: documentout
parameter. (15649).- π
torch.mul
: better explain elementwise multiplication. (15664). - π
torch.eig
,torch.symeig
: better explain backwards limitations. (15929). - π
torch.ormqr
: fixed output specification. (15694). torch.from_numpy
: replaced usage withtorch.as_tensor
in documentation. (16587).- π
torch.mvlgamma
: Fix the constant in the docs. (17045). torch.mode
: more precisely describe what is returned. (17069).- π
torch.upsample
: documentation now matchestorch.interpolate
. (17134) - π
torch.arange
: correctdtype
documentation. (18604) torch.cumprod
: documentout
parameter. (19340).torch.nonzero
: document indices being returned lexicographically. (19539).- π
torch.nn.functional.interpolate
: better explainaligned_corners
parameter. (14806). - π
torch.nn.functional.pad
: documentation has been made consistent with other functional ops. (15984). nn.functional.grid_sample
: clarify behavior of padding. (19754).nn.TripletMarginLoss
: correct type ofswap
parameter. (18115).- π
nn.CrossEntropyLoss
: clarifyignore_index
documentation. (18117). nn.CrossEntropyLoss
: the input format is more clearly explained. (15990).nn.CTCLoss
: Clarify a number of ambiguities. (18415).- π
nn.BCEWithLogitsLoss
: add better explanation. (19212). - π
nn.BCEWithLogitsLoss
: better explain positive samples. (17258). - π
nn.ModuleList
/nn.ParameterList
: update documentation. (17731). nn.Module.load_state_dict
: correct semantics ofstrict
. (17618)nn.parallel.DataParallel
: more accurately specify how different argument types are handled. (15993).nn.parallel.DistributedDataParallel
: Clarified batch size requirements. (16010).torch.distributed
: Document mixed-precision training. (15440).torch.multiprocessing
: Include example multiprocessing code. (16345).- π
torch.autograd
: Better explain computing Jacobian-vector product. (15197). torch.cuda.get_rng_state
,torch.cuda.set_rng_state
: document taking adevice
object. (14324).torch.device
: Fix example of passingdevice
to tensor factory. (16839).- π
DataLoader
: update documentation to describe how workers are managed. (18091). - π Unified shape formats throughout the documentation. (15741).
- π Update documentation for
reduction
arguments to use non-deprecated format. (17300). mark_non_differentiable
: document correct semantics. (17891).- Warn about memory overlaps on inplace operations. (17576).
- π Fix a number of small issues with conv and pooling docstrings. (17052).
- π Fix a number of small issues with padding and activation docstrings. (17197).
- [C++]: mention packed accessors in Tensor basics. (19464).
ONNX
Exporting More Torch Operators to ONNX
- Export torch.isnan to ONNX (17698).
- Export torch.flatten to ONNX (16240).
- Export torch.where, torch.ceil, torch.floor to ONNX (18571).
- Export torch.narrow to ONNX (17550).
- Export torch.argmax and torch torch.argmin (17382, 18264, 18261).
- Export adaptive_avg_pool1D, adaptive_avg_pool2D, adaptive_avg_pool3D, adaptive_max_pool1D, adaptive_max_pool2D, adaptive_max_pool3D to ONNX (17412).
- Export torch.nonzero to ONNX (17036, 18047).
- Export torch.erf to ONNX (16106).
- Export torch.split (15092).
- Export torch.lt, torch.gt, torch.le, torch.ge, torch.eq, torch.ne to ONNX (15677).
- Export torch.expand and torch.ne to ONNX (15050).
- π Export torch.nn.LogSigmoid to ONNX (14830).
- Export torch.nn.RReLU to ONNX (14781).
- Export torch.reshape and torch.reshape_as to ONNX (16632, 16971).
- Replace use of ConstantLike with with ConstantOfShape (16095, 16214).
Extending Existing Exporting Logic
- π Enable dim support in torch.nn.Softmax's export (18482).
- π Support exporting squeeze & unsqueeze with negative dim attribute (19297).
- Support exporting max_pool1d, max_pool2d, max_pool3d with indices (16455).
- β Add dtype support in torch.logsoftmax and torch.softmax's export (17672).
- Support ceil_mode in max_pool_1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d's export (16769).
β‘οΈ Optimizing Exported ONNX Graph
- β Add constant folding in ONNX exporter (18698).
- Retain the parameter names in ONNX exporter (17551).
- Omit slice op if it is a non-op (19155).
- β Add a flag to strip doc_string from exported ONNX models (18882).
- Omit torch.dropout if the model is in eval mode (16547).
β Adding Utility Functions and Refactoring
- Remove unused arg f from _model_to_graph(). (19647).
- β Add the support for stable ONNX opsets in exporter (16068, 17419).
- β Set the default ONNX opset to the latest stable opset (i.e., 9) (17736).
- β Add an utility function to check whether it's in the middle of ONNX export or not (19050).
- π¨ Refactoring serialization of ONNX initializers to be name-based (17830).
- π¦ Expose dim() on type and use it in ONNX symbolics (15933).
- Add scalar_type_to_pytorch_type dict in ONNX symbolic (15965).
- β Add an assertion to check the number of the parameters passed to ONNX exporter (18145).
π Bugfixes
- π Fix different types in rsub caused bug (15707).
- π Fix list structure supports in ONNX exporter (19102).
- π Fix case for
activations
attribute in nn.RNN ONNX export. (19368). - Minor fix for onnx ConstantOfShape export (18199).
- π Fix the torch.(reduce)min and torch.(reduce)max's export (15241).
- π Fixing ONNX export of logical ops to have correct output datatype (15185).
- π Fix typo in docstring (18216).
-
v1.1.0.a0
December 26, 2018 -
v1.0.1 Changes
February 07, 2019Note: our conda install commands have slightly changed. Version specifiers such as
cuda100
inconda install pytorch cuda100 -c pytorch
have changed toconda install pytorch cudatoolkit=10.0 -c pytorch
π₯ Breaking Changes
π There are no breaking changes in this release.
π Bug Fixes
Serious
- π Higher order gradients for CPU Convolutions have been fixed (regressed in 1.0.0 under MKL-DNN setting) #15686
- Correct gradients for non-contiguous weights in CPU Convolutions #16301
- π Fix ReLU on CPU Integer Tensors by fixing vec256 inversions #15634
- π Fix bincount for non-contiguous Tensors #15109
- π Fix torch.norm on CPU for large Tensors #15602
- π Fix eq_ to do equality on GPU (was doing greater-equal due to a typo) (#15475)
- βͺ Workaround a CuDNN bug that gave wrong results in certain strided convolution gradient setups
- blacklist fft algorithms for strided dgrad (#16626)
Correctness
- π Fix cuda native loss_ctc for varying input length (#15798)
- this avoids NaNs in variable length settings
- C++ Frontend: Fix serialization (#15033)
- Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do
- π Fix derivative for mvlgamma (#15049)
- π Fix numerical stability in log_prob for Gumbel distribution (#15878)
- multinomial: fix detection and drawing of zero probability events (#16075)
Crashes
- β‘οΈ PyTorch binaries were crashing on AWS Lambda and a few other niche systems, stemming from CPUInfo handling certain warnings as errors. Updated CPUInfo with relevant fixes.
- MKL-DNN is now statically built, to avoid conflicts with system versions
- π Allow ReadyQueue to handle empty tasks (#15791)
- Fixes a segfault with a DataParallel + Checkpoint neural network setting
- Avoid integer divide by zero error in index_put_ (#14984)
- π Fix for model inference crash on Win10 (#15919) (#16092)
- π Use CUDAGuard when serializing Tensors:
- Before this change,
torch.save
andtorch.load
would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1.
- Before this change,
- Fix error with handling scalars and rpow , for example
1 ^^ x
, where x is a PyTorch scalar (#16687) - Switch to CUDA implementation instead of CuDNN if batch size >= 65536 for affine_grid (#16403)
- CuDNN crashes when batch size >= 65536
- [Distributed] TCP init method race condition fix (#15684)
- [Distributed] Fix a memory leak in Gloo's CPU backend
- [C++ Frontend] Fix LBFGS issue around using inplace ops (#16167)
- [Hub] Fix github branch prefix v (#15552)
- π [Hub] url download bugfix for URLs served without Content-Length header
π Performance
- π LibTorch binaries now ship with CuDNN enabled. Without this change, many folks saw significant perf differences while using LibTorch vs PyTorch, this should be fixed now. #14976
- π Make btriunpack work for high dimensional batches and faster than before (#15286)
- π improve performance of unique with inverse indices (#16145)
- π¨ Re-enable OpenMP in binaries (got disabled because of a CMake refactor)
Other
- create type hint stub files for module torch (#16089)
- This will restore auto-complete functionality in PyCharm, VSCode etc.
- π Fix sum_to behavior with zero dimensions (#15796)
- Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
- π Fixes various error message / settings in dynamic weight GRU / LSTMs (#15766)
- C++ Frontend: Make call operator on module holder call forward (#15831)
- C++ Frontend: Add the normalize transform to the core library (#15891)
- π Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)
- Implements Batched upper triangular, lower triangular (#15257)
- β Add torch.roll to documentation (#14880)
- π (better errors) Add backend checks for batch norm (#15955)
JIT
- β Add better support for bools in the graph fuser (#15057)
- π Allow tracing with fork/wait (#15184)
- π improve script/no script save error (#15321)
- β Add self to Python printer reserved words (#15318)
- π Better error when torch.load-ing a JIT model (#15578)
- π fix select after chunk op (#15672)
- β Add script standard library documentation + cleanup (#14912)
-
v1.0.0 Changes
December 07, 2018Table of Contents
- Highlights
- JIT
- Brand New Distributed Package
- C++ Frontend [API Unstable]
- Torch Hub
- π₯ Breaking Changes
- β Additional New Features
- N-dimensional empty tensors
- New Operators
- New Distributions
- Sparse API Improvements
- Additions to existing Operators and Distributions
- π Bug Fixes
- Serious
- Backwards Compatibility
- Correctness
- Error checking
- Miscellaneous
- Other Improvements
- π Deprecations
- CPP Extensions
- π Performance
- π Documentation Improvements
Highlights
JIT
The JIT is a set of compiler tools for bridging the gap between research in PyTorch
β‘οΈ and production. It allows for the creation of models that can run without a dependency on the Python interpreter and which can be optimized more aggressively. Using program annotations existing models can be transformed into Torch Script, a subset of Python that PyTorch can run directly. Model code is still valid Python code and can be debugged with the standard Python toolchain. PyTorch 1.0 provides two ways in which you can make your existing code compatible with the JIT, usingtorch.jit.trace
ortorch.jit.script
. Once annotated, Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.# Write in Python, run [email protected] RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
As an example, see a tutorial on deploying a seq2seq model,
π loading an exported model from C++, or browse the docs.π¦ Brand New Distributed Package
π¦ The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by a brand new re-designed distributed library. The main highlights of the new library are:
- π New
torch.distributed
is performance driven and operates entirely asynchronously for all backends:Gloo
,NCCL
, andMPI
. - π Significant Distributed Data Parallel performance improvements especially for hosts with slower networks such as ethernet-based hosts
- β Adds async support for all distributed collective operations in the torch.distributed package.
- π Adds the following CPU ops in the Gloo backend: send, recv, reduce, all_gather, gather, scatter
- β Adds barrier op in the NCCL backend
- π Adds new_group support for the NCCL backend
C++ Frontend [API Unstable].
π The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to
torch.nn
,torch.optim
,torch.data
and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:Python C++ import torch model = torch.nn.Linear(5, 1) β‘οΈ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() β‘οΈ optimizer.step() | #include <torch/torch.h>
torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); β‘οΈ optimizer.step(); |
We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next couple of releases. Some parts of the API may undergo breaking changes during this time.
π See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.
Torch Hub
Torch Hub is a pre-trained model repository designed to facilitate research reproducibility.
π Torch Hub supports publishing pre-trained models (model definitions and pre-trained weights) to a github repository using a simple hubconf.py file; see hubconf for resnet models in pytorch/vision as an example. Once published, users can load the pre-trained models using the torch.hub.load API.
π For more details, see the torch.hub documentation. Expect a more-detailed blog post introducing Torch Hub in the near future!
π₯ Breaking Changes
- π Indexing a 0-dimensional tensor will now throw an error instead of warn. Use tensor.item() instead. (#11679).
- π torch.legacy is removed. (#11823).
- torch.masked_copy_ is removed, use torch.masked_scatter_ instead. (#9817).
Operations that result in 0 element tensors may return changed shapes.
- Before: all 0 element tensors would collapse to shape (0,). For example, torch.nonzero is documented to return a tensor of shape (n,z), where n = number of nonzero elements and z = dimensions of the input, but would always return a Tensor of shape _(0,) when no nonzero elements existed.
- Now: Operations return their documented shape.
Previously: all 0-element tensors are collapsed to shape (0,)
torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)
Now, proper shape is returned
torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
π torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
Some inter-type operations (e.g.
*
) betweentorch.Tensors
and NumPy arrays will now favor dispatching to thetorch
variant. This may result in different return types. (#9651).π Implicit
numpy
conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')
) before an implicit conversion. (#10553).torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
π torch.tensor function with a
Tensor
argument now returns adetached
Tensor (i.e. a Tensor wheregrad_fn
isNone
). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
#11815).torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape
(N,)
instead of(N, C)
to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
(#9965).The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
π Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
π CPP Extensions: Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and
TensorOptions
as the last argument. For example, replace your call toat::ones(torch::CPU(at::kFloat)), {2, 3})
withtorch::ones({2, 3}, at::kCPU)
. This applies to the following functions:arange
,empty
,eye
,full
,linspace
,logspace
,ones
,rand
,randint
,randn
,randperm
,range
,zeros
.
0οΈβ£ torch.potrf renamed to torch.cholesky. It has a new default (upper=False) (#12699).
π Renamed
elementwise_mean
tomean
for loss reduction functions (#13419)
β Additional New Features
N-dimensional empty tensors
Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
π New Operators
- π torch.argsort similar to numpy.argsort.
(#9600). - π torch.pdist similar to scipy.spatial.distance.pdist. (#10782).
- π torch.tensordot similar to numpy.tensordot. (#10025).
- π torch.broadcast_tensors similar to numpy.broadcast_arrays.
(#10075). - π torch.narrow support for sparse tensors.
(#11342). - π torch.matrix_rank similar to numpy.linalg.matrix_rank.
(#10338). - π torch.matrix_power similar to numpy.linalg.matrix_power. (#11421).
- π torch.nn.CeLU activation. (#8551).
- π torch.nn.CTCLoss. (#9628).
- π torch.diag_embed (#12447).
- π
torch.roll
operator to match numpy.roll (#13261, #13588, #13874). - π torch.chain_matmul for performing a chain of matrix multiplies (#12380).
- π torch.finfo,
π torch.info to get more detailed information on adtype
, similar to numpy.finfo and numpy.iinfo (#12472). Tensor. __cuda_array_interface__
to provide compatibility with numba and other CUDA projects (#11984).
π New Distributions
- π Weibull Distribution. (#9454).
- π NegativeBinomial Distribution. (#9345).
- π² torch.mvlgamma Multivariate Log-Gamma Distribution. (#9451).
π Sparse API Improvements
- π Implemented "sparse gradient" versions of some existing functions, see sparse.mm, sparse.sum, sparse.addmm for details. (#14526, #14661, #12430, #13345).
- π
Tensor.to_sparse()
allows conversion from a dense tensor to a sparse tensor. (#12171) - π torch.cat now supports sparse tensors. (#13761, #13577).
- π torch.unsqueeze now works with sparse vectors (this also makes torch.stack work out of the box). (#13760).
- Autograd is now supported on
values()
and torch.sparse_coo_tensor (with indices and values tensors). E.g.,torch.sparse_coo_tensor(i, v).values().sum()
is differentiable w.r.t.v
. See the updated torch.sparse documentation for details. (#13001).
β Additions to existing Operators and Distributions
- π torch.unique now accepts an optional
dim
argument. (#10423). - π torch.norm now supports matrix norms.
(#11261). - π torch.distributions.kl.kl_divergence now supports broadcasting. (#10533).
- π torch.distributions now support an
expand
method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341). - π torch.nn.functional.grid_sample now support nearest neighbor interpolation and reflection padding. (#10051).
- π torch.mean now works across multiple dimensions. (#14252).
- π torch.potrs supports batching (#13453).
- π torch.multiprocessing.spawn helper for spawning processes. (#13518).
- π torch.pow now allows taking derivatives when invoked with a python number as a base. (#12450).
- π Tensor.to now supports a
copy
keyword argument. (#12571). - π² torch.softmax and and torch.log_softmax now support a
dtype
accumulation argument. (#11719). - π torch.svd supports a
compute_uv
argument for optionally computing singular vectors (#12517). - π torch.inverse now supports batches of tensors (#9949).
- π autograd.profiler shows demangled names on nvtx ranges. (#13154).
π Bug Fixes
Serious
- π torch.nn.functional.softmin was using the incorrect formula in 0.4.1 (#10066).
- π torch.as_strided backwards (called via
view
) was incorrect with overlapping data locations. (#9538). - π Pointwise losses (e.g. torch.nn.MSELoss) were sometimes using the wrong
reduction
method. (#10018). - π torch.from_numpy was not handling big-endian dtypes correctly. (#9508).
- π torch.multiprocessing now correctly handles CUDA tensors, requires_grad settings, and hooks. (#10220).
__rsub__
now works properly when the CUDA device is not 0. (#12956).- π Fix memory leak during packing in tuples (#13305).
- π DataLoader fixed a couple of issues resulting in hangs. (#11985, #12700).
- π torch.multinomial
replacement=False
will not properly throw an error message when there are no more categories to select (#12490). - torch.masked_fill_ now works properly on non-contiguous tensor inputs (#12594).
Tensor. __delitem__
: fixed a segmentation fault on (#12726).
Backwards Compatibility
- torch.nn.Module
load_from_state_dict
now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781). - π Fix
RuntimeError: storages don't support slicing
when loading models saved with PyTorch 0.3. (#11314). - π BCEWithLogitsLoss: fixed an issue with legacy
reduce
parameter. (#12689).
Correctness
- π torch.nn.Dropout fused kernel could change parameters in
eval
mode. (#10621). - π torch.unbind backwards has been fixed. (#9995).
- π Fix a bug in sparse matrix-matrix multiplication when a sparse matrix is coalesced then transposed. (#10496).
- π torch.bernoulli now handles
out=
parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273). - π torch.Tensor.normal_ could give incorrect results on CPU. (#10846).
- π torch.tanh could return incorrect results on non-contiguous tensors. (#11226).
- π² torch.log on an expanded
Tensor
gave incorrect results on CPU. (#10269). - π torch.logsumexp now correctly modifies the
out
parameter if it is given. (#9755). - π torch.multinomial with
replacement=True
could select 0 probability events on CUDA. (#9960). - π torch.nn.ReLU will now properly propagate
NaN
.
(#10277). - π torch.max and torch.min could return incorrect values on input containing
inf
/-inf
. (#11091). - π Fixed an issue with calculated output sizes of
torch.nn.Conv
modules withstride
anddilation
. (#9640). - π torch.nn.EmbeddingBag now correctly returns vectors filled with zeros for empty bags on CUDA. (#11740).
- π Use integer math to compute output size of pooling operations (#14405).
- π Fix sum() on fp16 (#13926).
- Remove CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for accuracy (#13844).
- π fix stability in bce with pos_weight formula (#13863).
- π Fix torch.dist for infinity, zero and minus infinity norms (#13713).
- Give broadcast_coalesced tensors different version counters (#13594).
- π Fix flip() shape bug in CPU (#13344).
- π Fix more spectral norm bugs (#13350).
- π Fix handling of single input in gradcheck (#13543).
- π torch.cuda.manual_seed now also sets the philox seed and offset. (#12677).
- π utils.bottleneck fix ZeroDivisionError(#11987).
- π Disable hook serialization (#11705).
- π torch.norm: fix negative infinity norm (#12722).
- π Fix torch.isfinite for integer input (#12750).
- π ConvTranspose3d fix
output_size
calculation (#12952). - π torch.randperm: properly use RNG mutex on CPU (#13832)
Error checking
- π torch.gesv now properly checks LAPACK errors. (#11634).
- π Fixed an issue where extra positional arguments were accepted (and ignored) in Python functions calling into C++. (#10499).
- legacy
Tensor
constructors (e.g.torch.FloatTensor(...)
) now correctly check theirdevice
argument.
(#11669). - Properly check that
out
parameter is a CPUTensor
for CPU unary ops. (#10358). - π torch.nn.InstanceNorm1d now correctly accepts 2 dimensional inputs. (#9776).
- torch.nn.Module.load_state_dict had an incorrect error message. (#11200).
- π torch.nn.RNN now properly checks that inputs and hidden_states are on the same devices. (#10185).
- torch.nn.utils.rnn.pack_padded_sequence now properly checks for out-of-order length. (#13933).
- π torch.bmm now properly checks that its Tensor arguments are on compatible devices. (#12434).
- π Conv2d: fixed incorrect error message for too-large kernel size (#12791).
- π Tensor.expand error message now includes complete sizes. (#13124).
- π Improve CUDA out-of-memory error message. (#13751).
- π torch.arange now properly checks for invalid ranges. (#13915)
Miscellaneous
- π torch.utils.data.DataLoader could hang if it was not completely iterated. (#10366).
- π Fixed a segfault when grad to a hook function is
None
. (#12028). - π Fixed a segfault in backwards with torch.nn.PReLU when the input does not require grad. (#11758).
- π
dir(torch)
has been fixed with Python 3.7. (#10271). - π Fixed a device-side assert in torch.multinomial when
replacement=False
and the input has fewer nonzero elements thannum_samples
. (#11933). - Can now properly assign a
torch.float16
dtype tensor to.grad
. (#11781). - π Fixed
can only join a started process
error with torch.utils.data.DataLoader. (#11432). - π Prevent
unexpected exit
in torch.utils.data.DataLoader onKeyboardInterrupt
. (#11718). - π torch.einsum now handles spaces consistently. (#9994).
- π Fixed a broadcasting bug in torch.distributions.studentT.StudentT. (#12148).
- π fix a printing error with large non-contiguous tensors. (#10405).
- allow empty index for scatter_* methods (#14077)
- π torch.nn.ModuleList now handles negative indices. (#13102).
- Minor fix to reenable nvtx sequence numbers for the forward methods of custom (Python) autograd functions (#13876)
- π Fix handling all empty bags in CUDA embedding bag (#13483)
- Fix half_tensor.bernoulli_(double) (#13474)
- π Fix cuda out of memory test (#13864)
- Implement NaN-propagating max/min on Vec256. (#13399).
- π Fix refcounting in anomaly metadata (#13249)
- π Fix pointwise loss broadcast (#12996)
- π Fix copying a
nn.Parameter
(#12886)
Other Improvements
- π torch.cuda functions and torch.nn.parallel.data_parallel now accept torch.device objects in addition to integer device ids. (#10833, #10189).
- π torch.nn.parallel.data_parallel now accepts
torch.device
inputs. (#10189). - π² torch.nn.functional.log_softmax is now more numerically stable. (#11866).
- π Improve printing of sparse tensors and
grad_fns
. (#10181). - Only involve CUDA device in CUDA -> CPU copy. (#11592).
- Accept numpy floating-point scalars as doubles more consistently. (#9659).
- π sparse-to-sparse copy_ is now supported. (#9005).
- π torch.bincount now supports 0 element inputs. (#9757).
- π torch.nn.functional.conv2d error message have been improved. (#11053).
- π Allow conversion of
np.int64
to PyTorch scalar. (#9225). - π torch.einsum now handles varargs.
(#10067). - π torch.symeig now returns 0-filled eigenvectors when
eigenvectors=False
is passed on CUDA rather than uninitialized data. (#10645). - π torch.utils.checkpoint: added an pption to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout. (#14253).
π Deprecations
- β Removed support for C extensions. Please use cpp extensions. (#12122)
- β Delete
torch.utils.trainer
(#12487)
CPP Extensions
- π The
torch/torch.h
header is deprecated in favor oftorch/extension.h
, which should be used in all C++ extensions going forward. Includingtorch/torch.h
from a C++ extension will produce a warning. It is safe to batch replacetorch/torch.h
withtorch/extension.h
. - π Usage of the following functions in C++ extensions is also deprecated:
torch::set_requires_grad
. Replacement:at::Tensor
now has aset_requires_grad
method.torch::requires_grad
. Replacement:at::Tensor
now has arequires_grad
method.torch::getVariableType
. Replacement: None.
- π Fix version.groups() (#14505)
- π Allow building libraries with setuptools that dont have abi suffix (#14130)
- Missing .decode() after check_output in cpp_extensions (#13935)
torch.distributed
- π¦ the old (THD-backed) torch.distributed package is deprecated but still available at torch.distributed.deprecated.
- π The old (THD-backed) torch.nn.parallel.DistributedDataParallel is deprecated but still available at
torch.nn.parallel.deprecated.DistributedDataParallel
.
π Performance
- π "Advanced Indexing" performance has improved on both CPU and GPU. (#13420)
- π torch.nn.functional.grid_sample on CPU now uses vectorized operation and is now 2x~7x faster with AVX2 enabled CPUs. (#9961).
- π torch.norm has been vectorized and parallelized on CPU. (#11565).
- π torch.max and torch.min has been parallelized on CPU. (#10343).
- π torch.nn.Threshold and torch.nn.ReLU have been parallelized on CPU. (#13182)
- torch.Tensor.masked_fill_ has been parallelized on CPU. (#11359).
- π Tensor.sparse_mask has been parallelized on CPU. (#13290).
- π torch.nn.PReLU has been sped up on both CPU and GPU.
(#11758). - π torch.nn.KLDivLoss has been sped up on both CPU and GPU. (#10336).
- π torch.svd has been sped up on both CPU and GPU.
(#11194). - π torch.einsum has been greatly sped up on CPU.
(#11292). - π torch.clamp no longer does unnecessary copying. (#10352).
- π torch.add, torch.sub, torch.mul, torch.div are much faster for non-contiguous tensors on GPU. (#8919).
- π torch.nn.RNN and related Modules have been ported to C++ and are more performant. (#10305, #10481).
- π autograd.Profiler now has lower overhead. (#10969, #11773).
- π¨ Printing large Tensors is now faster. (#14418).
π Documentation Improvements
- π Reproducibility note added. (#11329).
- π CPP Extensions have improved online documentation. Authors of C++ extensions may want to consult this documentation when writing new extensions.
- π torch.Tensor.flatten is now documented.
(#9876). - π torch.digamma is now documented. (#10967).
- π torch.allclose is now documented. (#11185).
- π torch.eig return format clarified. (#10315).
- π torch.as_tensor now includes a proper example. (#10309).
- torch.sparse_coo_tensor now explains uncoalesced behavior. (#10308).
- π torch.fft equation has been corrected. (#10760).
- π torch.nn.LSTM behavior has been clarified in the multilayer case. (#11896).
- π torch.nn.functional.dropout documentation has been clarified. (#10417).
- π torch.nn.functional.pad documentation has been clarified. (#11623).
- Add documentation about Tensor properties
device
,is_cuda
,requires_grad
,is_leaf
andgrad
.
(#14339) - π torch.sparse documentation updated (#12221).
- β Added a quick rundown of codebase structure. (#12693)
- π torch.cat corrected parameter name to
tensors
fromseq
. (#12741) - Warn that tensor.resize_() resets strides (#12816)
- Highlights
-
v1.0.rc1 Changes
October 02, 2018This is a pre-release preview, do not rely on the tag to have a fixed set of commits, or rely on the tag for anything practical / important
Table of Contents
- Highlights
- JIT
- torch.distributed new "C10D" library
- C++ Frontend [API Unstable]
- π₯ Breaking Changes
- Additional New Features
- N-dimensional empty tensors
- New Operators
- New Distributions
- Additions to existing Operators and Distributions
- π Bug Fixes
- Serious
- Backwards Compatibility
- Correctness
- Error checking
- Miscellaneous
- Other Improvements
- π Deprecations
- CPP Extensions
- π Performance
- π Documentation Improvements
Highlights
JIT
The JIT is a set of compiler tools for bridging the gap between research in PyTorch
and production. It includes a language called Torch Script (don't worry it is a subset of Python,
so you'll still be writing Python), and two ways in which you can make your existing code compatible with the JIT.
β‘οΈ Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.# Write in Python, run [email protected] RNN(x, h, W\_h, U\_h, b\_h): y = [] for t in range(x.size(0)): h = torch.tanh(x[t] @ W\_h + h @ U\_h + b\_h) y += [h] return torch.stack(y), h
As an example, see a tutorial on deploying a seq2seq model,
π loading an exported model from C++, or browse the docs.torch.distributed new "C10D" library
π¦ The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by the new "C10D" library. The main highlights of the new library are:
- π C10D is performance driven and operates entirely asynchronously for all backends:
Gloo
,NCCL
, andMPI
. - π Significant Distributed Data Parallel performance improvements especially for slower network like ethernet-based hosts
- β Adds async support for all distributed collective operations in the torch.distributed package.
- β Adds send and recv support in the Gloo backend
C++ Frontend [API Unstable].
π The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to
torch.nn
,torch.optim
,torch.data
and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:Python C++ import torch model = torch.nn.Linear(5, 1) β‘οΈ optimizer = torch.optim.SGD(model.parameters(), lr=0.1) prediction = model.forward(torch.randn(3, 5)) loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1)) loss.backward() β‘οΈ optimizer.step() | #include <torch/torch.h>
torch::nn::Linear model(5, 1); torch::optim::SGD optimizer(model->parameters(), /lr=/0.1); torch::Tensor prediction = model->forward(torch::randn({3, 5})); auto loss = torch::mse_loss(prediction, torch::ones({3, 1})); loss.backward(); β‘οΈ optimizer.step(); |
We are releasing the C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for your research application, but still has some open construction sites that will stabilize over the next month or two. Some parts of the API may undergo breaking changes during this time.
π See https://pytorch.org/cppdocs for detailed documentation on the greater PyTorch C++ API as well as the C++ frontend.
π₯ Breaking Changes
- π Indexing a 0-dimensional tensor will now throw an error instead of warn. Use tensor.item() instead. (#11679).
- π torch.legacy is removed. (#11823).
- torch.masked_copy_ is removed, use torch.masked_scatter_ instead. (#9817).
Operations that result in 0 element tensors may return changed shapes.
- Before: all 0 element tensors would collapse to shape (0,). For example, torch.nonzero is documented to return a tensor of shape (n,z), where n = number of nonzero elements and z = dimensions of the input, but would always return a Tensor of shape _(0,) when no nonzero elements existed.
- Now: Operations return their documented shape.
Previously: all 0-element tensors are collapsed to shape (0,)
torch.nonzero(torch.zeros(2, 3)) tensor([], dtype=torch.int64)
Now, proper shape is returned
torch.nonzero(torch.zeros(2, 3)) tensor([], size=(0, 2), dtype=torch.int64)
Sparse tensor indices and values shape invariants are changed to be more consistent in the case of 0-element tensors. See link for more details. (#9279).
π torch.distributed: the TCP backend is removed, we recommend to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives.
Some inter-type operations (e.g.
*
) betweentorch.Tensors
and NumPy arrays will now favor dispatching to thetorch
variant. This may result in different return types. (#9651).π Implicit
numpy
conversion no longer implicitly moves a tensor to CPU. Therefore, you may have to explicitly move a CUDA tensor to CPU (tensor.to('cpu')
) before an implicit conversion. (#10553).torch.randint now defaults to using dtype torch.int64 rather than the default floating-point dtype. (#11040).
π torch.tensor function with a
Tensor
argument now returns adetached
Tensor (i.e. a Tensor wheregrad_fn
isNone
). This more closely aligns with the intent of the function, which is to return a Tensor with copied data and no history. (#11061,
#11815).torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape
(N,)
instead of(N, C)
to match the behavior of torch.nn.MultiMarginLoss. In addition, it is more numerically stable.
(#9965).The result type of a torch.float16 0-dimensional tensor and a integer is now torch.float16 (was torch.float32 or torch.float64 depending on the dtype of the integer). (#11941).
π Dirichlet and Categorical distributions no longer accept scalar parameters. (#11589).
π CPP Extensions: Deprecated factory functions that accept a type as the first argument and a size as a second argument argument have been removed. Instead, use the new-style factory functions that accept the size as the first argument and
TensorOptions
as the last argument. For example, replace your call toat::ones(torch::CPU(at::kFloat)), {2, 3})
withtorch::ones({2, 3}, at::kCPU)
. This applies to the following functions:arange
,empty
,eye
,full
,linspace
,logspace
,ones
,rand
,randint
,randn
,randperm
,range
,zeros
.
β Additional New Features
N-dimensional empty tensors
Tensors with 0 elements can now have an arbitrary number of dimensions and support indexing and other torch operations; previously, 0 element tensors were limited to shape (0,). (#9947). Example:
torch.empty((0, 2, 4, 0), dtype=torch.float64) tensor([], size=(0, 2, 4, 0), dtype=torch.float64)
π New Operators
- π torch.argsort similar to numpy.argsort.
(#9600). - π torch.pdist similar to scipy.spatial.distance.pdist. (#10782).
- π torch.tensordot similar to numpy.tensordot. (#10025).
- π torch.broadcast_tensors similar to numpy.broadcast_arrays.
(#10075). - π torch.narrow support for sparse tensors.
(#11342). - π torch.matrix_rank similar to numpy.linalg.matrix_rank.
(#10338). - π torch.matrix_power similar to numpy.linalg.matrix_power. (#11421).
- π torch.nn.CeLU activation. (#8551).
- π torch.nn.CTCLoss. (#9628).
π New Distributions
- π Weibull Distribution. (#9454).
- π NegativeBinomial Distribution. (#9345).
- π² torch.mvlgamma Multivariate Log-Gamma Distribution. (#9451).
β Additions to existing Operators and Distributions
- π torch.unique now accepts an optional
dim
argument. (#10423). - π torch.norm now supports matrix norms.
(#11261). - π torch.distributions.kl.kl_divergence now supports broadcasting. (#10533).
- π torch.distributions now support an
expand
method similar to torch.Tensor.expand. For example: torch.distributions.bernoulli.Bernoulli.expand. (#11341). - π torch.nn.functional.grid_sample now support nearest neighbor interpolation and reflection padding. (#10051).
π Bug Fixes
Serious
- π torch.nn.functional.softmin was using the incorrect formula in 0.4.1. (#10066)
- π torch.as_strided backwards (called via
view
) was incorrect with overlapping data locations. (#9538). - π Pointwise losses (e.g. torch.nn.MSELoss were sometimes using the wrong
reduction
method. (#10018). - π torch.from_numpy was not handling big-endian dtypes correctly. (#9508).
- π torch.multiprocessing now correctly handles CUDA tensors, requires_grad settings, and hooks. (#10220).
Backwards Compatibility
- torch.nn.Module
load_from_state_dict
now correctly handles 1-dimensional vs 0-dimensional tensors saved from 0.3 versions. (#9781). - π Fix
RuntimeError: storages don't support slicing
when loading models saved with PyTorch 0.3. (#11314).
Correctness
- π torch.nn.Dropout fused kernel could change parameters in
eval
mode. (#10621). - π torch.unbind backwards has been fixed. (#9995).
- π Fix a bug in sparse matrix-matrix multiplication when a sparse matrix is coalesced then transposed. (#10496).
- π torch.bernoulli now handles
out=
parameters correctly, handles expanded tensors correctly, and has corrected argument validity checks on CPU. (#10273). - π torch.Tensor.normal_ could give incorrect results on CPU. (#10846).
- π torch.tanh could return incorrect results on non-contiguous tensors. (#11226).
- π² torch.log on an expanded
Tensor
gave incorrect results on CPU. (#10269). - π torch.logsumexp now correctly modifies the
out
parameter if it is given. (#9755). - π torch.multinomial with
replacement=True
could select 0 probability events on CUDA. (#9960). - π torch.nn.ReLU will now properly propagate
NaN
.
(#10277). - π torch.max and torch.min could return incorrect values on input containing
inf
/-inf
. (#11091). - π Fixed an issue with calculated output sizes of
torch.nn.Conv
modules withstride
anddilation
. (#9640). - π torch.nn.EmbeddingBag now correctly returns vectors filled with zeros for empty bags on CUDA. (#11740).
Error checking
- π torch.gesv now properly checks LAPACK errors. (#11634).
- π Fixed an issue where extra positional arguments were accepted (and ignored) in Python functions calling into C++. (#10499).
- legacy
Tensor
constructors (e.g.torch.FloatTensor(...)
) now correctly check theirdevice
argument.
(#11669). - Properly check that
out
parameter is a CPUTensor
for CPU unary ops. (#10358). - π torch.nn.InstanceNorm1d now correctly accepts 2 dimensional inputs. (#9776).
- torch.nn.Module.load_state_dict had an incorrect error message. (#11200).
- π torch.nn.RNN now properly checks that inputs and hidden_states are on the same devices. (#10185).
Miscellaneous
- π torch.utils.data.DataLoader could hang if it was not completely iterated. (#10366).
- π Fixed a segfault when grad to a hook function is
None
. (#12028). - π Fixed a segfault in backwards with torch.nn.PReLU when the input does not require grad. (#11758).
- π
dir(torch)
has been fixed with Python 3.7. (#10271). - π Fixed a device-side assert in torch.multinomial when
replacement=False
and the input has fewer nonzero elements thannum_samples
. (#11933). - Can now properly assign a
torch.float16
dtype tensor to.grad
. (#11781). - π Fixed
can only join a started process
error with torch.utils.data.DataLoader. (#11432). - π Prevent
unexpected exit
in torch.utils.data.DataLoader onKeyboardInterrupt
. (#11718). - π torch.einsum now handles spaces consistently. (#9994).
- π Fixed a broadcasting bug in torch.distributions.studentT.StudentT. (#12148).
- π fix a printing error with large non-contiguous tensors. (#10405).
Other Improvements
- π torch.cuda functions and torch.nn.parallel.data_parallel now accept torch.device objects in addition to integer device ids. (#10833, #10189).
- π torch.nn.parallel.data_parallel now accepts
torch.device
inputs. (#10189). - π² torch.nn.functional.log_softmax is now more numerically stable. (#11866).
- π Improve printing of sparse tensors and
grad_fns
. (#10181). - Only involve CUDA device in CUDA -> CPU copy. (#11592).
- Accept numpy floating-point scalars as doubles more consistently. (#9659).
- π sparse-to-sparse copy_ is now supported. (#9005).
- π torch.bincount now supports 0 element inputs. (#9757).
- π torch.nn.functional.conv2d error message have been improved. (#11053).
- π Allow conversion of
np.int64
to PyTorch scalar. (#9225). - π torch.einsum now handles varargs.
(#10067). - π torch.symeig now returns 0-filled eigenvectors when
eigenvectors=False
is passed on CUDA rather than uninitialized data. (#10645).
π Deprecations
CPP Extensions
- π The
torch/torch.h
header is deprecated in favor oftorch/extension.h
, which should be used in all C++ extensions going forward. Includingtorch/torch.h
from a C++ extension will produce a warning. It is safe to batch replacetorch/torch.h
withtorch/extension.h
. - π Usage of the following functions in C++ extensions is also deprecated:
torch::set_requires_grad
. Replacement:at::Tensor
now has aset_requires_grad
method.torch::requires_grad
. Replacement:at::Tensor
now has arequires_grad
method.torch::getVariableType
. Replacement: None.
torch.distributed
- π¦ the old (THD-backed) torch.distributed package is deprecated but still available at torch.distributed.deprecated.
- π The old (THD-backed) torch.nn.parallel.DistributedDataParallel is deprecated but still available at
torch.nn.parallel.deprecated.DistributedDataParallel
.
π Performance
- π torch.nn.functional.grid_sample on CPU now uses vectorized operation and is now 2x~7x faster with AVX2 enabled CPUs. (#9961).
- π torch.norm has been vectorized and parallelized on CPU. (#11565).
- π torch.max and torch.min has been parallelized on CPU. (#10343).
- torch.Tensor.masked_fill_ has been parallelized on CPU. (#11359).
- π torch.nn.PReLU has been sped up on both CPU and GPU.
(#11758). - π torch.nn.KLDivLoss has been sped up on both CPU and GPU. (#10336).
- π torch.svd has been sped up on both CPU and GPU.
(#11194). - π torch.einsum has been greatly sped up on CPU.
(#11292). - π torch.clamp no longer does unnecessary copying. (#10352).
- π torch.add, torch.sub, torch.mul, torch.div are much faster for non-contiguous tensors on GPU. (#8919).
- π torch.nn.RNN and related Modules have been ported to C++ and are more performant. (#10305, #10481).
- π Profiler now has lower overhead. (#10969, #11773).
π Documentation Improvements
- π Reproducibility note added. (#11329).
- π CPP Extensions have improved online documentation. Authors of C++ extensions may want to consult this documentation when writing new extensions.
- π torch.Tensor.flatten is now documented.
(#9876). - π torch.digamma is now documented. (#10967).
- π torch.allclose is now documented. (#11185).
- π torch.eig return format clarified. (#10315).
- π torch.as_tensor now includes a proper example. (#10309).
- torch.sparse_coo_tensor now explains uncoalesced behavior. (#10308).
- π torch.fft equation has been corrected. (#10760).
- π torch.nn.LSTM behavior has been clarified in the multilayer case. (#11896).
- π torch.nn.functional.dropout documentation has been clarified. (#10417).
- π torch.nn.functional.pad documentation has been clarified. (#11623).
- Various mathematical formulas have been clarified. (#11106).
- Highlights
-
v1.0.rc0
October 02, 2018 -
v0.4.1 Changes
July 26, 2018Table of Contents
- π₯ Breaking Changes
- π New Features
- Neural Networks
- Adaptive Softmax, Spectral Norm, etc.
- Operators
- torch.bincount, torch.as_tensor, ...
- torch.distributions
- Half Cauchy, Gamma Sampling, ...
- Other
- Automatic anomaly detection (detecting NaNs, etc.)
- π Performance
- Faster CPU ops in a wide variety of cases
- Other improvements
- π Bug Fixes
- π Documentation Improvements
π₯ Breaking Changes
- π
torch.stft
has changed its signature to be consistent with librosa #9497- Before:
stft(signal, frame_length, hop, fft_size=None, normalized=False, onesided=True, window=None, pad_end=0)
- After:
stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)
torch.stft
is also now using FFT internally and is much faster.
- Before:
- π
torch.slice
is removed in favor of the tensor slicing notation #7924 - 0οΈβ£
torch.arange
now does dtype inference: any floating-point argument is inferred to be the defaultdtype
; all integer arguments are inferred to beint64
. #7016 - π
torch.nn.functional.embedding_bag
's old signature embedding_bag(weight, input, ...) is deprecated, embedding_bag(input, weight, ...) (consistent with torch.nn.functional.embedding) should be used instead - π
torch.nn.functional.sigmoid
andtorch.nn.functional.tanh
are deprecated in favor oftorch.sigmoid
andtorch.tanh
#8748 - Broadcast behavior changed in an (very rare) edge case:
[1] x [0]
now broadcasts to[0]
(used to be[1]
) #9209
π New Features
Neural Networks
π Adaptive Softmax
nn.AdaptiveLogSoftmaxWithLoss
#5287\>\>\> in\_features = 1000\>\>\> n\_classes = 200\>\>\> adaptive\_softmax = nn.AdaptiveLogSoftmaxWithLoss(in\_features, n\_classes, cutoffs=[20, 100, 150])\>\>\> adaptive\_softmax AdaptiveLogSoftmaxWithLoss( (head): Linear(in\_features=1000, out\_features=23, bias=False) (tail): ModuleList( (0): Sequential( (0): Linear(in\_features=1000, out\_features=250, bias=False) (1): Linear(in\_features=250, out\_features=80, bias=False) ) (1): Sequential( (0): Linear(in\_features=1000, out\_features=62, bias=False) (1): Linear(in\_features=62, out\_features=50, bias=False) ) (2): Sequential( (0): Linear(in\_features=1000, out\_features=15, bias=False) (1): Linear(in\_features=15, out\_features=50, bias=False) ) ) )\>\>\> batch = 15\>\>\> input = torch.randn(batch, in\_features)\>\>\> target = torch.randint(n\_classes, (batch,), dtype=torch.long)\>\>\> # get the log probabilities of target given input, and mean negative log probability loss\>\>\> adaptive\_softmax(input, target) ASMoutput(output=tensor([-6.8270, -7.9465, -7.3479, -6.8511, -7.5613, -7.1154, -2.9478, -6.9885, -7.7484, -7.9102, -7.1660, -8.2843, -7.7903, -8.4459, -7.2371], grad\_fn=\<ThAddBackward\>), loss=tensor(7.2112, grad\_fn=\<MeanBackward1\>))\>\>\> # get the log probabilities of all targets given input as a (batch x n\_classes) tensor\>\>\> adaptive\_softmax.log\_prob(input) tensor([[-2.6533, -3.3957, -2.7069, ..., -6.4749, -5.8867, -6.0611], [-3.4209, -3.2695, -2.9728, ..., -7.6664, -7.5946, -7.9606], [-3.6789, -3.6317, -3.2098, ..., -7.3722, -6.9006, -7.4314], ..., [-3.3150, -4.0957, -3.4335, ..., -7.9572, -8.4603, -8.2080], [-3.8726, -3.7905, -4.3262, ..., -8.0031, -7.8754, -8.7971], [-3.6082, -3.1969, -3.2719, ..., -6.9769, -6.3158, -7.0805]], grad\_fn=\<CopySlices\>)\>\>\> # predit: get the class that maximize log probaility for each input\>\>\> adaptive\_softmax.predict(input) tensor([8, 6, 6, 16, 14, 16, 16, 9, 4, 7, 5, 7, 8, 14, 3])
π Add spectral normalization
nn.utils.spectral_norm
#6929\>\>\> # Usage is similar to weight\_norm\>\>\> convT = nn.ConvTranspose2d(3, 64, kernel\_size=3, pad=1)\>\>\> # Can specify number of power iterations applied each time, or use default (1)\>\>\> convT = nn.utils.spectral\_norm(convT, n\_power\_iterations=2)\>\>\>\>\>\> # apply to every conv and conv transpose module in a model\>\>\> def add\_sn(m): for name, c in m.named\_children(): m.add\_module(name, add\_sn(c)) if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)): return nn.utils.spectral\_norm(m) else: return m\>\>\> my\_model = add\_sn(my\_model)
π
nn.ModuleDict
andnn.ParameterDict
containers #8463Add
nn.init.zeros_
andnn.init.ones_
#7488β Add sparse gradient option to pretrained embedding #7492
β Add max pooling support to
nn.EmbeddingBag
#5725π Depthwise convolution support for MKLDNN #8782
β Add
nn.FeatureAlphaDropout
(featurewise Alpha Dropout layer) #9073Operators
π
torch.bincount
(count frequency of each value in an integral tensor) #6688\>\>\> input = torch.randint(0, 8, (5,), dtype=torch.int64)\>\>\> weights = torch.linspace(0, 1, steps=5)\>\>\> input, weights (tensor([4, 3, 6, 3, 4]), tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])\>\>\> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1])\>\>\> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])
π
torch.as_tensor
(similar totorch.tensor
but never copies unless necessary) #7109\>\>\> tensor = torch.randn(3, device='cpu', dtype=torch.float32)\>\>\> torch.as\_tensor(tensor) # doesn't copy\>\>\> torch.as\_tensor(tensor, dtype=torch.float64) # copies due to incompatible dtype\>\>\> torch.as\_tensor(tensor, device='cuda') # copies due to incompatible device\>\>\> array = np.array([3, 4.5])\>\>\> torch.as\_tensor(array) # doesn't copy, sharing memory with the numpy array\>\>\> torch.as\_tensor(array, device='cuda') # copies due to incompatible device
π
torch.randperm
for CUDA tensors #7606π
nn.HardShrink
for CUDA tensors #8117π
torch.flip
(flips a tensor along specified dims) #7873π
torch.flatten
(flattens a contiguous range of dims) #8578π
torch.pinverse
(computes svd-based pseudo-inverse) #9052π
torch.meshgrid
#8581π
torch.unique
for CUDA tensors #8899π
torch.erfc
(complementary error function) https://github.com/pytorch/pytorch/pull/9366/filesπ
torch.isinf
andtorch.isfinite
#9169 #9487π Support backward for target tensor in
torch.nn.functional.kl_div
#7839π
torch.logsumexp
#7254β Add batched linear solver to
torch.gesv
#6100π
torch.sum
now supports summing over multiple dimensions https://github.com/pytorch/pytorch/pull/6152/filesπ
torch.diagonal
torch.diagflat
to take arbitrary diagonals with numpy semantics #6718π
tensor.any
andtensor.all
onByteTensor
can now acceptdim
andkeepdim
arguments #4627Distributions
- Half Cauchy and Half Normal #8411
- Gamma sampling for CUDA tensors #6855
- π Allow vectorized counts in Binomial Distribution #6720
Misc
- π Autograd automatic anomaly detection for
NaN
and errors occuring in backward. Two functions detect_anomaly and set_detect_anomaly are provided for this. #7677 - π Support
reversed(torch.Tensor)
#9216 - π Support
hash(torch.device)
#9246 - π Support
gzip
intorch.load
#6490
π Performance
- Accelerate bernoulli number generation on CPU #7171
- Enable cuFFT plan caching (80% speed-up in certain cases) #8344
- π Fix unnecessary copying in
bernoulli_
#8682 - π Fix unnecessary copying in
broadcast
#8222 - Speed-up multidim
sum
(2x~6x speed-up in certain cases) #8992 - Vectorize CPU
sigmoid
(>3x speed-up in most cases) #8612 - β‘οΈ Optimize CPU
nn.LeakyReLU
andnn.PReLU
(2x speed-up) #9206 - π Vectorize
softmax
andlogsoftmax
(4.5x speed-up on single core and 1.8x on 10 threads) #7375 - π Speed up
nn.init.sparse
(10-20x speed-up) #6899
π Improvements
π¨ Tensor printing
- Tensor printing now includes
requires_grad
andgrad_fn
information #8211 - π Improve number formatting in tensor print #7632
- π Fix scale when printing some tensors #7189
- π¨ Speed up printing of large tensors #6876
Neural Networks
NaN
is now propagated through many activation functions #8033- β Add
non_blocking
option to nn.Module.to #7312 - Loss modules now allow target to require gradient #8460
- β Add
pos_weight
argument tonn.BCEWithLogitsLoss
#6856 - π Support
grad_clip
for parameters on different devices #9302 - β Removes the requirement that input sequences to
pad_sequence
have to be sorted #7928 stride
argument formax_unpool1d
,max_unpool2d
,max_unpool3d
now defaults tokernel_size
#7388- Allowing calling grad mode context managers (e.g.,
torch.no_grad
,torch.enable_grad
) as decorators #7737 - β±
torch.optim.lr_scheduler._LRSchedulers
__getstate__
include optimizer info #7757 - Add support for accepting
Tensor
as input inclip_grad_*
functions #7769 - Return
NaN
inmax_pool
/adaptive_max_pool
forNaN
inputs #7670 nn.EmbeddingBag
can now handle empty bags in all modes #7389- β±
torch.optim.lr_scheduler.ReduceLROnPlateau
is now serializable #7201 - π Allow only tensors of floating point dtype to require gradients #7034 and #7185
- π Allow resetting of BatchNorm running stats and cumulative moving average #5766
- Set the gradient of
LP-Pool
ing to zero if the sum of all input elements to the power of p is zero #6766
Operators
- β Add ellipses ('...') and diagonals (e.g. 'iiβi') to
torch.einsum
#7173 - β Add
to
method forPackedSequence
#7319 - Add support for
__floordiv__
and__rdiv__
for integral tensors #7245 - π
torch.clamp
now has subgradient 1 at min and max #7049 - π
torch.arange
now uses NumPy-style type inference: #7016 - π Support infinity norm properly in
torch.norm
andtorch.renorm
#6969 - π Allow passing an output tensor via
out=
keyword arugment intorch.dot
andtorch.matmul
#6961
Distributions
- Always enable grad when calculating
lazy_property
#7708
π Sparse Tensor
- β Add log1p for sparse tensor #8969
- π Better support for adding zero-filled sparse tensors #7479
Data Parallel
- π Allow modules that return scalars in
nn.DataParallel
#7973 - π Allow
nn.parallel.parallel_apply
to take in a list/tuple of tensors #8047
Misc
torch.Size
can now accept PyTorch scalars #5676- Move
torch.utils.data.dataset.random_split
to torch.utils.data.random_split, andtorch.utils.data.dataset.Subset
totorch.utils.data.Subset
#7816 - β Add serialization for
torch.device
#7713 - π Allow copy.deepcopy of
torch.(int/float/...)*
dtype objects #7699 - π
torch.load
can now take atorch.device
as map location #7339
π Bug Fixes
- π Fix
nn.BCELoss
sometimes returning negative results #8147 - π Fix
tensor._indices
on scalar sparse tensor giving wrong result #8197 - π Fix backward of
tensor.as_strided
not working properly when input has overlapping memory #8721 - π Fix
x.pow(0)
gradient when x contains 0 #8945 - π Fix CUDA
torch.svd
andtorch.eig
returning wrong results in certain cases #9082 - π Fix
nn.MSELoss
having low precision #9287 - π Fix segmentation fault when calling
torch.Tensor.grad_fn
#9292 - π Fix
torch.topk
returning wrong results when input isn't contiguous #9441 - π Fix segfault in convolution on CPU with large
inputs
/dilation
#9274 - Fix
avg_pool2/3d
count_include_pad
having default valueFalse
(should beTrue
) #8645 - π Fix
nn.EmbeddingBag
'smax_norm
option #7959 - π Fix returning scalar input in Python autograd function #7934
- π Fix THCUNN
SpatialDepthwiseConvolution
assuming contiguity #7952 - π Fix bug in seeding random module in
DataLoader
#7886 - π Don't modify variables in-place for
torch.einsum
#7765 - π Make return uniform in lbfgs step #7586
- The return value of
uniform.cdf()
is now clamped to[0..1]
#7538 - π Fix advanced indexing with negative indices #7345
CUDAGenerator
will not initialize on the current device anymore, which will avoid unnecessary memory allocation onGPU:0
#7392- π Fix
tensor.type(dtype)
not preserving device #7474 - π· Batch sampler should return the same results when used alone or in dataloader with
num_workers
> 0 #7265 - π Fix broadcasting error in LogNormal, TransformedDistribution #7269
- π Fix
torch.max
andtorch.min
on CUDA in presence ofNaN
#7052 - π Fix
torch.tensor
device-type calculation when used with CUDA #6995 - π Fixed a missing
'='
innn.LPPoolNd
repr function #9629
π Documentation
- π¦ Expose and document
torch.autograd.gradcheck
andtorch.autograd.gradgradcheck
#8166 - β Document
tensor.scatter_add_
#9630 - π Document variants of
torch.add
andtensor.add_
, e.g.tensor.add(value=1, other)
-> Tensor #9027 - π Document
torch.logsumexp
#8428 - Document
torch.sparse_coo_tensor
#8152 - π Document
torch.utils.data.dataset.random_split
#7676 - π Document
torch.nn.GroupNorm
#7086 - π A lot of other various documentation improvements including RNNs,
ConvTransposeNd
,Fold
/Unfold
,Embedding
/EmbeddingBag
, Loss functions, etc.