Changelog History
-
v0.5.0 Changes
November 04, 2019โก๏ธ Major Updates
- โก๏ธ Updated my README emoji game to be more ambiguous while maintaining fun and heartwarming vibe. ๐
- ๐ Support for Python 3.5
- ๐ Extensive rewrite of README to focus on new users and building an NLP pipeline.
- ๐ Support for Pytorch 1.2
โ Added
torchnlp.random
for finer grain control of random state building on PyTorch'sfork_rng
. This module controls the random state oftorch
,numpy
andrandom
.import randomimport numpyimport torchfrom torchnlp.random import fork_rngwith fork_rng(seed=123): # Ensure determinismprint('Random:', random.randint(1, 2**31)) print('Numpy:', numpy.random.randint(1, 2**31)) print('Torch:', int(torch.randint(1, 2**31, (1,))))
๐จ Refactored
torchnlp.samplers
enabling pipelining. For example:from torchnlp.samplers import DeterministicSamplerfrom torchnlp.samplers import BalancedSampler data = ['a', 'b', 'c'] + ['c'] * 100sampler = BalancedSampler(data, num_samples=3) sampler = DeterministicSampler(sampler, random_seed=12)print([data[i] for i in sampler]) # ['c', 'b', 'a']
โ Added
torchnlp.samplers.balanced_sampler
for balanced sampling extending Pytorch'sWeightedRandomSampler
.โ Added
torchnlp.samplers.deterministic_sampler
for deterministic sampling based ontorchnlp.random
.Added
torchnlp.samplers.distributed_batch_sampler
for distributed batch sampling.Added
torchnlp.samplers.oom_batch_sampler
to sample large batches first in order to force an out-of-memory error.Added
torchnlp.utils.lengths_to_mask
to help create masks from a batch of sequences.Added
torchnlp.utils.get_total_parameters
to measure the number of parameters in a model.Added
torchnlp.utils.get_tensors
to measure the size of an object in number of tensor elements. This is useful for dynamic batch sizing and fortorchnlp.samplers.oom_batch_sampler
.from torchnlp.utils import get_tensors random_object_ = tuple([{'t': torch.tensor([1, 2])}, torch.tensor([2, 3])]) tensors = get_tensors(random_object_)assert len(tensors) == 2
โ Added a corporate sponsor to the library: https://wellsaidlabs.com/
โก๏ธ Minor Updates
- ๐ Fixed
snli
example (#84) - โก๏ธ Updated
.gitignore
to support Python's virtual environments (#84) - โ Removed
requests
andpandas
dependency. There are only two dependencies remaining. This is useful for production environments. (#84) - โ Added
LazyLoader
to reduce dependency requirements. (4e84780) - โ Removed unused
torchnlp.datasets.Dataset
class in favor of basic Python dictionary lists andpandas
. (#84) - ๐ Support for downloading
tar.gz
files and unpacking them faster. (eb61fee) - Rename
itos
andstoi
toindex_to_token
andtoken_to_index
respectively. (#84) - Fixed
batch_encode
,batch_decode
, andenforce_reversible
fortorchnlp.encoders.text
(#69) - ๐ Fix
FastText
vector downloads (#72) - ๐ Fixed documentation for
LockedDropout
(#73) - ๐ Fixed bug in
weight_drop
(#76) stack_and_pad_tensors
now returns a named tuple for readability (#84)- Added
torchnlp.utils.split_list
in favor oftorchnlp.utils.resplit_datasets
. This is enabled by the modularity oftorchnlp.random
. (#84) - ๐ Deprecated
torchnlp.utils.datasets_iterator
in favor of Pythonsitertools.chain
. (#84) - ๐ Deprecated
torchnlp.utils.shuffle
in favor oftorchnlp.random
. (#84) - ๐ Support encoding larger datasets following fixing this issue (#85).
- โ Added
torchnlp.samplers.repeat_sampler
following up on this issue: pytorch/pytorch#15849
-
v0.4.0 Changes
April 03, 2019โก๏ธ Major updates
- Rewrote encoders to better support more generic encoders like a
LabelEncoder
. Furthermore, added broad support forbatch_encode
,batch_decode
andenforce_reversible
. - ๐ง Rearchitected default reserved tokens to ensure configurability while still providing the convenience of good defaults.
โ Added support to collate sequences with
torch.utils.data.dataloader.DataLoader
. For example:from functools import partialfrom torchnlp.utils import collate_tensorsfrom torchnlp.encoders.text import stack_and_pad_tensors collate_fn = partial(collate_tensors, stack_tensors=stack_and_pad_tensors) torch.utils.data.dataloader.DataLoader(*args, collate_fn=collate_fn, **kwargs)
โ Added doctest support ensuring the documented examples are tested.
โ Removed SRU support, it's too heavy of a module to support. Please use https://github.com/taolei87/sru instead. Happy to accept a PR with a better tested and documented SRU module!
โก๏ธ Update version requirements to support Python 3.6 and 3.7, dropping support for Python 3.5.
โก๏ธ Updated version requirements to support PyTorch 1.0+.
๐ Merged #66 reducing the memory requirements for pre-trained word vectors by 2x.
โก๏ธ Minor Updates
- Formatted the code base with YAPF.
- ๐ Fixed
pandas
andcollections
warnings. โ Added invariant assertion to
Encoder
viaenforce_reversible
. For example:encoder = Encoder().enforce_reversible()
Ensuring
Encoder.decode(Encoder.encode(object)) == object
- ๐ Fixed the accuracy metric for PyTorch 1.0.
- Rewrote encoders to better support more generic encoders like a
-
v0.3.7.post1 Changes
December 09, 2018๐ Minor release fixing some issues and bugs.
-
v0.3.0 Changes
May 06, 2018๐ Release 0.3.0
Major Features And Improvements
- โฌ๏ธ Upgraded to PyTorch 0.4.0
- โ Added Byte-Pair Encoding (BPE) pre-trained subword embeddings in 275 languages
- ๐จ Refactored download scripts to
torchnlp.downloads
- Enable Spacy encoder to run in multiple languages.
- โ Added a boolean aligned option to FastText supporting MUSE (Multilingual Unsupervised and Supervised Embeddings)
๐ Bug Fixes and Other Changes
- Create non-existent cache dirs for
torchnlp.word_to_vector
. - โ Add
set
operation totorchnlp.datasets.Dataset
with support for slices, columns and rows - Updated
biggest_batches_first
intorchnlp.samplers
to be more efficient at approximating memory then Pickle - Enabled
torch.utils.pad_tensor
andtorch.utils. pad_batch
to support N dimensional tensors - โก๏ธ Updated to sacremoses to fix NLTK moses dependancy for
torch.text_encoders
Added
__getitem()__
for_PretrainedWordVectors
. For example:from torchnlp.word_to_vector import FastText vectors = FastText() tokenized_sentence = ['this', 'is', 'a', 'sentence'] vectors[tokenized_sentence]
Added
__contains__
for_PretrainedWordVectors
. For example:from torchnlp.word_to_vector import FastText vectors = FastText()
'the' in vectors True 'theqwe' in vectors False
-
v0.2.0
April 08, 2018