All Versions
5
Latest Version
Avg Release Cycle
144 days
Latest Release
1606 days ago

Changelog History

  • v0.5.0 Changes

    November 04, 2019

    โšก๏ธ Major Updates

    • โšก๏ธ Updated my README emoji game to be more ambiguous while maintaining fun and heartwarming vibe. ๐Ÿ•
    • ๐Ÿ‘Œ Support for Python 3.5
    • ๐Ÿ— Extensive rewrite of README to focus on new users and building an NLP pipeline.
    • ๐Ÿ‘Œ Support for Pytorch 1.2
    • โž• Added torchnlp.random for finer grain control of random state building on PyTorch's fork_rng. This module controls the random state of torch, numpy and random.

      import randomimport numpyimport torchfrom torchnlp.random import fork_rngwith fork_rng(seed=123): # Ensure determinismprint('Random:', random.randint(1, 2**31)) print('Numpy:', numpy.random.randint(1, 2**31)) print('Torch:', int(torch.randint(1, 2**31, (1,))))

    • ๐Ÿ”จ Refactored torchnlp.samplers enabling pipelining. For example:

      from torchnlp.samplers import DeterministicSamplerfrom torchnlp.samplers import BalancedSampler data = ['a', 'b', 'c'] + ['c'] * 100sampler = BalancedSampler(data, num_samples=3) sampler = DeterministicSampler(sampler, random_seed=12)print([data[i] for i in sampler]) # ['c', 'b', 'a']

    • โž• Added torchnlp.samplers.balanced_sampler for balanced sampling extending Pytorch's WeightedRandomSampler.

    • โž• Added torchnlp.samplers.deterministic_sampler for deterministic sampling based on torchnlp.random.

    • Added torchnlp.samplers.distributed_batch_sampler for distributed batch sampling.

    • Added torchnlp.samplers.oom_batch_sampler to sample large batches first in order to force an out-of-memory error.

    • Added torchnlp.utils.lengths_to_mask to help create masks from a batch of sequences.

    • Added torchnlp.utils.get_total_parameters to measure the number of parameters in a model.

    • Added torchnlp.utils.get_tensors to measure the size of an object in number of tensor elements. This is useful for dynamic batch sizing and for torchnlp.samplers.oom_batch_sampler.

      from torchnlp.utils import get_tensors random_object_ = tuple([{'t': torch.tensor([1, 2])}, torch.tensor([2, 3])]) tensors = get_tensors(random_object_)assert len(tensors) == 2

    • โž• Added a corporate sponsor to the library: https://wellsaidlabs.com/

    โšก๏ธ Minor Updates

    • ๐Ÿ›  Fixed snli example (#84)
    • โšก๏ธ Updated .gitignore to support Python's virtual environments (#84)
    • โœ‚ Removed requests and pandas dependency. There are only two dependencies remaining. This is useful for production environments. (#84)
    • โž• Added LazyLoader to reduce dependency requirements. (4e84780)
    • โœ‚ Removed unused torchnlp.datasets.Dataset class in favor of basic Python dictionary lists and pandas. (#84)
    • ๐Ÿ‘Œ Support for downloading tar.gz files and unpacking them faster. (eb61fee)
    • Rename itos and stoi to index_to_token and token_to_index respectively. (#84)
    • Fixed batch_encode, batch_decode, and enforce_reversible for torchnlp.encoders.text (#69)
    • ๐Ÿ›  Fix FastText vector downloads (#72)
    • ๐Ÿ›  Fixed documentation for LockedDropout (#73)
    • ๐Ÿ›  Fixed bug in weight_drop (#76)
    • stack_and_pad_tensors now returns a named tuple for readability (#84)
    • Added torchnlp.utils.split_list in favor of torchnlp.utils.resplit_datasets. This is enabled by the modularity of torchnlp.random. (#84)
    • ๐Ÿ—„ Deprecated torchnlp.utils.datasets_iterator in favor of Pythons itertools.chain. (#84)
    • ๐Ÿ—„ Deprecated torchnlp.utils.shuffle in favor of torchnlp.random. (#84)
    • ๐Ÿ‘Œ Support encoding larger datasets following fixing this issue (#85).
    • โž• Added torchnlp.samplers.repeat_sampler following up on this issue: pytorch/pytorch#15849
  • v0.4.0 Changes

    April 03, 2019

    โšก๏ธ Major updates

    • Rewrote encoders to better support more generic encoders like a LabelEncoder. Furthermore, added broad support for batch_encode, batch_decode and enforce_reversible.
    • ๐Ÿ”ง Rearchitected default reserved tokens to ensure configurability while still providing the convenience of good defaults.
    • โž• Added support to collate sequences with torch.utils.data.dataloader.DataLoader. For example:

      from functools import partialfrom torchnlp.utils import collate_tensorsfrom torchnlp.encoders.text import stack_and_pad_tensors collate_fn = partial(collate_tensors, stack_tensors=stack_and_pad_tensors) torch.utils.data.dataloader.DataLoader(*args, collate_fn=collate_fn, **kwargs)

    • โž• Added doctest support ensuring the documented examples are tested.

    • โœ‚ Removed SRU support, it's too heavy of a module to support. Please use https://github.com/taolei87/sru instead. Happy to accept a PR with a better tested and documented SRU module!

    • โšก๏ธ Update version requirements to support Python 3.6 and 3.7, dropping support for Python 3.5.

    • โšก๏ธ Updated version requirements to support PyTorch 1.0+.

    • ๐Ÿ”€ Merged #66 reducing the memory requirements for pre-trained word vectors by 2x.

    โšก๏ธ Minor Updates

    • Formatted the code base with YAPF.
    • ๐Ÿ›  Fixed pandas and collections warnings.
    • โž• Added invariant assertion to Encoder via enforce_reversible. For example:

      encoder = Encoder().enforce_reversible()

    Ensuring Encoder.decode(Encoder.encode(object)) == object

    • ๐Ÿ›  Fixed the accuracy metric for PyTorch 1.0.
  • v0.3.7.post1 Changes

    December 09, 2018

    ๐Ÿš€ Minor release fixing some issues and bugs.

  • v0.3.0 Changes

    May 06, 2018

    ๐Ÿš€ Release 0.3.0

    Major Features And Improvements

    • โฌ†๏ธ Upgraded to PyTorch 0.4.0
    • โž• Added Byte-Pair Encoding (BPE) pre-trained subword embeddings in 275 languages
    • ๐Ÿ”จ Refactored download scripts to torchnlp.downloads
    • Enable Spacy encoder to run in multiple languages.
    • โž• Added a boolean aligned option to FastText supporting MUSE (Multilingual Unsupervised and Supervised Embeddings)

    ๐Ÿ› Bug Fixes and Other Changes

    • Create non-existent cache dirs for torchnlp.word_to_vector.
    • โž• Add set operation to torchnlp.datasets.Dataset with support for slices, columns and rows
    • Updated biggest_batches_first in torchnlp.samplers to be more efficient at approximating memory then Pickle
    • Enabled torch.utils.pad_tensor and torch.utils. pad_batch to support N dimensional tensors
    • โšก๏ธ Updated to sacremoses to fix NLTK moses dependancy for torch.text_encoders
    • Added __getitem()__ for _PretrainedWordVectors. For example:

      from torchnlp.word_to_vector import FastText vectors = FastText() tokenized_sentence = ['this', 'is', 'a', 'sentence'] vectors[tokenized_sentence]

    • Added __contains__ for _PretrainedWordVectors. For example:

      from torchnlp.word_to_vector import FastText vectors = FastText()

      'the' in vectors True 'theqwe' in vectors False

  • v0.2.0

    April 08, 2018