spaCy v2.2.3 Release Notes

Release Date: 2019-11-21 // over 4 years ago
  • ๐Ÿฑ โœจ New features and improvements

    • ๐Ÿ†• NEW: Tokenizer.explain method to see which rule or pattern was matched.

      tok_exp = nlp.tokenizer.explain("(don't)")assert [t[0] for t in tok_exp] == ["PREFIX", "SPECIAL-1", "SPECIAL-2", "SUFFIX"]assert [t[1] for t in tok_exp] == ["(", "do", "n't", ")"]

    • ๐Ÿ†• NEW: Official Python 3.8 wheels for spaCy and its dependencies.

    • ๐Ÿ‘ Base language support for Korean.

    • Add Scorer.las_per_type (labelled depdencency scores per label).

    • Rework Chinese language initialization and tokenization

    • ๐Ÿ‘Œ Improve language data for Luxembourgish.

    ๐Ÿฑ ๐Ÿ”ด Bug fixes

    • ๐Ÿ›  Fix issue #4573, #4645: Improve tokenizer usage docs.
    • ๐Ÿ›  Fix issue #4575: Add error in debug-data if no dev docs are available.
    • ๐Ÿ›  Fix issue #4582: Make as_tuples=True in Language.pipe work with multiprocessing.
    • ๐Ÿ›  Fix issue #4590: Correctly call on_match in DependencyMatcher.
    • ๐Ÿ›  Fix issue #4593: Build wheels for Python 3.8.
    • ๐Ÿ›  Fix issue #4604: Fix realloc in Retokenizer.split.
    • ๐Ÿ›  Fix issue #4656: Fix conllu2json converter when -n > 1.
    • ๐Ÿ›  Fix issue #4662: Fix Language.evaluate for components without .pipe method.
    • ๐Ÿ›  Fix issue #4670: Ensure EntityRuler is deserialized correctly from disk.
    • ๐Ÿ›  Fix issue #4680: Raise error if non-string labels are added to Tagger or TextCategorizer.
    • ๐Ÿ›  Fix issue #4691: Make Vectors.find return keys in correct order.

    ๐Ÿ“š ๐Ÿ“– Documentation and examples

    • ๐Ÿ›  Fix various typos and inconsistencies.

    ๐Ÿ‘ฅ Contributors

    Thanks to @yash1994, @walterhenry, @prilopes, @f11r, @questoph, @erip, @richardpaulhudson and @GuiGel for the pull requests and contributions.