spaCy v2.1.8 Release Notes

Release Date: 2019-08-08 // 14 days ago
  • 🍱 ✨ New features and improvements

    • 🆕 NEW: Alpha tokenization support for Serbian
    • 👌 Improve language data for Urdu.
    • 👌 Support installing and loading model packages in the same session.

    🍱 🔴 Bug fixes

    • 🛠 Fix issue #4002: Make PhraseMatcher work as expected for NORM attribute.
    • 🛠 Fix issue #4063: Improve docs on Matcher attributes.
    • 🛠 Fix issue #4068: Make Korean work as expected on Python 2.7.
    • 🛠 Fix issue #4069: Add validate option to EntityRuler.
    • 🛠 Fix issue #4074: Raise error if annotation dict in simple training style has unexpected keys.
    • 🛠 Fix issue #4081: Fix typo in pyproject.toml.
    • 🛠 Fix handling of keyword arguments in Language.evaluate.

    📚 📖 Documentation and examples

    👥 Contributors

    Thanks to @akornilo, @mirfan899, @veer-bains, @seppeljordan, @Pavle992, @svlandeg, @jenojp and @adrianeboyd for the pull requests and contributions.


Previous changes from v2.1.7

  • 🍱 ✨ New features and improvements

    • ➕ Add Token.tensor and Span.tensor attributes.
    • 👌 Support simple training format of (text, annotations) instead of only (doc, gold) for nlp.evaluate.
    • ➕ Add support for "lang_factory" setting in model meta.json (see #4031).
    • 📦 Also support "requirements" in meta.json to define packages for setup's install_requires.
    • 👌 Improve Pipe base class methods and make them less presumptuous.
    • 👌 Improve Danish and Korean tokenization.
    • 👌 Improve error messages when deserializing model fails.

    🍱 🔴 Bug fixes

    • 🛠 Fix issue #3669, #3962: Fix dependency copy in Span.as_doc that could cause segfault.
    • 🛠 Fix issue #3968: Fix bug in per-entity scores.
    • 🛠 Fix issue #4000: Improve entity linking API.
    • 🛠 Fix issue #4022: Fix error when Korean text contains special characters.
    • 🛠 Fix issue #4030: Handle edge case when calling TextCategorizer.predict with empty Doc.
    • 🛠 Fix issue #4045: Correct Span.sent docs.
    • 🛠 Fix issue #4048: Fix init-model command if there's no vocab.
    • 🛠 Fix issue #4052: Improve per-type scoring of NER.
    • 🛠 Fix issue #4054: Ensure the lang of nlp and nlp.vocab stay consistent.
    • 🛠 Fix bugs in Token.similarity and Span.similarity when called via hook.

    📚 📖 Documentation and examples

    👥 Contributors

    Thanks to @sorenlind, @pmbaumgartner, @svlandeg, @FallakAsad, @BreakBB, @adrianeboyd, @polm, @b1uec0in, @mdaudali and @ejarkm for the pull requests and contributions.