spaCy v2.1.5 Release Notes

Release Date: 2019-07-12 // almost 5 years ago
  • ๐Ÿฑ โœจ New features and improvements

    • ๐Ÿ†• NEW: Base language data for Marathi and Korean (via mecab-ko, mecab-ko-dic and natto-py).
    • ๐Ÿ‘Œ Improve language data for Lithuanian, Spanish, Kannada, French, Norwegian and Hindi.
    • โž• Add evaluation metrics per entity type.
    • โž• Add resume logic to spacy pretrain.
    • โž• Add optional id property to EntityRuler patterns.
    • ๐Ÿ‘ Better introspection and IDE automcomplete for custom extension attributes.
    • ๐Ÿ“„ Make Doc.is_sentenced always return True for single-token docs.

    ๐Ÿฑ ๐Ÿ”ด Bug fixes

    • ๐Ÿ›  Fix issue #3490: Add evaluation metrics per entity type to Scorer.
    • ๐Ÿ›  Fix issue #3526: Serialize EntityRuler settings correctly.
    • ๐Ÿ›  Fix issue #3558: Improve E024 error message for incorrect GoldParse.
    • ๐Ÿ›  Fix issue #3611: Fix bug when setting ngram parameter in text classifier.
    • ๐Ÿ›  Fix issue #3625: Improve default punctuation rules for Hindi.
    • ๐Ÿ›  Fix issue #3707: Improve introspection of custom attributes.
    • ๐Ÿ›  Fix issue #3737: Check if component is callable in Language.replace_pipe.
    • ๐Ÿ›  Fix issue #3743: Fix documentation of lex_id.
    • ๐Ÿ›  Fix issue #3749: Change vector training script to work with latest Gensim.
    • ๐Ÿ›  Fix issue #3762, #3934: Make Doc.is_sentenced default to True for single-token Docs.
    • ๐Ÿ›  Fix issue #3802: Fix typo in docs example.
    • ๐Ÿ›  Fix issue #3811: Fix type of --seed option in spacy pretrain.
    • ๐Ÿ›  Fix issue #3822: Allow passing PhraseMatcher arguments to EntityRuler.
    • ๐Ÿ›  Fix issue #3839: Ensure the Matcher returns correct match IDs when used with operators.
    • ๐Ÿ›  Fix issue #3840: Improve error messages in spacy pretrain.
    • ๐Ÿ›  Fix issue #3853: Rename vectors if multiple models are loaded to prevent clashes.
    • ๐Ÿ›  Fix issue #3859: Update pretrain to prevent unintended overwriting of weight files.
    • ๐Ÿ›  Fix issue #3862: Fix matcher callback example.
    • ๐Ÿ›  Fix issue #3868: Add "v.s." to English tokenizer exceptions.
    • ๐Ÿ›  Fix issue #3869: Make Doc.count_by work as expected.
    • ๐Ÿ›  Fix issue #3880: Fix unflatten padding in Thinc when last element is empty.
    • ๐Ÿ›  Fix issue #3882: Exclude user_data when copying doc in displaCy.
    • ๐Ÿ›  Fix issue #3892: Update Tokenizer initialization docs.
    • ๐Ÿ›  Fix issue #3912: Make text classifier raise more friendly errors.

    ๐Ÿ“š ๐Ÿ“– Documentation and examples

    ๐Ÿ‘ฅ Contributors

    Thanks to @BreakBB, @ujwal-narayan, @estr4ng7d, @maknotavailable, @ramananbalakrishnan, @nipunsadvilkar, @NirantK, @munozbravo, @intrafindBreno, @Azagh3l, @jarib, @tokestermw, @polm, @skrcode, @kabirkhan, @demongolem, @elbaulp, @clarus, @BramVanroy, @rokasramas, @askhogan, @khellan, @kognate, @cedar101 and @yash1994 for the pull requests and contributions.