textacy v0.6.2 Release Notes

Release Date: 2018-07-19 // over 5 years ago
  • ๐Ÿ”„ Changes:

    • โž• Add a spacier.util module, and add / reorganize relevant functionality
      • move (most) spacy_util functions here, and add a deprecation warning to
        the spacy_util module
      • rename normalized_str() => get_normalized_text(), for consistency and clarity
      • add a function to split long texts up into chunks but combine them into
        โ†ช a single Doc. This is a workaround for a current limitation of spaCy's
        neural models, whose RAM usage scales with the length of input text.
    • โž• Add experimental support for reading and writing spaCy docs in binary format,
      ๐Ÿ“„ where multiple docs are contained in a single file. This functionality was
      ๐Ÿ‘Œ supported by spaCy v1, but is not in spaCy v2; I've implemented a workaround
      that should work well in most situations, but YMMV.
    • ๐Ÿ“š Package documentation is now "officially" hosted on GitHub pages. The docs
      ๐Ÿš€ are automatically built on and deployed from Travis via doctr, so they
      stay up-to-date with the master branch on GitHub. Maybe someday I'll get
      ๐Ÿ— ReadTheDocs to successfully build textacy once again...
      • Minor improvements/updates to documentation

    ๐Ÿ›  Bugfixes:

    • Add missing return statement in deprecated text_stats.flesch_readability_ease()
      function (Issue #191)
    • ๐Ÿ’… Catch an empty graph error in bestcoverage-style keyterm ranking (Issue #196)
    • ๐Ÿ›  Fix mishandling when specifying a single named entity type to in/exclude in
      extract.named_entities (Issue #202)
    • ๐Ÿ‘‰ Make networkx usage in keyterms module compatible with v1.11+ (Issue #199)