stanfordnlp v0.2.0 Release Notes

Release Date: 2019-05-16 // about 5 years ago
  • ๐Ÿ›  This release features major improvements on memory efficiency and speed of the neural network pipeline in stanfordnlp and various bugfixes. These features include:

    ๐ŸŽ The downloadable pretrained neural network models are now substantially smaller in size (due to the use of smaller pretrained vocabularies) with comparable performance. Notably, the default English model is now ~9x smaller in size, German ~11x, French ~6x and Chinese ~4x. As a result, memory efficiency of the neural pipelines for most languages are substantially improved.

    Substantial speedup of the neural lemmatizer via reduced neural sequence-to-sequence operations.

    The neural network pipeline can now take in a Python list of strings representing pre-tokenized text. (#58)

    ๐Ÿ”ง A requirements checking framework is now added in the neural pipeline, ensuring the proper processors are specified for a given pipeline configuration. The pipeline will now raise an exception when a requirement is not satisfied. (#42)

    ๐Ÿ›  Bugfix related to alignment between tokens and words post the multi-word expansion processor. (#71)

    ๐Ÿ“š More options are added for customizing the Stanford CoreNLP server at start time, including specifying properties for the default pipeline, and setting all server options such as username/password. For more details on different options, please checkout the client documentation page.

    0๏ธโƒฃ CoreNLPClient instance can now be created with CoreNLP default language properties as:

    client = CoreNLPClient(properties='chinese')
    • Alternatively, a properties file can now be used during the creation of a CoreNLPClient:

      client = CoreNLPClient(properties='/path/to/corenlp.props')

    • 0๏ธโƒฃ All specified CoreNLP annotators are now preloaded by default when a CoreNLPClient instance is created. (#56)

Previous changes from v0.1.2

  • ๐Ÿš€ This is a maintenance release of stanfordnlp. This release features:

    • ๐Ÿ‘ Allowing the tokenizer to treat the incoming document as pretokenized with space separated words in newline separated sentences. Set tokenize_pretokenized to True when building the pipeline to skip the neural tokenizer, and run all downstream components with your own tokenized text. (#24, #34)
    • Speedup in the POS/Feats tagger in evaluation (up to 2 orders of magnitude). (#18)
    • ๐Ÿ“š Various minor fixes and documentation improvements

    We would also like to thank the following community members for their contribution:
    Code improvements: @lwolfsonkin
    ๐Ÿ“š Documentation improvements: @0xflotus
    And thanks to everyone that raised issues and helped improve stanfordnlp!