All Versions
3
Latest Version
Avg Release Cycle
53 days
Latest Release
1820 days ago

Changelog History

  • v0.2.0 Changes

    May 16, 2019

    πŸ›  This release features major improvements on memory efficiency and speed of the neural network pipeline in stanfordnlp and various bugfixes. These features include:

    🐎 The downloadable pretrained neural network models are now substantially smaller in size (due to the use of smaller pretrained vocabularies) with comparable performance. Notably, the default English model is now ~9x smaller in size, German ~11x, French ~6x and Chinese ~4x. As a result, memory efficiency of the neural pipelines for most languages are substantially improved.

    Substantial speedup of the neural lemmatizer via reduced neural sequence-to-sequence operations.

    The neural network pipeline can now take in a Python list of strings representing pre-tokenized text. (#58)

    πŸ”§ A requirements checking framework is now added in the neural pipeline, ensuring the proper processors are specified for a given pipeline configuration. The pipeline will now raise an exception when a requirement is not satisfied. (#42)

    πŸ›  Bugfix related to alignment between tokens and words post the multi-word expansion processor. (#71)

    πŸ“š More options are added for customizing the Stanford CoreNLP server at start time, including specifying properties for the default pipeline, and setting all server options such as username/password. For more details on different options, please checkout the client documentation page.

    0️⃣ CoreNLPClient instance can now be created with CoreNLP default language properties as:

    client = CoreNLPClient(properties='chinese')
    
    • Alternatively, a properties file can now be used during the creation of a CoreNLPClient:

      client = CoreNLPClient(properties='/path/to/corenlp.props')

    • 0️⃣ All specified CoreNLP annotators are now preloaded by default when a CoreNLPClient instance is created. (#56)

  • v0.1.2 Changes

    February 26, 2019

    πŸš€ This is a maintenance release of stanfordnlp. This release features:

    • πŸ‘ Allowing the tokenizer to treat the incoming document as pretokenized with space separated words in newline separated sentences. Set tokenize_pretokenized to True when building the pipeline to skip the neural tokenizer, and run all downstream components with your own tokenized text. (#24, #34)
    • Speedup in the POS/Feats tagger in evaluation (up to 2 orders of magnitude). (#18)
    • πŸ“š Various minor fixes and documentation improvements

    We would also like to thank the following community members for their contribution:
    Code improvements: @lwolfsonkin
    πŸ“š Documentation improvements: @0xflotus
    And thanks to everyone that raised issues and helped improve stanfordnlp!

  • v0.1.0 Changes

    January 30, 2019

    πŸš€ The initial release of StanfordNLP. StanfordNLP is the combination of the software package used by the Stanford team in the CoNLL 2018 Shared Task on Universal Dependency Parsing, and the group’s official Python interface to the Stanford CoreNLP software. This package is built with highly accurate neural network components that enables efficient training and evaluation with your own annotated data. The modules are built on top of PyTorch (v1.0.0).

    StanfordNLP features:

    • Native Python implementation requiring minimal efforts to set up;
    • πŸ“ˆ Full neural network pipeline for robust text analytics, including tokenization, multi-word token (MWT) expansion, lemmatization, part-of-speech (POS) and morphological features tagging and dependency parsing;
    • πŸ‘ Pretrained neural models supporting 53 (human) languages featured in 73 treebanks;
    • A stable, officially maintained Python interface to CoreNLP.