Changelog History
-
v0.2.0 Changes
May 16, 2019π This release features major improvements on memory efficiency and speed of the neural network pipeline in stanfordnlp and various bugfixes. These features include:
π The downloadable pretrained neural network models are now substantially smaller in size (due to the use of smaller pretrained vocabularies) with comparable performance. Notably, the default English model is now ~9x smaller in size, German ~11x, French ~6x and Chinese ~4x. As a result, memory efficiency of the neural pipelines for most languages are substantially improved.
Substantial speedup of the neural lemmatizer via reduced neural sequence-to-sequence operations.
The neural network pipeline can now take in a Python list of strings representing pre-tokenized text. (#58)
π§ A requirements checking framework is now added in the neural pipeline, ensuring the proper processors are specified for a given pipeline configuration. The pipeline will now raise an exception when a requirement is not satisfied. (#42)
π Bugfix related to alignment between tokens and words post the multi-word expansion processor. (#71)
π More options are added for customizing the Stanford CoreNLP server at start time, including specifying properties for the default pipeline, and setting all server options such as username/password. For more details on different options, please checkout the client documentation page.
0οΈβ£
CoreNLPClient
instance can now be created with CoreNLP default language properties as:client = CoreNLPClient(properties='chinese')
Alternatively, a properties file can now be used during the creation of a
CoreNLPClient
:client = CoreNLPClient(properties='/path/to/corenlp.props')
0οΈβ£ All specified CoreNLP annotators are now preloaded by default when a
CoreNLPClient
instance is created. (#56)
-
v0.1.2 Changes
February 26, 2019π This is a maintenance release of stanfordnlp. This release features:
- π Allowing the tokenizer to treat the incoming document as pretokenized with space separated words in newline separated sentences. Set
tokenize_pretokenized
toTrue
when building the pipeline to skip the neural tokenizer, and run all downstream components with your own tokenized text. (#24, #34) - Speedup in the POS/Feats tagger in evaluation (up to 2 orders of magnitude). (#18)
- π Various minor fixes and documentation improvements
We would also like to thank the following community members for their contribution:
Code improvements: @lwolfsonkin
π Documentation improvements: @0xflotus
And thanks to everyone that raised issues and helped improve stanfordnlp! - π Allowing the tokenizer to treat the incoming document as pretokenized with space separated words in newline separated sentences. Set
-
v0.1.0 Changes
January 30, 2019π The initial release of StanfordNLP. StanfordNLP is the combination of the software package used by the Stanford team in the CoNLL 2018 Shared Task on Universal Dependency Parsing, and the groupβs official Python interface to the Stanford CoreNLP software. This package is built with highly accurate neural network components that enables efficient training and evaluation with your own annotated data. The modules are built on top of PyTorch (v1.0.0).
StanfordNLP features:
- Native Python implementation requiring minimal efforts to set up;
- π Full neural network pipeline for robust text analytics, including tokenization, multi-word token (MWT) expansion, lemmatization, part-of-speech (POS) and morphological features tagging and dependency parsing;
- π Pretrained neural models supporting 53 (human) languages featured in 73 treebanks;
- A stable, officially maintained Python interface to CoreNLP.