stanfordnlp v0.2.0 Release Notes
Release Date: 2019-05-16 // about 5 years ago-
๐ This release features major improvements on memory efficiency and speed of the neural network pipeline in stanfordnlp and various bugfixes. These features include:
๐ The downloadable pretrained neural network models are now substantially smaller in size (due to the use of smaller pretrained vocabularies) with comparable performance. Notably, the default English model is now ~9x smaller in size, German ~11x, French ~6x and Chinese ~4x. As a result, memory efficiency of the neural pipelines for most languages are substantially improved.
Substantial speedup of the neural lemmatizer via reduced neural sequence-to-sequence operations.
The neural network pipeline can now take in a Python list of strings representing pre-tokenized text. (#58)
๐ง A requirements checking framework is now added in the neural pipeline, ensuring the proper processors are specified for a given pipeline configuration. The pipeline will now raise an exception when a requirement is not satisfied. (#42)
๐ Bugfix related to alignment between tokens and words post the multi-word expansion processor. (#71)
๐ More options are added for customizing the Stanford CoreNLP server at start time, including specifying properties for the default pipeline, and setting all server options such as username/password. For more details on different options, please checkout the client documentation page.
0๏ธโฃ
CoreNLPClient
instance can now be created with CoreNLP default language properties as:client = CoreNLPClient(properties='chinese')
Alternatively, a properties file can now be used during the creation of a
CoreNLPClient
:client = CoreNLPClient(properties='/path/to/corenlp.props')
0๏ธโฃ All specified CoreNLP annotators are now preloaded by default when a
CoreNLPClient
instance is created. (#56)
Previous changes from v0.1.2
-
๐ This is a maintenance release of stanfordnlp. This release features:
- ๐ Allowing the tokenizer to treat the incoming document as pretokenized with space separated words in newline separated sentences. Set
tokenize_pretokenized
toTrue
when building the pipeline to skip the neural tokenizer, and run all downstream components with your own tokenized text. (#24, #34) - Speedup in the POS/Feats tagger in evaluation (up to 2 orders of magnitude). (#18)
- ๐ Various minor fixes and documentation improvements
We would also like to thank the following community members for their contribution:
Code improvements: @lwolfsonkin
๐ Documentation improvements: @0xflotus
And thanks to everyone that raised issues and helped improve stanfordnlp! - ๐ Allowing the tokenizer to treat the incoming document as pretokenized with space separated words in newline separated sentences. Set