stanfordnlp latest version

« Changelog History

stanfordnlp v0.2.0 Release Notes

Release Date: 2019-05-16 // about 5 years ago

🛠 This release features major improvements on memory efficiency and speed of the neural network pipeline in stanfordnlp and various bugfixes. These features include:
🐎 The downloadable pretrained neural network models are now substantially smaller in size (due to the use of smaller pretrained vocabularies) with comparable performance. Notably, the default English model is now ~9x smaller in size, German ~11x, French ~6x and Chinese ~4x. As a result, memory efficiency of the neural pipelines for most languages are substantially improved.
Substantial speedup of the neural lemmatizer via reduced neural sequence-to-sequence operations.
The neural network pipeline can now take in a Python list of strings representing pre-tokenized text. (#58)
🔧 A requirements checking framework is now added in the neural pipeline, ensuring the proper processors are specified for a given pipeline configuration. The pipeline will now raise an exception when a requirement is not satisfied. (#42)
🛠 Bugfix related to alignment between tokens and words post the multi-word expansion processor. (#71)
📚 More options are added for customizing the Stanford CoreNLP server at start time, including specifying properties for the default pipeline, and setting all server options such as username/password. For more details on different options, please checkout the client documentation page.
0️⃣ CoreNLPClient instance can now be created with CoreNLP default language properties as:
```
client = CoreNLPClient(properties='chinese')
```
- Alternatively, a properties file can now be used during the creation of a CoreNLPClient:
  
  client = CoreNLPClient(properties='/path/to/corenlp.props')
- 0️⃣ All specified CoreNLP annotators are now preloaded by default when a CoreNLPClient instance is created. (#56)

Previous changes from v0.1.2

🚀 This is a maintenance release of stanfordnlp. This release features:
- 👍 Allowing the tokenizer to treat the incoming document as pretokenized with space separated words in newline separated sentences. Set tokenize_pretokenized to True when building the pipeline to skip the neural tokenizer, and run all downstream components with your own tokenized text. (#24, #34)
- Speedup in the POS/Feats tagger in evaluation (up to 2 orders of magnitude). (#18)
- 📚 Various minor fixes and documentation improvements
We would also like to thank the following community members for their contribution:
Code improvements: @lwolfsonkin
📚 Documentation improvements: @0xflotus
And thanks to everyone that raised issues and helped improve stanfordnlp!