textacy v0.3.3 Release Notes
Release Date: 2017-02-10 // about 7 years ago-
๐ New and Changed:
- โ Added a consistent
normalize
param to functions and methods that require token/span text normalization. Typically, it takes one of the following values: 'lemma' to lemmatize tokens, 'lower' to lowercase tokens, False-y to not normalize tokens, or a function that converts a spacy token or span into a string, in whatever way the user prefers (e.g.spacy_utils.normalized_str()
).- Functions modified to use this param:
Doc.to_bag_of_terms()
,Doc.to_bag_of_words()
,Doc.to_terms_list()
,Doc.to_semantic_network()
,Corpus.word_freqs()
,Corpus.word_doc_freqs()
,keyterms.sgrank()
,keyterms.textrank()
,keyterms.singlerank()
,keyterms.key_terms_from_semantic_network()
,network.terms_to_semantic_network()
,network.sents_to_semantic_network()
- Functions modified to use this param:
- ๐ Tweaked
keyterms.sgrank()
for higher quality results and improved internal performance. - When getting both n-grams and named entities with
Doc.to_terms_list()
, filtering out numeric spans for only one is automatically extended to the other. This prevents unexpected behavior, such as passingfilter_nums=True
but getting numeric named entities back in the terms list.
๐ Fixed:
keyterms.sgrank()
no longer crashes if a term is missing fromidfs
mapping. (@jeremybmerrill, issue #53)- Proper nouns are no longer excluded from consideration as keyterms in
keyterms.sgrank()
andkeyterms.textrank()
. (@jeremybmerrill, issue #53) - Empty strings are now excluded from consideration as keyterms โ a bug inherited from spaCy. (@mlehl88, issue #58)
- โ Added a consistent