gensim v4.2.0 Release Notes
Release Date: 2022-04-29 // over 2 years ago-
:+1: New features
- #3188: Add get_sentence_vector() to FastText and get_mean_vector() to KeyedVectors, by @rock420
- ๐ #3194: Added random_seed parameter to make LsiModel reproducible, by @parashardhapola
- #3247: Sparse2Corpus: update getitem to work on slices, lists and ellipsis, by @PrimozGodec
- #3264: Detect when a fasttext executable is available in PATH, by @pabs3
- #3271: Added new ValueError in place of assertion error for no model data provided in lsi model, by @mark-todd
- #3299: Enable test_word2vec_stand_alone_script by using sys.executable for python, by @pabs3
- #3317: Added
encoding
parameter to TextDirectoryCorpus, by @Sandman-Ren - #2656: Streamlining most_similar_cosmul and evaluate_word_analogies, by @n3hrox
๐ :books: Tutorials and docs
- ๐ #3227: Fix FastText doc-comment example for
build_vocab
andtrain
to use correct argument names, by @HLasse - ๐ #3235: Fix TFIDF docs, by @piskvorky
- ๐ #3257: Dictionary doc: ref FAQ entry about filter_extremes corpus migration, by @zacchiro
- ๐ #3279: Add the FastSS and Levenshtein modules to docs, by @piskvorky
- ๐ #3284: Documentation fixes + added CITATION.cff, by @piskvorky
- โ๏ธ #3289: Typos, text and code fix in LDA tutorial, by @davebulaval
- ๐ #3301: Remove unused Jupyter screenshots, by @pabs3
- ๐ #3307: Documentation fixes, by @piskvorky
- ๐ #3339: Fix parsing error in FastText docs, by @MattYoon
- #3251: Apply new convention of delimiting instance params in str function, by @menshikh-iv
๐ :red_circle: Bug fixes
- #3117: Ensure next_index available when loading old stored KeyedVectors models, by @gojomo
- #3182: Fix error message when Doc2Vec does not receive corpus_file or corpus iterable, by @blainedietrich
- #3190: Fix broken external link for LDA implementation, by @ahaya3776
- #3197: Fix computation of topic coherence, by @silviatti
- #3250: Make negative ns_exponent work correctly, by @menshikh-iv
- #3282: Fix
str()
method in WmdSimilarity, by @DingQK - ๐ #3286: Fixes 'not enough arguments for format string' error, by @gilbertfrancois
- #3309: Respect encoding when reading binary keyed vectors, by @alhoo
- #3332: Missing
f
prefix on f-strings fix, by @code-review-doctor
๐ :warning: Removed functionality & deprecations
โ ๐ฎ Testing, CI, housekeeping
- ๐ง #3230: Adding lifecycle configuration, by @mpenkov
- #3252: Add Codecov to gensim repo, by @menshikh-iv
- ๐ #3255: Move windows tests from azure to github actions, by @menshikh-iv
- ๐ #3263: Remove commented out pytest-rerunfailures test dependency, by @pabs3
- #3274: Migrate setup.py from distutils to setuptools, by @geojacobm6
- ๐ #3298: test and build wheels for Py3.{7,8,9,10}, by @mpenkov
- #3300: Fix code formatting for FT_CMD definition, by @pabs3
- #3303: add GitHub URL for PyPi, by @andriyor
- ๐ #3308: get rid of tox, build things via github actions directly, by @mpenkov
- #3318: Clean up evaluate_word_pairs code, by @piskvorky
- #3329: Check gallery up to date as part of CI, by @mpenkov
- #3254: Skip blinking test
test_translate_gc
on OSX + py3.9, by @menshikh-iv - #3258: Adding another check to _check_corpus_sanity for compressed files, adding test, by @dchaplinsky
- โ #3278: Tighten test_parallel bound, by @austereantelope
- #3280: tighten test_topic_word, by @austereantelope
- โ #3281: adjust test_parallel bound, by @austereantelope
- โ #3297: Use gensim.test.utils datapath() to construct paths to the test data, by @pabs3
Previous changes from v4.1.2
-
๐ This is a bugfix release that addresses left over compatibility issues with older versions of numpy and MacOS.