All Versions
82
Latest Version
Avg Release Cycle
82 days
Latest Release
1113 days ago
Changelog History
Page 6
Changelog History
Page 6
-
v0.9.1 Changes
April 12, 2014- π MmCorpus fix for Windows
- π¨ LdaMallet support for printing/showing topics
- π fix LdaMallet bug when user specified a file prefix (Victor, #184)
- π fix LdaMallet output when input is single vector (Suvir)
- β added LdaMallet unit tests
- π more py3k fixes (Lars Buitinck)
- π change order of LDA topic printing (Fayimora Femi-Balogun, #188)
-
v0.9.0 Changes
March 16, 2014- πΎ save/load automatically single out large arrays + allow mmap
- π allow .gz/.bz2 corpus filenames => transparently (de)compressed I/O
- CBOW model for word2vec (SΓ©bastien Jean, #176)
- π new API for storing corpus metadata (Joseph Chang, #169)
- π new LdaMallet class = train LDA using wrapped Mallet
- π new MalletCorpus class for corpora in Mallet format (Christopher Corley, #179)
- π better Wikipedia article parsing (Joseph Chang, #170)
- word2vec load_word2vec_format uses less memory (Yves Raimond, #164)
- load/store vocabulary files for word2vec C format (Yves Raimond, #172)
- HDP estimation on new documents (Elliot Kulakow, #153)
- store labels in SvmLight corpus (Ritesh, #152)
- π fix word2vec binary load on Windows (Stephanus van Schalkwyk)
- replace numpy.svd with scipy.svd for more stability (Sven DΓΆring, #159)
- parametrize LDA constructor (Christopher Corley, #174)
- steps toward py3k compatibility (Lars Buitinck, #154)
-
v0.8.9 Changes
December 26, 2013- π use travis-ci for continuous integration
- β‘οΈ auto-optimize LDA asymmetric prior (Ben Trahan)
- β‘οΈ update for new word2vec binary format (Daren Race)
- doc rendering fix (Dan Foreman-Mackey)
- π better LDA perplexity logging
- π fix Pyro thread leak in distributed algos (Brian Feeny)
- optimizations in word2vec (Bryan Rink)
- π allow compressed input in LineSentence corpus (Eric Moyer)
- β¬οΈ upgrade ez_setup, doc improvements, minor fixes etc.
-
v0.8.8 Changes
November 03, 2013- python3 port by Parikshit Samant: https://github.com/samantp/gensimPy3
- massive optimizations to word2vec (cython, BLAS, multithreading): ~20x-300x speedup
- π new word2vec functionality (thx to Ghassen Hamrouni, PR #124)
- π new CSV corpus class (thx to Zygmunt ZajΔ c)
- corpus serialization checks to prevent overwriting (by Ian Langmore, PR #125)
- β add context manager support for older Python<=2.6 for gzip and bz2
- β added unittests for word2vec
-
v0.8.7 Changes
September 18, 2013- π initial version of word2vec, a neural network deep learning algo
- π make distributed gensim compatible with the new Pyro
- π allow merging dictionaries (by Florent Chandelier)
- π new design for the gensim website!
- speed up handling of corner cases when returning top-n most similar
- π make Random Projections compatible with new scipy (andrewjOc360, PR #110)
- π allow "light" (faster) word lemmatization (by Karsten Jeschkies)
- πΎ save/load directly from bzip2 files (by Luis Pedro Coelho, PR #101)
- Blei corpus now tries harder to find its vocabulary file (by Luis Pedro Coelho, PR #100)
- π sparse vector elements can now be a list (was: only a 2-tuple)
- simple_preprocess now optionally deaccents letters (Ε/Ε‘/ΓΊ=>r/s/u etc.)
- π better serialization of numpy corpora
- π¨ print_topics() returns the topics, in addition to printing/logging
- π fixes for more robust Windows multiprocessing
- π lots of small fixes, data checks and documentation updates
-
v0.8.6 Changes
September 15, 2012- β added HashDictionary (by Homer Strong)
- π support for adding target classes in SVMlight format (by Corrado Monti)
- π fixed problems with global lemmatizer object when running in parallel on Windows
- parallelization of Wikipedia processing + added script version that lemmatizes the input documents
- β added class method to initialize Dictionary from an existing corpus (by Marko Burjek)
-
v0.8.5 Changes
July 22, 2012- π improved performance of sharding (similarity queries)
- π better Wikipedia parsing (thx to Alejandro Weinstein and Lars Buitinck)
- faster Porter stemmer (thx to Lars Buitinck)
- π several minor fixes (in HDP model thx to Greg Ver Steeg)
- π improvements to documentation
-
v0.8.4 Changes
March 09, 2012- π better support for Pandas series input (thx to JT Bates)
- a new corpus format: UCI bag-of-words (thx to Jonathan Esterhazy)
- a new model, non-parametric bayes: HDP (thx to Jonathan Esterhazy; based on Chong Wang's code)
- π improved support for new scipy versions (thx to Skipper Seabold)
- π¦ lemmatizer support for wikipedia parsing (via the
pattern
python package) - π extended the lemmatizer for multi-core processing, to improve its performance
-
v0.8.3 Changes
December 02, 2011- π fixed Similarity sharding bug (issue #65, thx to Paul Rudin)
- π improved LDA code (clarity & memory footprint)
- β‘οΈ optimized efficiency of Similarity sharding
-
v0.8.2 Changes
October 31, 2011- π improved gensim landing page
- π improved accuracy of SVD (Latent Semantic Analysis) (thx to Mark Tygert)
- π changed interpretation of LDA topics: github issue #57
- took out similarity server code introduced in 0.8.1 (will become a separate project)
- β
started using
tox
for testing - π + several smaller fixes and optimizations