All Versions
82
Latest Version
Avg Release Cycle
82 days
Latest Release
1113 days ago

Changelog History
Page 6

  • v0.9.1 Changes

    April 12, 2014
    • 🏁 MmCorpus fix for Windows
    • πŸ–¨ LdaMallet support for printing/showing topics
    • πŸ›  fix LdaMallet bug when user specified a file prefix (Victor, #184)
    • πŸ›  fix LdaMallet output when input is single vector (Suvir)
    • βž• added LdaMallet unit tests
    • πŸ›  more py3k fixes (Lars Buitinck)
    • πŸ”„ change order of LDA topic printing (Fayimora Femi-Balogun, #188)
  • v0.9.0 Changes

    March 16, 2014
    • πŸ’Ύ save/load automatically single out large arrays + allow mmap
    • πŸ‘ allow .gz/.bz2 corpus filenames => transparently (de)compressed I/O
    • CBOW model for word2vec (SΓ©bastien Jean, #176)
    • πŸ†• new API for storing corpus metadata (Joseph Chang, #169)
    • πŸ†• new LdaMallet class = train LDA using wrapped Mallet
    • πŸ†• new MalletCorpus class for corpora in Mallet format (Christopher Corley, #179)
    • πŸ‘ better Wikipedia article parsing (Joseph Chang, #170)
    • word2vec load_word2vec_format uses less memory (Yves Raimond, #164)
    • load/store vocabulary files for word2vec C format (Yves Raimond, #172)
    • HDP estimation on new documents (Elliot Kulakow, #153)
    • store labels in SvmLight corpus (Ritesh, #152)
    • πŸ›  fix word2vec binary load on Windows (Stephanus van Schalkwyk)
    • replace numpy.svd with scipy.svd for more stability (Sven DΓΆring, #159)
    • parametrize LDA constructor (Christopher Corley, #174)
    • steps toward py3k compatibility (Lars Buitinck, #154)
  • v0.8.9 Changes

    December 26, 2013
    • πŸ‘‰ use travis-ci for continuous integration
    • ⚑️ auto-optimize LDA asymmetric prior (Ben Trahan)
    • ⚑️ update for new word2vec binary format (Daren Race)
    • doc rendering fix (Dan Foreman-Mackey)
    • πŸ‘ better LDA perplexity logging
    • πŸ›  fix Pyro thread leak in distributed algos (Brian Feeny)
    • optimizations in word2vec (Bryan Rink)
    • πŸ‘ allow compressed input in LineSentence corpus (Eric Moyer)
    • ⬆️ upgrade ez_setup, doc improvements, minor fixes etc.
  • v0.8.8 Changes

    November 03, 2013
    • python3 port by Parikshit Samant: https://github.com/samantp/gensimPy3
    • massive optimizations to word2vec (cython, BLAS, multithreading): ~20x-300x speedup
    • πŸ†• new word2vec functionality (thx to Ghassen Hamrouni, PR #124)
    • πŸ†• new CSV corpus class (thx to Zygmunt ZajΔ…c)
    • corpus serialization checks to prevent overwriting (by Ian Langmore, PR #125)
    • βž• add context manager support for older Python<=2.6 for gzip and bz2
    • βž• added unittests for word2vec
  • v0.8.7 Changes

    September 18, 2013
    • πŸŽ‰ initial version of word2vec, a neural network deep learning algo
    • πŸ‘‰ make distributed gensim compatible with the new Pyro
    • πŸ‘ allow merging dictionaries (by Florent Chandelier)
    • πŸ†• new design for the gensim website!
    • speed up handling of corner cases when returning top-n most similar
    • πŸ‘‰ make Random Projections compatible with new scipy (andrewjOc360, PR #110)
    • πŸ‘ allow "light" (faster) word lemmatization (by Karsten Jeschkies)
    • πŸ’Ύ save/load directly from bzip2 files (by Luis Pedro Coelho, PR #101)
    • Blei corpus now tries harder to find its vocabulary file (by Luis Pedro Coelho, PR #100)
    • πŸ“œ sparse vector elements can now be a list (was: only a 2-tuple)
    • simple_preprocess now optionally deaccents letters (Ε™/Ε‘/ΓΊ=>r/s/u etc.)
    • πŸ‘ better serialization of numpy corpora
    • πŸ–¨ print_topics() returns the topics, in addition to printing/logging
    • πŸ›  fixes for more robust Windows multiprocessing
    • πŸ“š lots of small fixes, data checks and documentation updates
  • v0.8.6 Changes

    September 15, 2012
    • βž• added HashDictionary (by Homer Strong)
    • πŸ‘Œ support for adding target classes in SVMlight format (by Corrado Monti)
    • πŸ›  fixed problems with global lemmatizer object when running in parallel on Windows
    • parallelization of Wikipedia processing + added script version that lemmatizes the input documents
    • βž• added class method to initialize Dictionary from an existing corpus (by Marko Burjek)
  • v0.8.5 Changes

    July 22, 2012
    • πŸ‘Œ improved performance of sharding (similarity queries)
    • πŸ‘ better Wikipedia parsing (thx to Alejandro Weinstein and Lars Buitinck)
    • faster Porter stemmer (thx to Lars Buitinck)
    • πŸ›  several minor fixes (in HDP model thx to Greg Ver Steeg)
    • πŸ‘Œ improvements to documentation
  • v0.8.4 Changes

    March 09, 2012
    • πŸ‘ better support for Pandas series input (thx to JT Bates)
    • a new corpus format: UCI bag-of-words (thx to Jonathan Esterhazy)
    • a new model, non-parametric bayes: HDP (thx to Jonathan Esterhazy; based on Chong Wang's code)
    • πŸ‘Œ improved support for new scipy versions (thx to Skipper Seabold)
    • πŸ“¦ lemmatizer support for wikipedia parsing (via the pattern python package)
    • 🐎 extended the lemmatizer for multi-core processing, to improve its performance
  • v0.8.3 Changes

    December 02, 2011
    • πŸ›  fixed Similarity sharding bug (issue #65, thx to Paul Rudin)
    • πŸ‘Œ improved LDA code (clarity & memory footprint)
    • ⚑️ optimized efficiency of Similarity sharding
  • v0.8.2 Changes

    October 31, 2011
    • πŸ‘Œ improved gensim landing page
    • πŸ‘Œ improved accuracy of SVD (Latent Semantic Analysis) (thx to Mark Tygert)
    • πŸ”„ changed interpretation of LDA topics: github issue #57
    • took out similarity server code introduced in 0.8.1 (will become a separate project)
    • βœ… started using tox for testing
    • πŸ›  + several smaller fixes and optimizations