All Versions
82
Latest Version
Avg Release Cycle
82 days
Latest Release
807 days ago

Changelog History
Page 4

  • v1.0.1 Changes

    March 03, 2017
    • Rebuild cumulative table on load. Fix #1180. (@tmylk, #1181)
    • most_similar_cosmul bug fix (@dkim010, #1177)
    • ๐Ÿ›  Fix loading old word2vec models pre-1.0.0 (@jayantj, #1179)
    • Load utf-8 words in fasttext (@jayantj, #1176)
  • v1.0.0 Changes

    February 24, 2017

    ๐Ÿ†• New features:

    • โž• Add Author-topic modeling (@olavurmortensen, #893)
    • โž• Add FastText word embedding wrapper (@Jayantj, #847)
    • โž• Add WordRank word embedding wrapper (@parulsethi, #1066, #1125)
    • โž• Add VarEmbed word embedding wrapper (@anmol01gulati, #1067))
    • โž• Add sklearn wrapper for LDAModel (@AadityaJ, #932)

    ๐Ÿ—„ Deprecated features:

    • Move load_word2vec_format and save_word2vec_format out of Word2Vec class to KeyedVectors (@tmylk, #1107)
    • ๐Ÿšš Move properties syn0norm, syn0, vocab, index2word from Word2Vec class to KeyedVectors (@tmylk,#1147)
    • โœ‚ Remove support for Python 2.6, 3.3 and 3.4 (@tmylk,#1145)

    ๐Ÿ‘Œ Improvements:

    • ๐Ÿ‘ Python 3.6 support (@tmylk #1077)
    • Phrases and Phraser allow a generator corpus (ELind77 #1099)
    • Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze, #1053)
    • ๐Ÿ›  Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel, #1103
    • ๐Ÿ›  Fix broken link to paper in readme (@bhargavvader, #1101)
    • Lazy formatting in evaluate_word_pairs (@akutuzov, #1084)
    • Deacc option to keywords pre-processing (@bhargavvader, #1076)
    • Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, #1165)
    • ๐Ÿ›  Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, #1152)
    • Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv, #1154)
    • ๐Ÿ›  Fix bound computation in Author Topic models. (@olavurmortensen, #1156)
    • Write UTF-8 byte strings in tensorboard conversion (@tmylk, #1144)
    • ๐Ÿ“œ Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk, #1146)

    Tutorial and doc improvements:

    • Clarifying comment in is_corpus func in utils.py (@greninja, #1109)
    • Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda, #1120)
    • ๐Ÿ›  Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc, #1119)
    • โœ‚ Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti, #1118)
    • Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda, #1116)
    • โšก๏ธ Update Transformation and Topics link from quick start notebook (@mariana393, #1115)
    • Quick Start Text clarification and typo correction (@luizcavalcanti, #1114)
    • ๐Ÿ›  Fix typos in Author-topic tutorial (@Fil, #1102)
    • โž• Address benchmark inconsistencies in Annoy tutorial (@droudy, #1113)
    • โž• Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja, #1137)
    • ๐Ÿ›  Fix dependencies description on doc2vec-IMDB notebook (@luizcavalcanti, #1132)
    • โž• Add documentation for WikiCorpus metadata. (@kirit93, #1163)
  • v1.0.0.RC2 Changes

    February 16, 2017
    • โž• Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja, #1137)
    • โœ‚ Remove direct access to properties moved to KeyedVectors (@tmylk, #1147)
    • โœ‚ Remove support for Python 2.6, 3.3 and 3.4 (@tmylk, #1145)
    • Write UTF-8 byte strings in tensorboard conversion (@tmylk, #1144)
    • ๐Ÿ“œ Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk, #1146)
  • v1.0.0.RC1 Changes

    January 31, 2017

    ๐Ÿ†• New features:

    • โž• Add Author-topic modeling (@olavurmortensen, #893)
    • โž• Add FastText word embedding wrapper (@Jayantj, #847)
    • โž• Add WordRank word embedding wrapper (@parulsethi, #1066, #1125)
    • โž• Add sklearn wrapper for LDAModel (@AadityaJ, #932)

    ๐Ÿ‘Œ Improvements:

    • ๐Ÿ‘ Python 3.6 support (@tmylk #1077)
    • Phrases and Phraser allow a generator corpus (ELind77 #1099)
    • Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze, #1053)
    • ๐Ÿšš Move load and save word2vec_format out of word2vec class to KeyedVectors (@tmylk, #1107)
    • ๐Ÿ›  Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel, #1103
    • ๐Ÿ›  Fix broken link to paper in readme (@bhargavvader, #1101)
    • Lazy formatting in evaluate_word_pairs (@akutuzov, #1084)
    • Deacc option to keywords pre-processing (@bhargavvader, #1076)

    Tutorial and doc improvements:

    • Clarifying comment in is_corpus func in utils.py (@greninja, #1109)
    • Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda, #1120)
    • ๐Ÿ›  Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc, #1119)
    • โœ‚ Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti, #1118)
    • Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda, #1116)
    • โšก๏ธ Update Transformation and Topics link from quick start notebook (@mariana393, #1115)
    • Quick Start Text clarification and typo correction (@luizcavalcanti, #1114)
    • ๐Ÿ›  Fix typos in Author-topic tutorial (@Fil, #1102)
    • โž• Address benchmark inconsistencies in Annoy tutorial (@droudy, #1113)
  • v0.13.4 Changes

    December 22, 2016
    • โž• Added suggested lda model method and print methods to HDP class (@bhargavvader, #1055)
    • ๐Ÿ†• New class KeyedVectors to store embedding separate from training code (@anmol01gulati and @droudy, #980)
    • Evaluation of word2vec models against semantic similarity datasets like SimLex-999 (@akutuzov, #1047)
    • TensorBoard word embedding visualisation of Gensim Word2vec format (@loretoparisi, #1051)
    • ๐Ÿ‘ป Throw exception if load() is called on instance rather than the class in word2vec and doc2vec (@dust0x, #889)
    • Loading and Saving LDA Models across Python 2 and 3. Fix #853 (@anmolgulati, #913, #1093)
    • ๐Ÿ›  Fix automatic learning of eta (prior over words) in LDA (@olavurmortensen, #1024).
      • eta should have dimensionality V (size of vocab) not K (number of topics). eta with shape K x V is still allowed, as the user may want to impose specific prior information to each topic.
      • eta is no longer allowed the "asymmetric" option. Asymmetric priors over words in general are fine (learned or user defined).
      • As a result, the eta update (update_eta) was simplified some. It also no longer logs eta when updated, because it is too large for that.
      • Unit tests were updated accordingly. The unit tests expect a different shape than before; some unit tests were redundant after the change; eta='asymmetric' now should raise an error.
    • Optimise show_topics to only call get_lambda once. Fix #1006. (@bhargavvader, #1028)
    • ๐Ÿ–จ HdpModel doc improvement. Inference and print_topics (@dsquareindia, #1029)
    • 0๏ธโƒฃ Removing Doc2Vec defaults so that it won't override Word2Vec defaults. Fix #795. (@markroxor, #929)
    • โœ‚ Remove warning on gensim import "pattern not installed". Fix #1009 (@shashankg7, #1018)
    • Add delete_temporary_training_data() function to word2vec and doc2vec models. (@deepmipt-VladZhukov, #987)
    • ๐Ÿ“š Documentation improvements (@IrinaGoloshchapova, #1010, #1011)
    • LDA tutorial by Olavur, tips and tricks (@olavurmortensen, #779)
    • โž• Add double quote in commmand line to run on Windows (@akarazeev, #1005)
    • ๐Ÿ›  Fix directory names in notebooks to be OS-independent (@mamamot, #1004)
    • Respect clip_start, clip_end in most_similar. Fix #601. (@parulsethi, #994)
    • Replace Python sigmoid function with scipy in word2vec & doc2vec (@markroxor, #989)
    • WMD to return 0 instead of inf for sentences that contain a single word (@rbahumi, #986)
    • Pass all the params through the apply call in lda.get_document_topics(), test case to use the per_word_topics through the corpus in test_ldamodel (@parthoiiitm, #978)
    • ๐Ÿ‘ท Pyro annotations for lsi_worker (@markroxor, #968)
  • v0.13.4.1 Changes

    January 04, 2017
    • โš  Disable direct access warnings on save and load of Word2vec/Doc2vec (@tmylk, #1072)
    • 0๏ธโƒฃ Making Default hs error explicit (@accraze, #1054)
    • โœ‚ Removed unnecessary numpy imports (@bhargavvader, #1065)
    • Utils and Matutils changes (@bhargavvader, #1062)
    • Tests for the evaluate_word_pairs function (@akutuzov, #1061)
  • v0.13.3 Changes

    October 20, 2016
    • โž• Add vocabulary expansion feature to word2vec. (@isohyt, #900)
    • Tutorial: Reproducing Doc2vec paper result on wikipedia. (@isohyt, #654)
    • โž• Add Save/Load interface to AnnoyIndexer for index persistence (@fortiema, #845)
    • ๐Ÿ›  Fixed issue #938,Creating a unified base class for all topic models. (@markroxor, #946)
      • breaking change in HdpTopicFormatter.show_topics
    • โž• Add Phraser for Phrases optimization. ( @gojomo & @anujkhare , #837)
    • ๐Ÿ›  Fix issue #743, in word2vec's n_similarity method if at least one empty list is passed ZeroDivisionError is raised (@pranay360, #883)
    • ๐Ÿ”„ Change export_phrases in Phrases model. Fix issue #794 (@AadityaJ, #879)
      • bigram construction can now support multiple bigrams within one sentence
    • ๐Ÿ›  Fix issue #838, RuntimeWarning: overflow encountered in exp (@markroxor, #895)
    • โš  Change some log messages to warnings as suggested in issue #828. (@rhnvrm, #884)
    • Fix issue #851, In summarizer.py, RunTimeError is raised if single sentence input is provided to avoid ZeroDivionError. (@metalaman, #887)
    • ๐Ÿ›  Fix issue #791, correct logic for iterating over SimilarityABC interface. (@MridulS, #839)
    • ๐Ÿ›  Fix RP model loading for large Fortran-order arrays (@piskvorky, #605)
    • โœ‚ Remove ShardedCorpus from init because of Theano dependency (@tmylk, #919)
    • ๐Ÿ“š Documentation improvements ( @dsquareindia & @tmylk, #914, #906 )
    • โž• Add Annoy memory-mapping example (@harshul1610, #899)
    • ๐Ÿ›  Fixed issue #601, correct docID in most_similar for clip range (@parulsethi, #994)
  • v0.13.2 Changes

    August 19, 2016
    • ๐Ÿ›  wordtopics has changed to word_topics in ldamallet, and fixed issue #764. (@bhargavvader, #771)
      • assigning wordtopics value of word_topics to keep backward compatibility, for now
    • topics, topn parameters changed to num_topics and num_words in show_topics() and print_topics() (@droudy, #755)
      • In hdpmodel and dtmmodel
      • NOT BACKWARDS COMPATIBLE!
    • Added random_state parameter to LdaState initializer and check_random_state() (@droudy, #113)
    • Topic coherence update with c_uci, c_npmi measures. LdaMallet, LdaVowpalWabbit support. Add topics parameter to coherencemodel. Can now provide tokenized topics to calculate coherence value. Faster backtracking. (@dsquareindia, #750, #793)
    • โž• Added a check for empty (no words) documents before starting to run the DTM wrapper if model = "fixed" is used (DIM model) as this causes the an error when such documents are reached in training. (@eickho, #806)
    • New parameters limit, datatype for load_word2vec_format(); lockf for intersect_word2vec_format (@gojomo, #817)
    • Changed use_lowercase option in word2vec accuracy to case_insensitive to account for case variations in training vocabulary (@jayantj, #804
    • ๐Ÿ”— Link to Doc2Vec on airline tweets example in tutorials page (@544895340, #823)
    • Small error on Doc2vec notebook tutorial (@charlessutton, #816)
    • ๐Ÿ›  Bugfix: Full2sparse clipped to use abs value (@tmylk, #811)
    • ๐Ÿ“„ WMD docstring: add tutorial link and query example (@tmylk, #813)
    • โšก๏ธ Annoy integration to speed word2vec and doc2vec similarity. Tutorial update (@droudy, #799,#792 )
    • โž• Add converter of LDA model between Mallet, Vowpal Wabit and gensim (@dsquareindia, #798, #766)
    • Distributed LDA in different network segments without broadcast (@menshikh-iv, #782)
    • Update Corpora_and_Vector_Spaces.ipynb (@megansquire, #772)
    • ๐Ÿ›  DTM wrapper bug fixes caused by renaming num_words in #755 (@bhargavvader, #770)
    • โž• Add LsiModel.docs_processed attribute (@hobson, #763)
    • Dynamic Topic Modelling in Python. Google Summer of Code 2016 project. (@bhargavvader, #739, #831)
  • v0.13.1 Changes

    June 22, 2016
    • Topic coherence C_v and U_mass (@dsquareindia, #710)
  • v0.13.0 Changes

    June 21, 2016
    • โž• Added Distance Metrics to matutils.pt (@bhargavvader, #656)
    • Tutorials migrated from website to ipynb (@j9chan, #721), (@jesford, #733), (@jesford, #725), (@jesford, #716)
    • ๐Ÿ†• New doc2vec intro tutorial (@seanlaw, #730)
    • Gensim Quick Start Tutorial (@andrewjlm, #727)
    • โž• Add export_phrases(sentences) to model Phrases (hanabi1224 #588)
    • ๐Ÿ“œ SparseMatrixSimilarity returns a sparse matrix if maintain_sparsity is True (@davechallis, #590)
    • โž• added functionality for Topics of Words in document - i.e, dynamic topics. (@bhargavvader, #704)
      • also included tutorial which explains new functionalities, and document word-topic colring.
    • ๐Ÿ‘ Made normalization an explicit transformation. Added 'l1' norm support (@dsquareindia, #649)
    • โž• added term-topics API for most probable topic for word in vocab. (@bhargavvader, #706)
    • build_vocab takes progress_per parameter for smaller output (@zer0n, #624)
    • Control whether to use lowercase for computing word2vec accuracy. (@alantian, #607)
    • Easy import of GloVe vectors using Gensim (Manas Ranjan Kar, #625)
      • Allow easy port of GloVe vectors into Gensim
      • Standalone script with command line arguments, compatible with Python>=2.6
      • Usage: python -m gensim.scripts.glove2word2vec -i glove_vectors.txt -o output_word2vec_compatible.txt
    • Add similar_by_word() and similar_by_vector() to word2vec (@isohyt, #381)
    • Convenience method for similarity of two out of training sentences to doc2vec (@ellolo, #707)
    • โšก๏ธ Dynamic Topic Modelling Tutorial updated with Dynamic Influence Model (@bhargavvader, #689)
    • โž• Added function to filter 'n' most frequent words from the dictionary (@abhinavchawla, #718)
    • โš  Raise warnings if vocab is single character elements and if alpha is increased in word2vec/doc2vec (@dsquareindia, #705)
    • โœ… Tests for wikidump (@jonmcoe, #723)
    • ๐Ÿ“œ Mallet wrapper sparse format support (@RishabGoel, #664)
    • Doc2vec pre-processing script translated from bash to Python (@andrewjlm, #720)