All Versions
82
Latest Version
Avg Release Cycle
82 days
Latest Release
960 days ago
Changelog History
Page 4
Changelog History
Page 4
-
v1.0.1 Changes
March 03, 2017 -
v1.0.0 Changes
February 24, 2017๐ New features:
- โ Add Author-topic modeling (@olavurmortensen, #893)
- โ Add FastText word embedding wrapper (@Jayantj, #847)
- โ Add WordRank word embedding wrapper (@parulsethi, #1066, #1125)
- โ Add VarEmbed word embedding wrapper (@anmol01gulati, #1067))
- โ Add sklearn wrapper for LDAModel (@AadityaJ, #932)
๐ Deprecated features:
- Move
load_word2vec_format
andsave_word2vec_format
out of Word2Vec class to KeyedVectors (@tmylk, #1107) - ๐ Move properties
syn0norm
,syn0
,vocab
,index2word
from Word2Vec class to KeyedVectors (@tmylk,#1147) - โ Remove support for Python 2.6, 3.3 and 3.4 (@tmylk,#1145)
๐ Improvements:
- ๐ Python 3.6 support (@tmylk #1077)
- Phrases and Phraser allow a generator corpus (ELind77 #1099)
- Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze, #1053)
- ๐ Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel, #1103
- ๐ Fix broken link to paper in readme (@bhargavvader, #1101)
- Lazy formatting in evaluate_word_pairs (@akutuzov, #1084)
- Deacc option to keywords pre-processing (@bhargavvader, #1076)
- Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, #1165)
- ๐ Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, #1152)
- Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv, #1154)
- ๐ Fix bound computation in Author Topic models. (@olavurmortensen, #1156)
- Write UTF-8 byte strings in tensorboard conversion (@tmylk, #1144)
- ๐ Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk, #1146)
Tutorial and doc improvements:
- Clarifying comment in is_corpus func in utils.py (@greninja, #1109)
- Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda, #1120)
- ๐ Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc, #1119)
- โ Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti, #1118)
- Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda, #1116)
- โก๏ธ Update Transformation and Topics link from quick start notebook (@mariana393, #1115)
- Quick Start Text clarification and typo correction (@luizcavalcanti, #1114)
- ๐ Fix typos in Author-topic tutorial (@Fil, #1102)
- โ Address benchmark inconsistencies in Annoy tutorial (@droudy, #1113)
- โ Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja, #1137)
- ๐ Fix dependencies description on doc2vec-IMDB notebook (@luizcavalcanti, #1132)
- โ Add documentation for WikiCorpus metadata. (@kirit93, #1163)
-
v1.0.0.RC2 Changes
February 16, 2017- โ Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja, #1137)
- โ Remove direct access to properties moved to KeyedVectors (@tmylk, #1147)
- โ Remove support for Python 2.6, 3.3 and 3.4 (@tmylk, #1145)
- Write UTF-8 byte strings in tensorboard conversion (@tmylk, #1144)
- ๐ Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk, #1146)
-
v1.0.0.RC1 Changes
January 31, 2017๐ New features:
- โ Add Author-topic modeling (@olavurmortensen, #893)
- โ Add FastText word embedding wrapper (@Jayantj, #847)
- โ Add WordRank word embedding wrapper (@parulsethi, #1066, #1125)
- โ Add sklearn wrapper for LDAModel (@AadityaJ, #932)
๐ Improvements:
- ๐ Python 3.6 support (@tmylk #1077)
- Phrases and Phraser allow a generator corpus (ELind77 #1099)
- Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze, #1053)
- ๐ Move load and save word2vec_format out of word2vec class to KeyedVectors (@tmylk, #1107)
- ๐ Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel, #1103
- ๐ Fix broken link to paper in readme (@bhargavvader, #1101)
- Lazy formatting in evaluate_word_pairs (@akutuzov, #1084)
- Deacc option to keywords pre-processing (@bhargavvader, #1076)
Tutorial and doc improvements:
- Clarifying comment in is_corpus func in utils.py (@greninja, #1109)
- Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda, #1120)
- ๐ Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc, #1119)
- โ Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti, #1118)
- Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda, #1116)
- โก๏ธ Update Transformation and Topics link from quick start notebook (@mariana393, #1115)
- Quick Start Text clarification and typo correction (@luizcavalcanti, #1114)
- ๐ Fix typos in Author-topic tutorial (@Fil, #1102)
- โ Address benchmark inconsistencies in Annoy tutorial (@droudy, #1113)
-
v0.13.4 Changes
December 22, 2016- โ Added suggested lda model method and print methods to HDP class (@bhargavvader, #1055)
- ๐ New class KeyedVectors to store embedding separate from training code (@anmol01gulati and @droudy, #980)
- Evaluation of word2vec models against semantic similarity datasets like SimLex-999 (@akutuzov, #1047)
- TensorBoard word embedding visualisation of Gensim Word2vec format (@loretoparisi, #1051)
- ๐ป Throw exception if load() is called on instance rather than the class in word2vec and doc2vec (@dust0x, #889)
- Loading and Saving LDA Models across Python 2 and 3. Fix #853 (@anmolgulati, #913, #1093)
- ๐ Fix automatic learning of eta (prior over words) in LDA (@olavurmortensen, #1024).
- eta should have dimensionality V (size of vocab) not K (number of topics). eta with shape K x V is still allowed, as the user may want to impose specific prior information to each topic.
- eta is no longer allowed the "asymmetric" option. Asymmetric priors over words in general are fine (learned or user defined).
- As a result, the eta update (
update_eta
) was simplified some. It also no longer logs eta when updated, because it is too large for that. - Unit tests were updated accordingly. The unit tests expect a different shape than before; some unit tests were redundant after the change;
eta='asymmetric'
now should raise an error.
- Optimise show_topics to only call get_lambda once. Fix #1006. (@bhargavvader, #1028)
- ๐จ HdpModel doc improvement. Inference and print_topics (@dsquareindia, #1029)
- 0๏ธโฃ Removing Doc2Vec defaults so that it won't override Word2Vec defaults. Fix #795. (@markroxor, #929)
- โ Remove warning on gensim import "pattern not installed". Fix #1009 (@shashankg7, #1018)
- Add delete_temporary_training_data() function to word2vec and doc2vec models. (@deepmipt-VladZhukov, #987)
- ๐ Documentation improvements (@IrinaGoloshchapova, #1010, #1011)
- LDA tutorial by Olavur, tips and tricks (@olavurmortensen, #779)
- โ Add double quote in commmand line to run on Windows (@akarazeev, #1005)
- ๐ Fix directory names in notebooks to be OS-independent (@mamamot, #1004)
- Respect clip_start, clip_end in most_similar. Fix #601. (@parulsethi, #994)
- Replace Python sigmoid function with scipy in word2vec & doc2vec (@markroxor, #989)
- WMD to return 0 instead of inf for sentences that contain a single word (@rbahumi, #986)
- Pass all the params through the apply call in lda.get_document_topics(), test case to use the per_word_topics through the corpus in test_ldamodel (@parthoiiitm, #978)
- ๐ท Pyro annotations for lsi_worker (@markroxor, #968)
-
v0.13.4.1 Changes
January 04, 2017- โ Disable direct access warnings on save and load of Word2vec/Doc2vec (@tmylk, #1072)
- 0๏ธโฃ Making Default hs error explicit (@accraze, #1054)
- โ Removed unnecessary numpy imports (@bhargavvader, #1065)
- Utils and Matutils changes (@bhargavvader, #1062)
- Tests for the evaluate_word_pairs function (@akutuzov, #1061)
-
v0.13.3 Changes
October 20, 2016- โ Add vocabulary expansion feature to word2vec. (@isohyt, #900)
- Tutorial: Reproducing Doc2vec paper result on wikipedia. (@isohyt, #654)
- โ Add Save/Load interface to AnnoyIndexer for index persistence (@fortiema, #845)
- ๐ Fixed issue #938,Creating a unified base class for all topic models. (@markroxor, #946)
- breaking change in HdpTopicFormatter.show_topics
- โ Add Phraser for Phrases optimization. ( @gojomo & @anujkhare , #837)
- ๐ Fix issue #743, in word2vec's n_similarity method if at least one empty list is passed ZeroDivisionError is raised (@pranay360, #883)
- ๐ Change export_phrases in Phrases model. Fix issue #794 (@AadityaJ, #879)
- bigram construction can now support multiple bigrams within one sentence
- ๐ Fix issue #838, RuntimeWarning: overflow encountered in exp (@markroxor, #895)
- โ Change some log messages to warnings as suggested in issue #828. (@rhnvrm, #884)
- Fix issue #851, In summarizer.py, RunTimeError is raised if single sentence input is provided to avoid ZeroDivionError. (@metalaman, #887)
- ๐ Fix issue #791, correct logic for iterating over SimilarityABC interface. (@MridulS, #839)
- ๐ Fix RP model loading for large Fortran-order arrays (@piskvorky, #605)
- โ Remove ShardedCorpus from init because of Theano dependency (@tmylk, #919)
- ๐ Documentation improvements ( @dsquareindia & @tmylk, #914, #906 )
- โ Add Annoy memory-mapping example (@harshul1610, #899)
- ๐ Fixed issue #601, correct docID in most_similar for clip range (@parulsethi, #994)
-
v0.13.2 Changes
August 19, 2016- ๐ wordtopics has changed to word_topics in ldamallet, and fixed issue #764. (@bhargavvader, #771)
- assigning wordtopics value of word_topics to keep backward compatibility, for now
- topics, topn parameters changed to num_topics and num_words in show_topics() and print_topics() (@droudy, #755)
- In hdpmodel and dtmmodel
- NOT BACKWARDS COMPATIBLE!
- Added random_state parameter to LdaState initializer and check_random_state() (@droudy, #113)
- Topic coherence update with
c_uci
,c_npmi
measures. LdaMallet, LdaVowpalWabbit support. Addtopics
parameter to coherencemodel. Can now provide tokenized topics to calculate coherence value. Faster backtracking. (@dsquareindia, #750, #793) - โ Added a check for empty (no words) documents before starting to run the DTM wrapper if model = "fixed" is used (DIM model) as this causes the an error when such documents are reached in training. (@eickho, #806)
- New parameters
limit
,datatype
for load_word2vec_format();lockf
for intersect_word2vec_format (@gojomo, #817) - Changed
use_lowercase
option in word2vec accuracy tocase_insensitive
to account for case variations in training vocabulary (@jayantj, #804 - ๐ Link to Doc2Vec on airline tweets example in tutorials page (@544895340, #823)
- Small error on Doc2vec notebook tutorial (@charlessutton, #816)
- ๐ Bugfix: Full2sparse clipped to use abs value (@tmylk, #811)
- ๐ WMD docstring: add tutorial link and query example (@tmylk, #813)
- โก๏ธ Annoy integration to speed word2vec and doc2vec similarity. Tutorial update (@droudy, #799,#792 )
- โ Add converter of LDA model between Mallet, Vowpal Wabit and gensim (@dsquareindia, #798, #766)
- Distributed LDA in different network segments without broadcast (@menshikh-iv, #782)
- Update Corpora_and_Vector_Spaces.ipynb (@megansquire, #772)
- ๐ DTM wrapper bug fixes caused by renaming num_words in #755 (@bhargavvader, #770)
- โ Add LsiModel.docs_processed attribute (@hobson, #763)
- Dynamic Topic Modelling in Python. Google Summer of Code 2016 project. (@bhargavvader, #739, #831)
- ๐ wordtopics has changed to word_topics in ldamallet, and fixed issue #764. (@bhargavvader, #771)
-
v0.13.1 Changes
June 22, 2016- Topic coherence C_v and U_mass (@dsquareindia, #710)
-
v0.13.0 Changes
June 21, 2016- โ Added Distance Metrics to matutils.pt (@bhargavvader, #656)
- Tutorials migrated from website to ipynb (@j9chan, #721), (@jesford, #733), (@jesford, #725), (@jesford, #716)
- ๐ New doc2vec intro tutorial (@seanlaw, #730)
- Gensim Quick Start Tutorial (@andrewjlm, #727)
- โ Add export_phrases(sentences) to model Phrases (hanabi1224 #588)
- ๐ SparseMatrixSimilarity returns a sparse matrix if
maintain_sparsity
is True (@davechallis, #590) - โ added functionality for Topics of Words in document - i.e, dynamic topics. (@bhargavvader, #704)
- also included tutorial which explains new functionalities, and document word-topic colring.
- ๐ Made normalization an explicit transformation. Added 'l1' norm support (@dsquareindia, #649)
- โ added term-topics API for most probable topic for word in vocab. (@bhargavvader, #706)
- build_vocab takes progress_per parameter for smaller output (@zer0n, #624)
- Control whether to use lowercase for computing word2vec accuracy. (@alantian, #607)
- Easy import of GloVe vectors using Gensim (Manas Ranjan Kar, #625)
- Allow easy port of GloVe vectors into Gensim
- Standalone script with command line arguments, compatible with Python>=2.6
- Usage: python -m gensim.scripts.glove2word2vec -i glove_vectors.txt -o output_word2vec_compatible.txt
- Add
similar_by_word()
andsimilar_by_vector()
to word2vec (@isohyt, #381) - Convenience method for similarity of two out of training sentences to doc2vec (@ellolo, #707)
- โก๏ธ Dynamic Topic Modelling Tutorial updated with Dynamic Influence Model (@bhargavvader, #689)
- โ Added function to filter 'n' most frequent words from the dictionary (@abhinavchawla, #718)
- โ Raise warnings if vocab is single character elements and if alpha is increased in word2vec/doc2vec (@dsquareindia, #705)
- โ Tests for wikidump (@jonmcoe, #723)
- ๐ Mallet wrapper sparse format support (@RishabGoel, #664)
- Doc2vec pre-processing script translated from bash to Python (@andrewjlm, #720)