Popularity

9.5

Stable

Activity

7.5

Declining

Stars 15,236

Watchers 436

Forks 4,347

Last Commit 3 days ago

Description

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Code Quality Rank: L3

Programming language: Python

License: GNU Lesser General Public License v3.0 only

Tags: Text Processing Machine Learning Scientific Engineering Information Analysis Linguistic Artificial Intelligence

Latest version: v4.2.0

gensim alternatives and similar packages

Based on the "Machine Learning" category.
Alternatively, view gensim alternatives based on common mentions on social networks and blogs.

tensorflow

10.0 10.0 L1 gensim VS tensorflow

An Open Source Machine Learning Framework for Everyone
Keras

9.9 9.9 L2 gensim VS Keras

Deep Learning for humans

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

scikit-learn

9.9 9.9 L3 gensim VS scikit-learn

scikit-learn: machine learning in Python
xgboost

9.8 9.6 L1 gensim VS xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
gym

9.8 0.0 gensim VS gym

A toolkit for developing and comparing reinforcement learning algorithms.
PaddlePaddle

9.6 10.0 L1 gensim VS PaddlePaddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
CNTK

9.6 0.0 L1 gensim VS CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
MLflow

9.5 9.9 gensim VS MLflow

Open source platform for the machine learning lifecycle
MindsDB

9.5 10.0 gensim VS MindsDB

The platform for customizing AI from enterprise data
Prophet

9.5 6.2 gensim VS Prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
TFLearn

9.1 0.0 L3 gensim VS TFLearn

Deep learning library featuring a higher-level API for TensorFlow.
dspy

8.9 9.9 gensim VS dspy

DSPy: The framework for programming—not prompting—foundation models
NuPIC

8.8 0.0 L3 gensim VS NuPIC

DISCONTINUED. Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
H2O

8.8 9.7 gensim VS H2O

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Pyro.ai

8.7 8.4 gensim VS Pyro.ai

Deep universal probabilistic programming with Python and PyTorch
Surprise

8.4 0.0 L4 gensim VS Surprise

A Python scikit for building and analyzing recommender systems
srez

8.3 0.0 L5 gensim VS srez

DISCONTINUED. Image super-resolution through deep learning
LightFM

7.9 4.8 L4 gensim VS LightFM

A Python implementation of LightFM, a hybrid recommendation algorithm.
Pylearn2

7.8 0.0 L2 gensim VS Pylearn2

Warning: This project does not have any current developer. See bellow.
skflow

7.6 1.3 L4 gensim VS skflow

DISCONTINUED. Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
PyBrain

7.5 0.0 L4 gensim VS PyBrain

Another Python Machine Learning Library.
Sacred

7.5 3.5 gensim VS Sacred

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
Clairvoyant

7.2 0.0 L3 gensim VS Clairvoyant

Software designed to identify and monitor social/historical cues for short term stock movement
Metrics

6.2 0.0 gensim VS Metrics

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave
python-recsys

6.2 0.0 L4 gensim VS python-recsys

A python library for implementing a recommender system
karateclub

6.1 7.0 gensim VS karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
awesome-embedding-models

6.0 0.0 gensim VS awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.
pydeep

5.9 0.0 L3 gensim VS pydeep

Deep learning in Python
Crab

5.7 0.0 L2 gensim VS Crab

Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering recommendation algorithms in the world of scientiﬁc Python packages (numpy, scipy, matplotlib).
hebel

5.0 0.0 L2 gensim VS hebel

GPU-Accelerated Deep Learning Library in Python
seqeval

4.6 0.0 gensim VS seqeval

A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
adaptive

4.5 6.2 gensim VS adaptive

:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions
Xorbits

4.4 8.8 gensim VS Xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
TrueSkill, the video game rating system

4.2 1.4 gensim VS TrueSkill, the video game rating system

An implementation of the TrueSkill rating system for Python
pdpipe

3.9 0.0 gensim VS pdpipe

Easy pipelines for pandas DataFrames.
SciKit-Learn Laboratory

3.9 8.7 gensim VS SciKit-Learn Laboratory

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
rwa

3.8 0.0 L5 gensim VS rwa

Machine Learning on Sequential Data Using a Recurrent Weighted Average
Feature Forge

3.5 0.0 L4 gensim VS Feature Forge

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API
nptyping

3.3 0.0 gensim VS nptyping

💡 Type hints for Numpy and Pandas
Data Flow Facilitator for Machine Learning (dffml)

3.3 9.1 gensim VS Data Flow Facilitator for Machine Learning (dffml)

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.
brew

3.2 0.0 L4 gensim VS brew

DISCONTINUED. Multiple Classifier Systems and Ensemble Learning Library in Python.
bodywork

3.1 0.0 gensim VS bodywork

DISCONTINUED. ML pipeline orchestration and model deployments on Kubernetes.
Robocorp Action Server

3.0 9.8 gensim VS Robocorp Action Server

Create 🐍 Python AI Actions and 🤖 Automations, and deploy & operate them anywhere
MLP Classifier

2.8 0.0 L4 gensim VS MLP Classifier

A handwritten multilayer perceptron classifer using numpy.
OptaPy

2.7 5.5 gensim VS OptaPy

OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.
redframes

2.7 1.4 gensim VS redframes

General Purpose Data Manipulation Library
openskill.py

2.5 7.3 gensim VS openskill.py

Multiplayer Rating System. No Friction.
vowpal_porpoise

2.5 0.0 L3 gensim VS vowpal_porpoise

lightweight python wrapper for vowpal wabbit
omega-ml

1.8 8.1 gensim VS omega-ml

MLOps simplified. From ML Pipeline ⇨ Data Product without the hassle
ChaiPy

1.5 0.0 gensim VS ChaiPy

DISCONTINUED. A developer interface for creating advanced chatbots for the Chai app.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of gensim or a related project?

Add another 'Machine Learning' Package

Popular Comparisons

README

gensim – Topic Modelling in Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

⚠️ Please sponsor Gensim to help sustain this open source project ❤️

Features

All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),
Intuitive interfaces
- easy to plug in your own input corpus/datastream (trivial streaming API)
- easy to extend with other Vector Space algorithms (trivial transformation API)
Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.
Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.
Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don’t need to do anything special.

Install the latest version of gensim:

    pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

    python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under all supported Python versions. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Support

For commercial support, please see Gensim sponsorship.

Ask open-ended questions on the public Gensim Mailing List.

Raise bugs on Github but please make sure you follow the issue template. Issues that are not bugs or fail to provide the requested details will be closed without inspection.

Adopters

Company	Logo	Industry	Use of Gensim
RARE Technologies	[rare](docs/src/readme_images/rare.png)	ML & NLP consulting	Creators of Gensim – this is us!
Amazon	[amazon](docs/src/readme_images/amazon.png)	Retail	Document similarity.
National Institutes of Health	[nih](docs/src/readme_images/nih.png)	Health	Processing grants and publications with word2vec.
Cisco Security	[cisco](docs/src/readme_images/cisco.png)	Security	Large-scale fraud detection.
Mindseye	[mindseye](docs/src/readme_images/mindseye.png)	Legal	Similarities in legal documents.
Channel 4	[channel4](docs/src/readme_images/channel4.png)	Media	Recommendation engine.
Talentpair	[talent-pair](docs/src/readme_images/talent-pair.png)	HR	Candidate matching in high-touch recruiting.
Juju	[juju](docs/src/readme_images/juju.png)	HR	Provide non-obvious related job suggestions.
Tailwind	[tailwind](docs/src/readme_images/tailwind.png)	Media	Post interesting and relevant content to Pinterest.
Issuu	[issuu](docs/src/readme_images/issuu.png)	Media	Gensim's LDA module lies at the very core of the analysis we perform on each uploaded publication to figure out what it's all about.
Search Metrics	[search-metrics](docs/src/readme_images/search-metrics.png)	Content Marketing	Gensim word2vec used for entity disambiguation in Search Engine Optimisation.
12K Research	[12k](docs/src/readme_images/12k.png)	Media	Document similarity analysis on media articles.
Stillwater Supercomputing	[stillwater](docs/src/readme_images/stillwater.png)	Hardware	Document comprehension and association with word2vec.
SiteGround	[siteground](docs/src/readme_images/siteground.png)	Web hosting	An ensemble search engine which uses different embeddings models and similarities, including word2vec, WMD, and LDA.
Capital One	[capitalone](docs/src/readme_images/capitalone.png)	Finance	Topic modeling for customer complaints exploration.

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      note={\url{http://is.muni.cz/publication/884893/en}},
      language={English}
}

gensim

Topic Modelling for Humans