Description
A curated list of awesome embedding models tutorials, projects and communities. Please feel free to pull requests to add links.
awesome-embedding-models alternatives and similar packages
Based on the "Machine Learning" category.
Alternatively, view awesome-embedding-models alternatives based on common mentions on social networks and blogs.
-
tensorflow
An Open Source Machine Learning Framework for Everyone -
scikit-learn
scikit-learn: machine learning in Python -
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow -
gym
A toolkit for developing and comparing reinforcement learning algorithms. -
CNTK
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit -
PaddlePaddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署) -
Prophet
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. -
MLflow
Open source platform for the machine learning lifecycle -
MindsDB
A low-code Machine Learning platform to help developers build #AI solutions -
TFLearn
Deep learning library featuring a higher-level API for TensorFlow. -
H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. -
NuPIC
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex. -
Pyro.ai
Deep universal probabilistic programming with Python and PyTorch -
Surprise
A Python scikit for building and analyzing recommender systems -
Pylearn2
Warning: This project does not have any current developer. See bellow. -
LightFM
A Python implementation of LightFM, a hybrid recommendation algorithm. -
Sacred
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA. -
skflow
Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning -
Clairvoyant
Software designed to identify and monitor social/historical cues for short term stock movement -
python-recsys
A python library for implementing a recommender system -
Metrics
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave -
karateclub
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020) -
Crab
Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib). -
seqeval
A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...) -
adaptive
:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions -
TrueSkill, the video game rating system
An implementation of the TrueSkill rating system for Python -
rwa
Machine Learning on Sequential Data Using a Recurrent Weighted Average -
SciKit-Learn Laboratory
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments. -
Feature Forge
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API -
brew
Multiple Classifier Systems and Ensemble Learning Library in Python. -
Data Flow Facilitator for Machine Learning (dffml)
The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease. -
bodywork
ML pipeline orchestration and model deployments on Kubernetes, made really easy. -
Xorbits
Scalable Python data science, in an API compatible & lightning fast way. -
MLP Classifier
A handwritten multilayer perceptron classifer using numpy. -
OptaPy
OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems. -
vowpal_porpoise
lightweight python wrapper for vowpal wabbit -
openskill.py
Multiplayer rating system. Better than Elo. -
omega-ml
Python analytics made easy - an open source DataOps, MLOps platform for humans -
ChaiPy
A developer interface for creating advanced chatbots for the Chai app. -
tfgraphviz
A visualization tool to show a TensorFlow's graph like TensorBoard -
neptune-contrib
This library is a location of the LegacyLogger for PyTorch Lightning.
Access the most powerful time series database as a service
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of awesome-embedding-models or a related project?
README
awesome-embedding-models
A curated list of awesome embedding models tutorials, projects and communities. Please feel free to pull requests to add links.
Table of Contents
Papers
Word Embeddings
Word2vec, GloVe, FastText
- Efficient Estimation of Word Representations in Vector Space (2013), T. Mikolov et al. [pdf]
- Distributed Representations of Words and Phrases and their Compositionality (2013), T. Mikolov et al. [pdf]
- word2vec Parameter Learning Explained (2014), Xin Rong [pdf]
- word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method (2014), Yoav Goldberg, Omer Levy [pdf]
- GloVe: Global Vectors for Word Representation (2014), J. Pennington et al. [pdf]
- Improving Word Representations via Global Context and Multiple Word Prototypes (2012), EH Huang et al. [pdf]
- Enriching Word Vectors with Subword Information (2016), P. Bojanowski et al. [pdf]
- Bag of Tricks for Efficient Text Classification (2016), A. Joulin et al. [pdf]
Language Model
- Semi-supervised sequence tagging with bidirectional language models (2017), Peters, Matthew E., et al. [pdf]
- Deep contextualized word representations (2018), Peters, Matthew E., et al. [pdf]
- Contextual String Embeddings for Sequence Labeling (2018), Akbik, Alan, Duncan Blythe, and Roland Vollgraf. [pdf]
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018), [pdf]
Embedding Enhancement
- Sentence Embedding:Learning Semantic Sentence Embeddings using Pair-wise Discriminator(2018),Patro et al.[Project Page] [Paper]
- Retrofitting Word Vectors to Semantic Lexicons (2014), M. Faruqui et al. [pdf]
- Better Word Representations with Recursive Neural Networks for Morphology (2013), T.Luong et al. [pdf]
- Dependency-Based Word Embeddings (2014), Omer Levy, Yoav Goldberg [pdf]
- Not All Neural Embeddings are Born Equal (2014), F. Hill et al. [pdf]
- Two/Too Simple Adaptations of Word2Vec for Syntax Problems (2015), W. Ling[pdf]
Comparing count-based vs predict-based method
- Linguistic Regularities in Sparse and Explicit Word Representations (2014), Omer Levy, Yoav Goldberg[pdf]
- Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors (2014), M. Baroni [pdf]
- Improving Distributional Similarity with Lessons Learned from Word Embeddings (2015), Omer Levy [pdf]
Evaluation, Analysis
- Evaluation methods for unsupervised word embeddings (2015), T. Schnabel [pdf]
- Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance (2016), B. Chiu [pdf]
- Problems With Evaluation of Word Embeddings Using Word Similarity Tasks (2016), M. Faruqui [pdf]
- Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure (2016), Oded Avraham, Yoav Goldberg [pdf]
- Evaluating Word Embeddings Using a Representative Suite of Practical Tasks (2016), N. Nayak [pdf]
Phrase, Sentence and Document Embeddings
Sentence
- Skip-Thought Vectors
- A Simple but Tough-to-Beat Baseline for Sentence Embeddings
- An efficient framework for learning sentence representations
- Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
- Universal Sentence Encoder
Document
Sense Embeddings
- SENSEMBED: Learning Sense Embeddings for Word and Relational Similarity
- Multi-Prototype Vector-Space Models of Word Meaning
Neural Language Models
- Recurrent neural network based language model
- A Neural Probabilistic Language Model
- Linguistic Regularities in Continuous Space Word Representations
Researchers
Courses and Lectures
Datasets
Training
Evaluation
- SemEval-2012 Task 2
- WordSimilarity-353
- Stanford's Contextual Word Similarities (SCWS)
- Stanford Rare Word (RW) Similarity Dataset
Pre-Trained Language Models
Below is pre-trained ELMo models. Adding ELMo to existing NLP systems significantly improves the state-of-the-art for every considered task.
Below is pre-trained sent2vec models.
Pre-Trained Word Vectors
Convenient downloader for pre-trained word vectors:
Links for pre-trained word vectors:
- Word2vec pretrained vector(English Only)
- Word2vec pretrained vectors for 30+ languages
- FastText pretrained vectors for 157 languages
- FastText pretrained vector for Japanese with NEologd
- word vectors trained by GloVe
- Dependency-Based Word Embeddings
- Meta-Embeddings
- Lex-Vec
- Huang et al. (2012)'s embeddings (HSMN+csmRNN)
- Collobert et al. (2011)'s embeddings (CW+csmRNN)
- BPEmb: subword embeddings for 275 languages
- Wikipedia2Vec: pretrained word and entity embeddings for 12 languages
- word2vec-slim
- BioWordVec: fastText pretrained vector for biomedical text
<!--
Articles
-->