Description
Lightweight python wrapper for vowpal_wabbit.
Why: Scalable, blazingly fast machine learning.
vowpal_porpoise alternatives and similar packages
Based on the "Machine Learning" category.
Alternatively, view vowpal_porpoise alternatives based on common mentions on social networks and blogs.
-
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow -
PaddlePaddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署) -
Prophet
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. -
NuPIC
DISCONTINUED. Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex. -
H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. -
Sacred
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA. -
Clairvoyant
Software designed to identify and monitor social/historical cues for short term stock movement -
garak, LLM vulnerability scanner
DISCONTINUED. the LLM vulnerability scanner [Moved to: https://github.com/NVIDIA/garak] -
karateclub
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020) -
awesome-embedding-models
A curated list of awesome embedding models tutorials, projects and communities. -
Crab
Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib). -
seqeval
A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...) -
SciKit-Learn Laboratory
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments. -
Feature Forge
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API -
Robocorp Action Server
Create 🐍 Python AI Actions and 🤖 Automations, and deploy & operate them anywhere -
Data Flow Facilitator for Machine Learning (dffml)
DISCONTINUED. The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.
CodeRabbit: AI Code Reviews for Developers
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of vowpal_porpoise or a related project?
README
vowpal_porpoise
Lightweight python wrapper for vowpal_wabbit.
Why: Scalable, blazingly fast machine learning.
Install
- Install vowpal_wabbit. Clone and run
make
- Install cython.
pip install cython
- Clone vowpal_porpoise
- Run:
python setup.py install
to install.
Now can you do: import vowpal_porpoise
from python.
Examples
Standard Interface
Linear regression with l1 penalty:
from vowpal_porpoise import VW
# Initialize the model
vw = VW(moniker='test', # a name for the model
passes=10, # vw arg: passes
loss='quadratic', # vw arg: loss
learning_rate=10, # vw arg: learning_rate
l1=0.01) # vw arg: l1
# Inside the with training() block a vw process will be
# open to communication
with vw.training():
for instance in ['1 |big red square',\
'0 |small blue circle']:
vw.push_instance(instance)
# here stdin will close
# here the vw process will have finished
# Inside the with predicting() block we can stream instances and
# acquire their labels
with vw.predicting():
for instance in ['1 |large burnt sienna rhombus',\
'0 |little teal oval']:
vw.push_instance(instance)
# Read the predictions like this:
predictions = list(vw.read_predictions_())
L-BFGS with a rank-5 approximation:
from vowpal_porpoise import VW
# Initialize the model
vw = VW(moniker='test_lbfgs', # a name for the model
passes=10, # vw arg: passes
lbfgs=True, # turn on lbfgs
mem=5) # lbfgs rank
Latent Dirichlet Allocation with 100 topics:
from vowpal_porpoise import VW
# Initialize the model
vw = VW(moniker='test_lda', # a name for the model
passes=10, # vw arg: passes
lda=100, # turn on lda
minibatch=100) # set the minibatch size
Scikit-learn Interface
vowpal_porpoise also ships with an interface into scikit-learn, which allows awesome experiment-level stuff like cross-validation:
from sklearn.cross_validation import StratifiedKFold
from sklearn.grid_search import GridSearchCV
from vowpal_porpoise.sklearn import VW_Classifier
GridSearchCV(
VW_Classifier(loss='logistic', moniker='example_sklearn',
passes=10, silent=True, learning_rate=10),
param_grid=parameters,
score_func=f1_score,
cv=StratifiedKFold(y_train),
).fit(X_train, y_train)
Check out example_sklearn.py for more details
Library Interace (DISABLED as of 2013-08-12)
Via the VW
interface:
with vw.predicting_library():
for instance in ['1 |large burnt sienna rhombus', \
'1 |little teal oval']:
prediction = vw.push_instance(instance)
Now the predictions are returned directly to the parent process, rather than having to read from disk.
See examples/example1.py
for more details.
Alternatively you can use the raw library interface:
import vw_c
vw = vw_c.VW("--loss=quadratic --l1=0.01 -f model")
vw.learn("1 |this is a positive example")
vw.learn("0 |this is a negative example")
vw.finish()
Currently does not support passes due to some limitations in the underlying vw C code.
Need more examples?
- example1.py: SimpleModel class wrapper around VP (both standard and library flavors)
- example_library.py: Demonstrates the low-level vw library wrapper, classifying lines of alice in wonderland vs through the looking glass.
Why
vowpal_wabbit is insanely
fast and scalable. vowpal_porpoise is slower, but only during the
initial training pass. Once the data has been properly cached it will idle while vowpal_wabbit does all the heavy lifting.
Furthermore, vowpal_porpoise was designed to be lightweight and not to get in the way
of vowpal_wabbit's scalability, e.g. it allows distributed learning via
--nodes
and does not require data to be batched in memory. In our
research work we use vowpal_porpoise on an 80-node cluster running over multiple
terabytes of data.
The main benefit of vowpal_porpoise is allowing rapid prototyping of new models and feature extractors. We found that we had been doing this in an ad-hoc way using python scripts to shuffle around massive gzipped text files, so we just closed the loop and made vowpal_wabbit a python library.
How it works
Wraps the vw binary in a subprocess and uses stdin to push data, temporary files to pull predictions. Why not use the prediction labels vw provides on stdout? It turns out that the python GIL basically makes streamining in and out of a process (even asynchronously) painfully difficult. If you know of a clever way to get around this, please email me. In other languages (e.g. in a forthcoming scala wrapper) this is not an issue.
Alternatively, you can use a pure api call (vw_c
, wrapping libvw) for prediction.
Contact
Joseph Reisinger @josephreisinger
Contributors
- Austin Waters ([email protected])
- Joseph Reisinger ([email protected])
- Daniel Duckworth ([email protected])
License
Apache 2.0
*Note that all licence references and agreements mentioned in the vowpal_porpoise README section above
are relevant to that project's source code only.