Description
redframes (rectangular data frames) is a data manipulation library for ML and visualization. It is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib!
redframes prioritizes syntax over flexibility and scope. And minimizes the number-of-googles-per-lines-of-code™ so that you can focus on the work that matters most.
"What is redframes?" would be the answer to the Jeopardy! clue "A pythonic dplyr".
redframes alternatives and similar packages
Based on the "Machine Learning" category.
Alternatively, view redframes alternatives based on common mentions on social networks and blogs.
-
tensorflow
An Open Source Machine Learning Framework for Everyone -
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow -
gym
A toolkit for developing and comparing reinforcement learning algorithms. -
PaddlePaddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署) -
CNTK
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit -
Prophet
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. -
MindsDB
A low-code Machine Learning platform to help developers build #AI solutions -
TFLearn
Deep learning library featuring a higher-level API for TensorFlow. -
NuPIC
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex. -
H2O
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. -
Pyro.ai
Deep universal probabilistic programming with Python and PyTorch -
Surprise
A Python scikit for building and analyzing recommender systems -
LightFM
A Python implementation of LightFM, a hybrid recommendation algorithm. -
Pylearn2
Warning: This project does not have any current developer. See bellow. -
Sacred
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA. -
skflow
Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning -
Clairvoyant
Software designed to identify and monitor social/historical cues for short term stock movement -
python-recsys
A python library for implementing a recommender system -
Metrics
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave -
karateclub
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020) -
awesome-embedding-models
A curated list of awesome embedding models tutorials, projects and communities. -
Crab
Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib). -
seqeval
A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...) -
TrueSkill, the video game rating system
An implementation of the TrueSkill rating system for Python -
adaptive
:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions -
rwa
Machine Learning on Sequential Data Using a Recurrent Weighted Average -
SciKit-Learn Laboratory
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments. -
Feature Forge
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API -
brew
Multiple Classifier Systems and Ensemble Learning Library in Python. -
Data Flow Facilitator for Machine Learning (dffml)
The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease. -
bodywork
ML pipeline orchestration and model deployments on Kubernetes, made really easy. -
Xorbits
Scalable Python data science, in an API compatible & lightning fast way. -
MLP Classifier
A handwritten multilayer perceptron classifer using numpy. -
OptaPy
OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems. -
vowpal_porpoise
lightweight python wrapper for vowpal wabbit -
omega-ml
Python analytics made easy - an open source DataOps, MLOps platform for humans -
ChaiPy
A developer interface for creating advanced chatbots for the Chai app. -
tfgraphviz
A visualization tool to show a TensorFlow's graph like TensorBoard -
neptune-contrib
This library is a location of the LegacyLogger for PyTorch Lightning.
Access the most powerful time series database as a service
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of redframes or a related project?
README
About
redframes (rectangular data frames) is a general purpose data manipulation library that prioritizes syntax, simplicity, and speed (to a solution). Importantly, the library is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib.
Install & Import
pip install redframes
import redframes as rf
Quickstart
Copy-and-paste this to get started:
import redframes as rf
df = rf.DataFrame({
'bear': ['Brown bear', 'Polar bear', 'Asian black bear', 'American black bear', 'Sun bear', 'Sloth bear', 'Spectacled bear', 'Giant panda'],
'genus': ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda'],
'weight (male, lbs)': ['300-860', '880-1320', '220-440', '125-500', '60-150', '175-310', '220-340', '190-275'],
'weight (female, lbs)': ['205-455', '330-550', '110-275', '90-300', '45-90', '120-210', '140-180', '155-220']
})
# | bear | genus | weight (male, lbs) | weight (female, lbs) |
# |:--------------------|:-----------|:---------------------|:-----------------------|
# | Brown bear | Ursus | 300-860 | 205-455 |
# | Polar bear | Ursus | 880-1320 | 330-550 |
# | Asian black bear | Ursus | 220-440 | 110-275 |
# | American black bear | Ursus | 125-500 | 90-300 |
# | Sun bear | Helarctos | 60-150 | 45-90 |
# | Sloth bear | Melursus | 175-310 | 120-210 |
# | Spectacled bear | Tremarctos | 220-340 | 140-180 |
# | Giant panda | Ailuropoda | 190-275 | 155-220 |
(
df
.rename({"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
.gather(["male", "female"], into=("sex", "weight"))
.split("weight", into=["min", "max"], sep="-")
.gather(["min", "max"], into=("stat", "weight"))
.mutate({"weight": lambda row: float(row["weight"])})
.group(["genus", "sex"])
.rollup({"weight": ("weight", rf.stat.mean)})
.spread("sex", using="weight")
.mutate({"dimorphism": lambda row: round(row["male"] / row["female"], 2)})
.drop(["male", "female"])
.sort("dimorphism", descending=True)
)
# | genus | dimorphism |
# |:-----------|-------------:|
# | Ursus | 2.01 |
# | Tremarctos | 1.75 |
# | Helarctos | 1.56 |
# | Melursus | 1.47 |
# | Ailuropoda | 1.24 |
For comparison, here's the equivalent pandas:
import pandas as pd
# df = pd.DataFrame({...})
df = df.rename(columns={"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
df = pd.melt(df, id_vars=['bear', 'genus'], value_vars=['male', 'female'], var_name='sex', value_name='weight')
df[["min", "max"]] = df["weight"].str.split("-", expand=True)
df = df.drop("weight", axis=1)
df = pd.melt(df, id_vars=['bear', 'genus', 'sex'], value_vars=['min', 'max'], var_name='stat', value_name='weight')
df['weight'] = df["weight"].astype('float')
df = df.groupby(["genus", "sex"])["weight"].mean()
df = df.reset_index()
df = pd.pivot_table(df, index=['genus'], columns=['sex'], values='weight')
df = df.reset_index()
df = df.rename_axis(None, axis=1)
df["dimorphism"] = round(df["male"] / df["female"], 2)
df = df.drop(["female", "male"], axis=1)
df = df.sort_values("dimorphism", ascending=False)
df = df.reset_index(drop=True)
# 🤮
IO
Save, load, and convert rf.DataFrame
objects:
# save .csv
rf.save(df, "bears.csv")
# load .csv
df = rf.load("bears.csv")
# convert redframes → pandas
pandas_df = rf.unwrap(df)
# convert pandas → redframes
df = rf.wrap(pandas_df)
Verbs
Verbs are pure and "chain-able" methods that manipulate rf.DataFrame
objects. Here is the complete list (see docstrings for examples and more details):
Verb | Description |
---|---|
accumulate ‡ |
Run a cumulative sum over a column |
append |
Append rows from another DataFrame |
combine |
Combine multiple columns into a single column (opposite of split ) |
cross |
Cross join columns from another DataFrame |
dedupe |
Remove duplicate rows |
denix |
Remove rows with missing values |
drop |
Drop entire columns (opposite of select ) |
fill |
Fill missing values "down", "up", or with a constant |
filter |
Keep rows matching specific conditions |
gather ‡ |
Gather columns into rows (opposite of spread ) |
group |
Prepare groups for compatible verbs‡ |
join |
Join columns from another DataFrame |
mutate |
Create a new, or overwrite an existing column |
pack ‡ |
Collate and concatenate row values for a target column (opposite of unpack ) |
rank ‡ |
Rank order values in a column |
rename |
Rename column keys |
replace |
Replace matching values within columns |
rollup ‡ |
Apply summary functions and/or statistics to target columns |
sample |
Randomly sample any number of rows |
select |
Select specific columns (opposite of drop ) |
shuffle |
Shuffle the order of all rows |
sort |
Sort rows by specific columns |
split |
Split a single column into multiple columns (opposite of combine ) |
spread |
Spread rows into columns (opposite of gather ) |
take ‡ |
Take any number of rows (from the top/bottom) |
unpack |
"Explode" concatenated row values into multiple rows (opposite of pack ) |
Properties
In addition to all of the verbs there are several properties attached to each DataFrame
object:
df["genus"]
# ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda']
df.columns
# ['bear', 'genus', 'weight (male, lbs)', 'weight (female, lbs)']
df.dimensions
# {'rows': 8, 'columns': 4}
df.empty
# False
df.memory
# '2 KB'
df.types
# {'bear': object, 'genus': object, 'weight (male, lbs)': object, 'weight (female, lbs)': object}
matplotlib
rf.DataFrame
objects integrate seamlessly with matplotlib
:
import redframes as rf
import matplotlib.pyplot as plt
football = rf.DataFrame({
'position': ['TE', 'K', 'RB', 'WR', 'QB'],
'avp': [116.98, 131.15, 180, 222.22, 272.91]
})
df = (
football
.mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
.replace({"color": {False: "orange", True: "red"}})
)
plt.barh(df["position"], df["avp"], color=df["color"]);
scikit-learn
rf.DataFrame
objects are fully compatible with sklearn
functions, estimators, and transformers:
import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = rf.DataFrame({
"touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
"age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
"mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})
target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527
print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})
X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])