Data Flow Facilitator for Machine Learning (dffml) v0.4.0 Release Notes
Release Date: 2021-02-18 // about 4 years ago-
โ Added
- ๐ New model for Anomaly Detection
- Ablity to specify maximum number of contexts running at a time
- CLI and Python example usage of Custom Neural Network
- ๐ PyTorch loss function entrypoint style loading
- ๐ Custom Neural Network, last layer support for pre-trained models
- Example usage of sklearn operations
- Example Flower17 species image classification
- Configloading ablity from CLI using "@" before filename
- ๐ Docstrings and doctestable example for DataFlowPreprocessSource
- XGBoost Regression Model
- Pre-Trained PyTorch torchvision Models
- Spacy model for NER
- Ability to rename outputs using GetSingle
- Tutorial for using NLP operations with models
- ๐ Operations plugin for NLP wrapping spacy and scikit functions
- ๐ Support for default value in a Definition
- Source for reading images in directories
- ๐ Operations plugin for image preprocessing
-pretty
flag tolist records
andpredict
commands- daal4py based linear regression model
- DataFlowPreprocessSource can take a config file as dataflow via the CLI.
- ๐ Support for link on conditions in dataflow diagrams
edit all
command to edit records in bulk- ๐ Support for Tensorflow 2.2
- Vowpal Wabbit Models
- ๐ Python 3.8 support
- binsec branch to
operations/binsec
- โ
Doctestable example for
model_predict
operation. - โ
Doctestable examples to
operation/mapping.py
- shouldi got an operation to run Dependency-check on java code.
load
andrun
functions in high level API- โ
Doctestable examples to
db
operations. - ๐ Source for parsing
.ini
file formats - โ Tests for noasync high level API.
- โ Tests for load and save functions in high level API.
- 0๏ธโฃ
Operation
inputs and outputs default to emptydict
if not given. - Ability to export any object with
dffml service dev export
- Complete example for dataflow run cli command
- โ Tests for default configs instantiation.
- Example ffmpeg operation.
- ๐ Operations to deploy docker container on receiving github webhook.
- ๐ New use case
Redeploying dataflow on webhook
in docs. - ๐ Documentation for creating Source for new File types taking
.ini
as an example. - ๐ New input modes, output modes for HTTP API dataflow registration.
- Usage example for tfhub text classifier.
AssociateDefinition
output operation to map definition names to values produced as a result of passing Inputs with those definitions to operations.- DataFlows now have a syntax for providing a set of definitions that will override the operations default definition for a given input.
- Source which modifies record features as they are read from another source. Useful for modifying datasets as they are used with ML commands or editing in bulk.
- Auto create Definition for the
op
when they might have a spec, subspec. shouldi use
command which detects the language of the codebase given via path to directory or Git repo URL and runs the appropriate static analyzers.- ๐ Support for entrypoint style loading of operations and seed inputs in
dataflow create
. - Definition for output of the function that
op
wraps. - ๐ฆ Expose high level load, run and save functions to noasync.
- Operation to verify secret for GitHub webhook.
- Option to modify flow and add config in
dataflow create
. - Ability to use a function as a data source via the
op
source - ๐ Make every model's directory property required
- ๐ New model AutoClassifierModel based on
AutoSklearn
. - ๐ New model AutoSklearnRegressorModel based on
AutoSklearn
. - Example showing usage of locks in dataflow.
-skip
flag toservice dev install
command to let users not install certain core plugins- HTTP service got a
-redirect
flag which allows for URL redirection via a HTTP 307 response - ๐ Support for immediate response in HTTP service
- Daal4py example usage.
- Gitter chatbot tutorial.
- Option to run dataflow without sources from cli.
- โ Sphinx extension for automated testing of tutorials (consoletest)
- Example of software portal using DataFlows and HTTP service
- Retry parameter to
Operation
. Allows for setting number of times operation should be retried before it's exception should be raised. ### ๐ Changed - ๐ Renamed
-seed
to-inputs
indataflow create
command - ๐ Renamed configloader/png to configloader/image and added support for loading JPEG and TIFF file formats
- Update record
__str__
method to output in tabular format - โก๏ธ Update MNIST use case to normalize image arrays.
- ๐
arg_
notation replaced withCONFIG = ExampleConfig
style syntax for parsing all command line arguments. - ๐ Moved usage/io.rst to docs/tutorials/dataflows/io.rst
edit
command substituted withedit record
- ๐
Edit on Github
button now hidden for plugins. - โ Doctests now run via unittests
- Every class and function can now be imported from the top level module
op
attempts to createDefinition
s for each argument if aninputs
are not given.- 0๏ธโฃ Classes now use
CONFIG
if it has a default for every field andconfig
isNone
- Models now dynamically import third party modules.
- Memory dataflow classes now use auto args and config infrastructure
- ๐จ
dffml list records
command prints Records as JSON using.export()
- ๐ Feature class in
dffml/feature/feature.py
initialize a feature object - All DefFeatures() functions are substituted with Features()
- All feature.type() and feature.lenght() are substituted with feature.type and feature.length
- FileSource takes pathlib.Path as filename
- โ Tensorflow tests re-run themselves up to 6 times to stop them from failing the CI due to their randomly initialized weights making them fail ~2% of the time
- ๐ Any plugin can now be loaded via it's entrypoint style path
with_features
now raises a helpful error message if no records with matching features were found- Split out model tutorial into writing the model, and another tutorial for packaging the model.
- โ IntegrationCLITestCase creates a new directory and chdir into it for each test
- โ Automated testing of Automating Classification tutorial
- ๐จ
dffml version
command now prints git repo hash and if the repo is dirty ### ๐ Fixed export_value
now converts numpy array to JSON serializable datatype- CSV source overwriting configloaded data to every row
- Race condition in
MemoryRedundancyChecker
when more than 4 possible parameter sets for an operation. - ๐ Typing of config values for numpy parsed docstrings where type should be tuple or list
- Model predict methods now use
SourcesContext.with_features
### โ Removed - โ Monitor class and associated tests (unused)
- DefinedFeature class in
dffml/feature/feature.py
- DefFeature function in
dffml/feature/feature.py
- load_def function in Feature class in
dffml/feature/feature.py
Previous changes from v0.3.7
-
[0.3.7] - 2020-04-14
โ Added
- IO operations demo and
literal_eval
operation. - Python prompts
>>>
can now be enabled or disabled for easy copying of code into interactive sessions. - Whitespace check now checks .rst and .md files too.
GetMulti
operation which gets all Inputs of a given definition- โ Python usage example for LogisticRegression and its related tests.
- ๐ Support for async generator operations
- Example CLI commands and Python code for
SLRModel
save
function in high level API to quickly save all given records to a
source- ๐ง Ability to configure sources and models for HTTP API from command line when
starting server - ๐ Documentation page for command line usage of HTTP API
- Usage of HTTP API to the quickstart to use trained model
๐ Changed
- ๐ Renamed
"arg"
to"plugin"
. - CSV source sorts feature names within headers when saving
- ๐ Moved HTTP service testing code to HTTP service
util.testing
๐ Fixed
- ๐ Exporting plugins
- ๐ Issue parsing string values when using the
dataflow run
command and
specifying extra inputs.
โ Removed
- Unused imports
- IO operations demo and