Data Flow Facilitator for Machine Learning (dffml) v0.4.0 Release Notes

Release Date: 2021-02-18 // 8 months ago
  • โž• Added

    • ๐Ÿ†• New model for Anomaly Detection
    • Ablity to specify maximum number of contexts running at a time
    • CLI and Python example usage of Custom Neural Network
    • ๐Ÿ’… PyTorch loss function entrypoint style loading
    • ๐Ÿ‘ Custom Neural Network, last layer support for pre-trained models
    • Example usage of sklearn operations
    • Example Flower17 species image classification
    • Configloading ablity from CLI using "@" before filename
    • ๐Ÿ“„ Docstrings and doctestable example for DataFlowSource
    • XGBoost Regression Model
    • Pre-Trained PyTorch torchvision Models
    • Spacy model for NER
    • Ability to rename outputs using GetSingle
    • Tutorial for using NLP operations with models
    • ๐Ÿ”Œ Operations plugin for NLP wrapping spacy and scikit functions
    • ๐Ÿ‘Œ Support for default value in a Definition
    • Source for reading images in directories
    • ๐Ÿ”Œ Operations plugin for image preprocessing
    • -pretty flag to list records and predict commands
    • daal4py based linear regression model
    • DataFlowSource can take a config file as dataflow via the CLI.
    • ๐Ÿ‘Œ Support for link on conditions in dataflow diagrams
    • edit all command to edit records in bulk
    • ๐Ÿ‘Œ Support for Tensorflow 2.2
    • Vowpal Wabbit Models
    • ๐Ÿ‘ Python 3.8 support
    • binsec branch to operations/binsec
    • โœ… Doctestable example for model_predict operation.
    • โœ… Doctestable examples to operation/mapping.py
    • shouldi got an operation to run Dependency-check on java code.
    • load and run functions in high level API
    • โœ… Doctestable examples to db operations.
    • ๐Ÿ“œ Source for parsing .ini file formats
    • โœ… Tests for noasync high level API.
    • โœ… Tests for load and save functions in high level API.
    • 0๏ธโƒฃ Operation inputs and ouputs default to empty dict if not given.
    • Ability to export any object with dffml service dev export
    • Complete example for dataflow run cli command
    • โœ… Tests for default configs instantiation.
    • Example ffmpeg operation.
    • ๐Ÿš€ Operations to deploy docker container on receving github webhook.
    • ๐Ÿ†• New use case Redeploying dataflow on webhook in docs.
    • ๐Ÿ“š Documentation for creating Source for new File types taking .ini as an example.
    • ๐Ÿ†• New input modes, output modes for HTTP API dataflow registration.
    • Usage example for tfhub text classifier.
    • AssociateDefinition output operation to map definition names to values produced as a result of passing Inputs with those definitions to operations.
    • DataFlows now have a syntax for providing a set of definitions that will override the operations default definition for a given input.
    • Source which modifies record features as they are read from another source. Useful for modifying datasets as they are used with ML commands or editing in bulk.
    • Auto create Definition for the op when they might have a spec, subspec.
    • shouldi use command which detects the language of the codebase given via path to directory or Git repo URL and runs the appropriate static analyzers.
    • ๐Ÿ‘Œ Support for entrypoint style loading of operations and seed inputs in dataflow create.
    • Definition for output of the function that op wraps.
    • ๐Ÿ”ฆ Expose high level load, run and save functions to noasync.
    • Operation to verify secret for GitHub webhook.
    • Option to modify flow and add config in dataflow create.
    • Ability to use a function as a data source via the op source
    • ๐Ÿ‘‰ Make every model's directory property required
    • ๐Ÿ†• New model AutoClassifierModel based on AutoSklearn.
    • ๐Ÿ†• New model AutoSklearnRegressorModel based on AutoSklearn.
    • Example showing usage of locks in dataflow.
    • -skip flag to service dev install command to let users not install certain core plugins
    • HTTP service got a -redirect flag which allows for URL redirection via a HTTP 307 response
    • ๐Ÿ‘Œ Support for immediate response in HTTP service
    • Daal4py example usage.
    • Gitter chatbot tutorial.
    • Option to run dataflow without sources from cli.
    • โœ… Sphinx extension for automated testing of tutorials (consoletest)
    • Example of software portal using DataFlows and HTTP service
    • Retry parameter to Operation. Allows for setting number of times operation should be retried before it's exception should be raised. ### ๐Ÿ”„ Changed
    • ๐Ÿ‘€ Renamed -seed to -inputs in dataflow create command
    • ๐Ÿ“‡ Renamed configloader/png to configloader/image and added support for loading JPEG and TIFF file formats
    • Update record __str__ method to output in tabular format
    • โšก๏ธ Update MNIST use case to normalize image arrays.
    • ๐Ÿ’… arg_ notation replaced with CONFIG = ExampleConfig style syntax for parsing all command line arguments.
    • ๐Ÿšš Moved usage/io.rst to docs/tutorials/dataflows/io.rst
    • edit command substituted with edit record
    • ๐Ÿ”Œ Edit on Github button now hidden for plugins.
    • โœ… Doctests now run via unittests
    • Every class and function can now be imported from the top level module
    • op attempts to create Definitions for each argument if an inputs are not given.
    • 0๏ธโƒฃ Classes now use CONFIG if it has a default for every field and config is None
    • Models now dynamically import third party modules.
    • Memory dataflow classes now use auto args and config infrastructure
    • ๐Ÿ–จ dffml list records command prints Records as JSON using .export()
    • ๐Ÿ”‹ Feature class in dffml/feature/feature.py initialize a feature object
    • All DefFeatures() functions are substituted with Features()
    • All feature.type() and feature.lenght() are substituted with feature.type and feature.length
    • FileSource takes pathlib.Path as filename
    • โœ… Tensorflow tests re-run themselves up to 6 times to stop them from failing the CI due to their randomly initialized weights making them fail ~2% of the time
    • ๐Ÿ’… Any plugin can now be loaded via it's entrypoint style path
    • with_features now raises a helpful error message if no records with matching features were found
    • Split out model tutorial into writing the model, and another tutorial for packaging the model.
    • โœ… IntegrationCLITestCase creates a new directory and chdir into it for each test
    • โœ… Automated testing of Automating Classification tutorial
    • ๐Ÿ–จ dffml version command now prints git repo hash and if the repo is dirty ### ๐Ÿ›  Fixed
    • export_value now converts numpy array to JSON serializable datatype
    • CSV source overwriting configloaded data to every row
    • Race condition in MemoryRedundancyChecker when more than 4 possible parameter sets for an operation.
    • ๐Ÿ“œ Typing of config vlaues for numpy parsed docstrings where type should be tuple or list
    • Model predict methods now use SourcesContext.with_features ### โœ‚ Removed
    • โœ… Monitor class and associated tests (unused)
    • DefinedFeature class in dffml/feature/feature.py
    • DefFeature function in dffml/feature/feature.py
    • load_def function in Feature class in dffml/feature/feature.py

Previous changes from v0.3.7

  • [0.3.7] - 2020-04-14

    โž• Added

    • IO operations demo and literal_eval operation.
    • Python prompts >>> can now be enabled or disabled for easy copying of code into interactive sessions.
    • Whitespace check now checks .rst and .md files too.
    • GetMulti operation which gets all Inputs of a given definition
    • โœ… Python usage example for LogisticRegression and its related tests.
    • ๐Ÿ‘Œ Support for async generator operations
    • Example CLI commands and Python code for SLRModel
    • save function in high level API to quickly save all given records to a
      source
    • ๐Ÿ”ง Ability to configure sources and models for HTTP API from command line when
      starting server
    • ๐Ÿ“š Documentation page for command line usage of HTTP API
    • Usage of HTTP API to the quickstart to use trained model

    ๐Ÿ”„ Changed

    • ๐Ÿ”Œ Renamed "arg" to "plugin".
    • CSV source sorts feature names within headers when saving
    • ๐Ÿšš Moved HTTP service testing code to HTTP service util.testing

    ๐Ÿ›  Fixed

    • ๐Ÿ”Œ Exporting plugins
    • ๐Ÿ“œ Issue parsing string values when using the dataflow run command and
      specifying extra inputs.

    โœ‚ Removed

    • Unused imports