Kedro/CHANGELOG and Kedro Releases

All Versions

Latest Version

0.16.6

Avg Release Cycle

29 days

Latest Release

1274 days ago

Changelog History

Page 1

v0.16.6 Changes
October 23, 2020
Major features and improvements
- ➕ Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks
- ➕ Added kedro-starter-spaceflights alias for generating a project: kedro new --starter spaceflights.
🐛 Bug fixes and other changes
- 🛠 Fixed TypeError when converting dict inputs to a node made from a wrapped partial function.
- PartitionedDataSet improvements:
  - Supported passing arguments to the underlying filesystem.
- 👌 Improved handling of non-ASCII word characters in dataset names.
  - For example, a dataset named jalapeño will be accessible as DataCatalog.datasets.jalapeño rather than DataCatalog.datasets.jalape__o.
- 🛠 Fixed kedro install for an Anaconda environment defined in environment.yml.
- 🛠 Fixed backwards compatibility with templates generated with older Kedro versions <0.16.5. No longer need to update .kedro.yml to use kedro lint and kedro jupyter notebook convert.
- 👌 Improved documentation.
- ➕ Added documentation using MinIO with Kedro.
- 👌 Improved error messages for incorrect parameters passed into a node.
- 🛠 Fixed issue with saving a TensorFlowModelDataset in the HDF5 format with versioning enabled.
- Added missing run_result argument in after_pipeline_run Hooks spec.
- 🛠 Fixed a bug in IPython script that was causing context hooks to be registered twice. To apply this fix to a project generated with an older Kedro version, apply the same changes made in this PR to your 00-kedro-init.py file.
👍 Thanks for supporting contributions

Deepyaman Datta, Bhavya Merchant, Lovkush Agarwal, Varun Krishna S, Sebastian Bertoli, noklam, Daniel Petti, Waylon Walker
v0.16.5 Changes
September 09, 2020
🚀 Release 0.16.5
v0.16.4 Changes
July 30, 2020
🚀 Release 0.16.4

Major features and improvements
- 🔌 Enabled auto-discovery of hooks implementations coming from installed plugins.
🐛 Bug fixes and other changes
- 🛠 Fixed a bug for using ParallelRunner on Windows.
- Modified GBQTableDataSet to load customised results using customised queries from Google Big Query tables.
- 📚 Documentation improvements.
👍 Thanks for supporting contributions

Ajay Bisht, Vijay Sajjanar, Deepyaman Datta, Sebastian Bertoli, Shahil Mawjee, Louis Guitton, Emanuel Ferm
v0.16.3 Changes
July 13, 2020
🚀 Release 0.16.3

v0.16.2 Changes

June 15, 2020

Major features and improvements

➕ Added the following new datasets.

Type	Description	Location
`pandas.AppendableExcelDataSet`	Works with `Excel` file opened in append mode	`kedro.extras.datasets.pandas`
`tensorflow.TensorFlowModelDataset`	Works with `TensorFlow` models using TensorFlow 2.X	`kedro.extras.datasets.tensorflow`
`holoviews.HoloviewsWriter`	Works with `Holoviews` objects (saves as image file)	`kedro.extras.datasets.holoviews`

🏗 kedro install will now compile project dependencies (by running kedro build-reqs behind the scenes) before the installation if the src/requirements.in file doesn't exist.
Added only_nodes_with_namespace in Pipeline class to filter only nodes with a specified namespace.
➕ Added the kedro pipeline delete command to help delete unwanted or unused pipelines (it won't remove references to the pipeline in your create_pipelines() code).
➕ Added the kedro pipeline package command to help package up a modular pipeline. It will bundle up the pipeline source code, tests, and parameters configuration into a .whl file.

🐛 Bug fixes and other changes

👌 Improvement in DataCatalog:
- Introduced regex filtering to the DataCatalog.list() method.
- Non-alphanumeric characters (except underscore) in dataset name are replaced with __ in DataCatalog.datasets, for ease of access to transcoded datasets.
👌 Improvement in Datasets:
- Improved initialization speed of spark.SparkHiveDataSet.
- Improved S3 cache in spark.SparkDataSet.
- Added support of options for building pyarrow table in pandas.ParquetDataSet.
👌 Improvement in kedro build-reqs CLI command:
- kedro build-reqs is now called with -q option and will no longer print out compiled requirements to the console for security reasons.
- All unrecognized CLI options in kedro build-reqs command are now passed to pip-compile call (e.g. kedro build-reqs --generate-hashes).
👌 Improvement in kedro jupyter CLI command:
- Improved error message when running kedro jupyter notebook, kedro jupyter lab or kedro ipython with Jupyter/IPython dependencies not being installed.
- Fixed %run_viz line magic for showing kedro viz inside a Jupyter notebook. For the fix to be applied on existing Kedro project, please see the migration guide.
- Fixed the bug in IPython startup script (issue 298).
📚 Documentation improvements:
- Updated community-generated content in FAQ.
- Added find-kedro and kedro-static-viz to the list of community plugins.
- Add missing pillow.ImageDataSet entry to the documentation.

💥 Breaking changes to the API

Migration guide from Kedro 0.16.1 to 0.16.2

Guide to apply the fix for `%run_viz` line magic in existing project

0️⃣ Even though this release ships a fix for project generated with kedro==0.16.2, after upgrading, you will still need to make a change in your existing project if it was generated with kedro>=0.16.0,<=0.16.1 for the fix to take effect. Specifically, please change the content of your project's IPython init script located at .ipython/profile_default/startup/00-kedro-init.py with the content of this file. You will also need kedro-viz>=3.3.1.

👍 Thanks for supporting contributions

Miguel Rodriguez Gutierrez, Joel Schwarzmann, w0rdsm1th, Deepyaman Datta, Tam-Sanh Nguyen, Marcus Gawronsky

v0.16.1 Changes
May 21, 2020
🐛 Bug fixes and other changes
- 🛠 Fixed deprecation warnings from kedro.cli and kedro.context when running kedro jupyter notebook.
- 🛠 Fixed a bug where catalog and context were not available in Jupyter Lab and Notebook.
- 🛠 Fixed a bug where kedro build-reqs would fail if you didn't have your project dependencies installed.

v0.16.0 Changes

May 20, 2020

Major features and improvements

CLI

➕ Added new CLI commands (only available for the projects created using Kedro 0.16.0 or later):
- kedro catalog list to list datasets in your catalog
- kedro pipeline list to list pipelines
- kedro pipeline describe to describe a specific pipeline
- kedro pipeline create to create a modular pipeline
👌 Improved the CLI speed by up to 50%.
👌 Improved error handling when making a typo on the CLI. We now suggest some of the possible commands you meant to type, in git-style.

Framework

🚀 All modules in kedro.cli and kedro.context have been moved into kedro.framework.cli and kedro.framework.context respectively. kedro.cli and kedro.context will be removed in future releases.
➕ Added Hooks, which is a new mechanism for extending Kedro.
🛠 Fixed load_context changing user's current working directory.
👍 Allowed the source directory to be configurable in .kedro.yml.
➕ Added the ability to specify nested parameter values inside your node inputs, e.g. node(func, "params:a.b", None)

DataSets

➕ Added the following new datasets.

Type	Description	Location
`pillow.ImageDataSet`	Work with image files using `Pillow`	`kedro.extras.datasets.pillow`
`geopandas.GeoJSONDataSet`	Work with geospatial data using `GeoPandas`	`kedro.extras.datasets.geopandas.GeoJSONDataSet`
`api.APIDataSet`	Work with data from HTTP(S) API requests	`kedro.extras.datasets.api.APIDataSet`

➕ Added joblib backend support to pickle.PickleDataSet.
➕ Added versioning support to MatplotlibWriter dataset.
➕ Added the ability to install dependencies for a given dataset with more granularity, e.g. pip install "kedro[pandas.ParquetDataSet]".
👉 Added the ability to specify extra arguments, e.g. encoding or compression, for fsspec.spec.AbstractFileSystem.open() calls when loading/saving a dataset. See Example 3 under docs.

Other

➕ Added namespace property on Node, related to the modular pipeline where the node belongs.
Added an option to enable asynchronous loading inputs and saving outputs in both SequentialRunner(is_async=True) and ParallelRunner(is_async=True) class.
➕ Added MemoryProfiler transformer.
✂ Removed the requirement to have all dependencies for a dataset module to use only a subset of the datasets within.
➕ Added support for pandas>=1.0.
Enabled Python 3.8 compatibility. Please note that a Spark workflow may be unreliable for this Python version as pyspark is not fully-compatible with 3.8 yet.
Renamed "features" layer to "feature" layer to be consistent with (most) other layers and the relevant FAQ.

🐛 Bug fixes and other changes

🛠 Fixed a bug where a new version created mid-run by an external system caused inconsistencies in the load versions used in the current run.
📚 Documentation improvements
- Added instruction in the documentation on how to create a custom runner).
- Updated contribution process in CONTRIBUTING.md - added Developer Workflow.
- Documented installation of development version of Kedro in the FAQ section.
- Added missing _exists method to MyOwnDataSet example in 04_user_guide/08_advanced_io.
🛠 Fixed a bug where PartitionedDataSet and IncrementalDataSet were not working with s3a or s3n protocol.
➕ Added ability to read partitioned parquet file from a directory in pandas.ParquetDataSet.
Replaced functools.lru_cache with cachetools.cachedmethod in PartitionedDataSet and IncrementalDataSet for per-instance cache invalidation.
Implemented custom glob function for SparkDataSet when running on Databricks.
🛠 Fixed a bug in SparkDataSet not allowing for loading data from DBFS in a Windows machine using Databricks-connect.
👌 Improved the error message for DataSetNotFoundError to suggest possible dataset names user meant to type.
➕ Added the option for contributors to run Kedro tests locally without Spark installation with make test-no-spark.
➕ Added option to lint the project without applying the formatting changes (kedro lint --check-only).

💥 Breaking changes to the API

Datasets

✂ Deleted obsolete datasets from kedro.io.
✂ Deleted kedro.contrib and extras folders.
✂ Deleted obsolete CSVBlobDataSet and JSONBlobDataSet dataset types.
Made invalidate_cache method on datasets private.
get_last_load_version and get_last_save_version methods are no longer available on AbstractDataSet.
get_last_load_version and get_last_save_version have been renamed to resolve_load_version and resolve_save_version on AbstractVersionedDataSet, the results of which are cached.
🚀 The release() method on datasets extending AbstractVersionedDataSet clears the cached load and save version. All custom datasets must call super()._release() inside _release().
TextDataSet no longer has load_args and save_args. These can instead be specified under open_args_load or open_args_save in fs_args.
PartitionedDataSet and IncrementalDataSet method invalidate_cache was made private: _invalidate_caches.

Other

Removed KEDRO_ENV_VAR from kedro.context to speed up the CLI run time.
🚚 Pipeline.name has been removed in favour of Pipeline.tag().
⬇️ Dropped Pipeline.transform() in favour of kedro.pipeline.modular_pipeline.pipeline() helper function.
🚚 Made constant PARAMETER_KEYWORDS private, and moved it from kedro.pipeline.pipeline to kedro.pipeline.modular_pipeline.
🚚 Layers are no longer part of the dataset object, as they've moved to the DataCatalog.
👍 Python 3.5 is no longer supported by the current and all future versions of Kedro.

🚀 Migration guide from Kedro 0.15.* to Upcoming Release

Migration for datasets

⚡️ Since all the datasets (from kedro.io and kedro.contrib.io) were moved to kedro/extras/datasets you must update the type of all datasets in <project>/conf/base/catalog.yml file.
Here how it should be changed: type: <SomeDataSet> -> type: <subfolder of kedro/extras/datasets>.<SomeDataSet> (e.g. type: CSVDataSet -> type: pandas.CSVDataSet).

🗄 In addition, all the specific datasets like CSVLocalDataSet, CSVS3DataSet etc. were deprecated. Instead, you must use generalized datasets like CSVDataSet.
E.g. type: CSVS3DataSet -> type: pandas.CSVDataSet.

Note: No changes required if you are using your custom dataset.

Migration for Pipeline.transform()

Pipeline.transform() has been dropped in favour of the pipeline() constructor. The following changes apply:

Remember to import from kedro.pipeline import pipeline
The prefix argument has been renamed to namespace
And datasets has been broken down into more granular arguments:
- inputs: Independent inputs to the pipeline
- outputs: Any output created in the pipeline, whether an intermediary dataset or a leaf output
- parameters: params:... or parameters

As an example, code that used to look like this with the Pipeline.transform() constructor:

result = my\_pipeline.transform( datasets={"input": "new\_input", "output": "new\_output", "params:x": "params:y"}, prefix="pre")

When used with the new pipeline() constructor, becomes:

from kedro.pipeline import pipelineresult = pipeline( my\_pipeline, inputs={"input": "new\_input"}, outputs={"output": "new\_output"}, parameters={"params:x": "params:y"}, namespace="pre")

Migration for decorators, color logger, transformers etc.

⚡️ Since some modules were moved to other locations you need to update import paths appropriately.
🚀 You can find the list of moved files in the 0.15.6 release notes under the section titled Files with a new location.

Migration for KEDRO_ENV_VAR, the environment variable

⚡️ > Note: If you haven't made significant changes to your kedro_cli.py, it may be easier to simply copy the updated kedro_cli.py .ipython/profile_default/startup/00-kedro-init.py and from GitHub or a newly generated project into your old project.

We've removed KEDRO_ENV_VAR from kedro.context. To get your existing project template working, you'll need to remove all instances of KEDRO_ENV_VAR from your project template:
- From the imports in kedro_cli.py and .ipython/profile_default/startup/00-kedro-init.py: from kedro.context import KEDRO_ENV_VAR, load_context -> from kedro.framework.context import load_context
- Remove the envvar=KEDRO_ENV_VAR line from the click options in run, jupyter_notebook and jupyter_lab in kedro_cli.py
- Replace KEDRO_ENV_VAR with "KEDRO_ENV" in _build_jupyter_env
- Replace context = load_context(path, env=os.getenv(KEDRO_ENV_VAR)) with context = load_context(path) in .ipython/profile_default/startup/00-kedro-init.py

🏗 Migration for `kedro build-reqs`

📚 We have upgraded pip-tools which is used by kedro build-reqs to 5.x. This pip-tools version requires pip>=20.0. To upgrade pip, please refer to their documentation.

👍 Thanks for supporting contributions

@foolsgold, Mani Sarkar, Priyanka Shanbhag, Luis Blanche, Deepyaman Datta, Antony Milne, Panos Psimatikas, Tam-Sanh Nguyen, Tomasz Kaczmarczyk, Kody Fischer, Waylon Walker

v0.15.9 Changes
April 06, 2020
🚀 Release 0.15.9
v0.15.8 Changes
March 05, 2020
Major features and improvements
- ➕ Added the additional libraries to our requirements.txt so pandas.CSVDataSet class works out of box with pip install kedro.
- ➕ Added pandas to our extra_requires in setup.py.
- 👌 Improved the error message when dependencies of a DataSet class are missing.
v0.15.7 Changes
February 26, 2020
Major features and improvements
- ➕ Added in documentation on how to contribute a custom AbstractDataSet implementation.
🐛 Bug fixes and other changes
- 🛠 Fixed the link to the Kedro banner image in the documentation.

Kedro changelog

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Changelog History Page 1

Major features and improvements

🐛 Bug fixes and other changes

👍 Thanks for supporting contributions

🚀 Release 0.16.4

Major features and improvements

🐛 Bug fixes and other changes

👍 Thanks for supporting contributions

Major features and improvements

🐛 Bug fixes and other changes

💥 Breaking changes to the API

Migration guide from Kedro 0.16.1 to 0.16.2

Guide to apply the fix for %run_viz line magic in existing project

👍 Thanks for supporting contributions

🐛 Bug fixes and other changes

Major features and improvements

CLI

Framework

DataSets

Other

🐛 Bug fixes and other changes

💥 Breaking changes to the API

Datasets

Other

🚀 Migration guide from Kedro 0.15.* to Upcoming Release

Migration for datasets

Migration for Pipeline.transform()

Migration for decorators, color logger, transformers etc.

Migration for KEDRO_ENV_VAR, the environment variable

🏗 Migration for kedro build-reqs

👍 Thanks for supporting contributions

Major features and improvements

Major features and improvements

🐛 Bug fixes and other changes

Changelog History

Page 1

Guide to apply the fix for `%run_viz` line magic in existing project

🏗 Migration for `kedro build-reqs`