Changelog History
Page 1
-
v0.16.6 Changes
October 23, 2020Major features and improvements
- β Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks
- β Added kedro-starter-spaceflights alias for generating a project:
kedro new --starter spaceflights
.
π Bug fixes and other changes
- π Fixed
TypeError
when converting dict inputs to a node made from a wrappedpartial
function. PartitionedDataSet
improvements:- Supported passing arguments to the underlying filesystem.
- π Improved handling of non-ASCII word characters in dataset names.
- For example, a dataset named
jalapeΓ±o
will be accessible asDataCatalog.datasets.jalapeΓ±o
rather thanDataCatalog.datasets.jalape__o
.
- For example, a dataset named
- π Fixed
kedro install
for an Anaconda environment defined inenvironment.yml
. - π Fixed backwards compatibility with templates generated with older Kedro versions <0.16.5. No longer need to update
.kedro.yml
to usekedro lint
andkedro jupyter notebook convert
. - π Improved documentation.
- β Added documentation using MinIO with Kedro.
- π Improved error messages for incorrect parameters passed into a node.
- π Fixed issue with saving a
TensorFlowModelDataset
in the HDF5 format with versioning enabled. - Added missing
run_result
argument inafter_pipeline_run
Hooks spec. - π Fixed a bug in IPython script that was causing context hooks to be registered twice. To apply this fix to a project generated with an older Kedro version, apply the same changes made in this PR to your
00-kedro-init.py
file.
π Thanks for supporting contributions
Deepyaman Datta, Bhavya Merchant, Lovkush Agarwal, Varun Krishna S, Sebastian Bertoli, noklam, Daniel Petti, Waylon Walker
-
v0.16.5 Changes
September 09, 2020π Release 0.16.5
-
v0.16.4 Changes
July 30, 2020π Release 0.16.4
Major features and improvements
- π Enabled auto-discovery of hooks implementations coming from installed plugins.
π Bug fixes and other changes
- π Fixed a bug for using
ParallelRunner
on Windows. - Modified
GBQTableDataSet
to load customised results using customised queries from Google Big Query tables. - π Documentation improvements.
π Thanks for supporting contributions
Ajay Bisht, Vijay Sajjanar, Deepyaman Datta, Sebastian Bertoli, Shahil Mawjee, Louis Guitton, Emanuel Ferm
-
v0.16.3 Changes
July 13, 2020π Release 0.16.3
-
v0.16.2 Changes
June 15, 2020Major features and improvements
- β Added the following new datasets.
Type Description Location pandas.AppendableExcelDataSet
Works with Excel
file opened in append modekedro.extras.datasets.pandas
tensorflow.TensorFlowModelDataset
Works with TensorFlow
models using TensorFlow 2.Xkedro.extras.datasets.tensorflow
holoviews.HoloviewsWriter
Works with Holoviews
objects (saves as image file)kedro.extras.datasets.holoviews
- π
kedro install
will now compile project dependencies (by runningkedro build-reqs
behind the scenes) before the installation if thesrc/requirements.in
file doesn't exist. - Added
only_nodes_with_namespace
inPipeline
class to filter only nodes with a specified namespace. - β Added the
kedro pipeline delete
command to help delete unwanted or unused pipelines (it won't remove references to the pipeline in yourcreate_pipelines()
code). - β Added the
kedro pipeline package
command to help package up a modular pipeline. It will bundle up the pipeline source code, tests, and parameters configuration into a .whl file.
π Bug fixes and other changes
- π Improvement in
DataCatalog
:- Introduced regex filtering to the
DataCatalog.list()
method. - Non-alphanumeric characters (except underscore) in dataset name are replaced with
__
inDataCatalog.datasets
, for ease of access to transcoded datasets.
- Introduced regex filtering to the
- π Improvement in Datasets:
- Improved initialization speed of
spark.SparkHiveDataSet
. - Improved S3 cache in
spark.SparkDataSet
. - Added support of options for building
pyarrow
table inpandas.ParquetDataSet
.
- Improved initialization speed of
- π Improvement in
kedro build-reqs
CLI command:kedro build-reqs
is now called with-q
option and will no longer print out compiled requirements to the console for security reasons.- All unrecognized CLI options in
kedro build-reqs
command are now passed to pip-compile call (e.g.kedro build-reqs --generate-hashes
).
- π Improvement in
kedro jupyter
CLI command:- Improved error message when running
kedro jupyter notebook
,kedro jupyter lab
orkedro ipython
with Jupyter/IPython dependencies not being installed. - Fixed
%run_viz
line magic for showing kedro viz inside a Jupyter notebook. For the fix to be applied on existing Kedro project, please see the migration guide. - Fixed the bug in IPython startup script (issue 298).
- Improved error message when running
- π Documentation improvements:
- Updated community-generated content in FAQ.
- Added find-kedro and kedro-static-viz to the list of community plugins.
- Add missing
pillow.ImageDataSet
entry to the documentation.
π₯ Breaking changes to the API
Migration guide from Kedro 0.16.1 to 0.16.2
Guide to apply the fix for
%run_viz
line magic in existing project0οΈβ£ Even though this release ships a fix for project generated with
kedro==0.16.2
, after upgrading, you will still need to make a change in your existing project if it was generated withkedro>=0.16.0,<=0.16.1
for the fix to take effect. Specifically, please change the content of your project's IPython init script located at.ipython/profile_default/startup/00-kedro-init.py
with the content of this file. You will also needkedro-viz>=3.3.1
.π Thanks for supporting contributions
Miguel Rodriguez Gutierrez, Joel Schwarzmann, w0rdsm1th, Deepyaman Datta, Tam-Sanh Nguyen, Marcus Gawronsky
-
v0.16.1 Changes
May 21, 2020π Bug fixes and other changes
- π Fixed deprecation warnings from
kedro.cli
andkedro.context
when runningkedro jupyter notebook
. - π Fixed a bug where
catalog
andcontext
were not available in Jupyter Lab and Notebook. - π Fixed a bug where
kedro build-reqs
would fail if you didn't have your project dependencies installed.
- π Fixed deprecation warnings from
-
v0.16.0 Changes
May 20, 2020Major features and improvements
CLI
- β Added new CLI commands (only available for the projects created using Kedro 0.16.0 or later):
kedro catalog list
to list datasets in your catalogkedro pipeline list
to list pipelineskedro pipeline describe
to describe a specific pipelinekedro pipeline create
to create a modular pipeline
- π Improved the CLI speed by up to 50%.
- π Improved error handling when making a typo on the CLI. We now suggest some of the possible commands you meant to type, in
git
-style.
Framework
- π All modules in
kedro.cli
andkedro.context
have been moved intokedro.framework.cli
andkedro.framework.context
respectively.kedro.cli
andkedro.context
will be removed in future releases. - β Added
Hooks
, which is a new mechanism for extending Kedro. - π Fixed
load_context
changing user's current working directory. - π Allowed the source directory to be configurable in
.kedro.yml
. - β Added the ability to specify nested parameter values inside your node inputs, e.g.
node(func, "params:a.b", None)
DataSets
- β Added the following new datasets.
Type Description Location pillow.ImageDataSet
Work with image files using Pillow
kedro.extras.datasets.pillow
geopandas.GeoJSONDataSet
Work with geospatial data using GeoPandas
kedro.extras.datasets.geopandas.GeoJSONDataSet
api.APIDataSet
Work with data from HTTP(S) API requests kedro.extras.datasets.api.APIDataSet
- β Added
joblib
backend support topickle.PickleDataSet
. - β Added versioning support to
MatplotlibWriter
dataset. - β Added the ability to install dependencies for a given dataset with more granularity, e.g.
pip install "kedro[pandas.ParquetDataSet]"
. - π Added the ability to specify extra arguments, e.g.
encoding
orcompression
, forfsspec.spec.AbstractFileSystem.open()
calls when loading/saving a dataset. See Example 3 under docs.
Other
- β Added
namespace
property onNode
, related to the modular pipeline where the node belongs. - Added an option to enable asynchronous loading inputs and saving outputs in both
SequentialRunner(is_async=True)
andParallelRunner(is_async=True)
class. - β Added
MemoryProfiler
transformer. - β Removed the requirement to have all dependencies for a dataset module to use only a subset of the datasets within.
- β Added support for
pandas>=1.0
. - Enabled Python 3.8 compatibility. Please note that a Spark workflow may be unreliable for this Python version as
pyspark
is not fully-compatible with 3.8 yet. - Renamed "features" layer to "feature" layer to be consistent with (most) other layers and the relevant FAQ.
π Bug fixes and other changes
- π Fixed a bug where a new version created mid-run by an external system caused inconsistencies in the load versions used in the current run.
- π Documentation improvements
- Added instruction in the documentation on how to create a custom runner).
- Updated contribution process in
CONTRIBUTING.md
- added Developer Workflow. - Documented installation of development version of Kedro in the FAQ section.
- Added missing
_exists
method toMyOwnDataSet
example in 04_user_guide/08_advanced_io.
- π Fixed a bug where
PartitionedDataSet
andIncrementalDataSet
were not working withs3a
ors3n
protocol. - β Added ability to read partitioned parquet file from a directory in
pandas.ParquetDataSet
. - Replaced
functools.lru_cache
withcachetools.cachedmethod
inPartitionedDataSet
andIncrementalDataSet
for per-instance cache invalidation. - Implemented custom glob function for
SparkDataSet
when running on Databricks. - π Fixed a bug in
SparkDataSet
not allowing for loading data from DBFS in a Windows machine using Databricks-connect. - π Improved the error message for
DataSetNotFoundError
to suggest possible dataset names user meant to type. - β Added the option for contributors to run Kedro tests locally without Spark installation with
make test-no-spark
. - β Added option to lint the project without applying the formatting changes (
kedro lint --check-only
).
π₯ Breaking changes to the API
Datasets
- β Deleted obsolete datasets from
kedro.io
. - β Deleted
kedro.contrib
andextras
folders. - β Deleted obsolete
CSVBlobDataSet
andJSONBlobDataSet
dataset types. - Made
invalidate_cache
method on datasets private. get_last_load_version
andget_last_save_version
methods are no longer available onAbstractDataSet
.get_last_load_version
andget_last_save_version
have been renamed toresolve_load_version
andresolve_save_version
onAbstractVersionedDataSet
, the results of which are cached.- π The
release()
method on datasets extendingAbstractVersionedDataSet
clears the cached load and save version. All custom datasets must callsuper()._release()
inside_release()
. TextDataSet
no longer hasload_args
andsave_args
. These can instead be specified underopen_args_load
oropen_args_save
infs_args
.PartitionedDataSet
andIncrementalDataSet
methodinvalidate_cache
was made private:_invalidate_caches
.
Other
- Removed
KEDRO_ENV_VAR
fromkedro.context
to speed up the CLI run time. - π
Pipeline.name
has been removed in favour ofPipeline.tag()
. - β¬οΈ Dropped
Pipeline.transform()
in favour ofkedro.pipeline.modular_pipeline.pipeline()
helper function. - π Made constant
PARAMETER_KEYWORDS
private, and moved it fromkedro.pipeline.pipeline
tokedro.pipeline.modular_pipeline
. - π Layers are no longer part of the dataset object, as they've moved to the
DataCatalog
. - π Python 3.5 is no longer supported by the current and all future versions of Kedro.
π Migration guide from Kedro 0.15.* to Upcoming Release
Migration for datasets
β‘οΈ Since all the datasets (from
kedro.io
andkedro.contrib.io
) were moved tokedro/extras/datasets
you must update the type of all datasets in<project>/conf/base/catalog.yml
file.
Here how it should be changed:type: <SomeDataSet>
->type: <subfolder of kedro/extras/datasets>.<SomeDataSet>
(e.g.type: CSVDataSet
->type: pandas.CSVDataSet
).π In addition, all the specific datasets like
CSVLocalDataSet
,CSVS3DataSet
etc. were deprecated. Instead, you must use generalized datasets likeCSVDataSet
.
E.g.type: CSVS3DataSet
->type: pandas.CSVDataSet
.Note: No changes required if you are using your custom dataset.
Migration for Pipeline.transform()
Pipeline.transform()
has been dropped in favour of thepipeline()
constructor. The following changes apply:- Remember to import
from kedro.pipeline import pipeline
- The
prefix
argument has been renamed tonamespace
- And
datasets
has been broken down into more granular arguments:inputs
: Independent inputs to the pipelineoutputs
: Any output created in the pipeline, whether an intermediary dataset or a leaf outputparameters
:params:...
orparameters
As an example, code that used to look like this with the
Pipeline.transform()
constructor:result = my\_pipeline.transform( datasets={"input": "new\_input", "output": "new\_output", "params:x": "params:y"}, prefix="pre")
When used with the new
pipeline()
constructor, becomes:from kedro.pipeline import pipelineresult = pipeline( my\_pipeline, inputs={"input": "new\_input"}, outputs={"output": "new\_output"}, parameters={"params:x": "params:y"}, namespace="pre")
Migration for decorators, color logger, transformers etc.
β‘οΈ Since some modules were moved to other locations you need to update import paths appropriately.
π You can find the list of moved files in the0.15.6
release notes under the section titledFiles with a new location
.Migration for KEDRO_ENV_VAR, the environment variable
β‘οΈ > Note: If you haven't made significant changes to your
kedro_cli.py
, it may be easier to simply copy the updatedkedro_cli.py
.ipython/profile_default/startup/00-kedro-init.py
and from GitHub or a newly generated project into your old project.- We've removed
KEDRO_ENV_VAR
fromkedro.context
. To get your existing project template working, you'll need to remove all instances ofKEDRO_ENV_VAR
from your project template:- From the imports in
kedro_cli.py
and.ipython/profile_default/startup/00-kedro-init.py
:from kedro.context import KEDRO_ENV_VAR, load_context
->from kedro.framework.context import load_context
- Remove the
envvar=KEDRO_ENV_VAR
line from the click options inrun
,jupyter_notebook
andjupyter_lab
inkedro_cli.py
- Replace
KEDRO_ENV_VAR
with"KEDRO_ENV"
in_build_jupyter_env
- Replace
context = load_context(path, env=os.getenv(KEDRO_ENV_VAR))
withcontext = load_context(path)
in.ipython/profile_default/startup/00-kedro-init.py
- From the imports in
π Migration for
kedro build-reqs
π We have upgraded
pip-tools
which is used bykedro build-reqs
to 5.x. Thispip-tools
version requirespip>=20.0
. To upgradepip
, please refer to their documentation.π Thanks for supporting contributions
@foolsgold, Mani Sarkar, Priyanka Shanbhag, Luis Blanche, Deepyaman Datta, Antony Milne, Panos Psimatikas, Tam-Sanh Nguyen, Tomasz Kaczmarczyk, Kody Fischer, Waylon Walker
- β Added new CLI commands (only available for the projects created using Kedro 0.16.0 or later):
-
v0.15.9 Changes
April 06, 2020π Release 0.15.9
-
v0.15.8 Changes
March 05, 2020Major features and improvements
- β Added the additional libraries to our
requirements.txt
sopandas.CSVDataSet
class works out of box withpip install kedro
. - β Added
pandas
to ourextra_requires
insetup.py
. - π Improved the error message when dependencies of a
DataSet
class are missing.
- β Added the additional libraries to our
-
v0.15.7 Changes
February 26, 2020Major features and improvements
- β Added in documentation on how to contribute a custom AbstractDataSet implementation.
π Bug fixes and other changes
- π Fixed the link to the Kedro banner image in the documentation.