All Versions
17
Latest Version
Avg Release Cycle
28 days
Latest Release
41 days ago

Changelog History
Page 1

  • v0.16.5

    September 09, 2020

    ๐Ÿš€ Release 0.16.5

  • v0.16.4

    July 30, 2020

    ๐Ÿš€ Release 0.16.4

    Major features and improvements

    • ๐Ÿ”Œ Enabled auto-discovery of hooks implementations coming from installed plugins.

    ๐Ÿ› Bug fixes and other changes

    • ๐Ÿ›  Fixed a bug for using ParallelRunner on Windows.
    • Modified GBQTableDataSet to load customised results using customised queries from Google Big Query tables.
    • ๐Ÿ“š Documentation improvements.

    ๐Ÿ‘ Thanks for supporting contributions

    Ajay Bisht, Vijay Sajjanar, Deepyaman Datta, Sebastian Bertoli, Shahil Mawjee, Louis Guitton, Emanuel Ferm

  • v0.16.3

    July 13, 2020

    ๐Ÿš€ Release 0.16.3

  • v0.16.2

    June 15, 2020

    Major features and improvements

    • โž• Added the following new datasets.
    Type Description Location
    pandas.AppendableExcelDataSet Works with Excel file opened in append mode kedro.extras.datasets.pandas
    tensorflow.TensorFlowModelDataset Works with TensorFlow models using TensorFlow 2.X kedro.extras.datasets.tensorflow
    holoviews.HoloviewsWriter Works with Holoviews objects (saves as image file) kedro.extras.datasets.holoviews
    • ๐Ÿ— kedro install will now compile project dependencies (by running kedro build-reqs behind the scenes) before the installation if the src/requirements.in file doesn't exist.
    • Added only_nodes_with_namespace in Pipeline class to filter only nodes with a specified namespace.
    • โž• Added the kedro pipeline delete command to help delete unwanted or unused pipelines (it won't remove references to the pipeline in your create_pipelines() code).
    • โž• Added the kedro pipeline package command to help package up a modular pipeline. It will bundle up the pipeline source code, tests, and parameters configuration into a .whl file.

    ๐Ÿ› Bug fixes and other changes

    • ๐Ÿ‘Œ Improvement in DataCatalog:
      • Introduced regex filtering to the DataCatalog.list() method.
      • Non-alphanumeric characters (except underscore) in dataset name are replaced with __ in DataCatalog.datasets, for ease of access to transcoded datasets.
    • ๐Ÿ‘Œ Improvement in Datasets:
      • Improved initialization speed of spark.SparkHiveDataSet.
      • Improved S3 cache in spark.SparkDataSet.
      • Added support of options for building pyarrow table in pandas.ParquetDataSet.
    • ๐Ÿ‘Œ Improvement in kedro build-reqs CLI command:
      • kedro build-reqs is now called with -q option and will no longer print out compiled requirements to the console for security reasons.
      • All unrecognized CLI options in kedro build-reqs command are now passed to pip-compile call (e.g. kedro build-reqs --generate-hashes).
    • ๐Ÿ‘Œ Improvement in kedro jupyter CLI command:
      • Improved error message when running kedro jupyter notebook, kedro jupyter lab or kedro ipython with Jupyter/IPython dependencies not being installed.
      • Fixed %run_viz line magic for showing kedro viz inside a Jupyter notebook. For the fix to be applied on existing Kedro project, please see the migration guide.
      • Fixed the bug in IPython startup script (issue 298).
    • ๐Ÿ“š Documentation improvements:
      • Updated community-generated content in FAQ.
      • Added find-kedro and kedro-static-viz to the list of community plugins.
      • Add missing pillow.ImageDataSet entry to the documentation.

    ๐Ÿ’ฅ Breaking changes to the API

    Migration guide from Kedro 0.16.1 to 0.16.2

    Guide to apply the fix for %run_viz line magic in existing project

    0๏ธโƒฃ Even though this release ships a fix for project generated with kedro==0.16.2, after upgrading, you will still need to make a change in your existing project if it was generated with kedro>=0.16.0,<=0.16.1 for the fix to take effect. Specifically, please change the content of your project's IPython init script located at .ipython/profile_default/startup/00-kedro-init.py with the content of this file. You will also need kedro-viz>=3.3.1.

    ๐Ÿ‘ Thanks for supporting contributions

    Miguel Rodriguez Gutierrez, Joel Schwarzmann, w0rdsm1th, Deepyaman Datta, Tam-Sanh Nguyen, Marcus Gawronsky

  • v0.16.1

    May 21, 2020

    ๐Ÿ› Bug fixes and other changes

    • ๐Ÿ›  Fixed deprecation warnings from kedro.cli and kedro.context when running kedro jupyter notebook.
    • ๐Ÿ›  Fixed a bug where catalog and context were not available in Jupyter Lab and Notebook.
    • ๐Ÿ›  Fixed a bug where kedro build-reqs would fail if you didn't have your project dependencies installed.
  • v0.16.0

    May 20, 2020

    Major features and improvements

    CLI

    • โž• Added new CLI commands (only available for the projects created using Kedro 0.16.0 or later):
      • kedro catalog list to list datasets in your catalog
      • kedro pipeline list to list pipelines
      • kedro pipeline describe to describe a specific pipeline
      • kedro pipeline create to create a modular pipeline
    • ๐Ÿ‘Œ Improved the CLI speed by up to 50%.
    • ๐Ÿ‘Œ Improved error handling when making a typo on the CLI. We now suggest some of the possible commands you meant to type, in git-style.

    Framework

    • ๐Ÿš€ All modules in kedro.cli and kedro.context have been moved into kedro.framework.cli and kedro.framework.context respectively. kedro.cli and kedro.context will be removed in future releases.
    • โž• Added Hooks, which is a new mechanism for extending Kedro.
    • ๐Ÿ›  Fixed load_context changing user's current working directory.
    • ๐Ÿ‘ Allowed the source directory to be configurable in .kedro.yml.
    • โž• Added the ability to specify nested parameter values inside your node inputs, e.g. node(func, "params:a.b", None)

    DataSets

    • โž• Added the following new datasets.
    Type Description Location
    pillow.ImageDataSet Work with image files using Pillow kedro.extras.datasets.pillow
    geopandas.GeoJSONDataSet Work with geospatial data using GeoPandas kedro.extras.datasets.geopandas.GeoJSONDataSet
    api.APIDataSet Work with data from HTTP(S) API requests kedro.extras.datasets.api.APIDataSet
    • โž• Added joblib backend support to pickle.PickleDataSet.
    • โž• Added versioning support to MatplotlibWriter dataset.
    • โž• Added the ability to install dependencies for a given dataset with more granularity, e.g. pip install "kedro[pandas.ParquetDataSet]".
    • ๐Ÿ‘‰ Added the ability to specify extra arguments, e.g. encoding or compression, for fsspec.spec.AbstractFileSystem.open() calls when loading/saving a dataset. See Example 3 under docs.

    Other

    • โž• Added namespace property on Node, related to the modular pipeline where the node belongs.
    • Added an option to enable asynchronous loading inputs and saving outputs in both SequentialRunner(is_async=True) and ParallelRunner(is_async=True) class.
    • โž• Added MemoryProfiler transformer.
    • โœ‚ Removed the requirement to have all dependencies for a dataset module to use only a subset of the datasets within.
    • โž• Added support for pandas>=1.0.
    • Enabled Python 3.8 compatibility. Please note that a Spark workflow may be unreliable for this Python version as pyspark is not fully-compatible with 3.8 yet.
    • Renamed "features" layer to "feature" layer to be consistent with (most) other layers and the relevant FAQ.

    ๐Ÿ› Bug fixes and other changes

    • ๐Ÿ›  Fixed a bug where a new version created mid-run by an external system caused inconsistencies in the load versions used in the current run.
    • ๐Ÿ“š Documentation improvements
      • Added instruction in the documentation on how to create a custom runner).
      • Updated contribution process in CONTRIBUTING.md - added Developer Workflow.
      • Documented installation of development version of Kedro in the FAQ section.
      • Added missing _exists method to MyOwnDataSet example in 04_user_guide/08_advanced_io.
    • ๐Ÿ›  Fixed a bug where PartitionedDataSet and IncrementalDataSet were not working with s3a or s3n protocol.
    • โž• Added ability to read partitioned parquet file from a directory in pandas.ParquetDataSet.
    • Replaced functools.lru_cache with cachetools.cachedmethod in PartitionedDataSet and IncrementalDataSet for per-instance cache invalidation.
    • Implemented custom glob function for SparkDataSet when running on Databricks.
    • ๐Ÿ›  Fixed a bug in SparkDataSet not allowing for loading data from DBFS in a Windows machine using Databricks-connect.
    • ๐Ÿ‘Œ Improved the error message for DataSetNotFoundError to suggest possible dataset names user meant to type.
    • โž• Added the option for contributors to run Kedro tests locally without Spark installation with make test-no-spark.
    • โž• Added option to lint the project without applying the formatting changes (kedro lint --check-only).

    ๐Ÿ’ฅ Breaking changes to the API

    Datasets

    • โœ‚ Deleted obsolete datasets from kedro.io.
    • โœ‚ Deleted kedro.contrib and extras folders.
    • โœ‚ Deleted obsolete CSVBlobDataSet and JSONBlobDataSet dataset types.
    • Made invalidate_cache method on datasets private.
    • get_last_load_version and get_last_save_version methods are no longer available on AbstractDataSet.
    • get_last_load_version and get_last_save_version have been renamed to resolve_load_version and resolve_save_version on AbstractVersionedDataSet, the results of which are cached.
    • ๐Ÿš€ The release() method on datasets extending AbstractVersionedDataSet clears the cached load and save version. All custom datasets must call super()._release() inside _release().
    • TextDataSet no longer has load_args and save_args. These can instead be specified under open_args_load or open_args_save in fs_args.
    • PartitionedDataSet and IncrementalDataSet method invalidate_cache was made private: _invalidate_caches.

    Other

    • Removed KEDRO_ENV_VAR from kedro.context to speed up the CLI run time.
    • ๐Ÿšš Pipeline.name has been removed in favour of Pipeline.tag().
    • โฌ‡๏ธ Dropped Pipeline.transform() in favour of kedro.pipeline.modular_pipeline.pipeline() helper function.
    • ๐Ÿšš Made constant PARAMETER_KEYWORDS private, and moved it from kedro.pipeline.pipeline to kedro.pipeline.modular_pipeline.
    • ๐Ÿšš Layers are no longer part of the dataset object, as they've moved to the DataCatalog.
    • ๐Ÿ‘ Python 3.5 is no longer supported by the current and all future versions of Kedro.

    ๐Ÿš€ Migration guide from Kedro 0.15.* to Upcoming Release

    Migration for datasets

    โšก๏ธ Since all the datasets (from kedro.io and kedro.contrib.io) were moved to kedro/extras/datasets you must update the type of all datasets in <project>/conf/base/catalog.yml file.
    Here how it should be changed: type: <SomeDataSet> -> type: <subfolder of kedro/extras/datasets>.<SomeDataSet> (e.g. type: CSVDataSet -> type: pandas.CSVDataSet).

    ๐Ÿ—„ In addition, all the specific datasets like CSVLocalDataSet, CSVS3DataSet etc. were deprecated. Instead, you must use generalized datasets like CSVDataSet.
    E.g. type: CSVS3DataSet -> type: pandas.CSVDataSet.

    Note: No changes required if you are using your custom dataset.

    Migration for Pipeline.transform()

    Pipeline.transform() has been dropped in favour of the pipeline() constructor. The following changes apply:

    • Remember to import from kedro.pipeline import pipeline
    • The prefix argument has been renamed to namespace
    • And datasets has been broken down into more granular arguments:
      • inputs: Independent inputs to the pipeline
      • outputs: Any output created in the pipeline, whether an intermediary dataset or a leaf output
      • parameters: params:... or parameters

    As an example, code that used to look like this with the Pipeline.transform() constructor:

    result = my\_pipeline.transform( datasets={"input": "new\_input", "output": "new\_output", "params:x": "params:y"}, prefix="pre")
    

    When used with the new pipeline() constructor, becomes:

    from kedro.pipeline import pipelineresult = pipeline( my\_pipeline, inputs={"input": "new\_input"}, outputs={"output": "new\_output"}, parameters={"params:x": "params:y"}, namespace="pre")
    
    Migration for decorators, color logger, transformers etc.

    โšก๏ธ Since some modules were moved to other locations you need to update import paths appropriately.
    ๐Ÿš€ You can find the list of moved files in the 0.15.6 release notes under the section titled Files with a new location.

    Migration for KEDRO_ENV_VAR, the environment variable

    โšก๏ธ > Note: If you haven't made significant changes to your kedro_cli.py, it may be easier to simply copy the updated kedro_cli.py .ipython/profile_default/startup/00-kedro-init.py and from GitHub or a newly generated project into your old project.

    • We've removed KEDRO_ENV_VAR from kedro.context. To get your existing project template working, you'll need to remove all instances of KEDRO_ENV_VAR from your project template:
      • From the imports in kedro_cli.py and .ipython/profile_default/startup/00-kedro-init.py: from kedro.context import KEDRO_ENV_VAR, load_context -> from kedro.framework.context import load_context
      • Remove the envvar=KEDRO_ENV_VAR line from the click options in run, jupyter_notebook and jupyter_lab in kedro_cli.py
      • Replace KEDRO_ENV_VAR with "KEDRO_ENV" in _build_jupyter_env
      • Replace context = load_context(path, env=os.getenv(KEDRO_ENV_VAR)) with context = load_context(path) in .ipython/profile_default/startup/00-kedro-init.py
    ๐Ÿ— Migration for kedro build-reqs

    ๐Ÿ“š We have upgraded pip-tools which is used by kedro build-reqs to 5.x. This pip-tools version requires pip>=20.0. To upgrade pip, please refer to their documentation.

    ๐Ÿ‘ Thanks for supporting contributions

    @foolsgold, Mani Sarkar, Priyanka Shanbhag, Luis Blanche, Deepyaman Datta, Antony Milne, Panos Psimatikas, Tam-Sanh Nguyen, Tomasz Kaczmarczyk, Kody Fischer, Waylon Walker

  • v0.15.9

    April 06, 2020

    ๐Ÿš€ Release 0.15.9

  • v0.15.8

    March 05, 2020

    Major features and improvements

    • โž• Added the additional libraries to our requirements.txt so pandas.CSVDataSet class works out of box with pip install kedro.
    • โž• Added pandas to our extra_requires in setup.py.
    • ๐Ÿ‘Œ Improved the error message when dependencies of a DataSet class are missing.
  • v0.15.7

    February 26, 2020

    Major features and improvements

    • โž• Added in documentation on how to contribute a custom AbstractDataSet implementation.

    ๐Ÿ› Bug fixes and other changes

    • ๐Ÿ›  Fixed the link to the Kedro banner image in the documentation.
  • v0.15.6

    February 26, 2020

    Major features and improvements

    TL;DR We're launching kedro.extras, the new home for our revamped series of datasets, decorators and dataset transformers. The datasets in kedro.extras.datasets use fsspec to access a variety of data stores including local file systems, network file systems, cloud object stores (including S3 and GCP), and Hadoop, read more about this here. The change will allow #178 to happen in the next major release of Kedro.

    ๐Ÿ‘€ An example of this new system can be seen below, loading the CSV SparkDataSet from S3:

    weather: type: spark.SparkDataSet # Observe the specified type, this affects all datasetsfilepath: s3a://your\_bucket/data/01\_raw/weather\* # filepath uses fsspec to indicate the file storage systemcredentials: dev\_s3file\_format: csv
    

    ๐Ÿ‘‰ You can also load data incrementally whenever it is dumped into a directory with the extension to PartionedDataSet, a feature that allows you to load a directory of files. The IncrementalDataSet stores the information about the last processed partition in a checkpoint, read more about this feature here.

    ๐Ÿ†• New features

    • Added layer attribute for datasets in kedro.extras.datasets to specify the name of a layer according to data engineering convention, this feature will be passed to kedro-viz in future releases.
    • Enabled loading a particular version of a dataset in Jupyter Notebooks and iPython, using catalog.load("dataset_name", version="<2019-12-13T15.08.09.255Z>").
    • Added property run_id on ProjectContext, used for versioning using the Journal. To customise your journal run_id you can override the private method _get_run_id().
    • โž• Added the ability to install all optional kedro dependencies via pip install "kedro[all]".
    • Modified the DataCatalog's load order for datasets, loading order is the following:
      • kedro.io
      • kedro.extras.datasets
      • Import path, specified in type
    • โž• Added an optional copy_mode flag to CachedDataSet and MemoryDataSet to specify (deepcopy, copy or assign) the copy mode to use when loading and saving.

    ๐Ÿ†• New Datasets

    Type Description Location
    ParquetDataSet Handles parquet datasets using Dask kedro.extras.datasets.dask
    โœ… PickleDataSet Work with Pickle files using fsspec to communicate with the underlying filesystem
    โœ… CSVDataSet Work with CSV files using fsspec to communicate with the underlying filesystem
    โœ… TextDataSet Work with text files using fsspec to communicate with the underlying filesystem
    โœ… ExcelDataSet Work with Excel files using fsspec to communicate with the underlying filesystem
    โœ… HDFDataSet Work with HDF using fsspec to communicate with the underlying filesystem
    โœ… YAMLDataSet Work with YAML files using fsspec to communicate with the underlying filesystem
    โœ… MatplotlibWriter Save with Matplotlib images using fsspec to communicate with the underlying filesystem
    โœ… NetworkXDataSet Work with NetworkX files using fsspec to communicate with the underlying filesystem
    โœ… BioSequenceDataSet Work with bio-sequence objects using fsspec to communicate with the underlying filesystem
    GBQTableDataSet Work with Google BigQuery kedro.extras.datasets.pandas
    โœ… FeatherDataSet Work with feather files using fsspec to communicate with the underlying filesystem
    IncrementalDataSet Inherit from PartitionedDataSet and remembers the last processed partition kedro.io

    Files with a new location

    Type New Location
    JSONDataSet kedro.extras.datasets.pandas
    CSVBlobDataSet kedro.extras.datasets.pandas
    JSONBlobDataSet kedro.extras.datasets.pandas
    SQLTableDataSet kedro.extras.datasets.pandas
    SQLQueryDataSet kedro.extras.datasets.pandas
    SparkDataSet kedro.extras.datasets.spark
    SparkHiveDataSet kedro.extras.datasets.spark
    SparkJDBCDataSet kedro.extras.datasets.spark
    kedro/contrib/decorators/retry.py kedro/extras/decorators/retry_node.py
    kedro/contrib/decorators/memory_profiler.py kedro/extras/decorators/memory_profiler.py
    kedro/contrib/io/transformers/transformers.py kedro/extras/transformers/time_profiler.py
    ๐ŸŒฒ kedro/contrib/colors/logging/color_logger.py
    extras/ipython_loader.py tools/ipython/ipython_loader.py
    kedro/contrib/io/cached/cached_dataset.py kedro/io/cached_dataset.py
    kedro/contrib/io/catalog_with_default/data_catalog_with_default.py kedro/io/data_catalog_with_default.py
    kedro/contrib/config/templated_config.py kedro/config/templated_config.py

    ๐Ÿ—„ Upcoming deprecations

    Category Type
    Datasets BioSequenceLocalDataSet
    CSVGCSDataSet
    CSVHTTPDataSet
    CSVLocalDataSet
    CSVS3DataSet
    ExcelLocalDataSet
    FeatherLocalDataSet
    JSONGCSDataSet
    JSONLocalDataSet
    HDFLocalDataSet
    HDFS3DataSet
    kedro.contrib.io.cached.CachedDataSet
    kedro.contrib.io.catalog_with_default.DataCatalogWithDefault
    MatplotlibLocalWriter
    MatplotlibS3Writer
    NetworkXLocalDataSet
    ParquetGCSDataSet
    ParquetLocalDataSet
    ParquetS3DataSet
    PickleLocalDataSet
    PickleS3DataSet
    TextLocalDataSet
    YAMLLocalDataSet
    Decorators kedro.contrib.decorators.memory_profiler
    kedro.contrib.decorators.retry
    kedro.contrib.decorators.pyspark.spark_to_pandas
    kedro.contrib.decorators.pyspark.pandas_to_spark
    Transformers kedro.contrib.io.transformers.transformers
    ๐Ÿ”ง Configuration Loaders

    ๐Ÿ› Bug fixes and other changes

    • โž• Added the option to set/overwrite params in config.yaml using YAML dict style instead of string CLI formatting only.
    • ๐Ÿš€ Kedro CLI arguments --node and --tag support comma-separated values, alternative methods will be deprecated in future releases.
    • ๐Ÿ›  Fixed a bug in the invalidate_cache method of ParquetGCSDataSet and CSVGCSDataSet.
    • --load-version now won't break if version value contains a colon.
    • Enabled running nodes with duplicate inputs.
    • ๐Ÿ‘Œ Improved error message when empty credentials are passed into SparkJDBCDataSet.
    • ๐Ÿ›  Fixed bug that caused an empty project to fail unexpectedly with ImportError in template/.../pipeline.py.
    • ๐Ÿ›  Fixed bug related to saving dataframe with categorical variables in table mode using HDFS3DataSet.
    • Fixed bug that caused unexpected behavior when using from_nodes and to_nodes in pipelines using transcoding.
    • Credentials nested in the dataset config are now also resolved correctly.
    • โฌ†๏ธ Bumped minimum required pandas version to 0.24.0 to make use of pandas.DataFrame.to_numpy (recommended alternative to pandas.DataFrame.values).
    • ๐Ÿ“„ Docs improvements.
    • Pipeline.transform skips modifying node inputs/outputs containing params: or parameters keywords.
    • ๐Ÿ‘Œ Support for dataset_credentials key in the credentials for PartitionedDataSet is now deprecated. The dataset credentials should be specified explicitly inside the dataset config.
    • Datasets can have a new confirm function which is called after a successful node function execution if the node contains confirms argument with such dataset name.
    • ๐Ÿ‘‰ Make the resume prompt on pipeline run failure use --from-nodes instead of --from-inputs to avoid unnecessarily re-running nodes that had already executed.
    • โšก๏ธ When closed, Jupyter notebook kernels are automatically terminated after 30 seconds of inactivity by default. Use --idle-timeout option to update it.
    • โž• Added kedro-viz to the Kedro project template requirements.txt file.
    • โœ‚ Removed the results and references folder from the project template.
    • โšก๏ธ Updated contribution process in CONTRIBUTING.md.

    ๐Ÿ’ฅ Breaking changes to the API

    • Existing MatplotlibWriter dataset in contrib was renamed to MatplotlibLocalWriter.
    • kedro/contrib/io/matplotlib/matplotlib_writer.py was renamed to kedro/contrib/io/matplotlib/matplotlib_local_writer.py.
    • kedro.contrib.io.bioinformatics.sequence_dataset.py was renamed to kedro.contrib.io.bioinformatics.biosequence_local_dataset.py.

    ๐Ÿ‘ Thanks for supporting contributions

    Andrii Ivaniuk, Jonas Kemper, Yuhao Zhu, Balazs Konig, Pedro Abreu, Tam-Sanh Nguyen, Peter Zhao, Deepyaman Datta, Florian Roessler, Miguel Rodriguez Gutierrez