Popularity

8.7

Stable

Activity

9.7

Growing

Stars 9,353

Watchers 104

Forks 864

Last Commit 6 days ago

Description

Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. We provide a standard approach so that you can: - spend more time building your data pipeline, - worry less about how to write production-ready code, - standardise the way that your team collaborates across your project, - work more efficiently. Is designed to assist both during development and production, allowing quick iterations Enforces separation of concerns between data processing and data storing Does the heavy lifting for dependency resolution Passes data between nodes for faster iterations during development

Programming language: Python

License: Apache License 2.0

Tags: Science And Data Analysis ETL Scalability Workflow Engine

Latest version: v0.16.6

Kedro alternatives and similar packages

Based on the "Workflow Engine" category.
Alternatively, view Kedro alternatives based on common mentions on social networks and blogs.

Airflow

9.8 10.0 Kedro VS Airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
luigi

9.4 6.4 L3 Kedro VS luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

Pinball

5.2 0.1 Kedro VS Pinball

DISCONTINUED. Pinball is a scalable workflow manager
BPMN_RPA

1.2 7.9 Kedro VS BPMN_RPA

Robotic Process Automation in Windows and Linux by using Diagrams.net BPMN diagrams.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Kedro or a related project?

Add another 'Workflow Engine' Package

Popular Comparisons

README

Kedro Logo Banner CircleCI - Main Branch Develop Branch Build

What is Kedro?

Kedro is an open-source Python framework to create reproducible, maintainable, and modular data science code. It uses software engineering best practices to help you build production-ready data engineering and data science pipelines.

Kedro is hosted by the LF AI & Data Foundation.

How do I install Kedro?

To install Kedro from the Python Package Index (PyPI) run:

pip install kedro

It is also possible to install Kedro using conda:

conda install -c conda-forge kedro

Our Get Started guide contains full installation instructions, and includes how to set up Python virtual environments.

What are the main features of Kedro?

A pipeline visualisation generated using Kedro-Viz

Feature	What is this?
Project Template	A standard, modifiable and easy-to-use project template based on Cookiecutter Data Science.
Data Catalog	A series of lightweight data connectors used to save and load data across many different file formats and file systems, including local and network file systems, cloud object stores, and HDFS. The Data Catalog also includes data and model versioning for file-based systems.
Pipeline Abstraction	Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using Kedro-Viz.
Coding Standards	Test-driven development using `pytest`, produce well-documented code using Sphinx, create linted code with support for `flake8`, `isort` and `black` and make use of the standard Python logging library.
Flexible Deployment	Deployment strategies that include single or distributed-machine deployment as well as additional support for deploying on Argo, Prefect, Kubeflow, AWS Batch and Databricks.

How do I use Kedro?

The Kedro documentation first explains how to install Kedro and then introduces key Kedro concepts.

The first example illustrates the basics of a Kedro project using the Iris dataset
You can then review the spaceflights tutorial to build a Kedro project for hands-on experience

For new and intermediate Kedro users, there's a comprehensive section on how to visualise Kedro projects using Kedro-Viz and how to work with Kedro and Jupyter notebooks.

Further documentation is available for more advanced Kedro usage and deployment. We also recommend the glossary and the API reference documentation for additional information.

Why does Kedro exist?

Kedro is built upon our collective best-practice (and mistakes) trying to deliver real-world ML applications that have vast amounts of raw unvetted data. We developed Kedro to achieve the following:

To address the main shortcomings of Jupyter notebooks, one-off scripts, and glue-code because there is a focus on creating maintainable data science code
To enhance team collaboration when different team members have varied exposure to software engineering concepts
To increase efficiency, because applied concepts like modularity and separation of concerns inspire the creation of reusable analytics code

The humans behind Kedro

Kedro is maintained by a product team and a number of contributors from across the world.

Can I contribute?

Yes! Want to help build Kedro? Check out our guide to contributing to Kedro.

Where can I learn more?

There is a growing community around Kedro. Have a look at the Kedro FAQs to find projects using Kedro and links to articles, podcasts and talks.

Who likes Kedro?

There are Kedro users across the world, who work at start-ups, major enterprises and academic institutions like Absa, Acensi, Advanced Programming Solutions SL, AI Singapore, Augment Partners, AXA UK, Belfius, Beamery, Caterpillar, CRIM, Dendra Systems, Element AI, GetInData, GMO, Indicium, Imperial College London, ING, Jungle Scout, Helvetas, Leapfrog, McKinsey & Company, Mercado Libre Argentina, Modec, Mosaic Data Science, NaranjaX, NASA, NHS AI Lab, Open Data Science LatAm, Prediqt, QuantumBlack, ReSpo.Vision, Retrieva, Roche, Sber, Société Générale, Telkomsel, Universidad Rey Juan Carlos, UrbanLogiq, Wildlife Studios, WovenLight and XP.

Kedro won Best Technical Tool or Framework for AI in the 2019 Awards AI competition and a merit award for the 2020 UK Technical Communication Awards. It is listed on the 2020 ThoughtWorks Technology Radar and the 2020 Data & AI Landscape. Kedro has received an honorable mention in the User Experience category in Fast Company’s 2022 Innovation by Design Awards.

How can I cite Kedro?

If you're an academic, Kedro can also help you, for example, as a tool to solve the problem of reproducible research. Use the "Cite this repository" button on our repository to generate a citation from the CITATION.cff file.

*Note that all licence references and agreements mentioned in the Kedro README section above are relevant to that project's source code only.

Kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.