All Versions
182
Latest Version
Avg Release Cycle
-
Latest Release
-

Changelog History
Page 3

  • v1.0.0 Changes

    Major Changes

    • πŸ“„ A docs site overhaul! Along with tons of additional content, the existing pages have been significantly edited and reorganized to improve readability.
    • All Dagster examples are revamped with a consistent project layout, descriptive names, and more helpful README files.
    • A new dagster projectCLI contains commands for bootstrapping new Dagster projects and repositories:
      • dagster project scaffold creates a folder structure with a single Dagster repository and other files such as workspace.yaml. This CLI enables you to quickly start building a new Dagster project with everything set up.
      • dagster project from-example downloads one of the Dagster examples. This CLI helps you to quickly bootstrap your project with an officially maintained example. You can find the available examples via dagster project list-examples.
      • Check out Create a New Project for more details.
    • A default_executor_def argument has been added to the @repository decorator. If specified, this will be used for any jobs (asset or op) which do not explicitly set an executor_def.
    • A default_logger_defs argument has been added to the @repository decorator, which works in the same way as default_executor_def.
    • πŸ‘· A new execute_job function presents a Python API for kicking off runs of your jobs.
    • πŸ‘· Run status sensors may now yield RunRequests, allowing you to kick off a job in response to the status of another job.
    • When loading an upstream asset or op output as an input, you can now set custom loading behavior using the input_manager_key argument to AssetIn and In.
    • 🍱 In the UI, the global lineage graph has been brought back and reworked! The graph keeps assets in the same group visually clustered together, and the query bar allows you to visualize a custom slice of your asset graph.

    πŸ’₯ Breaking Changes and Deprecations

    Legacy API Removals

    🚚 In 1.0.0, a large number of previously-deprecated APIs have been fully removed. A full list of breaking changes and deprecations, alongside instructions on how to migrate older code, can be found in MIGRATION.md. At a high level:

    • πŸ—„ The solid and pipeline APIs have been removed, along with references to them in extension libraries, arguments, and the CLI (deprecated in 0.13.0).
    • The AssetGroup and build_asset_job APIs, and a host of deprecated arguments to asset-related functions, have been removed (deprecated in 0.15.0).
    • πŸ—„ The EventMetadata and EventMetadataEntryData APIs have been removed (deprecated in 0.15.0).

    πŸ—„ Deprecations

    • dagster_type_materializer and DagsterTypeMaterializer have been marked experimental and will likely be removed within a 1.x release. Instead, use an IOManager.
    • πŸš€ FileManager and FileHandle have been marked experimental and will likely be removed within a 1.x release.

    Other Changes

    • πŸ‘ As of 1.0.0, Dagster no longer guarantees support for python 3.6. This is in line with PEP 494, which outlines that 3.6 has reached end of life.
    • [planned] In an upcoming 1.x release, we plan to make a change that renders values supplied to configured in Dagit. Up through this point, values provided to configured have not been sent anywhere outside the process where they were used. This change will mean that, like other places you can supply configuration, configured is not a good place to put secrets: You should not include any values in configuration that you don't want to be stored in the Dagster database and displayed inside Dagit.
    • fs_io_manager, s3_pickle_io_manager, and gcs_pickle_io_manager, and adls_pickle_io_manager no longer write out a file or object when handling an output with the None or Nothing type.
    • The custom_path_fs_io_manager has been removed, as its functionality is entirely subsumed by the fs_io_manager, where a custom path can be specified via config.
    • 0️⃣ The default typing_type of a DagsterType is now typing.Any instead of None.
    • πŸš€ Dagster’s integration libraries haven’t yet achieved the same API maturity as Dagster core. For this reason, all integration libraries will remain on a pre-1.0 (0.16.x) versioning track for the time being. However, 0.16.x library releases remain fully compatible with Dagster 1.x. In the coming months, we will graduate integration libraries one-by-one to the 1.x versioning track as they achieve API maturity. If you have installs of the form:
    pip install dagster=={DAGSTER_VERSION} dagster-somelibrary=={DAGSTER_VERSION}
    

    this should be converted to:

    pip install dagster=={DAGSTER_VERSION} dagster-somelibrary
    

    to make sure the correct library version is installed.

    πŸ†• New since 0.15.8

    • [dagster-databricks] When using the databricks_pyspark_step_launcher the events sent back to the host process are now compressed before sending, resulting in significantly better performance for steps which produce a large number of events.
    • 🍱 [dagster-dbt] If an error occurs in load_assets_from_dbt_project while loading your repository, the error message in Dagit will now display additional context from the dbt logs, instead of just DagsterDbtCliFatalRuntimeError.

    πŸ›  Bugfixes

    • 🍱 Fixed a bug that causes Dagster to ignore the group_name argument to AssetsDefinition.from_graph when a key_prefix argument is also present.
    • πŸ›  Fixed a bug which could cause GraphQL errors in Dagit when loading repositories that contained multiple assets created from the same graph.
    • 🍱 Ops and software-defined assets with the None return type annotation are now given the Nothing type instead of the Any type.
    • Fixed a bug that caused AssetsDefinition.from_graph and from_op to fail when invoked on a configured op.
    • ⚠ The materialize function, which is not experimental, no longer emits an experimental warning.
    • πŸ›  Fixed a bug where runs from different repositories would be intermingled when viewing the runs for a specific repository-scoped job/schedule/sensor.
    • πŸ”Š [dagster-dbt] A regression was introduced in 0.15.8 that would cause dbt logs to show up in json format in the UI. This has been fixed.
    • [dagster-databricks] Previously, if you were using the databricks_pyspark_step_launcher, and the external step failed to start, a RESOURCE_DOES_NOT_EXIST error would be surfaced, without helpful context. Now, in most cases, the root error causing the step to fail will be surfaced instead.

    πŸ“š Documentation

    • πŸ†• New guide that walks through seamlessly transitioning code from development to production environments.
    • πŸ†• New guide that demonstrates using Branch Deployments to test Dagster code in your cloud environment without impacting your production data.
  • v0.15.8 Changes

    πŸ†• New

    • Software-defined asset config schemas are no longer restricted to dicts.
    • The OpDefinition constructor now accept ins and outs arguments, to make direct construction easier.
    • define_dagstermill_op accepts ins and outs in order to make direct construction easier.

    πŸ›  Bugfixes

    • πŸ›  Fixed a bug where default configuration was not applied when assets were selected for materialization in Dagit.
    • Fixed a bug where RunRequests returned from run_status_sensors caused the sensor to error.
    • When supplying config to define_asset_job, an error would occur when selecting most asset subsets. This has been fixed.
    • πŸ›  Fixed an error introduced in 0.15.7 that would prevent viewing the execution plan for a job re-execution from 0.15.0 β†’ 0.15.6
    • [dagit] The Dagit server now returns 500 http status codes for GraphQL requests that encountered an unexpected server error.
    • [dagit] Fixed a bug that made it impossible to kick off materializations of partitioned asset if the day_offset, hour_offset, or minute_offset parameters were set on the asset’s partitions definition.
    • πŸ‘· [dagster-k8s] Fixed a bug where overriding the Kubernetes command to use to run a Dagster job by setting the dagster-k8s/config didn’t actually override the command.
    • πŸ— [dagster-datahub] Pinned version of acryl-datahub to avoid build error.

    πŸ’₯ Breaking Changes

    • 🚚 The constructor of JobDefinition objects now accept a config argument, and the preset_defs argument has been removed.

    πŸ—„ Deprecations

    • πŸ“‡ DagsterPipelineRunMetadataValue has been renamed to DagsterRunMetadataValue. DagsterPipelineRunMetadataValue will be removed in 1.0.

    Community Contributions

    • πŸ“„ Thanks to @hassen-io for fixing a broken link in the docs!

    πŸ“š Documentation

    • πŸ“‡ MetadataEntry static methods are now marked as deprecated in the docs.
    • PartitionMappings are now included in the API reference.
    • 🚚 A dbt example and memoization example using legacy APIs have been removed from the docs site.
  • v0.15.7 Changes

    πŸ†• New

    • DagsterRun now has a job_name property, which should be used instead of pipeline_name.
    • TimeWindowPartitionsDefinition now has a get_partition_keys_in_range method which returns a sequence of all the partition keys between two partition keys.
    • OpExecutionContext now has asset_partitions_def_for_output and asset_partitions_def_for_input methods.
    • 🍱 Dagster now errors immediately with an informative message when two AssetsDefinition objects with the same key are provided to the same repository.
    • build_output_context now accepts a partition_key argument that can be used when testing the handle_output method of an IO manager.

    πŸ›  Bugfixes

    • πŸ›  Fixed a bug that made it impossible to load inputs using a DagsterTypeLoader if the InputDefinition had an asset_key set.
    • 🍱 Ops created with the @asset and @multi_asset decorators no longer have a top-level β€œassets” entry in their config schema. This entry was unused.
    • πŸ“‡ In 0.15.6, a bug was introduced that made it impossible to load repositories if assets that had non-standard metadata attached to them were present. This has been fixed.
    • 🍱 [dagster-dbt] In some cases, using load_assets_from_dbt_manifest with a select parameter that included sources would result in an error. This has been fixed.
    • ⏱ [dagit] Fixed an error where a race condition of a sensor/schedule page load and the sensor/schedule removal caused a GraphQL exception to be raised.
    • [dagit] The β€œMaterialize” button no longer changes to β€œRematerialize” in some scenarios
    • βœ… [dagit] The live overlays on asset views, showing latest materialization and run info, now load faster
    • [dagit] Typing whitespace into the launchpad Yaml editor no longer causes execution to fail to start
    • πŸ—„ [dagit] The explorer sidebar no longer displays β€œmode” label and description for jobs, since modes are deprecated.

    Community Contributions

    • An error will now be raised if a @repository decorated function expects parameters. Thanks @roeij!

    πŸ“š Documentation

    • πŸ†• The non-asset version of the Hacker News example, which lived inside examples/hacker_news/, has been removed, because it hadn’t received updates in a long time and had drifted from best practices. The asset version is still there and has an updated README. Check it out here
  • v0.15.6 Changes

    πŸ†• New

    • πŸ‘» When an exception is wrapped by another exception and raised within an op, Dagit will now display the full chain of exceptions, instead of stopping after a single exception level.
    • A default_logger_defs argument has been added to the @repository decorator. Check out the docs on specifying default loggers to learn more.
    • 🍱 AssetsDefinition.from_graph and AssetsDefinition.from_op now both accept a partition_mappings argument.
    • 🍱 AssetsDefinition.from_graph and AssetsDefinition.from_op now both accept a metadata_by_output_name argument.
    • define_asset_job now accepts an executor_def argument.
    • βœ‚ Removed package pin for gql in dagster-graphql.
    • You can now apply a group name to assets produced with the @multi_asset decorator, either by supplying a group_name argument (which will apply to all of the output assets), or by setting the group_name argument on individual AssetOuts.
    • InputContext and OutputContext now each have an asset_partitions_def property, which returns the PartitionsDefinition of the asset that’s being loaded or stored.
    • ⏱ build_schedule_from_partitioned_job now raises a more informative error when provided a non-partitioned asset job
    • πŸ“¦ PartitionMapping, IdentityPartitionMapping, AllPartitionMapping, and LastPartitionMapping are exposed at the top-level dagster package. They're currently marked experimental.
    • When a non-partitioned asset depends on a partitioned asset, you can now control which partitions of the upstream asset are used by the downstream asset, by supplying a PartitionMapping.
    • You can now set PartitionMappings on AssetIn.
    • 🐎 [dagit] Made performance improvements to the loading of the partitions and backfill pages.
    • [dagit] The Global Asset Graph is back by popular demand, and can be reached via a new β€œView global asset lineage ”link on asset group and asset catalog pages! The global graph keeps asset in the same group visually clustered together and the query bar allows you to visualize a custom slice of your asset graph.
    • πŸ”’ [dagit] Simplified the Content Security Policy and removed frame-ancestors restriction.
    • 🍱 [dagster-dbt] load_assets_from_dbt_project and load_assets_from_dbt_manifest now support a node_info_to_group_name_fn parameter, allowing you to customize which group Dagster will assign each dbt asset to.
    • πŸ“‡ [dagster-dbt] When you supply a runtime_metadata_fn when loading dbt assets, this metadata is added to the default metadata that dagster-dbt generates, rather than replacing it entirely.
    • πŸ— [dagster-dbt] When you load dbt assets with use_build_command=True, seeds and snapshots will now be represented as Dagster assets. Previously, only models would be loaded as assets.

    πŸ›  Bugfixes

    • πŸ›  Fixed an issue where runs that were launched using the DockerRunLauncher would sometimes use Dagit’s Python environment as the entrypoint to launch the run, even if that environment did not exist in the container.
    • ⏱ Dagster no longer raises a β€œDuplicate definition found” error when a schedule definition targets a partitioned asset job.
    • 🍱 Silenced some erroneous warnings that arose when using software-defined assets.
    • When returning multiple outputs as a tuple, empty list values no longer cause unexpected exceptions.
    • 🍱 [dagit] Fixed an issue with graph-backed assets causing a GraphQL error when graph inputs were type-annotated.
    • 🍱 [dagit] Fixed an issue where attempting to materialize graph-backed assets caused a graphql error.
    • 🍱 [dagit] Fixed an issue where partitions could not be selected when materializing partitioned assets with associated resources.
    • 🍱 [dagit] Attempting to materialize assets with required resources now only presents the launchpad modal if at least one resource defines a config schema.

    πŸ’₯ Breaking Changes

    • An op with a non-optional DynamicOutput will now error if no outputs are returned or yielded for that dynamic output.
    • If an Output object is used to type annotate the return of an op, an Output object must be returned or an error will result.

    Community Contributions

    • πŸ”Š Dagit now displays the path of the output handled by PickledObjectS3IOManager in run logs and Asset view. Thanks @danielgafni

    πŸ“š Documentation

    • πŸ—„ The Hacker News example now uses stable 0.15+ asset APIs, instead of the deprecated 0.14.x asset APIs.
    • πŸ›  Fixed the build command in the instructions for contributing docs changes.
    • 🍱 [dagster-dbt] The dagster-dbt integration guide now contains information on using dbt with Software-Defined Assets.
  • v0.15.5 Changes

    πŸ†• New

    • βž• Added documentation and helm chart configuration for threaded sensor evaluations.
    • βž• Added documentation and helm chart configuration for tick retention policies.
    • βž• Added descriptions for default config schema. Fields like execution, loggers, ops, and resources are now documented.
    • πŸ‘· UnresolvedAssetJob objects can now be passed to run status sensors.
    • 🍱 [dagit] A new global asset lineage view, linked from the Asset Catalog and Asset Group pages, allows you to view a graph of assets in all loaded asset groups and filter by query selector and repo.
    • [dagit] A new option on Asset Lineage pages allows you to choose how many layers of the upstream / downstream graph to display.
    • 🐎 [dagit] Dagit's DAG view now collapses large sets of edges between the same ops for improved readability and rendering performance.

    πŸ›  Bugfixes

    • πŸ›  Fixed a bug with materialize that would cause required resources to not be applied correctly.
    • ⏱ Fixed issue that caused repositories to fail to load when build_schedule_from_partitioned_job and define_asset_job were used together.
    • πŸ›  Fixed a bug that caused auto run retries to always use the FROM_FAILURE strategy
    • 🍱 Previously, it was possible to construct Software-Defined Assets from graphs whose leaf ops were not mapped to assets. This is invalid, as these ops are not required for the production of any assets, and would cause confusing behavior or errors on execution. This will now result in an error at definition time, as intended.
    • πŸ›  Fixed issue where the run monitoring daemon could mark completed runs as failed if they transitioned quickly between STARTING and SUCCESS status.
    • πŸ›  Fixed stability issues with the sensor daemon introduced in 0.15.3 that caused the daemon to fail heartbeat checks if the sensor evaluation took too long.
    • πŸ›  Fixed issues with the thread pool implementation of the sensor daemon where race conditions caused the sensor to fire more frequently than the minimum interval.
    • πŸ›  Fixed an issue with storage implementations using MySQL server version 5.6 which caused SQL syntax exceptions to surface when rendering the Instance overview pages in Dagit.
    • Fixed a bug with the default_executor_def argument on repository where asset jobs that defined executor config would result in errors.
    • πŸ›  Fixed a bug where an erroneous exception would be raised if an empty list was returned for a list output of an op.
    • πŸ”§ [dagit] Clicking the "Materialize" button for assets with configurable resources will now present the asset launchpad.
    • 0️⃣ [dagit] If you have an asset group and no jobs, Dagit will display it by default rather than directing you to the asset catalog.
    • 🍱 [dagit] DAG renderings of software-defined assets now display only the last component of the asset's key for improved readability.
    • πŸ›  [dagit] Fixes a regression where clicking on a source asset would trigger a GraphQL error.
    • ⏱ [dagit] Fixed issue where the β€œUnloadable” section on the sensors / schedules pages in Dagit were populated erroneously with loadable sensors and schedules
    • πŸ— [dagster-dbt] Fixed an issue where an exception would be raised when using the dbt build command with Software-Defined Assets if a test was defined on a source.

    πŸ—„ Deprecations

    • βœ‚ Removed the deprecated dagster-daemon health-check CLI command

    Community Contributions

    • πŸ“¦ TimeWindow is now exported from the dagster package (Thanks @nvinhphuc!)
    • βž• Added a fix to allow customization of slack messages (Thanks @solarisa21!)
    • [dagster-databricks] The databricks_pyspark_step_launcher now allows you to configure the following (Thanks @Phazure!):
      • the aws_attributes of the cluster that will be spun up for the step.
      • arbitrary environment variables to be copied over to databricks from the host machine, rather than requiring these variables to be stored as secrets.
      • job and cluster permissions, allowing users to view the completed runs through the databricks console, even if they’re kicked off by a service account.

    Experimental

    • πŸ‘· [dagster-k8s] Added k8s_job_op to launch a Kubernetes Job with an arbitrary image and CLI command. This is in contrast with the k8s_job_executor, which runs each Dagster op in a Dagster job in its own k8s job. This op may be useful when you need to orchestrate a command that isn't a Dagster op (or isn't written in Python). Usage:
       from dagster_k8s import k8s_job_op
    
       my_k8s_op = k8s_job_op.configured({
        "image": "busybox",
        "command": ["/bin/sh", "-c"],
        "args": ["echo HELLO"],
        },
        name="my_k8s_op",
       )
    
    • [dagster-dbt] The dbt asset-loading functions now support partitions_def and partition_key_to_vars_fn parameters, adding preliminary support for partitioned dbt assets. To learn more, check out the Github issue!
  • v0.15.4 Changes

    • βͺ Reverted sensor threadpool changes from 0.15.3 to address daemon stability issues.
  • v0.15.3 Changes

    πŸ†• New

    • When loading an upstream asset or op output as an input, you can now set custom loading behavior using the input_manager_key argument to AssetIn and In
    • The list of objects returned by a repository can now contain nested lists.
    • βž• Added a data retention instance setting in dagster.yaml that enables the automatic removal of sensor/schedule ticks after a certain number of days.
    • βž• Added a sensor daemon setting in dagster.yaml that enables sensor evaluations to happen in a thread pool to increase throughput.
    • materialize_to_memory and materialize now both have the partition_key argument.
    • Output and DynamicOutput objects now work with deep equality checks:
    Output(value=5, name="foo") == Output(value=5, name="foo") # evaluates to True
    
    • βš™ RunRequests can now be returned from run status sensors
    • 🍱 Added resource_defs argument to AssetsDefinition.from_graph. Allows for specifying resources required by constituent ops directly on the asset.
    • When adding a tag to the Run search filter in Dagit by clicking the hover menu on the tag, the tag will now be appended to the filter instead of replacing the entire filter state.

    πŸ›  Bugfixes

    • πŸ‘» [dagster-dbt] An exception is now emitted if you attempt to invoke the library without having dbt-core installed. dbt-core is now also added as a dependency to the library.
    • Asset group names can now contain reserved python keywords
    • πŸ›  Fixed a run config parsing bug that was introduced in 0.15.1 that caused Dagit to interpret datetime strings as datetime objects and octal strings as integers.
    • βš™ Runs that have failed to start are now represented in the Instance Timeline view on Dagit.
    • πŸ›  Fixed an issue where the partition status was missing for partitioned jobs that had no runs.
    • πŸ›  Fixed a bug where op/resource invocation would error when resources were required, no context was used in the body of the function, and no context was provided when invoking.
    • [dagster-databricks] Fixed an issue where an exception related to the deprecated prior_attempts_count field when using the databricks_pyspark_step_launcher.
    • [dagster-databricks] Polling information logged from the databricks_pyspark_step_launcher is now emitted at the DEBUG level instead of INFO.
    • In the yaml editor in Dagit, the typeahead feature now correctly shows suggestions for nullable schema types.
    • πŸ”§ When editing asset configuration in Dagit, the β€œScaffold config” button in the Dagit launchpad sometimes showed the scaffold dialog beneath the launchpad. This has been fixed.
    • ⏱ A recent change added execution timezones to some human-readable cron strings on schedules in Dagit. This was added incorrectly in some cases, and has now been fixed.
    • πŸ›  In the Dagit launchpad, a config state containing only empty newlines could lead to an error that could break the editor. This has been fixed.
    • Fixed issue that could cause partitioned graph-backed assets to attempt to load upstream inputs from the incorrect path when using the fs_io_manager (or other similar io managers).
    • πŸ“œ [dagster-dbt] Fixed issue where errors generated from issuing dbt cli commands would only show json-formatted output, rather than a parsed, human-readable output.
    • 🌲 [dagster-dbt] By default, dagster will invoke the dbt cli with a --log-format json flag. In some cases, this may cause dbt to report incorrect or misleading error messages. As a workaround, it is now possible to disable this behavior by setting the json_log_format configuration option on the dbt_cli_resource to False.
    • materialize_to_memory erroneously allowed non-in-memory io managers to be used. Now, providing io managers to materialize_to_memory will result in an error, and mem_io_manager will be provided to all io manager keys.
  • v0.15.2 Changes

    πŸ›  Bugfixes

    • πŸ›  Fixed an issue where asset dependency resolution would break when two assets in the same group had the same name
  • v0.15.1 Changes

    πŸ†• New

    • 🌲 When Dagster loads an event from the event log of a type that it doesn’t recognize (for example, because it was created by a newer version of Dagster) it will now return a placeholder event rather than raising an exception.
    • AssetsDefinition.from_graph() now accepts a group_name parameter. All assets created by from_graph are assigned to this group.
    • 🍱 You can define an asset from an op via a new utility method AssetsDefinition.from_op. Dagster will infer asset inputs and outputs from the ins/outs defined on the @op in the same way as @graphs.
    • A default executor definition can be defined on a repository using the default_executor_def argument. The default executor definition will be used for all op/asset jobs that don’t explicitly define their own executor.
    • JobDefinition.run_request_for_partition now accepts a tags argument (Thanks @jburnich!)
    • πŸ’» In Dagit, the graph canvas now has a dotted background to help it stand out from the reset of the UI.
    • @multi_asset now accepts a resource_defs argument. The provided resources can be either used on the context, or satisfy the io manager requirements of the outs on the asset.
    • In Dagit, show execution timezone on cron strings, and use 12-hour or 24-hour time format depending on the user’s locale.
    • ⚑️ In Dagit, when viewing a run and selecting a specific step in the Gantt chart, the compute log selection state will now update to that step as well.
    • define_asset_job and to_job now can now accept a partitions_def argument and a config argument at the same time, as long as the value for the config argument is a hardcoded config dictionary (not a PartitionedConfig or ConfigMapping)

    πŸ›  Bugfixes

    • πŸ›  Fixed an issue where entering a string in the launchpad that is valid YAML but invalid JSON would render incorrectly in Dagit.
    • πŸ‘· Fixed an issue where steps using the k8s_job_executor and docker_executor would sometimes return the same event lines twice in the command-line output for the step.
    • πŸ›  Fixed type annotations on the @op decorator (Thanks Milos Tomic!)
    • πŸ›  Fixed an issue where job backfills were not displayed correctly on the Partition view in Dagit.
    • UnresolvedAssetJobDefinition now supports the run_request_for_partition method.
    • πŸ›  Fixed an issue in Dagit where the Instance Overview page would briefly flash a loading state while loading fresh data.

    πŸ’₯ Breaking Changes

    • πŸ”Š Runs that were executed in newer versions of Dagster may produce errors when their event logs are loaded in older versions of Dagit, due to new event types that were recently added. Going forward, Dagit has been made more resilient to handling new events.

    πŸ—„ Deprecations

    • πŸ“‡ Updated deprecation warnings to clarify that the deprecated metadata APIs will be removed in 0.16.0, not 0.15.0.

    Experimental

    • 🍱 If two assets are in the same group and the upstream asset has a multi-segment asset key, the downstream asset doesn’t need to specify the full asset key when declaring its dependency on the upstream asset - just the last segment.

    πŸ“š Documentation

    • βž• Added dedicated sections for op, graph, and job Concept docs in the sidenav
    • πŸ“š Moved graph documentation from the jobs docs into its own page
    • βž• Added documentation for assigning asset groups and viewing them in Dagit
    • βž• Added apidoc for AssetOut and AssetIn
    • πŸ›  Fixed a typo on the Run Configuration concept page (Thanks Wenshuai Hou!)
    • ⚑️ Updated screenshots in the software-defined assets tutorial to match the new Dagit UI
    • Fixed a typo in the Defining an asset section of the software-defined assets tutorial (Thanks Daniel Kim!)
  • v0.15.0 Changes

    Major Changes

    • πŸ— Software-defined assets are now marked fully stable and are ready for prime time - we recommend using them whenever your goal using Dagster is to build and maintain data assets.
    • 🍱 You can now organize software defined assets into groups by providing a group_name on your asset definition. These assets will be grouped together in Dagit.
    • πŸ”§ Software-defined assets now accept configuration, similar to ops. E.g.
      from dagster import asset
    
      @asset(config_schema={"iterations": int})
      def my_asset(context):
          for i in range(context.op_config["iterations"]):
              ...
    
    • 🍱 Asset definitions can now be created from graphs via AssetsDefinition.from_graph:
      @graph(out={"asset_one": GraphOut(), "asset_two": GraphOut()})
      def my_graph(input_asset):
          ...
    
      graph_asset = AssetsDefinition.from_graph(my_graph)
    
    • execute_in_process and GraphDefinition.to_job now both accept an input_values argument, so you can pass arbitrary Python objects to the root inputs of your graphs and jobs.
    • πŸ“‡ Ops that return Outputs and DynamicOutputs now work well with Python type annotations. You no longer need to sacrifice static type checking just because you want to include metadata on an output. E.g.
      from dagster import Output, op
    
      @op
      def my_op() -> Output[int]:
          return Output(5, metadata={"a": "b"})
    
    • πŸ‘· You can now automatically re-execute runs from failure. This is analogous to op-level retries, except at the job level.
    • πŸ“‡ You can now supply arbitrary structured metadata on jobs, which will be displayed in Dagit.
    • The partitions and backfills pages in Dagit have been redesigned to be faster and show the status of all partitions, instead of just the last 30 or so.
    • πŸ‘· The left navigation pane in Dagit is now grouped by repository, which makes it easier to work with when you have large numbers of jobs, especially when jobs in different repositories have the same name.
    • πŸ‘€ The Asset Details page for a software-defined asset now includes a Lineage tab, which makes it easy to see all the assets that are upstream or downstream of an asset.

    πŸ’₯ Breaking Changes and Deprecations

    🍱 Software-defined assets

    πŸš€ This release marks the official transition of software-defined assets from experimental to stable. We made some final changes to incorporate feedback and make the APIs as consistent as possible:

    • πŸ‘Œ Support for adding tags to asset materializations, which was previously marked as experimental, has been removed.
    • Some of the properties of the previously-experimental AssetsDefinition class have been renamed. group_names is now group_names_by_key, asset_keys_by_input_name is now keys_by_input_name, and asset_keys_by_output_name is now keys_by_output_name, asset_key is now key, and asset_keys is now keys.
    • Removes previously experimental IO manager fs_asset_io_manager in favor of merging its functionality with fs_io_manager. fs_io_manager is now the default IO manager for asset jobs, and will store asset outputs in a directory named with the asset key. Similarly, removed adls2_pickle_asset_io_manager, gcs_pickle_asset_io_manager , and s3_pickle_asset_io_manager. Instead, adls2_pickle_io_manager, gcs_pickle_io_manager, and s3_pickle_io_manager now support software-defined assets.
    • πŸ—„ (deprecation) The namespace argument on the @asset decorator and AssetIn has been deprecated. Users should use key_prefix instead.
    • πŸ—„ (deprecation) AssetGroup has been deprecated. Users should instead place assets directly on repositories, optionally attaching resources using with_resources. Asset jobs should be defined using define_asset_job (replacing AssetGroup.build_job), and arbitrary sets of assets can be materialized using the standalone function materialize (replacing AssetGroup.materialize).
    • πŸ—„ (deprecation) The outs property of the previously-experimental @multi_asset decorator now prefers a dictionary whose values are AssetOut objects instead of a dictionary whose values are Out objects. The latter still works, but is deprecated.
    • The previously-experimental property on OpExecutionContext called output_asset_partition_key is now deprecated in favor of asset_partition_key_for_output

    Event records

    • The get_event_records method on DagsterInstance now requires a non-None argument event_records_filter. Passing a None value for the event_records_filter argument will now raise an exception where previously it generated a deprecation warning.
    • Removed methods events_for_asset_key and get_asset_events, which have been deprecated since 0.12.0.

    Extension libraries

    • 🍱 [dagster-dbt] (breaks previously-experimental API) When using the load_assets_from_dbt_project or load_assets_from_dbt_manifest , the AssetKeys generated for dbt sources are now the union of the source name and the table name, and the AssetKeys generated for models are now the union of the configured schema name for a given model (if any), and the model name. To revert to the old behavior: dbt_assets = load_assets_from_dbt_project(..., node_info_to_asset_key=lambda node_info: AssetKey(node_info["name"]).
    • πŸš€ [dagster-k8s] In the Dagster Helm chart, user code deployment configuration (like secrets, configmaps, or volumes) is now automatically included in any runs launched from that code. Previously, this behavior was opt-in. In most cases, this will not be a breaking change, but in less common cases where a user code deployment was running in a different kubernetes namespace or using a different service account, this could result in missing secrets or configmaps in a launched run that previously worked. You can return to the previous behavior where config on the user code deployment was not applied to any runs by setting the includeConfigInLaunchedRuns.enabled field to false for the user code deployment. See the Kubernetes Deployment docs for more details.
    • πŸš€ [dagster-snowflake] dagster-snowflake has dropped support for python 3.6. The library it is currently built on, snowflake-connector-python, dropped 3.6 support in their recent 2.7.5 release.

    Other

    • The prior_attempts_count parameter is now removed from step-launching APIs. This parameter was not being used, as the information it held was stored elsewhere in all cases. It can safely be removed from invocations without changing behavior.
    • 🚚 The FileCache class has been removed.
    • ⏱ Previously, when schedules/sensors targeted jobs with the same name as other jobs in the repo, the jobs on the sensor/schedule would silently overwrite the other jobs. Now, this will cause an error.

    πŸ†• New since 0.14.20

    • A new define_asset_job function allows you to define a selection of assets that should be executed together. The selection can be a simple string, or an AssetSelection object. This selection will be resolved into a set of assets once placed on the repository.
      from dagster import repository, define_asset_job, AssetSelection
    
      string_selection_job = define_asset_job(
          name="foo_job", selection="*foo"
      )
      object_selection_job = define_asset_job(
          name="bar_job", selection=AssetSelection.groups("some_group")
      )
    
      @repository
      def my_repo():
          return [
              *my_list_of_assets,
              string_selection_job,
              object_selection_job,
          ]
    
    • 🍱 [dagster-dbt] Assets loaded with load_assets_from_dbt_project and load_assets_from_dbt_manifest will now be sorted into groups based on the subdirectory of the project that each model resides in.
    • @asset and @multi_asset are no longer considered experimental.
    • 🍱 Adds new utility methods load_assets_from_modules, assets_from_current_module, assets_from_package_module, and assets_from_package_name to fetch and return a list of assets from within the specified python modules.
    • 🍱 Resources and io managers can now be provided directly on assets and source assets.
      from dagster import asset, SourceAsset, resource, io_manager
    
      @resource
      def foo_resource():
          pass
    
      @asset(resource_defs={"foo": foo_resource})
      def the_resource(context):
          foo = context.resources.foo
    
      @io_manager
      def the_manager():
          ...
    
      @asset(io_manager_def=the_manager)
      def the_asset():
          ...
    

    Note that assets provided to a job must not have conflicting resource for the same key. For a given job, all resource definitions must match by reference equality for a given key.

    • A materialize_to_memory method which will load the materializations of a provided list of assets into memory:
      from dagster import asset, materialize_to_memory
    
      @asset
      def the_asset():
          return 5
    
      result = materialize_to_memory([the_asset])
      output = result.output_for_node("the_asset")
    
    • 🍱 A with_resources method, which allows resources to be added to multiple assets / source assets at once:
      from dagster import asset, with_resources, resource
    
      @asset(required_resource_keys={"foo"})
      def requires_foo(context):
          ...
    
      @asset(required_resource_keys={"foo"})
      def also_requires_foo(context):
          ...
    
      @resource
      def foo_resource():
          ...
    
      requires_foo, also_requires_foo = with_resources(
          [requires_foo, also_requires_foo],
          {"foo": foo_resource},
      )
    
    • You can now include asset definitions directly on repositories. A default_executor_def property has been added to the repository, which will be used on any materializations of assets provided directly to the repository.
      from dagster import asset, repository, multiprocess_executor
    
      @asset
      def my_asset():
        ...
    
      @repository(default_executor_def=multiprocess_executor)
      def repo():
          return [my_asset]
    
    • The run_storage, event_log_storage, and schedule_storage configuration sections of the dagster.yaml can now be replaced by a unified storage configuration section. This should avoid duplicate configuration blocks with your dagster.yaml. For example, instead of:
      # dagster.yaml
      run_storage:
      module: dagster_postgres.run_storage
      class: PostgresRunStorage
      config:
          postgres_url: { PG_DB_CONN_STRING }
      event_log_storage:
      module: dagster_postgres.event_log
      class: PostgresEventLogStorage
      config:
          postgres_url: { PG_DB_CONN_STRING }
      schedule_storage:
      module: dagster_postgres.schedule_storage
      class: PostgresScheduleStorage
      config:
          postgres_url: { PG_DB_CONN_STRING }
    

    You can now write:

      storage:
        postgres:
          postgres_url: { PG_DB_CONN_STRING }
    
    • 🍱 All assets where a group_name is not provided are now part of a group called default.
    • The group_name parameter value for @asset is now restricted to only allow letters, numbers and underscore.
    • πŸš€ You can now set policies to automatically retry Job runs. This is analogous to op-level retries, except at the job level. By default the retries pick up from failure, meaning only failed ops and their dependents are executed.
    • [dagit] The new repository-grouped left navigation is fully launched, and is no longer behind a feature flag.
    • πŸ›  [dagit] The left navigation can now be collapsed even when the viewport window is wide. Previously, the navigation was collapsible only for small viewports, but kept in a fixed, visible state for wide viewports. This visible/collapsed state for wide viewports is now tracked in localStorage, so your preference will persist across sessions.
    • [dagit] Queued runs can now be terminated from the Run page.
    • 🌲 [dagit] The log filter on a Run page now shows counts for each filter type, and the filters have higher contrast and a switch to indicate when they are on or off.
    • 0️⃣ [dagit] The partitions and backfill pages have been redesigned to focus on easily viewing the last run state by partition. These redesigned pages were previously gated behind a feature flag β€” they are now loaded by default.
    • πŸ‘· [dagster-k8s] Overriding labels in the K8sRunLauncher will now apply to both the Kubernetes job and the Kubernetes pod created for each run, instead of just the Kubernetes pod.

    πŸ›  Bugfixes

    • πŸ›  [dagster-dbt] In some cases, if Dagster attempted to rematerialize a dbt asset, but dbt failed to start execution, asset materialization events would still be emitted. This has been fixed.
    • [dagit] On the Instance Overview page, the popover showing details of overlapping batches of runs is now scrollable.
    • πŸ›  [dagit] When viewing Instance Overview, reloading a repository via controls in the left navigation could lead to an error that would crash the page due to a bug in client-side cache state. This has been fixed.
    • πŸ›  [dagit] When scrolling through a list of runs, scrolling would sometimes get stuck on certain tags, specifically those with content overflowing the width of the tag. This has been fixed.
    • πŸ‘· [dagit] While viewing a job page, the left navigation item corresponding to that job will be highlighted, and the navigation pane will scroll to bring it into view.
    • πŸ›  [dagit] Fixed a bug where the β€œScaffold config” button was always enabled.

    Community Contributions

    • πŸ”§ You can now provide dagster-mlflow configuration parameters as environment variables, thanks @chasleslr!

    πŸ“š Documentation

    • βž• Added a guide that helps users who are familiar with ops and graphs understand how and when to use software-defined assets.
    • ⚑️ Updated and reorganized docs to document software-defined assets changes since 0.14.0.
    • πŸš€ The Deploying in Docker example now includes an example of using the docker_executor to run each step of a job in a different Docker container.
    • Descriptions for the top-level fields of Dagit GraphQL queries, mutations, and subscriptions have been added.