All Versions
182
Latest Version
Avg Release Cycle
-
Latest Release
-
Changelog History
Page 3
Changelog History
Page 3
-
v1.0.0 Changes
Major Changes
- π A docs site overhaul! Along with tons of additional content, the existing pages have been significantly edited and reorganized to improve readability.
- All Dagster examples are revamped with a consistent project layout, descriptive names, and more helpful README files.
- A new
dagster project
CLI contains commands for bootstrapping new Dagster projects and repositories:dagster project scaffold
creates a folder structure with a single Dagster repository and other files such as workspace.yaml. This CLI enables you to quickly start building a new Dagster project with everything set up.dagster project from-example
downloads one of the Dagster examples. This CLI helps you to quickly bootstrap your project with an officially maintained example. You can find the available examples viadagster project list-examples
.- Check out Create a New Project for more details.
- A
default_executor_def
argument has been added to the@repository
decorator. If specified, this will be used for any jobs (asset or op) which do not explicitly set anexecutor_def
. - A
default_logger_defs
argument has been added to the@repository
decorator, which works in the same way asdefault_executor_def
. - π· A new
execute_job
function presents a Python API for kicking off runs of your jobs. - π· Run status sensors may now yield
RunRequests
, allowing you to kick off a job in response to the status of another job. - When loading an upstream asset or op output as an input, you can now set custom loading behavior using the
input_manager_key
argument to AssetIn and In. - π± In the UI, the global lineage graph has been brought back and reworked! The graph keeps assets in the same group visually clustered together, and the query bar allows you to visualize a custom slice of your asset graph.
π₯ Breaking Changes and Deprecations
Legacy API Removals
π In 1.0.0, a large number of previously-deprecated APIs have been fully removed. A full list of breaking changes and deprecations, alongside instructions on how to migrate older code, can be found in MIGRATION.md. At a high level:
- π The
solid
andpipeline
APIs have been removed, along with references to them in extension libraries, arguments, and the CLI (deprecated in0.13.0)
. - The
AssetGroup
andbuild_asset_job
APIs, and a host of deprecated arguments to asset-related functions, have been removed (deprecated in0.15.0
). - π The
EventMetadata
andEventMetadataEntryData
APIs have been removed (deprecated in0.15.0
).
π Deprecations
dagster_type_materializer
andDagsterTypeMaterializer
have been marked experimental and will likely be removed within a 1.x release. Instead, use anIOManager
.- π
FileManager
andFileHandle
have been marked experimental and will likely be removed within a 1.x release.
Other Changes
- π As of 1.0.0, Dagster no longer guarantees support for python 3.6. This is in line with PEP 494, which outlines that 3.6 has reached end of life.
- [planned] In an upcoming 1.x release, we plan to make a change that renders values supplied to
configured
in Dagit. Up through this point, values provided toconfigured
have not been sent anywhere outside the process where they were used. This change will mean that, like other places you can supply configuration,configured
is not a good place to put secrets: You should not include any values in configuration that you don't want to be stored in the Dagster database and displayed inside Dagit. fs_io_manager
,s3_pickle_io_manager
, andgcs_pickle_io_manager
, andadls_pickle_io_manager
no longer write out a file or object when handling an output with theNone
orNothing
type.- The
custom_path_fs_io_manager
has been removed, as its functionality is entirely subsumed by thefs_io_manager
, where a custom path can be specified via config. - 0οΈβ£ The default
typing_type
of aDagsterType
is nowtyping.Any
instead ofNone
. - π Dagsterβs integration libraries havenβt yet achieved the same API maturity as Dagster core. For this reason, all integration libraries will remain on a pre-1.0 (0.16.x) versioning track for the time being. However, 0.16.x library releases remain fully compatible with Dagster 1.x. In the coming months, we will graduate integration libraries one-by-one to the 1.x versioning track as they achieve API maturity. If you have installs of the form:
pip install dagster=={DAGSTER_VERSION} dagster-somelibrary=={DAGSTER_VERSION}
this should be converted to:
pip install dagster=={DAGSTER_VERSION} dagster-somelibrary
to make sure the correct library version is installed.
π New since 0.15.8
- [dagster-databricks] When using the
databricks_pyspark_step_launcher
the events sent back to the host process are now compressed before sending, resulting in significantly better performance for steps which produce a large number of events. - π± [dagster-dbt] If an error occurs in
load_assets_from_dbt_project
while loading your repository, the error message in Dagit will now display additional context from the dbt logs, instead of justDagsterDbtCliFatalRuntimeError
.
π Bugfixes
- π± Fixed a bug that causes Dagster to ignore the
group_name
argument toAssetsDefinition.from_graph
when akey_prefix
argument is also present. - π Fixed a bug which could cause GraphQL errors in Dagit when loading repositories that contained multiple assets created from the same graph.
- π± Ops and software-defined assets with the
None
return type annotation are now given theNothing
type instead of theAny
type. - Fixed a bug that caused
AssetsDefinition.from_graph
andfrom_op
to fail when invoked on aconfigured
op. - β The
materialize
function, which is not experimental, no longer emits an experimental warning. - π Fixed a bug where runs from different repositories would be intermingled when viewing the runs for a specific repository-scoped job/schedule/sensor.
- π [dagster-dbt] A regression was introduced in 0.15.8 that would cause dbt logs to show up in json format in the UI. This has been fixed.
- [dagster-databricks] Previously, if you were using the
databricks_pyspark_step_launcher
, and the external step failed to start, aRESOURCE_DOES_NOT_EXIST
error would be surfaced, without helpful context. Now, in most cases, the root error causing the step to fail will be surfaced instead.
π Documentation
-
v0.15.8 Changes
π New
- Software-defined asset config schemas are no longer restricted to
dict
s. - The
OpDefinition
constructor now acceptins
andouts
arguments, to make direct construction easier. define_dagstermill_op
acceptsins
andouts
in order to make direct construction easier.
π Bugfixes
- π Fixed a bug where default configuration was not applied when assets were selected for materialization in Dagit.
- Fixed a bug where
RunRequests
returned fromrun_status_sensors
caused the sensor to error. - When supplying config to
define_asset_job
, an error would occur when selecting most asset subsets. This has been fixed. - π Fixed an error introduced in 0.15.7 that would prevent viewing the execution plan for a job re-execution from 0.15.0 β 0.15.6
- [dagit] The Dagit server now returns
500
http status codes for GraphQL requests that encountered an unexpected server error. - [dagit] Fixed a bug that made it impossible to kick off materializations of partitioned asset if the
day_offset
,hour_offset
, orminute_offset
parameters were set on the assetβs partitions definition. - π· [dagster-k8s] Fixed a bug where overriding the Kubernetes command to use to run a Dagster job by setting the
dagster-k8s/config
didnβt actually override the command. - π [dagster-datahub] Pinned version of
acryl-datahub
to avoid build error.
π₯ Breaking Changes
- π The constructor of
JobDefinition
objects now accept a config argument, and thepreset_defs
argument has been removed.
π Deprecations
- π
DagsterPipelineRunMetadataValue
has been renamed toDagsterRunMetadataValue
.DagsterPipelineRunMetadataValue
will be removed in 1.0.
Community Contributions
- π Thanks to @hassen-io for fixing a broken link in the docs!
π Documentation
- π
MetadataEntry
static methods are now marked as deprecated in the docs. PartitionMapping
s are now included in the API reference.- π A dbt example and memoization example using legacy APIs have been removed from the docs site.
- Software-defined asset config schemas are no longer restricted to
-
v0.15.7 Changes
π New
DagsterRun
now has ajob_name
property, which should be used instead ofpipeline_name
.TimeWindowPartitionsDefinition
now has aget_partition_keys_in_range
method which returns a sequence of all the partition keys between two partition keys.OpExecutionContext
now hasasset_partitions_def_for_output
andasset_partitions_def_for_input
methods.- π± Dagster now errors immediately with an informative message when two
AssetsDefinition
objects with the same key are provided to the same repository. build_output_context
now accepts apartition_key
argument that can be used when testing thehandle_output
method of an IO manager.
π Bugfixes
- π Fixed a bug that made it impossible to load inputs using a DagsterTypeLoader if the InputDefinition had an
asset_key
set. - π± Ops created with the
@asset
and@multi_asset
decorators no longer have a top-level βassetsβ entry in their config schema. This entry was unused. - π In 0.15.6, a bug was introduced that made it impossible to load repositories if assets that had non-standard metadata attached to them were present. This has been fixed.
- π± [dagster-dbt] In some cases, using
load_assets_from_dbt_manifest
with aselect
parameter that included sources would result in an error. This has been fixed. - β± [dagit] Fixed an error where a race condition of a sensor/schedule page load and the sensor/schedule removal caused a GraphQL exception to be raised.
- [dagit] The βMaterializeβ button no longer changes to βRematerializeβ in some scenarios
- β [dagit] The live overlays on asset views, showing latest materialization and run info, now load faster
- [dagit] Typing whitespace into the launchpad Yaml editor no longer causes execution to fail to start
- π [dagit] The explorer sidebar no longer displays βmodeβ label and description for jobs, since modes are deprecated.
Community Contributions
- An error will now be raised if a
@repository
decorated function expects parameters. Thanks @roeij!
π Documentation
- π The non-asset version of the Hacker News example, which lived inside
examples/hacker_news/
, has been removed, because it hadnβt received updates in a long time and had drifted from best practices. The asset version is still there and has an updated README. Check it out here
-
v0.15.6 Changes
π New
- π» When an exception is wrapped by another exception and raised within an op, Dagit will now display the full chain of exceptions, instead of stopping after a single exception level.
- A
default_logger_defs
argument has been added to the@repository
decorator. Check out the docs on specifying default loggers to learn more. - π±
AssetsDefinition.from_graph
andAssetsDefinition.from_op
now both accept apartition_mappings
argument. - π±
AssetsDefinition.from_graph
andAssetsDefinition.from_op
now both accept ametadata_by_output_name
argument. define_asset_job
now accepts anexecutor_def
argument.- β Removed package pin for
gql
indagster-graphql
. - You can now apply a group name to assets produced with the
@multi_asset
decorator, either by supplying agroup_name
argument (which will apply to all of the output assets), or by setting thegroup_name
argument on individualAssetOut
s. InputContext
andOutputContext
now each have anasset_partitions_def
property, which returns thePartitionsDefinition
of the asset thatβs being loaded or stored.- β±
build_schedule_from_partitioned_job
now raises a more informative error when provided a non-partitioned asset job - π¦
PartitionMapping
,IdentityPartitionMapping
,AllPartitionMapping
, andLastPartitionMapping
are exposed at the top-leveldagster
package. They're currently marked experimental. - When a non-partitioned asset depends on a partitioned asset, you can now control which partitions of the upstream asset are used by the downstream asset, by supplying a
PartitionMapping
. - You can now set
PartitionMappings
onAssetIn
. - π [dagit] Made performance improvements to the loading of the partitions and backfill pages.
- [dagit] The Global Asset Graph is back by popular demand, and can be reached via a new βView global asset lineage βlink on asset group and asset catalog pages! The global graph keeps asset in the same group visually clustered together and the query bar allows you to visualize a custom slice of your asset graph.
- π [dagit] Simplified the Content Security Policy and removed
frame-ancestors
restriction. - π± [dagster-dbt]
load_assets_from_dbt_project
andload_assets_from_dbt_manifest
now support anode_info_to_group_name_fn
parameter, allowing you to customize which group Dagster will assign each dbt asset to. - π [dagster-dbt] When you supply a
runtime_metadata_fn
when loading dbt assets, this metadata is added to the default metadata that dagster-dbt generates, rather than replacing it entirely. - π [dagster-dbt] When you load dbt assets with
use_build_command=True
, seeds and snapshots will now be represented as Dagster assets. Previously, only models would be loaded as assets.
π Bugfixes
- π Fixed an issue where runs that were launched using the
DockerRunLauncher
would sometimes use Dagitβs Python environment as the entrypoint to launch the run, even if that environment did not exist in the container. - β± Dagster no longer raises a βDuplicate definition foundβ error when a schedule definition targets a partitioned asset job.
- π± Silenced some erroneous warnings that arose when using software-defined assets.
- When returning multiple outputs as a tuple, empty list values no longer cause unexpected exceptions.
- π± [dagit] Fixed an issue with graph-backed assets causing a GraphQL error when graph inputs were type-annotated.
- π± [dagit] Fixed an issue where attempting to materialize graph-backed assets caused a graphql error.
- π± [dagit] Fixed an issue where partitions could not be selected when materializing partitioned assets with associated resources.
- π± [dagit] Attempting to materialize assets with required resources now only presents the launchpad modal if at least one resource defines a config schema.
π₯ Breaking Changes
- An op with a non-optional DynamicOutput will now error if no outputs are returned or yielded for that dynamic output.
- If an
Output
object is used to type annotate the return of an op, an Output object must be returned or an error will result.
Community Contributions
- π Dagit now displays the path of the output handled by
PickledObjectS3IOManager
in run logs and Asset view. Thanks @danielgafni
π Documentation
- π The Hacker News example now uses stable 0.15+ asset APIs, instead of the deprecated 0.14.x asset APIs.
- π Fixed the build command in the instructions for contributing docs changes.
- π± [dagster-dbt] The dagster-dbt integration guide now contains information on using dbt with Software-Defined Assets.
-
v0.15.5 Changes
π New
- β Added documentation and helm chart configuration for threaded sensor evaluations.
- β Added documentation and helm chart configuration for tick retention policies.
- β Added descriptions for default config schema. Fields like execution, loggers, ops, and resources are now documented.
- π· UnresolvedAssetJob objects can now be passed to run status sensors.
- π± [dagit] A new global asset lineage view, linked from the Asset Catalog and Asset Group pages, allows you to view a graph of assets in all loaded asset groups and filter by query selector and repo.
- [dagit] A new option on Asset Lineage pages allows you to choose how many layers of the upstream / downstream graph to display.
- π [dagit] Dagit's DAG view now collapses large sets of edges between the same ops for improved readability and rendering performance.
π Bugfixes
- π Fixed a bug with
materialize
that would cause required resources to not be applied correctly. - β± Fixed issue that caused repositories to fail to load when
build_schedule_from_partitioned_job
anddefine_asset_job
were used together. - π Fixed a bug that caused auto run retries to always use the
FROM_FAILURE
strategy - π± Previously, it was possible to construct Software-Defined Assets from graphs whose leaf ops were not mapped to assets. This is invalid, as these ops are not required for the production of any assets, and would cause confusing behavior or errors on execution. This will now result in an error at definition time, as intended.
- π Fixed issue where the run monitoring daemon could mark completed runs as failed if they transitioned quickly between STARTING and SUCCESS status.
- π Fixed stability issues with the sensor daemon introduced in 0.15.3 that caused the daemon to fail heartbeat checks if the sensor evaluation took too long.
- π Fixed issues with the thread pool implementation of the sensor daemon where race conditions caused the sensor to fire more frequently than the minimum interval.
- π Fixed an issue with storage implementations using MySQL server version 5.6 which caused SQL syntax exceptions to surface when rendering the Instance overview pages in Dagit.
- Fixed a bug with the
default_executor_def
argument on repository where asset jobs that defined executor config would result in errors. - π Fixed a bug where an erroneous exception would be raised if an empty list was returned for a list output of an op.
- π§ [dagit] Clicking the "Materialize" button for assets with configurable resources will now present the asset launchpad.
- 0οΈβ£ [dagit] If you have an asset group and no jobs, Dagit will display it by default rather than directing you to the asset catalog.
- π± [dagit] DAG renderings of software-defined assets now display only the last component of the asset's key for improved readability.
- π [dagit] Fixes a regression where clicking on a source asset would trigger a GraphQL error.
- β± [dagit] Fixed issue where the βUnloadableβ section on the sensors / schedules pages in Dagit were populated erroneously with loadable sensors and schedules
- π [dagster-dbt] Fixed an issue where an exception would be raised when using the dbt build command with Software-Defined Assets if a test was defined on a source.
π Deprecations
- β Removed the deprecated dagster-daemon health-check CLI command
Community Contributions
- π¦ TimeWindow is now exported from the dagster package (Thanks @nvinhphuc!)
- β Added a fix to allow customization of slack messages (Thanks @solarisa21!)
- [dagster-databricks] The
databricks_pyspark_step_launcher
now allows you to configure the following (Thanks @Phazure!):- the
aws_attributes
of the cluster that will be spun up for the step. - arbitrary environment variables to be copied over to databricks from the host machine, rather than requiring these variables to be stored as secrets.
- job and cluster permissions, allowing users to view the completed runs through the databricks console, even if theyβre kicked off by a service account.
- the
Experimental
- π· [dagster-k8s] Added
k8s_job_op
to launch a Kubernetes Job with an arbitrary image and CLI command. This is in contrast with thek8s_job_executor
, which runs each Dagster op in a Dagster job in its own k8s job. This op may be useful when you need to orchestrate a command that isn't a Dagster op (or isn't written in Python). Usage:
from dagster_k8s import k8s_job_op my_k8s_op = k8s_job_op.configured({ "image": "busybox", "command": ["/bin/sh", "-c"], "args": ["echo HELLO"], }, name="my_k8s_op", )
- [dagster-dbt] The dbt asset-loading functions now support
partitions_def
andpartition_key_to_vars_fn
parameters, adding preliminary support for partitioned dbt assets. To learn more, check out the Github issue!
-
v0.15.4 Changes
- βͺ Reverted sensor threadpool changes from 0.15.3 to address daemon stability issues.
-
v0.15.3 Changes
π New
- When loading an upstream asset or op output as an input, you can now set custom loading behavior using the input_manager_key argument to AssetIn and In
- The list of objects returned by a repository can now contain nested lists.
- β Added a data retention instance setting in dagster.yaml that enables the automatic removal of sensor/schedule ticks after a certain number of days.
- β Added a sensor daemon setting in dagster.yaml that enables sensor evaluations to happen in a thread pool to increase throughput.
materialize_to_memory
and materialize now both have the partition_key argument.Output
andDynamicOutput
objects now work with deep equality checks:
Output(value=5, name="foo") == Output(value=5, name="foo") # evaluates to True
- β RunRequests can now be returned from run status sensors
- π± Added
resource_defs
argument toAssetsDefinition.from_graph
. Allows for specifying resources required by constituent ops directly on the asset. - When adding a tag to the Run search filter in Dagit by clicking the hover menu on the tag, the tag will now be appended to the filter instead of replacing the entire filter state.
π Bugfixes
- π» [dagster-dbt] An exception is now emitted if you attempt to invoke the library without having dbt-core installed. dbt-core is now also added as a dependency to the library.
- Asset group names can now contain reserved python keywords
- π Fixed a run config parsing bug that was introduced in
0.15.1
that caused Dagit to interpret datetime strings as datetime objects and octal strings as integers. - β Runs that have failed to start are now represented in the Instance Timeline view on Dagit.
- π Fixed an issue where the partition status was missing for partitioned jobs that had no runs.
- π Fixed a bug where op/resource invocation would error when resources were required, no context was used in the body of the function, and no context was provided when invoking.
- [dagster-databricks] Fixed an issue where an exception related to the deprecated prior_attempts_count field when using the databricks_pyspark_step_launcher.
- [dagster-databricks] Polling information logged from the databricks_pyspark_step_launcher is now emitted at the DEBUG level instead of INFO.
- In the yaml editor in Dagit, the typeahead feature now correctly shows suggestions for nullable schema types.
- π§ When editing asset configuration in Dagit, the βScaffold configβ button in the Dagit launchpad sometimes showed the scaffold dialog beneath the launchpad. This has been fixed.
- β± A recent change added execution timezones to some human-readable cron strings on schedules in Dagit. This was added incorrectly in some cases, and has now been fixed.
- π In the Dagit launchpad, a config state containing only empty newlines could lead to an error that could break the editor. This has been fixed.
- Fixed issue that could cause partitioned graph-backed assets to attempt to load upstream inputs from the incorrect path when using the fs_io_manager (or other similar io managers).
- π [dagster-dbt] Fixed issue where errors generated from issuing dbt cli commands would only show json-formatted output, rather than a parsed, human-readable output.
- π² [dagster-dbt] By default, dagster will invoke the dbt cli with a --log-format json flag. In some cases, this may cause dbt to report incorrect or misleading error messages. As a workaround, it is now possible to disable this behavior by setting the json_log_format configuration option on the dbt_cli_resource to False.
- materialize_to_memory erroneously allowed non-in-memory io managers to be used. Now, providing io managers to materialize_to_memory will result in an error, and mem_io_manager will be provided to all io manager keys.
-
v0.15.2 Changes
π Bugfixes
- π Fixed an issue where asset dependency resolution would break when two assets in the same group had the same name
-
v0.15.1 Changes
π New
- π² When Dagster loads an event from the event log of a type that it doesnβt recognize (for example, because it was created by a newer version of Dagster) it will now return a placeholder event rather than raising an exception.
- AssetsDefinition.from_graph() now accepts a group_name parameter. All assets created by from_graph are assigned to this group.
- π± You can define an asset from an op via a new utility method
AssetsDefinition.from_op
. Dagster will infer asset inputs and outputs from the ins/outs defined on the@op
in the same way as@graphs
. - A default executor definition can be defined on a repository using the
default_executor_def
argument. The default executor definition will be used for all op/asset jobs that donβt explicitly define their own executor. JobDefinition.run_request_for_partition
now accepts atags
argument (Thanks @jburnich!)- π» In Dagit, the graph canvas now has a dotted background to help it stand out from the reset of the UI.
@multi_asset
now accepts a resource_defs argument. The provided resources can be either used on the context, or satisfy the io manager requirements of the outs on the asset.- In Dagit, show execution timezone on cron strings, and use 12-hour or 24-hour time format depending on the userβs locale.
- β‘οΈ In Dagit, when viewing a run and selecting a specific step in the Gantt chart, the compute log selection state will now update to that step as well.
define_asset_job
andto_job
now can now accept apartitions_def
argument and aconfig
argument at the same time, as long as the value for theconfig
argument is a hardcoded config dictionary (not aPartitionedConfig
orConfigMapping
)
π Bugfixes
- π Fixed an issue where entering a string in the launchpad that is valid YAML but invalid JSON would render incorrectly in Dagit.
- π· Fixed an issue where steps using the
k8s_job_executor
anddocker_executor
would sometimes return the same event lines twice in the command-line output for the step. - π Fixed type annotations on the
@op
decorator (Thanks Milos Tomic!) - π Fixed an issue where job backfills were not displayed correctly on the Partition view in Dagit.
UnresolvedAssetJobDefinition
now supports therun_request_for_partition
method.- π Fixed an issue in Dagit where the Instance Overview page would briefly flash a loading state while loading fresh data.
π₯ Breaking Changes
- π Runs that were executed in newer versions of Dagster may produce errors when their event logs are loaded in older versions of Dagit, due to new event types that were recently added. Going forward, Dagit has been made more resilient to handling new events.
π Deprecations
- π Updated deprecation warnings to clarify that the deprecated metadata APIs will be removed in 0.16.0, not 0.15.0.
Experimental
- π± If two assets are in the same group and the upstream asset has a multi-segment asset key, the downstream asset doesnβt need to specify the full asset key when declaring its dependency on the upstream asset - just the last segment.
π Documentation
- β Added dedicated sections for op, graph, and job Concept docs in the sidenav
- π Moved graph documentation from the jobs docs into its own page
- β Added documentation for assigning asset groups and viewing them in Dagit
- β Added apidoc for
AssetOut
andAssetIn
- π Fixed a typo on the Run Configuration concept page (Thanks Wenshuai Hou!)
- β‘οΈ Updated screenshots in the software-defined assets tutorial to match the new Dagit UI
- Fixed a typo in the Defining an asset section of the software-defined assets tutorial (Thanks Daniel Kim!)
-
v0.15.0 Changes
Major Changes
- π Software-defined assets are now marked fully stable and are ready for prime time - we recommend using them whenever your goal using Dagster is to build and maintain data assets.
- π± You can now organize software defined assets into groups by providing a group_name on your asset definition. These assets will be grouped together in Dagit.
- π§ Software-defined assets now accept configuration, similar to ops. E.g.
from dagster import asset @asset(config_schema={"iterations": int}) def my_asset(context): for i in range(context.op_config["iterations"]): ...
- π± Asset definitions can now be created from graphs via
AssetsDefinition.from_graph
:
@graph(out={"asset_one": GraphOut(), "asset_two": GraphOut()}) def my_graph(input_asset): ... graph_asset = AssetsDefinition.from_graph(my_graph)
execute_in_process
andGraphDefinition.to_job
now both accept aninput_values
argument, so you can pass arbitrary Python objects to the root inputs of your graphs and jobs.- π Ops that return Outputs and DynamicOutputs now work well with Python type annotations. You no longer need to sacrifice static type checking just because you want to include metadata on an output. E.g.
from dagster import Output, op @op def my_op() -> Output[int]: return Output(5, metadata={"a": "b"})
- π· You can now automatically re-execute runs from failure. This is analogous to op-level retries, except at the job level.
- π You can now supply arbitrary structured metadata on jobs, which will be displayed in Dagit.
- The partitions and backfills pages in Dagit have been redesigned to be faster and show the status of all partitions, instead of just the last 30 or so.
- π· The left navigation pane in Dagit is now grouped by repository, which makes it easier to work with when you have large numbers of jobs, especially when jobs in different repositories have the same name.
- π The Asset Details page for a software-defined asset now includes a Lineage tab, which makes it easy to see all the assets that are upstream or downstream of an asset.
π₯ Breaking Changes and Deprecations
π± Software-defined assets
π This release marks the official transition of software-defined assets from experimental to stable. We made some final changes to incorporate feedback and make the APIs as consistent as possible:
- π Support for adding tags to asset materializations, which was previously marked as experimental, has been removed.
- Some of the properties of the previously-experimental AssetsDefinition class have been renamed. group_names is now group_names_by_key, asset_keys_by_input_name is now keys_by_input_name, and asset_keys_by_output_name is now keys_by_output_name, asset_key is now key, and asset_keys is now keys.
- Removes previously experimental IO manager
fs_asset_io_manager
in favor of merging its functionality withfs_io_manager
.fs_io_manager
is now the default IO manager for asset jobs, and will store asset outputs in a directory named with the asset key. Similarly, removedadls2_pickle_asset_io_manager
,gcs_pickle_asset_io_manager
, ands3_pickle_asset_io_manager
. Instead,adls2_pickle_io_manager
,gcs_pickle_io_manager
, ands3_pickle_io_manager
now support software-defined assets. - π (deprecation) The namespace argument on the
@asset
decorator and AssetIn has been deprecated. Users should use key_prefix instead. - π (deprecation) AssetGroup has been deprecated. Users should instead place assets directly on repositories, optionally attaching resources using with_resources. Asset jobs should be defined using
define_asset_job
(replacingAssetGroup.build_job
), and arbitrary sets of assets can be materialized using the standalone function materialize (replacingAssetGroup.materialize
). - π (deprecation) The
outs
property of the previously-experimental@multi_asset
decorator now prefers a dictionary whose values areAssetOut
objects instead of a dictionary whose values areOut
objects. The latter still works, but is deprecated. - The previously-experimental property on
OpExecutionContext
calledoutput_asset_partition_key
is now deprecated in favor ofasset_partition_key_for_output
Event records
- The
get_event_records
method on DagsterInstance now requires a non-None argumentevent_records_filter
. Passing aNone
value for theevent_records_filter
argument will now raise an exception where previously it generated a deprecation warning. - Removed methods
events_for_asset_key
andget_asset_events
, which have been deprecated since 0.12.0.
Extension libraries
- π± [dagster-dbt] (breaks previously-experimental API) When using the load_assets_from_dbt_project or load_assets_from_dbt_manifest , the AssetKeys generated for dbt sources are now the union of the source name and the table name, and the AssetKeys generated for models are now the union of the configured schema name for a given model (if any), and the model name. To revert to the old behavior:
dbt_assets = load_assets_from_dbt_project(..., node_info_to_asset_key=lambda node_info: AssetKey(node_info["name"])
. - π [dagster-k8s] In the Dagster Helm chart, user code deployment configuration (like secrets, configmaps, or volumes) is now automatically included in any runs launched from that code. Previously, this behavior was opt-in. In most cases, this will not be a breaking change, but in less common cases where a user code deployment was running in a different kubernetes namespace or using a different service account, this could result in missing secrets or configmaps in a launched run that previously worked. You can return to the previous behavior where config on the user code deployment was not applied to any runs by setting the includeConfigInLaunchedRuns.enabled field to false for the user code deployment. See the Kubernetes Deployment docs for more details.
- π [dagster-snowflake] dagster-snowflake has dropped support for python 3.6. The library it is currently built on, snowflake-connector-python, dropped 3.6 support in their recent 2.7.5 release.
Other
- The
prior_attempts_count
parameter is now removed from step-launching APIs. This parameter was not being used, as the information it held was stored elsewhere in all cases. It can safely be removed from invocations without changing behavior. - π The
FileCache
class has been removed. - β± Previously, when schedules/sensors targeted jobs with the same name as other jobs in the repo, the jobs on the sensor/schedule would silently overwrite the other jobs. Now, this will cause an error.
π New since 0.14.20
- A new
define_asset_job
function allows you to define a selection of assets that should be executed together. The selection can be a simple string, or an AssetSelection object. This selection will be resolved into a set of assets once placed on the repository.
from dagster import repository, define_asset_job, AssetSelection string_selection_job = define_asset_job( name="foo_job", selection="*foo" ) object_selection_job = define_asset_job( name="bar_job", selection=AssetSelection.groups("some_group") ) @repository def my_repo(): return [ *my_list_of_assets, string_selection_job, object_selection_job, ]
- π± [dagster-dbt] Assets loaded with
load_assets_from_dbt_project
andload_assets_from_dbt_manifest
will now be sorted into groups based on the subdirectory of the project that each model resides in. @asset
and@multi_asset
are no longer considered experimental.- π± Adds new utility methods
load_assets_from_modules
,assets_from_current_module
,assets_from_package_module
, andassets_from_package_name
to fetch and return a list of assets from within the specified python modules. - π± Resources and io managers can now be provided directly on assets and source assets.
from dagster import asset, SourceAsset, resource, io_manager @resource def foo_resource(): pass @asset(resource_defs={"foo": foo_resource}) def the_resource(context): foo = context.resources.foo @io_manager def the_manager(): ... @asset(io_manager_def=the_manager) def the_asset(): ...
Note that assets provided to a job must not have conflicting resource for the same key. For a given job, all resource definitions must match by reference equality for a given key.
- A
materialize_to_memory
method which will load the materializations of a provided list of assets into memory:
from dagster import asset, materialize_to_memory @asset def the_asset(): return 5 result = materialize_to_memory([the_asset]) output = result.output_for_node("the_asset")
- π± A
with_resources
method, which allows resources to be added to multiple assets / source assets at once:
from dagster import asset, with_resources, resource @asset(required_resource_keys={"foo"}) def requires_foo(context): ... @asset(required_resource_keys={"foo"}) def also_requires_foo(context): ... @resource def foo_resource(): ... requires_foo, also_requires_foo = with_resources( [requires_foo, also_requires_foo], {"foo": foo_resource}, )
- You can now include asset definitions directly on repositories. A
default_executor_def
property has been added to the repository, which will be used on any materializations of assets provided directly to the repository.
from dagster import asset, repository, multiprocess_executor @asset def my_asset(): ... @repository(default_executor_def=multiprocess_executor) def repo(): return [my_asset]
- The
run_storage
,event_log_storage
, andschedule_storage
configuration sections of thedagster.yaml
can now be replaced by a unifiedstorage
configuration section. This should avoid duplicate configuration blocks with yourdagster.yaml
. For example, instead of:
# dagster.yaml run_storage: module: dagster_postgres.run_storage class: PostgresRunStorage config: postgres_url: { PG_DB_CONN_STRING } event_log_storage: module: dagster_postgres.event_log class: PostgresEventLogStorage config: postgres_url: { PG_DB_CONN_STRING } schedule_storage: module: dagster_postgres.schedule_storage class: PostgresScheduleStorage config: postgres_url: { PG_DB_CONN_STRING }
You can now write:
storage: postgres: postgres_url: { PG_DB_CONN_STRING }
- π± All assets where a
group_name
is not provided are now part of a group calleddefault
. - The group_name parameter value for
@asset
is now restricted to only allow letters, numbers and underscore. - π You can now set policies to automatically retry Job runs. This is analogous to op-level retries, except at the job level. By default the retries pick up from failure, meaning only failed ops and their dependents are executed.
- [dagit] The new repository-grouped left navigation is fully launched, and is no longer behind a feature flag.
- π [dagit] The left navigation can now be collapsed even when the viewport window is wide. Previously, the navigation was collapsible only for small viewports, but kept in a fixed, visible state for wide viewports. This visible/collapsed state for wide viewports is now tracked in localStorage, so your preference will persist across sessions.
- [dagit] Queued runs can now be terminated from the Run page.
- π² [dagit] The log filter on a Run page now shows counts for each filter type, and the filters have higher contrast and a switch to indicate when they are on or off.
- 0οΈβ£ [dagit] The partitions and backfill pages have been redesigned to focus on easily viewing the last run state by partition. These redesigned pages were previously gated behind a feature flag β they are now loaded by default.
- π· [dagster-k8s] Overriding labels in the K8sRunLauncher will now apply to both the Kubernetes job and the Kubernetes pod created for each run, instead of just the Kubernetes pod.
π Bugfixes
- π [dagster-dbt] In some cases, if Dagster attempted to rematerialize a dbt asset, but dbt failed to start execution, asset materialization events would still be emitted. This has been fixed.
- [dagit] On the Instance Overview page, the popover showing details of overlapping batches of runs is now scrollable.
- π [dagit] When viewing Instance Overview, reloading a repository via controls in the left navigation could lead to an error that would crash the page due to a bug in client-side cache state. This has been fixed.
- π [dagit] When scrolling through a list of runs, scrolling would sometimes get stuck on certain tags, specifically those with content overflowing the width of the tag. This has been fixed.
- π· [dagit] While viewing a job page, the left navigation item corresponding to that job will be highlighted, and the navigation pane will scroll to bring it into view.
- π [dagit] Fixed a bug where the βScaffold configβ button was always enabled.
Community Contributions
- π§ You can now provide dagster-mlflow configuration parameters as environment variables, thanks @chasleslr!
π Documentation
- β Added a guide that helps users who are familiar with ops and graphs understand how and when to use software-defined assets.
- β‘οΈ Updated and reorganized docs to document software-defined assets changes since 0.14.0.
- π The Deploying in Docker example now includes an example of using the
docker_executor
to run each step of a job in a different Docker container. - Descriptions for the top-level fields of Dagit GraphQL queries, mutations, and subscriptions have been added.