The easiest way to start building a Dagster project is by using the dagster project CLI. This CLI tool helps generate files and folder structures that enable you to quickly get started with Dagster.
The command dagster project scaffold generates a folder structure with a single Dagster code location and other files such as pyproject.toml and setup.py. This helps you to quickly start with an empty project with everything set up.
Here's a breakdown of the files and directories that are generated:
File/Directory
Description
my_dagster_project/
A Python package that contains your new Dagster code.
my_dagster_project_tests/
A Python package that contains tests for my_dagster_project.
README.md
A description and starter guide for your new Dagster project.
pyproject.toml
A file that specifies package core metadata in static, tool-agnostic way. It includes a tool.dagster section which references to the Python package with your Dagster definitions defined and discoverable at the top-level. This allows you to type justdagit without any parameters in order to load your Dagster code. Visit Code Locations to learn more.
Note:pyproject.toml was introduced in PEP-518 and meant to replace setup.py, but we may still include a setup.py for compatibility with tools that do not use this spec.
setup.py
A build script with Python package dependencies for your new project as a package.
setup.cfg
An ini file that contains option defaults for setup.py commands.
Inside of the my_dagster_project/ directory, the following files and directories are generated:
File/Directory
Description
my_dagster_project/__init__.py
The __init__.py file includes a Definitions object that contains all the definitions defined within your project. A definition can be an asset, a job, a schedule, a sensor, or a resource. This allows Dagster to load the definitions in an installed package. Refer to the Code locations documentation to learn other ways to deploy and load your Dagster code.
Note: As your project grows, we recommend organizing assets in sub-packages or sub-modules. For example, you can put all analytics related assets in a my_dagster_project/assets/analytics/folder and use load_assets_from_package_module in the top-level definitions to load them, rather than needing to manually add assets to the top-level definitions every time you define one. Similarly, you can also use load_assets_from_modules to load assets from single Python files. Read more about best practices in the Fully Featured Project guide.
The command dagster project from-example downloads one of the official Dagster examples to the current directory. This command enables you to quickly bootstrap your project with an officially maintained example.
For more info about the examples, visit the Dagster GitHub repository or use dagster project list-examples.
The newly generated my-dagster-project directory is a fully functioning Python package and can be installed with pip. To install it as a package and its Python dependencies, run:
pip install -e ".[dev]"
By using the --editable flag, pip will install your code location as a Python package in "editable mode" so that as you develop, local code changes will automatically apply.
Environment variables, which are key-value pairs configured outside your source code, allow you to dynamically modify application behavior depending on environment.
Using environment variables, you can define various configuration options for your Dagster application and securely set up secrets. For example, instead of hard-coding database credentials - which is bad practice and cumbersome for development - you can use environment variables to supply user details. This allows you to parameterize your pipeline without modifying code or insecurely storing sensitive data.
Start a daemon process in the same folder as your pyproject.toml file, but in a different shell or terminal:
dagster-daemon run
The $DAGSTER_HOME environment variable must be set to a directory for the daemon to work. Note: using directories within /tmp/ may cause issues. See Dagster Instance default local behavior for more details.
Once your daemon process is running, you can start turning on schedules and sensors for your jobs.