dstack uses Alembic to manage database migrations. If you modify any SQLAlchemy
models or related data structures,
generate a new migration with Alembic:
cd src/dstack/_internal/server/
alembic revision -m "<some message>" --autogenerateThen adjust the generated migration if needed.
The dstack server claims to support multi-replica setups with zero-downtime deployments.
This means DB migrations should not make changes that break old replicas.
Incompatible changes should be introduced in multiple stages (releases), following
the expand and contract pattern.
Note: If it's impossible to make the migration compatible with older versions, the PR should say so explicitly, so that the change is planned and released with the migration notice.
Below are some common changes and how to make them.
- First release:
- Stop reading the column. In SQLAlchemy this can be done by setting
deferred=Trueon a model field.
- Stop reading the column. In SQLAlchemy this can be done by setting
- Second release:
- Drop the column.
These steps apply to renaming a column or changing the type of a column
- First release:
- Introduce a new column with the new type.
- Write to both the new and the old column.
- Second release:
- Migrate data from the old column to the new column.
- Start reading the new column.
- Stop reading the old column.
- Stop writing to the old column.
- Third release:
- Drop the old column.
Altering a table requires Postgres to take an ACCESS EXCLUSIVE lock. (This applies not only to statements that rewrite the tables but also to statements that modify tables metadata.) Altering multiple tables can cause deadlocks due to conflict with read operations since the dstack server does not define an order for read operations. Altering multiple tables should be done in separate transactions/migrations.
Use CREATE INDEX CONCURRENTLY to avoid tacking exclusive lock on the table for a long time.
For migrations that create multiple indexes, failures can leave the schema in a partial state
(some indexes already created, some missing). On Postgres, concurrent index creation can also fail
midway and leave an invalid index object with the same name. Retrying the migration then fails
with "already exists".
For retry-safe migrations, pre-drop indexes with if_exists=True before creating them again:
with op.get_context().autocommit_block():
op.drop_index(
"ix_table_col",
table_name="table",
if_exists=True,
postgresql_concurrently=True,
)
op.create_index(
"ix_table_col",
"table",
["col"],
postgresql_concurrently=True,
)