Skip to content

fix(airflow): fresh Docker setup — Postgres SCRAM, Shapely/GEOS, image bump#428

Open
lprzychodzien wants to merge 1 commit intomainfrom
setup_fix
Open

fix(airflow): fresh Docker setup — Postgres SCRAM, Shapely/GEOS, image bump#428
lprzychodzien wants to merge 1 commit intomainfrom
setup_fix

Conversation

@lprzychodzien
Copy link
Copy Markdown
Collaborator

Summary

Improves a clean clone / first docker compose up experience for the Airflow image: fixes Postgres SCRAM authentication by pinning a modern psycopg2-binary, adds GEOS for Shapely (pulled in via Google/BigQuery stack), and bumps the tagged Airflow image version so rebuilds pick up the Dockerfile changes. Simplifies the postgres service compose definition by removing an unused build block while keeping shm_size.


Motivation / problem

  • New setups were hitting Postgres auth / driver issues (SCRAM-SHA-256 vs older libpq / psycopg2 in the base Airflow 2.5 image).
  • google-cloud-bigquery brings in Shapely; wheels can still need libgeos_c at runtime without the system library installed.
  • Image tag in docker-compose.yml should increment when airflow/requirements.txt or the Dockerfile changes so people don’t reuse a stale cached image.

Changes

  • airflow/Dockerfile: run as root to apt-get install libgeos-c1v5, then USER airflow; COPY requirements with correct ownership; pip uninstall psycopg2 / psycopg2-binary before pip install -r requirements.txt to avoid messy force-reinstall and large temp disk use.
  • airflow/requirements.txt: add psycopg2-binary>=2.9.9,<3 (comment in file already ties this to bumping the compose image version).
  • docker-compose.yml: bump sagerx_airflow image tag v0.0.2718v0.0.2721; postgres: drop build: context: ., keep shm_size: "4gb".

How we tested

Use language that matches what you actually did; this is the minimal honest checklist for this PR:

  1. Clean rebuild (forces new image tag + Dockerfile)
    From repo root, with Docker running:

    • docker compose build --no-cache airflow-init
    • or docker compose build --no-cache for all buildable services
      Confirm the build completes without apt/pip errors.
  2. Init + stack

    • docker compose up airflow-init
    • docker compose up (or up -d)
      Confirm airflow-init exits successfully (DB upgrade / user creation) and airflow-webserver / airflow-scheduler stay healthy.
  3. Postgres connectivity from Airflow

    • Open Airflow UI (README: http://localhost:8001, airflow / airflow).
    • Trigger or open a DAG that uses the postgres_default (or equivalent) connection, or check logs for no SCRAM, password authentication failed, or psycopg2 / libpq errors on scheduler/webserver startup.
  4. Optional: Shapely / BigQuery path
    If you have a task that imports shapely or uses BigQuery client code in the worker image, run that path once and confirm no ImportError / libgeos_c missing-library errors.

  5. Postgres service still has shared memory

    • docker compose config and verify postgres still has shm_size: "4gb" under the service definition.

If you did not run a full clean rebuild, say instead: “Verified on existing machine with docker compose build airflow-init and docker compose up” so reviewers know the scope.


Risk / rollout notes

  • Image tag bump: anyone on the old tag must pull/rebuild; that’s intentional so the fixed image is used.
  • Postgres compose: removing build is safe only if nothing relied on a custom postgres image from .; the diff assumes that build was redundant (image is still postgres:14-alpine).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant