Skip to content

Implement Gang-scheduling #393

@auhlig

Description

@auhlig

Description

Cortex should support gang-/co-scheduling for Kubernetes workloads.
A job starts only when all pods in the gang and their required resources are available at the same time.

Objectives

  • Implement a controller that groups pods into a gang. Consider prior-art on how to define these.
  • Track gang readiness and resource feasibility
  • Submit a joint scheduling request to Cortex. Consider a single request or using Cortex' reservation feature to allocate resources.
  • Bind all pods only after Cortex confirms full gang placement
  • Provide minimal metrics/logs
  • Documentation

Acceptance Criteria

  • Gang-scheduling implemented
  • Pods of a gang schedule only when the full gang can be placed
  • Basic e2e tests

Dependencies

N/A

Additional Notes

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions