Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.
This repository was archived by the owner on May 12, 2021. It is now read-only.

Count number of times partitioned tasks reenter the cluster as healthy #30

@DavidMcLaughlin

Description

@DavidMcLaughlin

Currently when a task is PARTITIONED and LOST, Aurora reschedules a replacement. Later on, the task can send a message saying it was healthy and then Aurora will kill the old task. Receiving this signal is a huge indicator that you could avoid unnecessary churn in the cluster by extending timeouts.

Add a metric to monitor how often this use case happens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions