Skip to content

[train] Check when PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True can be set #1405

@CharlieFRuan

Description

@CharlieFRuan

It is useful for saving memory consumption. But we need to make sure it does not break things.

From my understanding it can be safely set for non-colocated mode on the training node. For colocated setting, we can set it dynamically since I think it doesn't work well with CUDA IPC weight sync

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions