It is useful for saving memory consumption. But we need to make sure it does not break things.
From my understanding it can be safely set for non-colocated mode on the training node. For colocated setting, we can set it dynamically since I think it doesn't work well with CUDA IPC weight sync
It is useful for saving memory consumption. But we need to make sure it does not break things.
From my understanding it can be safely set for non-colocated mode on the training node. For colocated setting, we can set it dynamically since I think it doesn't work well with CUDA IPC weight sync