Skip to content

Explicit gpu global copies#2260

Open
ThrudPrimrose wants to merge 82 commits intomainfrom
explicit-gpu-global-copies
Open

Explicit gpu global copies#2260
ThrudPrimrose wants to merge 82 commits intomainfrom
explicit-gpu-global-copies

Conversation

@ThrudPrimrose
Copy link
Copy Markdown
Collaborator

@ThrudPrimrose ThrudPrimrose commented Jan 6, 2026

This pass inserts GPU global copies as tasklets for explicit scheduling later.

Since this is a pass aimed at GPU specialization, I decided not use copy nodes. (Note: Right now, copy library nodes do not exist in the main branch, but I have a separate PR upcoming for copy and memset library nodes with a pass that converts memcpy and memset kernels to use these library nodes.

@phschaad
Copy link
Copy Markdown
Collaborator

cscs-ci run

Copy link
Copy Markdown
Contributor

@alexnick83 alexnick83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good work. Some comments and questions follow.


memlet = edge.data

self.copy_shape = memlet.subset.size_exact()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraneous due to lines 37, 39, and 42?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I will update.

"""Remove size-1 dims; keep tile strides; default to [1] if none remain."""
n = len(subset)
collapsed = [st for st, sz in zip(strides, subset.size()) if sz != 1]
collapsed.extend(strides[n:]) # include tiles
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these tile strides exactly, and why are they in n:? This implies that the length of strides may be greater than the length of subset (n), but then how would zip in the above line work?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is some workaround around src_subset, dst_subset and other_susbet shenaningans.
Techinically it should be always matching in size, I update accordingly to:

        def _collapse_strides(strides, subset):
            assert len(strides) == len(subset.size())
            collapsed = [st for st, sz in zip(strides, subset.size()) if sz != 1]
            return collapsed or [1]

Comment thread dace/transformation/passes/gpu_specialization/helpers/copy_strategies.py Outdated
- We are not currently generating kernel code
- The copy occurs between two AccessNodes
- The data descriptors of source and destination are not views.
- The storage types of either src or dst is CPU_Pinned or GPU_Device
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU global?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct should be GPU global


else:
# sanity check
assert num_dims > 2, f"Expected copy shape with more than 2 dimensions, but got {num_dims}."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit overzealous; is this supposed to catch num_dims == 0?

Comment thread dace/transformation/passes/gpu_specialization/helpers/copy_strategies.py Outdated
Comment thread dace/transformation/passes/gpu_specialization/helpers/copy_strategies.py Outdated
Comment thread dace/transformation/passes/gpu_specialization/helpers/copy_strategies.py Outdated
continue

# If the subset has more than 2 dimensions and is not contiguous (represented as a 1D memcpy) then fallback to a copy kernel
if len(edge.data.subset) > 2 and not edge.data.subset.is_contiguous_subset(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this check in OutOfKernelCopyStrategy.applicable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not applicable -> skip entirely (not a GPU out-of-kernel copy)
applicable + >2D non-contiguous -> fallback mapped tasklet
applicable + expressible as memcpy -> generate memcpy code

For the if branch at 135 we have an else 170. Both cases are applicable but require different patterns.

Comment thread dace/dtypes.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants