Conversation
|
cscs-ci run |
alexnick83
left a comment
There was a problem hiding this comment.
Very good work. Some comments and questions follow.
|
|
||
| memlet = edge.data | ||
|
|
||
| self.copy_shape = memlet.subset.size_exact() |
There was a problem hiding this comment.
Extraneous due to lines 37, 39, and 42?
There was a problem hiding this comment.
True, I will update.
| """Remove size-1 dims; keep tile strides; default to [1] if none remain.""" | ||
| n = len(subset) | ||
| collapsed = [st for st, sz in zip(strides, subset.size()) if sz != 1] | ||
| collapsed.extend(strides[n:]) # include tiles |
There was a problem hiding this comment.
What are these tile strides exactly, and why are they in n:? This implies that the length of strides may be greater than the length of subset (n), but then how would zip in the above line work?
There was a problem hiding this comment.
It is some workaround around src_subset, dst_subset and other_susbet shenaningans.
Techinically it should be always matching in size, I update accordingly to:
def _collapse_strides(strides, subset):
assert len(strides) == len(subset.size())
collapsed = [st for st, sz in zip(strides, subset.size()) if sz != 1]
return collapsed or [1]
| - We are not currently generating kernel code | ||
| - The copy occurs between two AccessNodes | ||
| - The data descriptors of source and destination are not views. | ||
| - The storage types of either src or dst is CPU_Pinned or GPU_Device |
There was a problem hiding this comment.
Yes, you are correct should be GPU global
|
|
||
| else: | ||
| # sanity check | ||
| assert num_dims > 2, f"Expected copy shape with more than 2 dimensions, but got {num_dims}." |
There was a problem hiding this comment.
A bit overzealous; is this supposed to catch num_dims == 0?
| continue | ||
|
|
||
| # If the subset has more than 2 dimensions and is not contiguous (represented as a 1D memcpy) then fallback to a copy kernel | ||
| if len(edge.data.subset) > 2 and not edge.data.subset.is_contiguous_subset( |
There was a problem hiding this comment.
Why isn't this check in OutOfKernelCopyStrategy.applicable?
There was a problem hiding this comment.
not applicable -> skip entirely (not a GPU out-of-kernel copy)
applicable + >2D non-contiguous -> fallback mapped tasklet
applicable + expressible as memcpy -> generate memcpy code
For the if branch at 135 we have an else 170. Both cases are applicable but require different patterns.
…ategies.py Co-authored-by: alexnick83 <31545860+alexnick83@users.noreply.github.com>
…ategies.py Co-authored-by: alexnick83 <31545860+alexnick83@users.noreply.github.com>
…t_and_memcpy.py Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
…o memcpy_map_to_libnode_pass
This pass inserts GPU global copies as tasklets for explicit scheduling later.
Since this is a pass aimed at GPU specialization, I decided not use copy nodes. (Note: Right now, copy library nodes do not exist in the main branch, but I have a separate PR upcoming for copy and memset library nodes with a pass that converts memcpy and memset kernels to use these library nodes.