Open
Conversation
* refactor metadata handling to share workspace layout across GPU backends * update CUDA/NVIDIA to use workspace-backed shape and stride metadata * add MUSA source build rules for Moore
Ziminli
requested changes
Mar 17, 2026
| "operator `Swiglu` requires all input and output tensors to have the " | ||
| "same dtype"); | ||
| "Operator `Swiglu` requires all input and output tensors to have the " | ||
| "same dtype."); |
| if (required_workspace_size != 0) { | ||
| assert(workspace != nullptr && "`Swiglu` requires a workspace buffer."); | ||
| assert(workspace_size >= required_workspace_size && | ||
| "`workspace_size_in_bytes` is insufficient for `Swiglu`."); |
There was a problem hiding this comment.
这个按理来说应该是基本复用 cuda/swiglu/kernel.h 的,可以参考一下 nvidia 的实现。
There was a problem hiding this comment.
按理来说这整个文件的内容都是不需要的,基本都是重复的部分,可以参考 add 的类 CUDA 实现(cuda/add/, nvidia/add/, 和 metax/add/)以及参考 common/cuda/kernel_commons.h 里已有的公共函数。
| namespace infini::ops::swiglu::moore { | ||
|
|
||
| using cuda_bfloat16 = mt_bfloat16; | ||
| using cuda_bfloat162 = mt_bfloat162; |
| } else { | ||
| return static_cast<T>(1 / (1 + std::exp(-x))); | ||
| } | ||
| } |
There was a problem hiding this comment.
这个的存在似乎没有意义。同理,如果摩尔和其他类 CUDA 一样,应该直接复用,还是参考 add.
| float sig0 = __low2float(sig); | ||
| float sig1 = __high2float(sig); | ||
| float up0 = __low2float(up); | ||
| float up1 = __high2float(up); |
There was a problem hiding this comment.
如果摩尔在个别地方和其他类 CUDA 不一样,无法直接复用某个部分的逻辑,应该只针对这个情况,其他地方仍复用统一的 cuda 实现。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A100编译及算子测试:





沐曦编译测试:
寒武纪编译测试:
天数编译测试:
摩尔算子测试: