Skip to content

Feat/moore swiglu#24

Open
gongchensu wants to merge 4 commits intoInfiniTensor:feat/dev-infrafrom
gongchensu:feat/moore-swiglu
Open

Feat/moore swiglu#24
gongchensu wants to merge 4 commits intoInfiniTensor:feat/dev-infrafrom
gongchensu:feat/moore-swiglu

Conversation

@gongchensu
Copy link

@gongchensu gongchensu commented Mar 16, 2026

A100编译及算子测试:
image
沐曦编译测试:
img_v3_02vr_1e1b787d-f6d9-49d8-a034-9880a561957g
寒武纪编译测试:
image
天数编译测试:
image
摩尔算子测试:
image

gongchensu and others added 4 commits March 13, 2026 09:13
* refactor  metadata handling to share workspace layout across GPU backends
* update CUDA/NVIDIA  to use workspace-backed shape and stride metadata
* add MUSA source build rules for Moore
@gongchensu gongchensu self-assigned this Mar 16, 2026
@gongchensu gongchensu requested a review from voltjia March 16, 2026 07:37
"operator `Swiglu` requires all input and output tensors to have the "
"same dtype");
"Operator `Swiglu` requires all input and output tensors to have the "
"same dtype.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要这个更改,贡献指南C++的第四点

if (required_workspace_size != 0) {
assert(workspace != nullptr && "`Swiglu` requires a workspace buffer.");
assert(workspace_size >= required_workspace_size &&
"`workspace_size_in_bytes` is insufficient for `Swiglu`.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,末尾无需句号,其他文件也麻烦检查更正一下。

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个按理来说应该是基本复用 cuda/swiglu/kernel.h 的,可以参考一下 nvidia 的实现。

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按理来说这整个文件的内容都是不需要的,基本都是重复的部分,可以参考 add 的类 CUDA 实现(cuda/add/, nvidia/add/, 和 metax/add/)以及参考 common/cuda/kernel_commons.h 里已有的公共函数。

namespace infini::ops::swiglu::moore {

using cuda_bfloat16 = mt_bfloat16;
using cuda_bfloat162 = mt_bfloat162;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个要有应该是在 common 里的,可以看一下其他平台的做法。

} else {
return static_cast<T>(1 / (1 + std::exp(-x)));
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个的存在似乎没有意义。同理,如果摩尔和其他类 CUDA 一样,应该直接复用,还是参考 add.

float sig0 = __low2float(sig);
float sig1 = __high2float(sig);
float up0 = __low2float(up);
float up1 = __high2float(up);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果摩尔在个别地方和其他类 CUDA 不一样,无法直接复用某个部分的逻辑,应该只针对这个情况,其他地方仍复用统一的 cuda 实现。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants