Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new ROCm 6.3 container recipe to the repo’s dockerfile/ collection, targeting the rocm/pytorch-training:v25.6 base image and layering SuperBench build/install steps plus common tooling needed for benchmarks.
Changes:
- Introduces
dockerfile/rocm6.3.x.dockerfilefor a ROCm 6.3.4 + PyTorch training base image. - Installs additional system tools (Docker client, OFED if missing, Intel MLC) and configures SSH/limits.
- Builds SuperBench third-party dependencies and installs the package with AMD worker extras.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ARG NUM_MAKE_JOBS=64 | ||
|
|
| ENV USE_HIPBLAS_COMPUTETYPE=1 | ||
| RUN python3 -m pip install --no-build-isolation .[amdworker] && \ | ||
| CXX=/opt/rocm/bin/hipcc make cppbuild && \ | ||
| make postinstall |
| # Get Ubuntu version and set as an environment variable | ||
| RUN echo "Ubuntu version: $(lsb_release -r -s)" | ||
| ARG UBUNTU_VERSION=22.04 | ||
|
|
||
| # Install OFED | ||
| ENV OFED_VERSION=24.10-1.1.4.0 | ||
| # Check if ofed_info is present and has a version | ||
| RUN if ! command -v ofed_info >/dev/null 2>&1; then \ | ||
| echo "OFED not found. Installing OFED..."; \ |
| RUN python3 -m pip install --upgrade pip wheel setuptools==65.7 && \ | ||
| python3 -c "import pkg_resources" || python3 -m pip install setuptools | ||
|
|
| RUN if ! command -v ofed_info >/dev/null 2>&1; then \ | ||
| echo "OFED not found. Installing OFED..."; \ | ||
| cd /tmp && \ | ||
| wget -q http://content.mellanox.com/ofed/MLNX_OFED-${OFED_VERSION}/MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu${UBUNTU_VERSION}-x86_64.tgz && \ |
| # Upgrading botocore/boto3 to versions compatible with urllib3 2.x. | ||
| RUN python3 -m pip install --upgrade botocore boto3 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #790 +/- ##
=======================================
Coverage 85.70% 85.70%
=======================================
Files 102 102
Lines 7703 7703
=======================================
Hits 6602 6602
Misses 1101 1101
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a new ROCm 6.3 build target to the repository’s Docker image build pipeline, enabling CI to produce a superbench/main:rocm6.3 image variant.
Changes:
- Introduces
dockerfile/rocm6.3.x.dockerfilebased onrocm/pytorch-training:v25.6, with additional system deps, Docker CLI, OFED (conditional), and SuperBench build/install steps. - Updates
.github/workflows/build-image.ymlto build and tag the new ROCm 6.3 image on the self-hosted ROCm runner.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| dockerfile/rocm6.3.x.dockerfile | New ROCm 6.3 Dockerfile building SuperBench on top of rocm/pytorch-training:v25.6. |
| .github/workflows/build-image.yml | Adds a rocm6.3 entry to the build matrix to produce/push superbench/main:rocm6.3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ADD third_party third_party | ||
| # perftest_rocm6.patch skipped — upstream perftest already includes the equivalent changes | ||
| RUN make RCCL_HOME=/opt/rocm ROCBLAS_BRANCH=release-staging/rocm-rel-6.3 HIPBLASLT_BRANCH=release-staging/rocm-rel-6.3 ROCM_VER=rocm-5.5.0 -C third_party rocm -o cpu_hpl -o cpu_stream -o megatron_lm -o rocm_megatron_lm |
| sed -i "s/[# ]*PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config && \ | ||
| sed -i "s/[# ]*PermitUserEnvironment no/PermitUserEnvironment yes/" /etc/ssh/sshd_config && \ | ||
| sed -i "s/[# ]*Port.*/Port 22/" /etc/ssh/sshd_config && \ | ||
| echo "* soft nofile 1048576\n* hard nofile 1048576" >> /etc/security/limits.conf && \ | ||
| echo "root soft nofile 1048576\nroot hard nofile 1048576" >> /etc/security/limits.conf |
|
|
||
| # Install OFED | ||
| ENV OFED_VERSION=24.10-1.1.4.0 | ||
| # Check if ofed_info is present and has a version |
| ADD . . | ||
| ENV USE_HIP_DATATYPE=1 | ||
| ENV USE_HIPBLAS_COMPUTETYPE=1 | ||
| RUN python3 -m pip install --no-build-isolation .[amdworker] && \ | ||
| CXX=/opt/rocm/bin/hipcc make cppbuild && \ | ||
| make postinstall |
| RUN python3 -m pip install --upgrade pip wheel setuptools==65.7 && \ | ||
| python3 -c "import pkg_resources" || python3 -m pip install setuptools |
| ENV OFED_VERSION=24.10-1.1.4.0 | ||
| # Check if ofed_info is present and has a version | ||
| RUN if ! command -v ofed_info >/dev/null 2>&1; then \ | ||
| echo "OFED not found. Installing OFED..."; \ | ||
| cd /tmp && \ | ||
| wget -q http://content.mellanox.com/ofed/MLNX_OFED-${OFED_VERSION}/MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu${UBUNTU_VERSION}-x86_64.tgz && \ |
Description
Add ROCm6.3 dockerfile.