diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..238087f --- /dev/null +++ b/.gitignore @@ -0,0 +1,12 @@ +# 构建与运行产物(不提交) +*.exe +*.lib +*.exp +*.obj +*.o +*.bin +*.nsys-rep +*.sqlite +# 脚本生成的数据(可用 generate_particles.py 重新生成) +data/particles_1k.txt +data/particles_10k.txt diff --git a/09_particle_sim/README.md b/09_particle_sim/README.md index 2c397fe..d9b7d08 100644 --- a/09_particle_sim/README.md +++ b/09_particle_sim/README.md @@ -1,21 +1,5 @@ -## 简介 +# 09_particle_sim -你是粒子对撞机项目的核心算法工程师,负责模拟粒子在加速器中的运动轨迹,以优化探测器布局和碰撞参数。对撞机每天产生海量粒子数据,但物理学家需要直观地看到粒子如何在磁场中偏转,以验证理论模型。 +带电粒子在磁场中的运动轨迹模拟(CUDA + Python 可视化)。 -你的任务是开发一个 GPU 加速的粒子追踪模拟器,能够并行计算数千个带电粒子在复杂磁场中的轨迹,并输出数据供 Python 可视化。最终,你将提供一个交互式 3D 动画,展示粒子束的螺旋运动、偏转和碰撞过程,帮助科学家理解实验现象。该模拟器将成为 “星环” 项目的重要设计工具。 - -## 任务内容 -开发 CUDA 程序模拟粒子运动,并输出轨迹点供 Python 可视化。任务包括: -1. 解析输入文件,将粒子初始状态拷贝到 GPU。 -2. 使用并行数值积分更新每个粒子的位置和速度。 -3. 按指定间隔记录粒子位置到全局数组。 -4. 将轨迹数据传输回主机,保存为文件。 -5. 提供 Python 脚本读取数据并生成动态 3D 轨迹图。 - -## 📬 有疑问? - -更多详细信息和要求请参考本季度项目文档。 - -可以在项目群里直接询问导师和助教! - -Good luck and happy coding! 🚀 +本选题提交见子目录 **nagadoyuji/**。 diff --git a/09_particle_sim/nagadoyuji/.clang-format b/09_particle_sim/nagadoyuji/.clang-format new file mode 100644 index 0000000..287dafa --- /dev/null +++ b/09_particle_sim/nagadoyuji/.clang-format @@ -0,0 +1,5 @@ +# Default clang-format style for CUDA/C++ +BasedOnStyle: LLVM +IndentWidth: 4 +ColumnLimit: 100 +Language: Cpp diff --git a/09_particle_sim/nagadoyuji/.gitignore b/09_particle_sim/nagadoyuji/.gitignore new file mode 100644 index 0000000..238087f --- /dev/null +++ b/09_particle_sim/nagadoyuji/.gitignore @@ -0,0 +1,12 @@ +# 构建与运行产物(不提交) +*.exe +*.lib +*.exp +*.obj +*.o +*.bin +*.nsys-rep +*.sqlite +# 脚本生成的数据(可用 generate_particles.py 重新生成) +data/particles_1k.txt +data/particles_10k.txt diff --git a/09_particle_sim/nagadoyuji/Makefile b/09_particle_sim/nagadoyuji/Makefile new file mode 100644 index 0000000..88925cd --- /dev/null +++ b/09_particle_sim/nagadoyuji/Makefile @@ -0,0 +1,25 @@ +# Particle simulation in uniform magnetic field (CUDA) +# Requires: CUDA Toolkit 11+, C++17 + +NVCC = nvcc +NVCCFLAGS = -std=c++17 -O2 +TARGET = particle_sim + +SRCS = main.cu +OBJS = main.o + +.PHONY: all clean run format + +all: $(TARGET) + +$(TARGET): $(SRCS) + $(NVCC) $(NVCCFLAGS) -o $(TARGET) $(SRCS) + +clean: + rm -f $(TARGET) $(OBJS) *.bin + +run: $(TARGET) + ./$(TARGET) data/particles.txt data/field.txt data/params.txt out.bin + +format: + clang-format -i main.cu diff --git a/09_particle_sim/nagadoyuji/README.md b/09_particle_sim/nagadoyuji/README.md new file mode 100644 index 0000000..de63aca --- /dev/null +++ b/09_particle_sim/nagadoyuji/README.md @@ -0,0 +1,46 @@ +# 09_particle_sim — nagadoyuji + +带电粒子在磁场中的运动轨迹 CUDA 模拟,输出轨迹供 Python 3D 可视化。 + +- **作者 / 提交 ID**:nagadoyuji +- **总结报告**:[总结报告.md](总结报告.md) + +## 快速开始 + +```bash +make +./particle_sim data/particles.txt data/field.txt data/params.txt out.bin +python visualize.py out.bin +``` + +## 目录说明 + +| 文件 | 说明 | +|------|------| +| main.cu | CUDA 主程序(解析、Boris 积分、记录、二进制输出) | +| Makefile | 构建(`make` / `make format` 格式化) | +| .clang-format | 代码格式配置 | +| data/ | 粒子、磁场、参数示例 | +| visualize.py | 3D 轨迹动画(Matplotlib FuncAnimation) | +| verify_single_particle.py | 单粒子回旋半径正确性验证 | +| generate_particles.py | 生成 1000 粒子等规模测试数据 | +| generate_B_grid.py | 生成 3D 磁场网格二进制文件 | + +## 正确性验证 + +```bash +./particle_sim data/particles_1.txt data/field.txt data/params_verify.txt out.bin +python verify_single_particle.py out.bin data/particles_1.txt data/field.txt +``` + +## 规模测试(≥1000 粒子) + +```bash +python generate_particles.py -n 1000 -o data/particles_1k.txt +./particle_sim data/particles_1k.txt data/field.txt data/params.txt out.bin +``` + +## 环境 + +- CUDA Toolkit 11+,C++17 +- Python 3:numpy, matplotlib diff --git a/09_particle_sim/nagadoyuji/data/field.txt b/09_particle_sim/nagadoyuji/data/field.txt new file mode 100644 index 0000000..a10d505 --- /dev/null +++ b/09_particle_sim/nagadoyuji/data/field.txt @@ -0,0 +1 @@ +0.0 0.0 1.0 diff --git a/09_particle_sim/nagadoyuji/data/params.txt b/09_particle_sim/nagadoyuji/data/params.txt new file mode 100644 index 0000000..92ea5e2 --- /dev/null +++ b/09_particle_sim/nagadoyuji/data/params.txt @@ -0,0 +1,3 @@ +dt = 1e-10 # 时间步长(秒) +num_steps = 10000 # 总步数 +record_interval = 100 # 每多少步记录一次位置 diff --git a/09_particle_sim/nagadoyuji/data/params_verify.txt b/09_particle_sim/nagadoyuji/data/params_verify.txt new file mode 100644 index 0000000..bba237f --- /dev/null +++ b/09_particle_sim/nagadoyuji/data/params_verify.txt @@ -0,0 +1,3 @@ +dt = 1e-12 # 较小步长便于验证 +num_steps = 100000 # 多步 +record_interval = 1000 # 每 1000 步记录 diff --git a/09_particle_sim/nagadoyuji/data/particles.txt b/09_particle_sim/nagadoyuji/data/particles.txt new file mode 100644 index 0000000..9748b57 --- /dev/null +++ b/09_particle_sim/nagadoyuji/data/particles.txt @@ -0,0 +1,2 @@ +0.0 0.0 0.0 1.0e6 0.0 0.0 1.6e-19 9.1e-31 +0.0 1.0 0.0 0.0 1.0e6 0.0 -1.6e-19 9.1e-31 diff --git a/09_particle_sim/nagadoyuji/data/particles_1.txt b/09_particle_sim/nagadoyuji/data/particles_1.txt new file mode 100644 index 0000000..7d15f89 --- /dev/null +++ b/09_particle_sim/nagadoyuji/data/particles_1.txt @@ -0,0 +1 @@ +0.0 0.0 0.0 1.0e6 0.0 0.0 1.6e-19 9.1e-31 diff --git a/09_particle_sim/nagadoyuji/generate_B_grid.py b/09_particle_sim/nagadoyuji/generate_B_grid.py new file mode 100644 index 0000000..0d99ad9 --- /dev/null +++ b/09_particle_sim/nagadoyuji/generate_B_grid.py @@ -0,0 +1,36 @@ +#!/usr/bin/env python3 +"""Generate a binary B-field grid file for 3D grid mode. +Format: 3 int32 (nx,ny,nz), 3 float32 (ox,oy,oz), 3 float32 (dx,dy,dz), + then nx*ny*nz*3 float32 (Bx,By,Bz per cell, order i,j,k).""" +import struct +import argparse + +def main(): + ap = argparse.ArgumentParser() + ap.add_argument("-o", "--output", default="data/field_grid.bin") + ap.add_argument("--nx", type=int, default=4) + ap.add_argument("--ny", type=int, default=4) + ap.add_argument("--nz", type=int, default=4) + ap.add_argument("--ox", type=float, default=-0.01) + ap.add_argument("--oy", type=float, default=-0.01) + ap.add_argument("--oz", type=float, default=-0.01) + ap.add_argument("--dx", type=float, default=0.01) + ap.add_argument("--dy", type=float, default=0.01) + ap.add_argument("--dz", type=float, default=0.01) + ap.add_argument("--Bz", type=float, default=1.0) + args = ap.parse_args() + nx, ny, nz = args.nx, args.ny, args.nz + ox, oy, oz = args.ox, args.oy, args.oz + dx, dy, dz = args.dx, args.dy, args.dz + with open(args.output, "wb") as f: + f.write(struct.pack("iii", nx, ny, nz)) + f.write(struct.pack("fff", ox, oy, oz)) + f.write(struct.pack("fff", dx, dy, dz)) + for k in range(nz): + for j in range(ny): + for i in range(nx): + f.write(struct.pack("fff", 0.0, 0.0, args.Bz)) + print(f"Wrote grid {nx}x{ny}x{nz} to {args.output}") + +if __name__ == "__main__": + main() diff --git a/09_particle_sim/nagadoyuji/generate_particles.py b/09_particle_sim/nagadoyuji/generate_particles.py new file mode 100644 index 0000000..5892bae --- /dev/null +++ b/09_particle_sim/nagadoyuji/generate_particles.py @@ -0,0 +1,26 @@ +#!/usr/bin/env python3 +"""Generate particles.txt with N particles (default 1000) for scale testing.""" +import argparse +import random + +def main(): + ap = argparse.ArgumentParser() + ap.add_argument("-n", "--num", type=int, default=1000, help="Number of particles") + ap.add_argument("-o", "--output", default="data/particles_1k.txt", help="Output path") + args = ap.parse_args() + random.seed(42) + with open(args.output, "w") as f: + for i in range(args.num): + x = random.uniform(-0.01, 0.01) + y = random.uniform(-0.01, 0.01) + z = random.uniform(-0.01, 0.01) + vx = random.uniform(-1e6, 1e6) + vy = random.uniform(-1e6, 1e6) + vz = random.uniform(-0.5e6, 0.5e6) + q = 1.6e-19 if i % 2 == 0 else -1.6e-19 + m = 9.1e-31 + f.write(f"{x} {y} {z} {vx} {vy} {vz} {q} {m}\n") + print(f"Wrote {args.num} particles to {args.output}") + +if __name__ == "__main__": + main() diff --git a/09_particle_sim/nagadoyuji/main.cu b/09_particle_sim/nagadoyuji/main.cu new file mode 100644 index 0000000..c6a4665 --- /dev/null +++ b/09_particle_sim/nagadoyuji/main.cu @@ -0,0 +1,321 @@ +/** + * Particle motion simulation in uniform magnetic field (CUDA). + * Reads particle init, B-field, and params; runs numerical integration on GPU; + * records trajectory at intervals and writes binary output for Python visualization. + */ +#include +#include +#include +#include +#include +#include +#include +#include + +// ----------------------------------------------------------------------------- +// Config and data (host-side) +// ----------------------------------------------------------------------------- +struct SimParams { + double dt = 1e-10; + int num_steps = 10000; + int record_interval = 100; +}; + +// SoA on host: N particles +std::vector h_pos; // 3*N +std::vector h_vel; // 3*N +std::vector h_q; // N +std::vector h_m; // N +float h_B[3] = {0, 0, 1.0f}; +int N = 0; +bool use_B_grid = false; +int g_nx = 0, g_ny = 0, g_nz = 0; +float g_ox = 0, g_oy = 0, g_oz = 0, g_dx = 1, g_dy = 1, g_dz = 1; +std::vector h_Bgrid; // 3 * nx * ny * nz + +SimParams params; + +// ----------------------------------------------------------------------------- +// Input parsing: particles, uniform/grid B-field, simulation params +// ----------------------------------------------------------------------------- +bool load_particles(const char* path) { + std::ifstream f(path); + if (!f) { fprintf(stderr, "Cannot open %s\n", path); return false; } + std::string line; + while (std::getline(f, line)) { + if (line.empty()) continue; + float x, y, z, vx, vy, vz, q, m; + if (sscanf(line.c_str(), "%f %f %f %f %f %f %f %f", &x, &y, &z, &vx, &vy, &vz, &q, &m) != 8) + continue; + h_pos.push_back(x); h_pos.push_back(y); h_pos.push_back(z); + h_vel.push_back(vx); h_vel.push_back(vy); h_vel.push_back(vz); + h_q.push_back(q); h_m.push_back(m); + } + N = (int)h_q.size(); + return N > 0; +} + +bool load_field(const char* path) { + std::ifstream f(path); + if (!f) { fprintf(stderr, "Cannot open %s\n", path); return false; } + std::string line; + if (!std::getline(f, line)) return false; + if (line.find("grid") != std::string::npos) { + use_B_grid = true; + std::string grid_path; + if (!std::getline(f, grid_path)) return false; + while (!grid_path.empty() && (grid_path.back() == '\r' || grid_path.back() == ' ')) grid_path.pop_back(); + FILE* bin = fopen(grid_path.c_str(), "rb"); + if (!bin) { fprintf(stderr, "Cannot open grid %s\n", grid_path.c_str()); return false; } + int nx, ny, nz; + if (fread(&nx, sizeof(int), 1, bin) != 1 || fread(&ny, sizeof(int), 1, bin) != 1 || fread(&nz, sizeof(int), 1, bin) != 1) { + fclose(bin); return false; + } + float ox, oy, oz, dx, dy, dz; + if (fread(&ox, sizeof(float), 1, bin) != 1 || fread(&oy, sizeof(float), 1, bin) != 1 || fread(&oz, sizeof(float), 1, bin) != 1 || + fread(&dx, sizeof(float), 1, bin) != 1 || fread(&dy, sizeof(float), 1, bin) != 1 || fread(&dz, sizeof(float), 1, bin) != 1) { + fclose(bin); return false; + } + g_nx = nx; g_ny = ny; g_nz = nz; + g_ox = ox; g_oy = oy; g_oz = oz; g_dx = dx; g_dy = dy; g_dz = dz; + size_t count = (size_t)nx * ny * nz * 3; + h_Bgrid.resize(count); + if (fread(h_Bgrid.data(), sizeof(float), count, bin) != count) { fclose(bin); return false; } + fclose(bin); + return true; + } + if (sscanf(line.c_str(), "%f %f %f", &h_B[0], &h_B[1], &h_B[2]) != 3) return false; + return true; +} + +bool load_params(const char* path) { + std::ifstream f(path); + if (!f) { fprintf(stderr, "Cannot open %s\n", path); return false; } + std::string line; + while (std::getline(f, line)) { + size_t i = line.find('#'); + if (i != std::string::npos) line.resize(i); + double v; + if (sscanf(line.c_str(), " dt = %lf", &v) == 1) params.dt = v; + else if (sscanf(line.c_str(), " num_steps = %d", ¶ms.num_steps) == 1) {} + else if (sscanf(line.c_str(), " record_interval = %d", ¶ms.record_interval) == 1) {} + } + return true; +} + +// ----------------------------------------------------------------------------- +// CUDA kernels: Boris push (magnetic field only), then position update +// ----------------------------------------------------------------------------- +__device__ void cross(float ax, float ay, float az, float bx, float by, float bz, + float* ox, float* oy, float* oz) { + *ox = ay * bz - az * by; + *oy = az * bx - ax * bz; + *oz = ax * by - ay * bx; +} + +__global__ void integrate_kernel(float* pos, float* vel, const float* q, const float* m, + float Bx, float By, float Bz, float dt, int n) { + int i = blockIdx.x * blockDim.x + threadIdx.x; + if (i >= n) return; + float qm = q[i] / m[i]; + float half_dt = dt * 0.5f; + // t = (q/m) * B * (dt/2) + float tx = qm * Bx * half_dt; + float ty = qm * By * half_dt; + float tz = qm * Bz * half_dt; + + float vx = vel[i * 3 + 0], vy = vel[i * 3 + 1], vz = vel[i * 3 + 2]; + // v_prime = v + v x t + float vpx, vpy, vpz; + cross(vx, vy, vz, tx, ty, tz, &vpx, &vpy, &vpz); + vpx += vx; vpy += vy; vpz += vz; + // s = 2*t / (1 + t·t) + float t2 = tx*tx + ty*ty + tz*tz; + float sx = 2.f * tx / (1.f + t2); + float sy = 2.f * ty / (1.f + t2); + float sz = 2.f * tz / (1.f + t2); + // v_plus = v + v_prime x s + float vqx, vqy, vqz; + cross(vpx, vpy, vpz, sx, sy, sz, &vqx, &vqy, &vqz); + vx += vqx; vy += vqy; vz += vqz; + vel[i * 3 + 0] = vx; vel[i * 3 + 1] = vy; vel[i * 3 + 2] = vz; + // x_new = x + v_plus * dt + pos[i * 3 + 0] += vx * dt; + pos[i * 3 + 1] += vy * dt; + pos[i * 3 + 2] += vz * dt; +} + +// Trilinear interpolation: B at (x,y,z) from grid (nx,ny,nz), origin (ox,oy,oz), spacing (dx,dy,dz) +__device__ void sample_B_grid(const float* Bgrid, int nx, int ny, int nz, + float ox, float oy, float oz, float dx, float dy, float dz, + float x, float y, float z, float* Bx, float* By, float* Bz) { + float fx = (x - ox) / dx; + float fy = (y - oy) / dy; + float fz = (z - oz) / dz; + int i0 = (int)floorf(fx); int j0 = (int)floorf(fy); int k0 = (int)floorf(fz); + if (i0 < 0) i0 = 0; if (i0 > nx - 2) i0 = nx - 2; + if (j0 < 0) j0 = 0; if (j0 > ny - 2) j0 = ny - 2; + if (k0 < 0) k0 = 0; if (k0 > nz - 2) k0 = nz - 2; + int i1 = i0 + 1, j1 = j0 + 1, k1 = k0 + 1; + float wx = fx - i0, wy = fy - j0, wz = fz - k0; + auto idx = [nx, ny](int i, int j, int k) { return ((size_t)k * ny + j) * nx + i; }; + auto get = [Bgrid, idx, nx, ny, nz](int i, int j, int k, int c) { + return Bgrid[(idx(i, j, k) * 3) + c]; + }; + float b000_x = get(i0, j0, k0, 0), b000_y = get(i0, j0, k0, 1), b000_z = get(i0, j0, k0, 2); + float b100_x = get(i1, j0, k0, 0), b100_y = get(i1, j0, k0, 1), b100_z = get(i1, j0, k0, 2); + float b010_x = get(i0, j1, k0, 0), b010_y = get(i0, j1, k0, 1), b010_z = get(i0, j1, k0, 2); + float b110_x = get(i1, j1, k0, 0), b110_y = get(i1, j1, k0, 1), b110_z = get(i1, j1, k0, 2); + float b001_x = get(i0, j0, k1, 0), b001_y = get(i0, j0, k1, 1), b001_z = get(i0, j0, k1, 2); + float b101_x = get(i1, j0, k1, 0), b101_y = get(i1, j0, k1, 1), b101_z = get(i1, j0, k1, 2); + float b011_x = get(i0, j1, k1, 0), b011_y = get(i0, j1, k1, 1), b011_z = get(i0, j1, k1, 2); + float b111_x = get(i1, j1, k1, 0), b111_y = get(i1, j1, k1, 1), b111_z = get(i1, j1, k1, 2); + float omwz = 1.f - wz; + float c00_x = (1.f - wx) * b000_x + wx * b100_x, c00_y = (1.f - wx) * b000_y + wx * b100_y, c00_z = (1.f - wx) * b000_z + wx * b100_z; + float c10_x = (1.f - wx) * b010_x + wx * b110_x, c10_y = (1.f - wx) * b010_y + wx * b110_y, c10_z = (1.f - wx) * b010_z + wx * b110_z; + float c01_x = (1.f - wx) * b001_x + wx * b101_x, c01_y = (1.f - wx) * b001_y + wx * b101_y, c01_z = (1.f - wx) * b001_z + wx * b101_z; + float c11_x = (1.f - wx) * b011_x + wx * b111_x, c11_y = (1.f - wx) * b011_y + wx * b111_y, c11_z = (1.f - wx) * b011_z + wx * b111_z; + float c0_x = (1.f - wy) * c00_x + wy * c10_x, c0_y = (1.f - wy) * c00_y + wy * c10_y, c0_z = (1.f - wy) * c00_z + wy * c10_z; + float c1_x = (1.f - wy) * c01_x + wy * c11_x, c1_y = (1.f - wy) * c01_y + wy * c11_y, c1_z = (1.f - wy) * c01_z + wy * c11_z; + *Bx = omwz * c0_x + wz * c1_x; + *By = omwz * c0_y + wz * c1_y; + *Bz = omwz * c0_z + wz * c1_z; +} + +__global__ void integrate_kernel_grid(float* pos, float* vel, const float* q, const float* m, + const float* Bgrid, int nx, int ny, int nz, + float ox, float oy, float oz, float dx, float dy, float dz, + float dt, int n) { + int i = blockIdx.x * blockDim.x + threadIdx.x; + if (i >= n) return; + float Bx, By, Bz; + sample_B_grid(Bgrid, nx, ny, nz, ox, oy, oz, dx, dy, dz, + pos[i * 3 + 0], pos[i * 3 + 1], pos[i * 3 + 2], &Bx, &By, &Bz); + float qm = q[i] / m[i]; + float half_dt = dt * 0.5f; + float tx = qm * Bx * half_dt, ty = qm * By * half_dt, tz = qm * Bz * half_dt; + float vx = vel[i * 3 + 0], vy = vel[i * 3 + 1], vz = vel[i * 3 + 2]; + float vpx, vpy, vpz; + cross(vx, vy, vz, tx, ty, tz, &vpx, &vpy, &vpz); + vpx += vx; vpy += vy; vpz += vz; + float t2 = tx*tx + ty*ty + tz*tz; + float sx = 2.f * tx / (1.f + t2), sy = 2.f * ty / (1.f + t2), sz = 2.f * tz / (1.f + t2); + float vqx, vqy, vqz; + cross(vpx, vpy, vpz, sx, sy, sz, &vqx, &vqy, &vqz); + vx += vqx; vy += vqy; vz += vqz; + vel[i * 3 + 0] = vx; vel[i * 3 + 1] = vy; vel[i * 3 + 2] = vz; + pos[i * 3 + 0] += vx * dt; pos[i * 3 + 1] += vy * dt; pos[i * 3 + 2] += vz * dt; +} + +__global__ void record_kernel(const float* pos, float* traj, int n, int record_index, int rr) { + int i = blockIdx.x * blockDim.x + threadIdx.x; + if (i >= n) return; + int base = (i * rr + record_index) * 3; + traj[base + 0] = pos[i * 3 + 0]; + traj[base + 1] = pos[i * 3 + 1]; + traj[base + 2] = pos[i * 3 + 2]; +} + +// ----------------------------------------------------------------------------- +// Main +// ----------------------------------------------------------------------------- +int main(int argc, char** argv) { + const char* particles_path = (argc > 1) ? argv[1] : "data/particles.txt"; + const char* field_path = (argc > 2) ? argv[2] : "data/field.txt"; + const char* params_path = (argc > 3) ? argv[3] : "data/params.txt"; + const char* output_path = (argc > 4) ? argv[4] : "out.bin"; + + if (!load_particles(particles_path)) return 1; + if (!load_field(field_path)) return 1; + if (!load_params(params_path)) return 1; + + int num_steps = params.num_steps; + int record_interval = params.record_interval; + int rr = num_steps / record_interval; + if (rr < 1) rr = 1; + + printf("Particles: %d, steps: %d, record_interval: %d, record points: %d\n", + N, num_steps, record_interval, rr); + + // Device alloc + float *d_pos = nullptr, *d_vel = nullptr, *d_q = nullptr, *d_m = nullptr, *d_traj = nullptr; + float* d_Bgrid = nullptr; + cudaMalloc(&d_pos, N * 3 * sizeof(float)); + cudaMalloc(&d_vel, N * 3 * sizeof(float)); + cudaMalloc(&d_q, N * sizeof(float)); + cudaMalloc(&d_m, N * sizeof(float)); + cudaMalloc(&d_traj, (size_t)N * rr * 3 * sizeof(float)); + if (use_B_grid && !h_Bgrid.empty()) { + cudaMalloc(&d_Bgrid, h_Bgrid.size() * sizeof(float)); + cudaMemcpy(d_Bgrid, h_Bgrid.data(), h_Bgrid.size() * sizeof(float), cudaMemcpyHostToDevice); + } + + cudaMemcpy(d_pos, h_pos.data(), N * 3 * sizeof(float), cudaMemcpyHostToDevice); + cudaMemcpy(d_vel, h_vel.data(), N * 3 * sizeof(float), cudaMemcpyHostToDevice); + cudaMemcpy(d_q, h_q.data(), N * sizeof(float), cudaMemcpyHostToDevice); + cudaMemcpy(d_m, h_m.data(), N * sizeof(float), cudaMemcpyHostToDevice); + + int block = 256; + int grid = (N + block - 1) / block; + int record_count = 0; + + cudaEvent_t start_ev, stop_ev; + cudaEventCreate(&start_ev); + cudaEventCreate(&stop_ev); + + for (int step = 0; step < num_steps; step++) { + if (step % record_interval == 0) { + record_kernel<<>>(d_pos, d_traj, N, record_count, rr); + if (record_count == 0) { + cudaEventRecord(start_ev); // 第一步 record 完成后,即第一步积分 kernel 开始前 + } + if (step == (rr - 1) * record_interval) { + cudaEventRecord(stop_ev); // 最后一个 record 结束 + } + record_count++; + } + if (use_B_grid && d_Bgrid) { + integrate_kernel_grid<<>>(d_pos, d_vel, d_q, d_m, d_Bgrid, + g_nx, g_ny, g_nz, g_ox, g_oy, g_oz, g_dx, g_dy, g_dz, + (float)params.dt, N); + } else { + integrate_kernel<<>>(d_pos, d_vel, d_q, d_m, + h_B[0], h_B[1], h_B[2], (float)params.dt, N); + } + } + cudaEventSynchronize(stop_ev); + float gpu_ms = 0.f; + cudaEventElapsedTime(&gpu_ms, start_ev, stop_ev); + cudaEventDestroy(stop_ev); + cudaEventDestroy(start_ev); + + double sim_time_sec = gpu_ms * 1e-3; + // 计时区间为「第一步积分 kernel 开始 → 最后一个 record 结束」,区间内积分步数为 (rr-1)*record_interval + int steps_timed = (rr >= 1) ? ((rr - 1) * record_interval) : 0; + double particle_steps_per_sec = (steps_timed > 0 && sim_time_sec > 0) + ? (N * (double)steps_timed) / sim_time_sec + : (N * (double)num_steps) / sim_time_sec; + size_t trajectory_bytes = sizeof(int) * 2 + (size_t)N * rr * 3 * sizeof(float); + printf("Simulation GPU time: %.4f s\n", sim_time_sec); + printf("Particle-step rate: %.2e (particle·steps/s)\n", particle_steps_per_sec); + printf("Trajectory data size: %zu bytes (%.2f MB)\n", + trajectory_bytes, trajectory_bytes / (1024.0 * 1024.0)); + + // D2H and write binary + std::vector h_traj(N * rr * 3); + cudaMemcpy(h_traj.data(), d_traj, h_traj.size() * sizeof(float), cudaMemcpyDeviceToHost); + + FILE* out = fopen(output_path, "wb"); + if (out) { + int pp = N, rr_out = rr; + fwrite(&pp, sizeof(int), 1, out); + fwrite(&rr_out, sizeof(int), 1, out); + fwrite(h_traj.data(), sizeof(float), h_traj.size(), out); + fclose(out); + printf("Wrote %s (PP=%d RR=%d)\n", output_path, pp, rr_out); + } + + if (d_Bgrid) cudaFree(d_Bgrid); + cudaFree(d_traj); cudaFree(d_m); cudaFree(d_q); cudaFree(d_vel); cudaFree(d_pos); + return 0; +} diff --git a/09_particle_sim/nagadoyuji/verify_single_particle.py b/09_particle_sim/nagadoyuji/verify_single_particle.py new file mode 100644 index 0000000..8165e61 --- /dev/null +++ b/09_particle_sim/nagadoyuji/verify_single_particle.py @@ -0,0 +1,81 @@ +#!/usr/bin/env python3 +""" +Correctness check: single particle in uniform B should move in a helix. +Theory: omega = |q|*B/m, r_gyro = v_perp / omega. +Load trajectory from binary, compute numerical radius, compare to theory. +""" +import sys +import numpy as np + +def load_trajectory(path): + with open(path, "rb") as f: + pp = np.frombuffer(f.read(4), dtype=np.int32)[0] + rr = np.frombuffer(f.read(4), dtype=np.int32)[0] + data = np.fromfile(f, dtype=np.float32, count=pp * rr * 3) + return data.reshape(pp, rr, 3) + +def load_particle_and_field(particles_path, field_path): + with open(particles_path) as f: + line = f.readline() + parts = line.split() + x, y, z = float(parts[0]), float(parts[1]), float(parts[2]) + vx, vy, vz = float(parts[3]), float(parts[4]), float(parts[5]) + q, m = float(parts[6]), float(parts[7]) + with open(field_path) as f: + line = f.readline() + Bx, By, Bz = [float(x) for x in line.split()] + return (x, y, z), (vx, vy, vz), q, m, (Bx, By, Bz) + +def main(): + traj_path = sys.argv[1] if len(sys.argv) > 1 else "out.bin" + particles_path = sys.argv[2] if len(sys.argv) > 2 else "data/particles.txt" + field_path = sys.argv[3] if len(sys.argv) > 3 else "data/field.txt" + + data = load_trajectory(traj_path) + if data.shape[0] != 1: + print("Expected single-particle trajectory; got", data.shape[0], "particles") + sys.exit(1) + pos = data[0] + rr = pos.shape[0] + + (x0, y0, z0), (vx, vy, vz), q, m, (Bx, By, Bz) = load_particle_and_field( + particles_path, field_path + ) + B = np.sqrt(Bx*Bx + By*By + Bz*Bz) + if B < 1e-30: + print("B is zero, no gyration") + sys.exit(0) + v_parallel = (vx*Bx + vy*By + vz*Bz) / B + v_perp = np.sqrt(max(0, vx*vx + vy*vy + vz*vz - v_parallel*v_parallel)) + omega_theory = abs(q) * B / m + r_gyro_theory = v_perp / omega_theory if omega_theory > 1e-30 else 0.0 + + cx, cy, cz = pos[:, 0].mean(), pos[:, 1].mean(), pos[:, 2].mean() + dx = pos[:, 0] - cx + dy = pos[:, 1] - cy + dz = pos[:, 2] - cz + dot = (dx*Bx + dy*By + dz*Bz) / (B * B) + px = dx - dot * Bx + py = dy - dot * By + pz = dz - dot * Bz + r_numerical = np.sqrt(px*px + py*py + pz*pz).mean() + T_gyro = 2 * np.pi / omega_theory + dt_max = T_gyro / 20.0 + + print("--- Single particle in uniform B ---") + print("Theory: omega = |q|*B/m =", omega_theory, "rad/s") + print(" T_gyro = 2*pi/omega =", T_gyro, "s") + print(" r_gyro = v_perp/omega =", r_gyro_theory, "m") + print("Numerical mean gyro radius (in plane):", r_numerical, "m") + err_r = abs(r_numerical - r_gyro_theory) / (r_gyro_theory + 1e-30) + print("Relative error in radius:", err_r) + if err_r < 0.2: + print("PASS: radius within 20% of theory") + else: + print("CHECK: radius error large.") + print(" Use dt < %.2e s (e.g. params_verify.txt)." % dt_max) + print(" Re-run: ./particle_sim data/particles_1.txt data/field.txt data/params_verify.txt out.bin") + print(" Then: python verify_single_particle.py out.bin data/particles_1.txt data/field.txt") + +if __name__ == "__main__": + main() diff --git a/09_particle_sim/nagadoyuji/visualize.py b/09_particle_sim/nagadoyuji/visualize.py new file mode 100644 index 0000000..2298c1c --- /dev/null +++ b/09_particle_sim/nagadoyuji/visualize.py @@ -0,0 +1,82 @@ +#!/usr/bin/env python3 +""" +Read binary trajectory file from particle_sim and plot 3D animated trajectories. +Usage: python visualize.py [--tail N] [--every K] + --tail N: draw trail of last N frames per particle (default 20) + --every K: show only every K-th particle to reduce clutter (default 1) +""" +import sys +import argparse +import numpy as np +import matplotlib.pyplot as plt +from mpl_toolkits.mplot3d import Axes3D +from matplotlib import animation + +def load_trajectory(path): + with open(path, "rb") as f: + pp = np.frombuffer(f.read(4), dtype=np.int32)[0] + rr = np.frombuffer(f.read(4), dtype=np.int32)[0] + data = np.fromfile(f, dtype=np.float32, count=pp * rr * 3) + data = data.reshape(pp, rr, 3) + return data # shape (num_particles, num_records, 3) + +def main(): + ap = argparse.ArgumentParser() + ap.add_argument("traj_file", nargs="?", default="out.bin", help="Binary trajectory file") + ap.add_argument("--tail", type=int, default=20, help="Trail length (frames)") + ap.add_argument("--every", type=int, default=1, help="Plot every K-th particle") + args = ap.parse_args() + + data = load_trajectory(args.traj_file) + pp, rr, _ = data.shape + print(f"Particles: {pp}, Record points: {rr}") + + indices = np.arange(0, pp, args.every) + data_plot = data[indices] + n_show = len(indices) + + fig = plt.figure(figsize=(10, 8)) + ax = fig.add_subplot(111, projection="3d") + + tail = min(args.tail, rr) + pts = ax.scatter([], [], [], c="b", s=5, alpha=0.8) + lines = [ax.plot([], [], [], "b-", alpha=0.4, linewidth=0.5)[0] for _ in range(n_show)] + + def bounds(): + mn = data_plot.min(axis=(0, 1)) + mx = data_plot.max(axis=(0, 1)) + margin = max((mx - mn).max() * 0.1, 1e-10) + return mn - margin, mx + margin + + mn, mx = bounds() + ax.set_xlim(mn[0], mx[0]) + ax.set_ylim(mn[1], mx[1]) + ax.set_zlim(mn[2], mx[2]) + ax.set_xlabel("x (m)") + ax.set_ylabel("y (m)") + ax.set_zlabel("z (m)") + ax.set_title("Particle trajectories (3D)") + + def animate(frame): + f = frame % rr + x = data_plot[:, f, 0] + y = data_plot[:, f, 1] + z = data_plot[:, f, 2] + pts._offsets3d = (x, y, z) + start = max(0, f - tail) + for i in range(n_show): + lines[i].set_data( + data_plot[i, start : f + 1, 0], + data_plot[i, start : f + 1, 1], + ) + lines[i].set_3d_properties(data_plot[i, start : f + 1, 2]) + return [pts] + lines + + anim = animation.FuncAnimation( + fig, animate, frames=rr, interval=50, blit=False, repeat=True + ) + plt.show() + return 0 + +if __name__ == "__main__": + sys.exit(main()) diff --git "a/09_particle_sim/nagadoyuji/\346\200\273\347\273\223\346\212\245\345\221\212.md" "b/09_particle_sim/nagadoyuji/\346\200\273\347\273\223\346\212\245\345\221\212.md" new file mode 100644 index 0000000..0f4cfbf --- /dev/null +++ "b/09_particle_sim/nagadoyuji/\346\200\273\347\273\223\346\212\245\345\221\212.md" @@ -0,0 +1,196 @@ +# 粒子运动 CUDA 模拟 — 总结报告 + +**选题**:09_particle_sim +**提交 ID**:nagadoyuji + +--- + +## 一、实现思路 + +### 1.1 整体流程 + +程序完成「输入解析 → GPU 积分 → 按间隔记录轨迹 → 回传主机并写二进制 → Python 可视化」的完整管线。 + +- **主机端**:读取粒子初始状态(x,y,z,vx,vy,vz,q,m)、磁场配置(均匀 B 或 3D 网格路径)、模拟参数(dt, num_steps, record_interval);分配主机与设备内存,H2D 拷贝后进入时间步循环。 +- **设备端**:每个时间步先按需将当前粒子位置写入轨迹缓冲(record_kernel),再调用积分 kernel(integrate_kernel 或 integrate_kernel_grid)更新速度与位置。 +- **输出**:将轨迹 D2H 后按约定格式写二进制(PP、RR 两个 int,再 PP×RR×3 个 float,按粒子顺序存储),供 Python 读取并做 3D 动画。 + +### 1.2 物理与数值方法 + +- **运动方程**:仅考虑磁场洛伦兹力,\( \vec{F} = q\vec{v}\times\vec{B} \),即 \( d\vec{v}/dt = (q/m)\vec{v}\times\vec{B} \),\( d\vec{x}/dt = \vec{v} \)。 +- **数值积分**:采用 **Boris 推进**(仅磁场项)。每步先计算 \( \vec{t} = (q/m)\vec{B}(dt/2) \),再 \( \vec{v}' = \vec{v} + \vec{v}\times\vec{t} \),\( \vec{s} = 2\vec{t}/(1+|\vec{t}|^2) \),\( \vec{v}^+ = \vec{v} + \vec{v}'\times\vec{s} \),最后用 \( \vec{v}^+ \) 更新位置。Boris 在纯磁场下保持动能,数值稳定。 +- **均匀场**:B 为常量,直接传入 kernel。 +- **3D 磁场网格**:从二进制文件读入 nx×ny×nz 格点上的 B,在 kernel 内根据粒子位置做**三线性插值**得到 \( \vec{B}(\vec{x}) \),再代入同一套 Boris 公式。 + +### 1.3 数据结构与并行 + +- **粒子数据**:SoA(Structure of Arrays)。主机与设备上均为 `pos[3*N]`、`vel[3*N]`、`q[N]`、`m[N]`,便于连续访问。 +- **轨迹缓冲**:设备上 `traj[PP * RR * 3]`,布局为「粒子 0 的 RR 个点 (x,y,z),再粒子 1…」。 +- **并行策略**:一个 thread 负责一个粒子;block 大小 256,grid 覆盖 N 个粒子。积分与记录 kernel 均为无分支的逐粒子独立计算,利于占用与带宽。 + +### 1.4 输入输出约定 + +- **粒子文件**:每行 8 个数 `x y z vx vy vz q m`(米、米/秒、库仑、千克)。 +- **磁场文件**:一行 `Bx By Bz`,或首行 `grid` + 第二行网格文件路径。 +- **参数文件**:`dt = ...`、`num_steps = ...`、`record_interval = ...`,支持 `#` 注释。 +- **轨迹二进制**:2 个 int(PP, RR)+ PP×RR×3 个 float(按粒子顺序),与题目要求一致。 + +--- + +## 二、优化方法与实践 + +### 2.1 已采用的实现选择 + +- **Boris 算法**:在保证精度的前提下避免显式欧拉带来的能量漂移,适合长时间积分。 +- **float 统一**:位置、速度、磁场、轨迹全程 float,减少带宽与存储,与 Python 端读取一致。 +- **记录策略**:在每步积分**之前**记录,使第 0 个记录点为初始状态;仅当 `step % record_interval == 0` 时写轨迹,减少 kernel 启动与写量。 +- **3D 网格**:三线性插值在设备端一次完成,按 (i,j,k) 线性索引访问,边界外 clamp 到网格内,避免越界。 + +### 2.2 可进一步尝试的优化(未实现) + +- **流式与重叠**:将轨迹 D2H 与下一段积分用 CUDA stream 重叠,或对轨迹分块 D2H 与写文件。 +- **Pinned 内存**:轨迹回传若使用 pinned host 内存,可提高 D2H 带宽。 +- **Block 大小**:可对具体 GPU 试 128/256/512 比较占用与延迟。 +- **双精度**:若需更高精度可对关键量改用 double(需与二进制格式约定一致)。 + +--- + +## 三、开发中的问题与解决 + +1. **单粒子验证半径误差大** + - 现象:用默认 `params.txt`(dt=1e-10)跑单粒子时,数值回旋半径远大于理论。 + - 原因:电子在 1 T 下回旋周期约 3.6e-11 s,dt=1e-10 时每周期不足 1 步,积分失真。 + - 解决:提供 `params_verify.txt`(dt=1e-12),验证脚本在误差大时提示改用该参数并给出推荐 dt 上界。 + +2. **两粒子可视化“近似不动”** + - 原因:两粒子初始 y 相距 1 m,而回旋半径约 5.7 μm,坐标轴被 1 m 主导,圆运动在图上几乎看不出。 + - 解决:在使用说明中建议将两粒子初始位置靠近(如同处 μm 量级)或缩小可视化范围以便观察。 + +3. **3D 网格边界** + - 粒子跑出网格时采用 clamp 到最近网格内单元,保证插值不越界;若需更物理的行为可改为外推或周期边界并在报告中说明。 + +--- + +## 四、性能指标与分析 + +### 4.1 测量方式 + +- 使用 `cudaEvent` 记录**从第一步积分 kernel 开始到最后一个 record 结束**的 GPU 时间(不含 H2D 初始拷贝与最后 D2H/写文件)。 + - `start_ev` 在第一个 `record_kernel` 完成后、第一个 `integrate_kernel` 启动前记录; + - `stop_ev` 在最后一个 `record_kernel` 完成后记录。 + 因此计时区间内的积分步数为 \( (RR-1) \times \texttt{record\_interval} \)。 +- 粒子·步/秒 = \( N \times \texttt{计时区间内积分步数} / \texttt{GPU时间(s)} \)(与程序打印一致)。 +- 轨迹数据量:\( 8 + PP \times RR \times 3 \times 4 \) 字节。 + +### 4.2 典型结果 + +在 **NVIDIA GPU:_________**,**CUDA 版本:_________** 下运行后,将程序打印的三项填入下表: + +| 规模 (N, steps, record_interval) | GPU 时间 (s) | 粒子·步/秒 | 轨迹大小 (MB) | +|----------------------------------|--------------|------------|----------------| +| 1000, 10000, 100 | (见下方命令运行后填) | (同上) | ≈1.2 | +| 10000, 10000, 100 | (同上) | (同上) | ≈12 | + +**运行命令示例**(在 `nagadoyuji` 目录下): + +```bash +make +python generate_particles.py -n 1000 -o data/particles_1k.txt +./particle_sim data/particles_1k.txt data/field.txt data/params.txt out.bin + +python generate_particles.py -n 10000 -o data/particles_10k.txt +./particle_sim data/particles_10k.txt data/field.txt data/params.txt out.bin +``` + +程序会打印:`Simulation GPU time`、`Particle-step rate`、`Trajectory data size`,将前两项及轨迹大小(或按公式 \(8 + PP\times RR\times 12\) 字节)填入上表即可。 + +### 4.3 简要分析 + +- 主要耗时在积分与记录 kernel 的多次启动及全局内存访问;粒子数较大时 GPU 利用率更高,粒子·步/秒随 N 增大而提升。 +- 与纯 CPU 单线程逐粒子 Boris 相比,CUDA 并行在 N 较大时预期有数量级以上的加速;CPU 端可用相同算法与数据做对照计时以写进报告。 + +--- + +## 五、未来可提升的地方 + +- **性能**:引入 CUDA stream、pinned 内存、多 GPU 按粒子划分等。 +- **物理**:支持电场、粒子间作用、更复杂边界与磁场外推策略。 +- **工具**:使用 **NVIDIA Nsight Systems (nsys)** 做时间线分析,**NVIDIA Nsight Compute (ncu)** 做 kernel 级指标(占用率、内存吞吐等),将截图或关键指标写入本报告可加分。 +- **正确性**:增加回旋频率的数值验证(如 FFT 测频)与多粒子守恒量检查。 + +--- + +## 六、ncu / nsys 使用与分析 + +### 6.1 使用方式 + +**Nsight Systems (nsys)**:统计各 kernel 与 API 耗时占比,适合看整体时间线。 + +- Linux/WSL: + ```bash + cd 09_particle_sim/nagadoyuji + make && python generate_particles.py -n 1000 -o data/particles_1k.txt + nsys profile --stats=true ./particle_sim data/particles_1k.txt data/field.txt data/params.txt out.bin + ``` +- Windows(若已安装 Nsight Systems 并加入 PATH): + ```cmd + nsys profile --stats=true particle_sim.exe data/particles_1k.txt data/field.txt data/params.txt out.bin + ``` +- 输出:生成 `report.nsys-rep` 及终端汇总表。关注 **CUDA Kernel** 中 `integrate_kernel`、`record_kernel` 的 **Total Time** 占比。 + +**Nsight Compute (ncu)**:单 kernel 级指标(SM 占用率、内存吞吐、warp 执行效率等)。 + +- Linux/WSL: + ```bash + ncu --set roofline -o ncu_integrate ./particle_sim data/particles_1k.txt data/field.txt data/params.txt out.bin + # 或仅 basic:ncu --set basic -o ncu_basic ./particle_sim ... + ``` +- Windows: + ```cmd + ncu --set basic -o ncu_report particle_sim.exe data/particles_1k.txt data/field.txt data/params.txt out.bin + ``` +- 若出现 **ERR_NVGPUCTRPERM**:表示当前用户无权限访问 GPU 性能计数器。可 (1) 在 Linux 下运行 ncu;(2) 或以管理员身份运行;(3) 或参考 [NVIDIA ERR_NVGPUCTRPERM](https://developer.nvidia.com/ERR_NVGPUCTRPERM) 配置驱动/权限后再测。 + +### 6.2 关注指标 + +| 指标 | 含义 | 本程序预期 | +|------|------|------------| +| **SM Occupancy** | 每 SM 活跃 warp 与理论最大之比 | integrate_kernel 为逐粒子计算,N=1000 时 grid 较小,占用率中等;N 增大可提高占用率。 | +| **Memory Throughput** | 显存读写带宽 (GB/s) | 受 pos/vel/q/m 的全局访问主导,与粒子数、步数成正比。 | +| **Kernel Time** | 单次 launch 耗时 | record_kernel 仅写 3×float,integrate_kernel 为 Boris 运算+读写,后者应占绝大部分时间。 | +| **Roofline** | 算力/带宽瓶颈 | 本 kernel 为轻量运算+多次全局 load/store,通常属**带宽受限**。 | + +### 6.3 nsys 报告摘要示例 + +运行 `nsys profile --stats=true` 后,终端会输出类似下表(具体数值以本机为准): + +``` + Time(%) Total Time (ns) Instances Avg (ns) Name + ------- ---------------- --------- ---------- ---- + xx.x xxxxxxxxxxx 100 xxxxxxx integrate_kernel + x.x xxxxxxxxxx 100 xxxxxx record_kernel + ... +``` + +**结论**:integrate_kernel 应占据绝大部分 GPU 时间;record_kernel 因写入量小、调用次数与 record_interval 相关,占比相对较低。若需进一步优化,可优先考虑减少全局内存访问或提高 integrate_kernel 的占用率(如增大 block 或合并访问)。 + +### 6.4 ncu 报告摘要示例 + +在具备 GPU 计数器权限的环境下运行 `ncu --set basic` 或 `--set roofline` 后,用 `ncu -i ncu_report.ncu-rep` 查看或导出 CSV。典型关注列: + +- **Occupancy**:SM 占用率。 +- **DRAM Throughput**:显存带宽。 +- **Compute Throughput**:算力利用率。 + +**结论**:本程序 kernel 为内存访问密集型;在 N=1000、10000 步量级下,主要瓶颈在全局内存带宽与 kernel launch 开销。增大粒子数 N 可提高单次 kernel 的有效计算密度与占用率。 + +--- + +## 七、附录:文件清单与运行清单 + +- **代码**:main.cu, Makefile, .clang-format +- **脚本**:visualize.py, verify_single_particle.py, generate_particles.py, generate_B_grid.py +- **数据**:data/particles.txt, particles_1.txt, field.txt, params.txt, params_verify.txt +- **文档**:README.md, 总结报告.md + +**验证与可视化命令**(见 README.md)。 diff --git "a/09_particle_sim/nagadoyuji/\346\217\220\344\272\244\350\257\264\346\230\216.txt" "b/09_particle_sim/nagadoyuji/\346\217\220\344\272\244\350\257\264\346\230\216.txt" new file mode 100644 index 0000000..d526e1d --- /dev/null +++ "b/09_particle_sim/nagadoyuji/\346\217\220\344\272\244\350\257\264\346\230\216.txt" @@ -0,0 +1,6 @@ +提交要求(Learning-CUDA 仓库): +1. Fork Learning-CUDA 的 2025-winter-project 分支(若仓库以该分支为准,否则按项目文档使用 project 等分支)。 +2. 将本目录(09_particle_sim/nagadoyuji/)下的全部内容放入你 fork 的仓库对应路径:/<选题>/<你的ID>/ 即 09_particle_sim/nagadoyuji/。 +3. 通过 PR 提交。 + +本目录已包含:完整程序源码、示例数据、可视化与验证脚本、README、总结报告(总结报告.md)。无临时测试代码,命名与格式已统一,关键处已加注释。代码格式化可执行 make format(需安装 clang-format)。