When using the LUCI Go CAS client (cas.exe) to archive .isolate files, all resources appear to be uploaded successfully to the CAS server. However, the uploaded files on the server side are incomplete (missing data).
After investigating, the issue seems to originate from the NativeLink CAS server implementation, specifically in:
nativelink-store/src/filesystem_store.rs
Observed flow:
update_oneshot()
→ write temp file ✅
→ sync ✅
→ drop file ✅
→ call emplace_file()
→ spawn background task (rename not finished yet ❗)
→ return OK 🚨
Meanwhile, on the client side:
cas.exe
→ starts downloading immediately
→ calls get_part_unchunked()
Problem:
There is a race condition between upload completion and file availability:
update_oneshot() returns OK before the file is fully moved (renamed) to its final location.
The emplace_file() function spawns a background task to perform the rename, but does not wait for it to complete.
The client begins downloading immediately after receiving success.
At this point, the file may still be in the temporary path or the rename operation is not finished.
Result:
Clients may read incomplete or partially written files from the CAS server.
Expected behavior:
The server should only return success after:
The file has been fully written,
Synced to disk,
And atomically moved (rename completed) to its final location.
Suggested fix:
Ensure the rename operation in emplace_file() is completed synchronously before returning success,
or
Introduce a mechanism to guarantee file visibility/consistency before allowing reads.
Environment:
LUCI Go CAS client (cas.exe)
NativeLink CAS server
Let me know if more logs or a minimal reproduction are needed.
When using the LUCI Go CAS client (cas.exe) to archive .isolate files, all resources appear to be uploaded successfully to the CAS server. However, the uploaded files on the server side are incomplete (missing data).
After investigating, the issue seems to originate from the NativeLink CAS server implementation, specifically in:
nativelink-store/src/filesystem_store.rs
Observed flow:
update_oneshot()
→ write temp file ✅
→ sync ✅
→ drop file ✅
→ call emplace_file()
→ spawn background task (rename not finished yet ❗)
→ return OK 🚨
Meanwhile, on the client side:
cas.exe
→ starts downloading immediately
→ calls get_part_unchunked()
Problem:
There is a race condition between upload completion and file availability:
update_oneshot() returns OK before the file is fully moved (renamed) to its final location.
The emplace_file() function spawns a background task to perform the rename, but does not wait for it to complete.
The client begins downloading immediately after receiving success.
At this point, the file may still be in the temporary path or the rename operation is not finished.
Result:
Clients may read incomplete or partially written files from the CAS server.
Expected behavior:
The server should only return success after:
The file has been fully written,
Synced to disk,
And atomically moved (rename completed) to its final location.
Suggested fix:
Ensure the rename operation in emplace_file() is completed synchronously before returning success,
or
Introduce a mechanism to guarantee file visibility/consistency before allowing reads.
Environment:
LUCI Go CAS client (cas.exe)
NativeLink CAS server
Let me know if more logs or a minimal reproduction are needed.