Implement the semi-space copying garbage collector#13107
Implement the semi-space copying garbage collector#13107fitzgen merged 14 commits intobytecodealliance:mainfrom
Conversation
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "fuzzing", "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
alexcrichton
left a comment
There was a problem hiding this comment.
Overall I think this looks fantastic, thanks so much!
I've got some minor comments below, but I'm also happy to defer anything to issues as you see fit. My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?
|
Oh, also, to kick the tire some more I might recommend locally making the copying collector the default (instead of drc) and running the full test suite. Iunno if that'll turn up much but might be good as a first pass |
Not totally clear to me right now. A nursery probably isn't worth it for our typical short-running programs, since we can mostly just avoid collecting in those cases, and would only slow down wasm execution via the write barriers it would require. (Also, our amortized GC heap growth algorithm effectively means we have N nursery collections on the way to the full heap size, if you squint hard.) But someone with long-running wasm instance use cases may indeed want a nursery. But then we also wouldn't want to force a nursery on the short-running use cases that don't need it. Seems like the kind of bridge we can cross when we get to it. I wouldn't go so far as to say that we should expect it to happen with enough certainty that a tracking issue is required at this time. |
|
Adding this to the merge queue now, but we can definitely continue with some of these other bits in follow ups. |
Passive element segments were registered as GC roots by converting a `*mut ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian inside `ValRaw`, regardless of the target architecture's endianness, this cast is only valid on little endian systems. This bug was not exposed until the introduction of the copying collector in bytecodealliance#13107 The fix that this commit makes is to add another kind of GC root for `ValRaw`, where we can get and set the GC root using `ValRaw` internally, to ensure that the endianness (and incidentally also the GC ref offsets within the `ValRaw`) are matched up correctly between the GC root's definition and the collector's use of it. This brings us to three kinds of GC roots: Wasm stack roots, `VMGcRef` roots, and `ValRaw` roots. FWIW, I initially tried to make `VMGcRef` also always store its data as little endian, but this was a larger, more-invasive change and with feedback like bytecodealliance#13193 (comment) suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is non-zero right now). I also investigated making `PassiveElementSegment` an `enum` or either funcrefs or externrefs, similar to what we do for `wasmtime::runtime::vm::Table`. This also led to an outsized amount of code churn and didn't feel like it was paying for itself. Ultimately, I abandoned these approaches, preferring the one taken in this commit instead.
FWIW, I just spun out this issue with some forward thinking ideas/complications related to generational GC and our desire to generally only have O(1)-sized host allocations outside the GC heap: #13231 |
The `rewrite_use` method of the safepoint spiller was not checking for value aliases, and therefore some uses of needs-stack-map values would not be reloaded from their associated stack slot. Note that, because the liveness analysis *does* correctly analyze alias values and will always correctly spill them at safepoints, this could not result in any bug with non-moving collectors (where reloading after safepoints is unnecessary), like those that Wasmtime has today. However, with the introduction of a moving collector in bytecodealliance#13107, this lack-of-reload bug in the safepoint spiller does trigger bugs in Wasmtime (and, reassuringly, our testing and fuzzing infrastructure finds it ~immediately). Uses of a non-reloaded GC reference are stale because the object they previously pointed to moved but the non-reloaded GC reference was not updated to point to the object's new location, resulting in use-after-move bugs. The fix for the safepoint spiller is simple: resolve aliases before rewriting uses. This commit additionally sprinkles some debug assertions throughout all the safepoint spiller code to double check that we have already resolved aliases and are no longer dealing with alias values in, e.g., our current set of live values in the liveness analysis.
ce9c44e to
896b0b0
Compare
|
Rebased on top of Only new commit outside of that is 896b0b0, which just introduces a few additional tests. This should be good to land (after another rebase) once those bug fixes land. |
) * Fix passive element segment GC roots' endianness Passive element segments were registered as GC roots by converting a `*mut ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian inside `ValRaw`, regardless of the target architecture's endianness, this cast is only valid on little endian systems. This bug was not exposed until the introduction of the copying collector in bytecodealliance#13107 The fix that this commit makes is to add another kind of GC root for `ValRaw`, where we can get and set the GC root using `ValRaw` internally, to ensure that the endianness (and incidentally also the GC ref offsets within the `ValRaw`) are matched up correctly between the GC root's definition and the collector's use of it. This brings us to three kinds of GC roots: Wasm stack roots, `VMGcRef` roots, and `ValRaw` roots. FWIW, I initially tried to make `VMGcRef` also always store its data as little endian, but this was a larger, more-invasive change and with feedback like bytecodealliance#13193 (comment) suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is non-zero right now). I also investigated making `PassiveElementSegment` an `enum` or either funcrefs or externrefs, similar to what we do for `wasmtime::runtime::vm::Table`. This also led to an outsized amount of code churn and didn't feel like it was paying for itself. Ultimately, I abandoned these approaches, preferring the one taken in this commit instead. * fix compilation of no-gc builds * review feedback
The `rewrite_use` method of the safepoint spiller was not checking for value aliases, and therefore some uses of needs-stack-map values would not be reloaded from their associated stack slot. Note that, because the liveness analysis *does* correctly analyze alias values and will always correctly spill them at safepoints, this could not result in any bug with non-moving collectors (where reloading after safepoints is unnecessary), like those that Wasmtime has today. However, with the introduction of a moving collector in bytecodealliance#13107, this lack-of-reload bug in the safepoint spiller does trigger bugs in Wasmtime (and, reassuringly, our testing and fuzzing infrastructure finds it ~immediately). Uses of a non-reloaded GC reference are stale because the object they previously pointed to moved but the non-reloaded GC reference was not updated to point to the object's new location, resulting in use-after-move bugs. The fix for the safepoint spiller is simple: resolve aliases before rewriting uses. This commit additionally sprinkles some debug assertions throughout all the safepoint spiller code to double check that we have already resolved aliases and are no longer dealing with alias values in, e.g., our current set of live values in the liveness analysis.
896b0b0 to
720b100
Compare
…nce#13228) The `rewrite_use` method of the safepoint spiller was not checking for value aliases, and therefore some uses of needs-stack-map values would not be reloaded from their associated stack slot. Note that, because the liveness analysis *does* correctly analyze alias values and will always correctly spill them at safepoints, this could not result in any bug with non-moving collectors (where reloading after safepoints is unnecessary), like those that Wasmtime has today. However, with the introduction of a moving collector in bytecodealliance#13107, this lack-of-reload bug in the safepoint spiller does trigger bugs in Wasmtime (and, reassuringly, our testing and fuzzing infrastructure finds it ~immediately). Uses of a non-reloaded GC reference are stale because the object they previously pointed to moved but the non-reloaded GC reference was not updated to point to the object's new location, resulting in use-after-move bugs. The fix for the safepoint spiller is simple: resolve aliases before rewriting uses. This commit additionally sprinkles some debug assertions throughout all the safepoint spiller code to double check that we have already resolved aliases and are no longer dealing with alias values in, e.g., our current set of live values in the liveness analysis.
This is a classic semi-space copying collector that uses bump allocation and Cheney-style worklists to avoid an explicit stack for grey (relocated but not yet scanned) objects outside the GC heap. Forwarding "pointers" (really `VMGcRef` indices) are stored inline in objects in the old semi-space, which again avoids explicit data structures outside of the GC heap. Furthermore, an intrusive linked-list of all `externref`s is maintained to allow for efficiently sweeping their associated host data after collection (and, once again, avoiding additional data structures outside the GC heap). Allocation in compiled Wasm code always happens by calling out to the `gc_alloc_raw` libcall currently. This is expected to become inline bump allocation, very similar to the null collector, in a follow-up commit shortly after this one. It is delayed to make review easier. Collection is not incremental (in the Wasmtime sense, not the GC literature sense) yet and the `CopyingCollection::collect_increment` implementation only has a single increment. This is delayed to make review easier and is also expected to come shortly in follow-up commits. I've also added new disas tests for the copying collector and also enabled running wast tests which need a GC heap with the copying collector. Finally, fuzz config generation can now generate configurations that enable the copying collector.
a85b205 to
2064150
Compare
This is a classic semi-space copying collector that uses bump allocation and Cheney-style worklists to avoid an explicit stack for grey (relocated but not yet scanned) objects outside the GC heap. Forwarding "pointers" (really
VMGcRefindices) are stored inline in objects in the old semi-space, which again avoids explicit data structures outside of the GC heap. Furthermore, an intrusive linked-list of allexternrefs is maintained to allow for efficiently sweeping their associated host data after collection (and, once again, avoiding additional data structures outside the GC heap).Allocation in compiled Wasm code always happens by calling out to the
gc_alloc_rawlibcall currently. This is expected to become inline bump allocation, very similar to the null collector, in a follow-up commit shortly after this one. It is delayed to make review easier.Collection is not incremental (in the Wasmtime sense, not the GC literature sense) yet and the
CopyingCollection::collect_incrementimplementation only has a single increment. This is delayed to make review easier and is also expected to come shortly in follow-up commits.I've also added new disas tests for the copying collector and also enabled running wast tests which need a GC heap with the copying collector.
Finally, fuzz config generation can now generate configurations that enable the copying collector.
Fixes #10329