Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/sembr/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ struct Cli {
/// Modify files that do not comply
overwrite: bool,
/// Applies to lines that are to be split
#[arg(long, default_value_t = 100)]
#[arg(long, default_value_t = 80)]
line_length_limit: usize,
}

Expand Down
128 changes: 72 additions & 56 deletions src/borrow-check/region-inference/member-constraints.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Member constraints

A member constraint `'m member of ['c_1..'c_N]` expresses that the
region `'m` must be *equal* to some **choice regions** `'c_i` (for
some `i`). These constraints cannot be expressed by users, but they
arise from `impl Trait` due to its lifetime capture rules. Consider a
function such as the following:
region `'m` must be *equal* to some **choice regions** `'c_i` (for some `i`).
These constraints cannot be expressed by users, but they
arise from `impl Trait` due to its lifetime capture rules.
Consider a function such as the following:

```rust,ignore
fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. }
```

Here, the true return type (often called the "hidden type") is only
permitted to capture the lifetimes `'a` or `'b`. You can kind of see
permitted to capture the lifetimes `'a` or `'b`.
You can kind of see
this more clearly by desugaring that `impl Trait` return type into its
more explicit form:

Expand All @@ -23,16 +24,17 @@ fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. }
Here, the idea is that the hidden type must be some type that could
have been written in place of the `impl Trait<'x, 'y>` -- but clearly
such a type can only reference the regions `'x` or `'y` (or
`'static`!), as those are the only names in scope. This limitation is
`'static`!), as those are the only names in scope.
This limitation is
then translated into a restriction to only access `'a` or `'b` because
we are returning `MakeReturn<'a, 'b>`, where `'x` and `'y` have been
replaced with `'a` and `'b` respectively.

## Detailed example

To help us explain member constraints in more detail, let's spell out
the `make` example in a bit more detail. First off, let's assume that
you have some dummy trait:
the `make` example in a bit more detail.
First off, let's assume that you have some dummy trait:

```rust,ignore
trait Trait<'a, 'b> { }
Expand All @@ -49,8 +51,8 @@ fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> {
```

What happens in this case is that the return type will be `(&'0 u32, &'1 u32)`,
where `'0` and `'1` are fresh region variables. We will have the following
region constraints:
where `'0` and `'1` are fresh region variables.
We will have the following region constraints:

```txt
'0 live at {L}
Expand All @@ -67,11 +69,11 @@ return tuple is constructed to where it is returned (in fact, `'0` and
`'1` might have slightly different liveness sets, but that's not very
interesting to the point we are illustrating here).

The `'a: '0` and `'b: '1` constraints arise from subtyping. When we
construct the `(a, b)` value, it will be assigned type `(&'0 u32, &'1
The `'a: '0` and `'b: '1` constraints arise from subtyping.
When we construct the `(a, b)` value, it will be assigned type `(&'0 u32, &'1
u32)` -- the region variables reflect that the lifetimes of these
references could be made smaller. For this value to be created from
`a` and `b`, however, we do require that:
references could be made smaller.
For this value to be created from `a` and `b`, however, we do require that:

```txt
(&'a u32, &'b u32) <: (&'0 u32, &'1 u32)
Expand All @@ -82,35 +84,39 @@ which means in turn that `&'a u32 <: &'0 u32` and hence that `'a: '0`

Note that if we ignore member constraints, the value of `'0` would be
inferred to some subset of the function body (from the liveness
constraints, which we did not write explicitly). It would never become
constraints, which we did not write explicitly).
It would never become
`'a`, because there is no need for it too -- we have a constraint that
`'a: '0`, but that just puts a "cap" on how *large* `'0` can grow to
become. Since we compute the *minimal* value that we can, we are happy
to leave `'0` as being just equal to the liveness set. This is where
member constraints come in.
`'a: '0`, but that just puts a "cap" on how *large* `'0` can grow to become.
Since we compute the *minimal* value that we can, we are happy
to leave `'0` as being just equal to the liveness set.
This is where member constraints come in.

## Choices are always lifetime parameters

At present, the "choice" regions from a member constraint are always lifetime
parameters from the current function. As of <!-- date-check --> March 2026,
this falls out from the placement of impl Trait, though in the future it may not
be the case. We take some advantage of this fact, as it simplifies the current
code. In particular, we don't have to consider a case like `'0 member of ['1,
be the case.
We take some advantage of this fact, as it simplifies the current code.
In particular, we don't have to consider a case like `'0 member of ['1,
'static]`, in which the value of both `'0` and `'1` are being inferred and hence
changing. See [rust-lang/rust#61773][#61773] for more information.
changing.
See [rust-lang/rust#61773][#61773] for more information.

[#61773]: https://github.com/rust-lang/rust/issues/61773

## Applying member constraints

Member constraints are a bit more complex than other forms of
constraints. This is because they have a "or" quality to them -- that
Member constraints are a bit more complex than other forms of constraints.
This is because they have a "or" quality to them -- that
is, they describe multiple choices that we must select from. E.g., in
our example constraint `'0 member of ['a, 'b, 'static]`, it might be
that `'0` is equal to `'a`, `'b`, *or* `'static`. How can we pick the
correct one? What we currently do is to look for a *minimal choice*
-- if we find one, then we will grow `'0` to be equal to that minimal
choice. To find that minimal choice, we take two factors into
that `'0` is equal to `'a`, `'b`, *or* `'static`.
How can we pick the correct one?
What we currently do is to look for a *minimal choice*
-- if we find one, then we will grow `'0` to be equal to that minimal choice.
To find that minimal choice, we take two factors into
consideration: lower and upper bounds.

### Lower bounds
Expand All @@ -121,30 +127,34 @@ apply member constraints, we've already *computed* the lower bounds of
`'0` because we computed its minimal value (or at least, the lower
bounds considering everything but member constraints).

Let `LB` be the current value of `'0`. We know then that `'0: LB` must
hold, whatever the final value of `'0` is. Therefore, we can rule out
Let `LB` be the current value of `'0`.
We know then that `'0: LB` must hold, whatever the final value of `'0` is.
Therefore, we can rule out
any choice `'choice` where `'choice: LB` does not hold.

Unfortunately, in our example, this is not very helpful. The lower
bound for `'0` will just be the liveness set `{L}`, and we know that
all the lifetime parameters outlive that set. So we are left with the
same set of choices here. (But in other examples, particularly those
Unfortunately, in our example, this is not very helpful.
The lower bound for `'0` will just be the liveness set `{L}`, and we know that
all the lifetime parameters outlive that set.
So we are left with the same set of choices here.
(But in other examples, particularly those
with different variance, lower bound constraints may be relevant.)

### Upper bounds

The *upper bounds* are those lifetimes that *must outlive* `'0` --
i.e., that `'0` must be *smaller* than. In our example, this would be
`'a`, because we have the constraint that `'a: '0`. In more complex
examples, the chain may be more indirect.
`'a`, because we have the constraint that `'a: '0`.
In more complex examples, the chain may be more indirect.

We can use upper bounds to rule out members in a very similar way to
lower bounds. If UB is some upper bound, then we know that `UB:
lower bounds.
If UB is some upper bound, then we know that `UB:
'0` must hold, so we can rule out any choice `'choice` where `UB:
'choice` does not hold.

In our example, we would be able to reduce our choice set from `['a,
'b, 'static]` to just `['a]`. This is because `'0` has an upper bound
'b, 'static]` to just `['a]`.
This is because `'0` has an upper bound
of `'a`, and neither `'a: 'b` nor `'a: 'static` is known to hold.

(For notes on how we collect upper bounds in the implementation, see
Expand All @@ -153,39 +163,45 @@ of `'a`, and neither `'a: 'b` nor `'a: 'static` is known to hold.
### Minimal choice

After applying lower and upper bounds, we can still sometimes have
multiple possibilities. For example, imagine a variant of our example
using types with the opposite variance. In that case, we would have
the constraint `'0: 'a` instead of `'a: '0`. Hence the current value
of `'0` would be `{L, 'a}`. Using this as a lower bound, we would be
multiple possibilities.
For example, imagine a variant of our example
using types with the opposite variance.
In that case, we would have the constraint `'0: 'a` instead of `'a: '0`.
Hence the current value of `'0` would be `{L, 'a}`.
Using this as a lower bound, we would be
able to narrow down the member choices to `['a, 'static]` because `'b:
'a` is not known to hold (but `'a: 'a` and `'static: 'a` do hold). We
would not have any upper bounds, so that would be our final set of choices.
'a` is not known to hold (but `'a: 'a` and `'static: 'a` do hold).
We would not have any upper bounds, so that would be our final set of choices.

In that case, we apply the **minimal choice** rule -- basically, if
one of our choices if smaller than the others, we can use that. In
this case, we would opt for `'a` (and not `'static`).
one of our choices if smaller than the others, we can use that.
In this case, we would opt for `'a` (and not `'static`).

This choice is consistent with the general 'flow' of region
propagation, which always aims to compute a minimal value for the
region being inferred. However, it is somewhat arbitrary.
region being inferred.
However, it is somewhat arbitrary.

<a id="collecting"></a>

### Collecting upper bounds in the implementation

In practice, computing upper bounds is a bit inconvenient, because our
data structures are setup for the opposite. What we do is to compute
data structures are setup for the opposite.
What we do is to compute
the **reverse SCC graph** (we do this lazily and cache the result) --
that is, a graph where `'a: 'b` induces an edge `SCC('b) ->
SCC('a)`. Like the normal SCC graph, this is a DAG. We can then do a
depth-first search starting from `SCC('0)` in this graph. This will
take us to all the SCCs that must outlive `'0`.
that is, a graph where `'a: 'b` induces an edge `SCC('b) -> SCC('a)`.
Like the normal SCC graph, this is a DAG.
We can then do a depth-first search starting from `SCC('0)` in this graph.
This will take us to all the SCCs that must outlive `'0`.

One wrinkle is that, as we walk the "upper bound" SCCs, their values
will not yet have been fully computed. However, we **have** already
will not yet have been fully computed.
However, we **have** already
applied their liveness constraints, so we have some information about
their value. In particular, for any regions representing lifetime
their value.
In particular, for any regions representing lifetime
parameters, their value will contain themselves (i.e., the initial
value for `'a` includes `'a` and the value for `'b` contains `'b`). So
we can collect all of the lifetime parameters that are reachable,
value for `'a` includes `'a` and the value for `'b` contains `'b`).
So we can collect all of the lifetime parameters that are reachable,
which is precisely what we are interested in.
66 changes: 31 additions & 35 deletions src/diagnostics/lintstore.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,14 @@
This page documents some of the machinery around lint registration and how we
run lints in the compiler.

The [`LintStore`] is the central piece of infrastructure, around which
everything rotates. The `LintStore` is held as part of the [`Session`], and it
The [`LintStore`] is the central piece of infrastructure, around which everything rotates.
The `LintStore` is held as part of the [`Session`], and it
gets populated with the list of lints shortly after the `Session` is created.

## Lints vs. lint passes

There are two parts to the linting mechanism within the compiler: lints and
lint passes. Unfortunately, a lot of the documentation we have refers to both
of these as just "lints."
There are two parts to the linting mechanism within the compiler: lints and lint passes.
Unfortunately, a lot of the documentation we have refers to both of these as just "lints."

First, we have the lint declarations themselves,
and this is where the name and default lint level and other metadata come from.
Expand All @@ -24,10 +23,11 @@ like all macros).
we lint against direct declarations without the use of the macro.

Lint declarations don't carry any "state" - they are merely global identifiers
and descriptions of lints. We assert at runtime that they are not registered
twice (by lint name).
and descriptions of lints.
We assert at runtime that they are not registered twice (by lint name).

Lint passes are the meat of any lint. Notably, there is not a one-to-one
Lint passes are the meat of any lint.
Notably, there is not a one-to-one
relationship between lints and lint passes; a lint might not have any lint pass
that emits it, it could have many, or just one -- the compiler doesn't track
whether a pass is in any way associated with a particular lint, and frequently
Expand All @@ -44,36 +44,33 @@ and all lints are registered.
There are three 'sources' of lints:

* internal lints: lints only used by the rustc codebase
* builtin lints: lints built into the compiler and not provided by some outside
source
* `rustc_interface::Config`[`register_lints`]: lints passed into the compiler
during construction
* builtin lints: lints built into the compiler and not provided by some outside source
* `rustc_interface::Config`[`register_lints`]: lints passed into the compiler during construction

Lints are registered via the [`LintStore::register_lint`] function. This should
happen just once for any lint, or an ICE will occur.
Lints are registered via the [`LintStore::register_lint`] function.
This should happen just once for any lint, or an ICE will occur.

Once the registration is complete, we "freeze" the lint store by placing it in
an `Arc`.
Once the registration is complete, we "freeze" the lint store by placing it in an `Arc`.

Lint passes are registered separately into one of the categories
(pre-expansion, early, late, late module). Passes are registered as a closure
(pre-expansion, early, late, late module).
Passes are registered as a closure
-- i.e., `impl Fn() -> Box<dyn X>`, where `dyn X` is either an early or late
lint pass trait object. When we run the lint passes, we run the closure and
then invoke the lint pass methods. The lint pass methods take `&mut self` so
they can keep track of state internally.
lint pass trait object.
When we run the lint passes, we run the closure and then invoke the lint pass methods.
The lint pass methods take `&mut self` so they can keep track of state internally.

#### Internal lints

These are lints used just by the compiler or drivers like `clippy`. They can be
found in [`rustc_lint::internal`].
These are lints used just by the compiler or drivers like `clippy`.
They can be found in [`rustc_lint::internal`].

An example of such a lint is the check that lint passes are implemented using
the `declare_lint_pass!` macro and not by hand. This is accomplished with the
`LINT_PASS_IMPL_WITHOUT_MACRO` lint.
the `declare_lint_pass!` macro and not by hand.
This is accomplished with the `LINT_PASS_IMPL_WITHOUT_MACRO` lint.

Registration of these lints happens in the [`rustc_lint::register_internals`]
function which is called when constructing a new lint store inside
[`rustc_lint::new_lint_store`].
function which is called when constructing a new lint store inside [`rustc_lint::new_lint_store`].

#### Builtin Lints

Expand All @@ -83,19 +80,18 @@ Often the first provides the definitions for the lints themselves,
and the latter provides the lint pass definitions (and implementations),
but this is not always true.

The builtin lint registration happens in
the [`rustc_lint::register_builtins`] function.
The builtin lint registration happens in the [`rustc_lint::register_builtins`] function.
Just like with internal lints,
this happens inside of [`rustc_lint::new_lint_store`].

#### Driver lints

These are the lints provided by drivers via the `rustc_interface::Config`
[`register_lints`] field, which is a callback. Drivers should, if finding it
already set, call the function currently set within the callback they add. The
best way for drivers to get access to this is by overriding the
`Callbacks::config` function which gives them direct access to the `Config`
structure.
[`register_lints`] field, which is a callback.
Drivers should, if finding it
already set, call the function currently set within the callback they add.
The best way for drivers to get access to this is by overriding the
`Callbacks::config` function which gives them direct access to the `Config` structure.

## Compiler lint passes are combined into one pass

Expand All @@ -105,8 +101,8 @@ of lint passes. Instead, we have a single lint pass of each variety (e.g.,
individual lint passes; this is because then we get the benefits of static over
dynamic dispatch for each of the (often empty) trait methods.

Ideally, we'd not have to do this, since it adds to the complexity of
understanding the code. However, with the current type-erased lint store
Ideally, we'd not have to do this, since it adds to the complexity of understanding the code.
However, with the current type-erased lint store
approach, it is beneficial to do so for performance reasons.

[`LintStore`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lint/struct.LintStore.html
Expand Down
Loading
Loading