Skip to content

fix: erase persistent_term leak in GRPC.Client.Connection on disconnect#509

Open
ryochin wants to merge 1 commit intoelixir-grpc:masterfrom
ryochin:fix/persistent-term-leak-on-disconnect
Open

fix: erase persistent_term leak in GRPC.Client.Connection on disconnect#509
ryochin wants to merge 1 commit intoelixir-grpc:masterfrom
ryochin:fix/persistent-term-leak-on-disconnect

Conversation

@ryochin
Copy link
Contributor

@ryochin ryochin commented Mar 11, 2026

Problem

GRPC.Client.Connection.init/1 stores a persistent_term entry keyed by the channel's ref:

# init/1
:persistent_term.put(
  {__MODULE__, :lb_state, state.virtual_channel.ref},
  state.virtual_channel
)

On disconnect, handle_call({:disconnect, _}) drops the virtual_channel from state via Map.drop/2, then stops the GenServer via {:continue, :stop}:

# handle_call({:disconnect, ...})
keys_to_delete = [:real_channels, :virtual_channel]
new_state = Map.drop(state, keys_to_delete)
{:reply, resp, new_state, {:continue, :stop}}

However, the persistent_term entry is never erased -- neither in the disconnect handler nor in terminate/2 (which is a no-op: def terminate(_reason, _state), do: :ok).

By default, each connect/2 call generates a fresh ref via make_ref(), every connect/disconnect cycle permanently leaks one persistent_term entry.

Impact

Applications that create short-lived connections (e.g. connect -> RPC -> disconnect per request) accumulate persistent_term entries indefinitely. Unlike regular process memory, persistent_term entries are not garbage-collected and persist for the lifetime of the BEAM node, causing steady memory growth with no upper bound.

Proposed fix

Erase the persistent_term entry in handle_call({:disconnect, ...}), before Map.drop removes the ref from state:

def handle_call({:disconnect, %Channel{adapter: adapter} = channel}, _from, state) do
    resp = {:ok, %Channel{channel | adapter_payload: %{conn_pid: nil}}}
    :persistent_term.erase({__MODULE__, :lb_state, channel.ref})
    ...

Additionally, terminate/2 should erase the entry as a safety net for abnormal termination paths where disconnect is never called:

def terminate(_reason, %{virtual_channel: %{ref: ref}}) do
    :persistent_term.erase({__MODULE__, :lb_state, ref})
rescue
    _ -> :ok
end
def terminate(_reason, _state), do: :ok

Note: terminate/2 alone is insufficient because Map.drop(state, [:real_channels, :virtual_channel]) removes the ref from state before terminate is called in the normal disconnect path. Both locations are needed.

Reproduction

Requires a gRPC server listening on localhost:50051 (any service will do).

# If GRPC.Client.Supervisor is not already running:
{:ok, _} = DynamicSupervisor.start_link(strategy: :one_for_one, name: GRPC.Client.Supervisor)

count_lb_entries = fn ->
  :persistent_term.get()
  |> Enum.count(fn
    {{GRPC.Client.Connection, :lb_state, _}, _} -> true
    _ -> false
  end)
end

before = count_lb_entries.()

for _ <- 1..100 do
  {:ok, ch} = GRPC.Stub.connect("localhost:50051")
  GRPC.Stub.disconnect(ch)
end

after_ = count_lb_entries.()
IO.puts("leaked entries: #{after_ - before}")
# => leaked entries: 100 (expected: 0)

Each connect/disconnect cycle leaks one persistent_term entry because
the entry is never erased. Since persistent_term is not garbage-collected,
this causes unbounded memory growth on long-running nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant