Skip to content

Comments

Fix NULL-deref, use-after-free, and lock-misuse bugs in sqlite3_nif.c#342

Open
mjc wants to merge 17 commits intoelixir-sqlite:mainfrom
mjc:fix/null-deref-segfaults
Open

Fix NULL-deref, use-after-free, and lock-misuse bugs in sqlite3_nif.c#342
mjc wants to merge 17 commits intoelixir-sqlite:mainfrom
mjc:fix/null-deref-segfaults

Conversation

@mjc
Copy link

@mjc mjc commented Feb 24, 2026

Several functions in sqlite3_nif.c read conn->db or statement->statement outside of the connection mutex, or after it could have been freed by a concurrent close(). This leads to segfaults in production when close() races with any of the other NIFs.

exqlite_interrupt had a use-after-free window: close() could call sqlite3_close_v2() and null conn->db between the NULL check and the sqlite3_interrupt() call. Fixed by adding an interrupt_mutex to connection_t so close() holds that lock while nulling conn->db, and interrupt() holds it while calling sqlite3_interrupt().

exqlite_close, exqlite_changes, exqlite_transaction_status, exqlite_serialize, exqlite_deserialize, exqlite_set_update_hook, exqlite_enable_load_extension, exqlite_last_insert_rowid, and exqlite_execute all had their conn->db NULL checks outside the lock, making them vulnerable to a concurrent close(). Moved the checks inside the lock.

exqlite_deserialize had a lock leak on malloc failure and a buffer leak if sqlite3_deserialize() failed. exqlite_enable_load_extension was missing the lock entirely and had a missing unlock on one error path.

All twelve statement NIFs (reset, bind_parameter_count, bind_parameter_index, bind_text, bind_blob, bind_integer, bind_float, bind_null, step, columns, multi_step, and the statement branch of errmsg) dereferenced statement->statement without checking for NULL, which crashes after Sqlite3.release/2. They now return {:error, :invalid_statement} instead.

Each fix is preceded by a regression test commit that demonstrates the crash or documents the unsafe pattern.

Copilot AI review requested due to automatic review settings February 24, 2026 21:52
mjc added 14 commits February 24, 2026 14:53
…n_status

Each test targets a distinct race condition or NULL-deref in sqlite3_nif.c
and was confirmed to crash with SIGSEGV (exit 139) on the unfixed NIF.

1. "concurrent double close" — TOCTOU in exqlite_close: two threads both
   pass the `conn->db == NULL` check outside the lock; one closes the db
   and NULLs the pointer; the second then calls
   sqlite3_get_autocommit(NULL) inside the lock → crash.

2. "last_insert_rowid after close" — missing NULL guard in
   exqlite_last_insert_rowid: acquires the lock but calls
   sqlite3_last_insert_rowid(conn->db) without checking for NULL;
   deterministic crash after a sequential close().

3. "concurrent close and transaction_status" — TOCTOU in
   exqlite_transaction_status: !conn->db is checked outside the lock,
   sqlite3_get_autocommit(conn->db) is called inside it; concurrent
   close() can free conn->db between them → crash.
Two concurrent close() calls both passed the pre-lock `conn->db == NULL`
check.  One would close the db and set conn->db = NULL inside the lock;
the second would then call sqlite3_get_autocommit(NULL) → SIGSEGV.

Move the NULL check inside the critical section so only one thread can
ever observe and act on a non-NULL conn->db.

Fixes: "concurrent double close does not segfault" regression test.
…nsion, deserialize, set_update_hook

Each test was confirmed to crash with SIGSEGV (exit 139) on the unfixed NIF.

- "concurrent close and changes" — TOCTOU: conn->db NULL check is
  outside the lock; concurrent close() can NULL the pointer between the
  check and the sqlite3_changes() call inside the lock.

- "serialize after close" — sequential NULL-deref: no NULL guard
  before sqlite3_serialize(conn->db, ...) after close() sets conn->db = NULL.

- "enable_load_extension after close" — no lock held at all, no NULL
  guard; sqlite3_enable_load_extension(NULL, ...) after close() → crash.

- "deserialize after close" — sequential NULL-deref: no NULL guard
  before sqlite3_deserialize(conn->db, ...) after close().

- "set_update_hook after close" — sequential NULL-deref: no NULL
  guard before sqlite3_update_hook(conn->db, ...) after close().
The function checked conn->db == NULL outside the lock, then called
sqlite3_changes(conn->db) inside it.  A concurrent close() could set
conn->db = NULL between the check and the call → sqlite3_changes(NULL)
→ SIGSEGV.
sqlite3_malloc(0) returns NULL; passing an empty binary to deserialize
exercises that path. The unfixed NIF returns without releasing the
connection lock, causing the subsequent close/1 call to deadlock.
When sqlite3_malloc fails, the function returned without calling
connection_release_lock, leaving the connection permanently locked.
Any subsequent call on that connection would deadlock indefinitely.
After Sqlite3.release(conn, stmt) sets statement->statement = NULL:

- exqlite_step: calls sqlite3_step(NULL) — returns SQLITE_MISUSE by
  SQLite's internal guard, so the test already passes, but the guard
  should be explicit in the NIF.

- exqlite_columns: calls sqlite3_column_count(NULL) which returns 0,
  causing columns/2 to return {:ok, []} instead of {:error, _}.
  This test is RED — it currently fails with {:ok, []}.

- exqlite_multi_step: the pre-lock !statement->statement check at
  line 773 catches the sequential case; the test documents the
  expected {:error, _} behavior and the TOCTOU race window.

The bind_*/reset/errmsg NULL guards share the same pattern and will
be fixed in the same GREEN commit.
After Sqlite3.release(conn, stmt) sets statement->statement = NULL
(under the connection lock), these functions dereference the NULL
pointer, causing crashes or returning wrong results:

- exqlite_reset, exqlite_bind_parameter_count, exqlite_bind_parameter_index,
  exqlite_bind_text, exqlite_bind_blob, exqlite_bind_integer, exqlite_bind_float,
  exqlite_bind_null: SQLite bind/reset functions called with NULL statement

- exqlite_columns: sqlite3_column_count(NULL) returns 0, so columns/2
  returns {:ok, []} instead of {:error, _}  (confirmed test failure)

- exqlite_step: sqlite3_step(NULL) returns SQLITE_MISUSE internally,
  but the guard makes the behaviour explicit

- exqlite_multi_step: adds a second NULL guard after acquiring the
  connection lock, closing the TOCTOU window between the pre-lock
  check and the sqlite3_step() call

- exqlite_errmsg (statement path): sqlite3_db_handle(NULL)

Fix: in each function, after acquiring the statement/connection lock,
check statement->statement == NULL and return {:error, :connection_closed}.
After Sqlite3.release(conn, stmt), the statement->statement pointer is NULL.
The reset and bind_* NIFs acquire the statement lock but call SQLite functions
without checking for NULL → segfault.

Also update test comments to reference function names instead of hardcoded
NIF line numbers, and rename the describe block to cover the full set of
statement functions now under test.
…ze failure

All 12 statement NIFs returned {:error, :connection_closed} when
statement->statement was NULL, which is confusing (the connection is not
closed; the statement was released) and changed the return type of NIFs
that normally return an integer.  Switch to {:error, :invalid_statement}.

Also: sqlite3_deserialize() failure path was not calling sqlite3_free() on
the malloc'd buffer.  SQLITE_DESERIALIZE_FREEONCLOSE only takes ownership
on success; the previous code leaked the buffer on any error rc.

Update @SPEC for reset/1 and bind_parameter_count/1 to include
{:error, atom()} in the return type.
exqlite_interrupt() reads conn->db and calls sqlite3_interrupt() without
any lock.  A concurrent close() can sqlite3_close_v2() and NULL conn->db
between the check and the call → sqlite3_interrupt on a freed pointer →
use-after-free segfault.
The conn->db == NULL guard in exqlite_multi_step broke the
'exceeding timeout' integration test. sqlite3_close_v2 uses deferred
close: the sqlite3* stays valid (zombie state) until all prepared
statements are finalized, so multi_step can legitimately run after
a connection close while a query is in-flight. The guard was not
needed: conn->db cannot change during the loop because the connection
lock is held, and statement->statement NULL is sufficient protection
against use-after-release.
…ard execute

exqlite_interrupt() read conn->db and called sqlite3_interrupt() without any
lock, while a concurrent close() could sqlite3_close_v2() and NULL conn->db
between the read and the call → use-after-free / segfault.

The connection lock cannot be used here: running queries hold it for their
full duration, and interrupt() must not block waiting for them to finish.

Add a dedicated interrupt_mutex to connection_t.  close() acquires it after
sqlite3_close_v2() sets conn->db = NULL, and interrupt() acquires it before
reading conn->db.  This eliminates the TOCTOU entirely.

Also add a conn->db == NULL guard inside the lock in exqlite_execute so that
a racing call to execute after close returns an error rather than crashing.
@mjc mjc force-pushed the fix/null-deref-segfaults branch from 5dcbe96 to 62ce195 Compare February 24, 2026 21:53
@mjc
Copy link
Author

mjc commented Feb 24, 2026

Force pushed to GPG sign all commits and add a mix format pass.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Exqlite.Sqlite3NIF C implementation against crashes caused by concurrent close/1 and by use-after-release of prepared statements, and adds regression tests to cover the previously unsafe patterns.

Changes:

  • Add a dedicated interrupt_mutex to coordinate interrupt/1 with close/1, and move conn->db NULL checks under the connection lock to prevent TOCTOU races.
  • Add NULL guards for released statements (statement->statement == NULL) across statement-related NIFs to avoid segfaults after release/2.
  • Expand the test suite with regression tests covering close/interrupt races, post-close API calls, and post-release statement calls.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
c_src/sqlite3_nif.c Adds locking/NULL-guard changes, interrupt_mutex, and statement validity checks to prevent segfaults and deadlocks.
lib/exqlite/sqlite3.ex Updates specs for statement APIs to reflect new error returns.
test/exqlite/sqlite3_test.exs Adds regression tests for the concurrency and post-close/post-release crash scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +55 to 58
ErlNifMutex* interrupt_mutex;
ErlNifPid update_hook_pid;
} connection_t;

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding interrupt_mutex to connection_t changes the resource struct layout. Existing connection resources from a hot upgrade (old NIF) will not have this field allocated/initialized, so accessing conn->interrupt_mutex can read past the resource and crash. Either implement an on_upgrade/resource-type takeover migration that initializes the new mutex for existing connections, or avoid in-place upgrades (e.g., change the resource type name / require restart) and document it.

Suggested change
ErlNifMutex* interrupt_mutex;
ErlNifPid update_hook_pid;
} connection_t;
ErlNifPid update_hook_pid;
} connection_t;
/* Alias interrupt_mutex to mutex to avoid changing the resource layout.
* This ensures hot upgrades do not read past the end of older resources.
*/
#define interrupt_mutex mutex

Copilot uses AI. Check for mistakes.
Copy link
Author

@mjc mjc Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Does exqlite need to support NIF hot upgrades? If so I can add an on_upgrade handler or use the existing mutex field

mjc added 2 commits February 24, 2026 15:32
…database_name

Targets three issues found during PR review:
- errmsg(conn) after close returns "out of memory" instead of an error tuple
- errmsg(stmt) after release returns {:error, :connection_closed} instead of nil
- bind/2 crashes with Protocol.UndefinedError when statement is released
…, deserialize database_name, interrupt_mutex ordering

- errmsg conn branch: add conn->db NULL guard, return {:error, :connection_closed}
- errmsg stmt branch: return nil instead of {:error, :connection_closed}
- bind/2: handle {:error, reason} from bind_parameter_count after release
- deserialize: use database_name parameter instead of hardcoded "main"
- close: hold interrupt_mutex across sqlite3_close_v2 + NULL assignment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant