Fix NULL-deref, use-after-free, and lock-misuse bugs in sqlite3_nif.c#342
Fix NULL-deref, use-after-free, and lock-misuse bugs in sqlite3_nif.c#342mjc wants to merge 17 commits intoelixir-sqlite:mainfrom
Conversation
…n_status Each test targets a distinct race condition or NULL-deref in sqlite3_nif.c and was confirmed to crash with SIGSEGV (exit 139) on the unfixed NIF. 1. "concurrent double close" — TOCTOU in exqlite_close: two threads both pass the `conn->db == NULL` check outside the lock; one closes the db and NULLs the pointer; the second then calls sqlite3_get_autocommit(NULL) inside the lock → crash. 2. "last_insert_rowid after close" — missing NULL guard in exqlite_last_insert_rowid: acquires the lock but calls sqlite3_last_insert_rowid(conn->db) without checking for NULL; deterministic crash after a sequential close(). 3. "concurrent close and transaction_status" — TOCTOU in exqlite_transaction_status: !conn->db is checked outside the lock, sqlite3_get_autocommit(conn->db) is called inside it; concurrent close() can free conn->db between them → crash.
Two concurrent close() calls both passed the pre-lock `conn->db == NULL` check. One would close the db and set conn->db = NULL inside the lock; the second would then call sqlite3_get_autocommit(NULL) → SIGSEGV. Move the NULL check inside the critical section so only one thread can ever observe and act on a non-NULL conn->db. Fixes: "concurrent double close does not segfault" regression test.
…nsion, deserialize, set_update_hook Each test was confirmed to crash with SIGSEGV (exit 139) on the unfixed NIF. - "concurrent close and changes" — TOCTOU: conn->db NULL check is outside the lock; concurrent close() can NULL the pointer between the check and the sqlite3_changes() call inside the lock. - "serialize after close" — sequential NULL-deref: no NULL guard before sqlite3_serialize(conn->db, ...) after close() sets conn->db = NULL. - "enable_load_extension after close" — no lock held at all, no NULL guard; sqlite3_enable_load_extension(NULL, ...) after close() → crash. - "deserialize after close" — sequential NULL-deref: no NULL guard before sqlite3_deserialize(conn->db, ...) after close(). - "set_update_hook after close" — sequential NULL-deref: no NULL guard before sqlite3_update_hook(conn->db, ...) after close().
The function checked conn->db == NULL outside the lock, then called sqlite3_changes(conn->db) inside it. A concurrent close() could set conn->db = NULL between the check and the call → sqlite3_changes(NULL) → SIGSEGV.
sqlite3_malloc(0) returns NULL; passing an empty binary to deserialize exercises that path. The unfixed NIF returns without releasing the connection lock, causing the subsequent close/1 call to deadlock.
When sqlite3_malloc fails, the function returned without calling connection_release_lock, leaving the connection permanently locked. Any subsequent call on that connection would deadlock indefinitely.
After Sqlite3.release(conn, stmt) sets statement->statement = NULL:
- exqlite_step: calls sqlite3_step(NULL) — returns SQLITE_MISUSE by
SQLite's internal guard, so the test already passes, but the guard
should be explicit in the NIF.
- exqlite_columns: calls sqlite3_column_count(NULL) which returns 0,
causing columns/2 to return {:ok, []} instead of {:error, _}.
This test is RED — it currently fails with {:ok, []}.
- exqlite_multi_step: the pre-lock !statement->statement check at
line 773 catches the sequential case; the test documents the
expected {:error, _} behavior and the TOCTOU race window.
The bind_*/reset/errmsg NULL guards share the same pattern and will
be fixed in the same GREEN commit.
After Sqlite3.release(conn, stmt) sets statement->statement = NULL
(under the connection lock), these functions dereference the NULL
pointer, causing crashes or returning wrong results:
- exqlite_reset, exqlite_bind_parameter_count, exqlite_bind_parameter_index,
exqlite_bind_text, exqlite_bind_blob, exqlite_bind_integer, exqlite_bind_float,
exqlite_bind_null: SQLite bind/reset functions called with NULL statement
- exqlite_columns: sqlite3_column_count(NULL) returns 0, so columns/2
returns {:ok, []} instead of {:error, _} (confirmed test failure)
- exqlite_step: sqlite3_step(NULL) returns SQLITE_MISUSE internally,
but the guard makes the behaviour explicit
- exqlite_multi_step: adds a second NULL guard after acquiring the
connection lock, closing the TOCTOU window between the pre-lock
check and the sqlite3_step() call
- exqlite_errmsg (statement path): sqlite3_db_handle(NULL)
Fix: in each function, after acquiring the statement/connection lock,
check statement->statement == NULL and return {:error, :connection_closed}.
After Sqlite3.release(conn, stmt), the statement->statement pointer is NULL. The reset and bind_* NIFs acquire the statement lock but call SQLite functions without checking for NULL → segfault. Also update test comments to reference function names instead of hardcoded NIF line numbers, and rename the describe block to cover the full set of statement functions now under test.
…ze failure
All 12 statement NIFs returned {:error, :connection_closed} when
statement->statement was NULL, which is confusing (the connection is not
closed; the statement was released) and changed the return type of NIFs
that normally return an integer. Switch to {:error, :invalid_statement}.
Also: sqlite3_deserialize() failure path was not calling sqlite3_free() on
the malloc'd buffer. SQLITE_DESERIALIZE_FREEONCLOSE only takes ownership
on success; the previous code leaked the buffer on any error rc.
Update @SPEC for reset/1 and bind_parameter_count/1 to include
{:error, atom()} in the return type.
exqlite_interrupt() reads conn->db and calls sqlite3_interrupt() without any lock. A concurrent close() can sqlite3_close_v2() and NULL conn->db between the check and the call → sqlite3_interrupt on a freed pointer → use-after-free segfault.
The conn->db == NULL guard in exqlite_multi_step broke the 'exceeding timeout' integration test. sqlite3_close_v2 uses deferred close: the sqlite3* stays valid (zombie state) until all prepared statements are finalized, so multi_step can legitimately run after a connection close while a query is in-flight. The guard was not needed: conn->db cannot change during the loop because the connection lock is held, and statement->statement NULL is sufficient protection against use-after-release.
…ard execute exqlite_interrupt() read conn->db and called sqlite3_interrupt() without any lock, while a concurrent close() could sqlite3_close_v2() and NULL conn->db between the read and the call → use-after-free / segfault. The connection lock cannot be used here: running queries hold it for their full duration, and interrupt() must not block waiting for them to finish. Add a dedicated interrupt_mutex to connection_t. close() acquires it after sqlite3_close_v2() sets conn->db = NULL, and interrupt() acquires it before reading conn->db. This eliminates the TOCTOU entirely. Also add a conn->db == NULL guard inside the lock in exqlite_execute so that a racing call to execute after close returns an error rather than crashing.
5dcbe96 to
62ce195
Compare
|
Force pushed to GPG sign all commits and add a mix format pass. |
There was a problem hiding this comment.
Pull request overview
This PR hardens the Exqlite.Sqlite3NIF C implementation against crashes caused by concurrent close/1 and by use-after-release of prepared statements, and adds regression tests to cover the previously unsafe patterns.
Changes:
- Add a dedicated
interrupt_mutexto coordinateinterrupt/1withclose/1, and moveconn->dbNULL checks under the connection lock to prevent TOCTOU races. - Add NULL guards for released statements (
statement->statement == NULL) across statement-related NIFs to avoid segfaults afterrelease/2. - Expand the test suite with regression tests covering close/interrupt races, post-close API calls, and post-release statement calls.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
c_src/sqlite3_nif.c |
Adds locking/NULL-guard changes, interrupt_mutex, and statement validity checks to prevent segfaults and deadlocks. |
lib/exqlite/sqlite3.ex |
Updates specs for statement APIs to reflect new error returns. |
test/exqlite/sqlite3_test.exs |
Adds regression tests for the concurrency and post-close/post-release crash scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ErlNifMutex* interrupt_mutex; | ||
| ErlNifPid update_hook_pid; | ||
| } connection_t; | ||
|
|
There was a problem hiding this comment.
Adding interrupt_mutex to connection_t changes the resource struct layout. Existing connection resources from a hot upgrade (old NIF) will not have this field allocated/initialized, so accessing conn->interrupt_mutex can read past the resource and crash. Either implement an on_upgrade/resource-type takeover migration that initializes the new mutex for existing connections, or avoid in-place upgrades (e.g., change the resource type name / require restart) and document it.
| ErlNifMutex* interrupt_mutex; | |
| ErlNifPid update_hook_pid; | |
| } connection_t; | |
| ErlNifPid update_hook_pid; | |
| } connection_t; | |
| /* Alias interrupt_mutex to mutex to avoid changing the resource layout. | |
| * This ensures hot upgrades do not read past the end of older resources. | |
| */ | |
| #define interrupt_mutex mutex |
There was a problem hiding this comment.
Good catch. Does exqlite need to support NIF hot upgrades? If so I can add an on_upgrade handler or use the existing mutex field
…database_name
Targets three issues found during PR review:
- errmsg(conn) after close returns "out of memory" instead of an error tuple
- errmsg(stmt) after release returns {:error, :connection_closed} instead of nil
- bind/2 crashes with Protocol.UndefinedError when statement is released
…, deserialize database_name, interrupt_mutex ordering
- errmsg conn branch: add conn->db NULL guard, return {:error, :connection_closed}
- errmsg stmt branch: return nil instead of {:error, :connection_closed}
- bind/2: handle {:error, reason} from bind_parameter_count after release
- deserialize: use database_name parameter instead of hardcoded "main"
- close: hold interrupt_mutex across sqlite3_close_v2 + NULL assignment
Several functions in sqlite3_nif.c read conn->db or statement->statement outside of the connection mutex, or after it could have been freed by a concurrent close(). This leads to segfaults in production when close() races with any of the other NIFs.
exqlite_interrupt had a use-after-free window: close() could call sqlite3_close_v2() and null conn->db between the NULL check and the sqlite3_interrupt() call. Fixed by adding an interrupt_mutex to connection_t so close() holds that lock while nulling conn->db, and interrupt() holds it while calling sqlite3_interrupt().
exqlite_close, exqlite_changes, exqlite_transaction_status, exqlite_serialize, exqlite_deserialize, exqlite_set_update_hook, exqlite_enable_load_extension, exqlite_last_insert_rowid, and exqlite_execute all had their conn->db NULL checks outside the lock, making them vulnerable to a concurrent close(). Moved the checks inside the lock.
exqlite_deserialize had a lock leak on malloc failure and a buffer leak if sqlite3_deserialize() failed. exqlite_enable_load_extension was missing the lock entirely and had a missing unlock on one error path.
All twelve statement NIFs (reset, bind_parameter_count, bind_parameter_index, bind_text, bind_blob, bind_integer, bind_float, bind_null, step, columns, multi_step, and the statement branch of errmsg) dereferenced statement->statement without checking for NULL, which crashes after Sqlite3.release/2. They now return {:error, :invalid_statement} instead.
Each fix is preceded by a regression test commit that demonstrates the crash or documents the unsafe pattern.