Note: Dux is under active development and not yet production ready. APIs may change between releases.
DuckDB-native dataframes for Elixir.
Dux gives you a dplyr-style verb API backed by DuckDB's analytical engine, with built-in distributed execution across the BEAM. Pipelines are lazy, operations compile to SQL, and DuckDB handles columnar execution, vectorised aggregation, and predicate pushdown.
require Dux
Dux.from_parquet("s3://data/sales/**/*.parquet")
|> Dux.filter(amount > 100 and region == "US")
|> Dux.mutate(revenue: price * quantity)
|> Dux.group_by(:product)
|> Dux.summarise(total: sum(revenue), orders: count(product))
|> Dux.sort_by(desc: :total)
|> Dux.to_rows()Dux is the successor to Explorer. That means it borrows its verb design from dplyr and the tidyverse — constrained, composable operations that each do one thing well. If you've used dplyr::filter(), mutate(), group_by() |> summarise(), the Dux API will feel familiar.
Where Dux diverges from Explorer:
- The module IS the dataframe.
Dux.filter(df, ...)notDux.DataFrame.filter(df, ...). No Series API — all operations are dataframe-level. - DuckDB is the only engine. No pluggable backends, no abstraction tax. Full access to DuckDB's SQL functions, window functions, recursive CTEs, and 50+ extensions.
- Lazy by default. Operations accumulate as an AST in
%Dux{}. When you materialise (compute/1,to_rows/1), the whole pipeline compiles to a chain of SQL CTEs and DuckDB optimises end-to-end. - Distributed on the BEAM.
%Dux{}is plain data — ship it to any BEAM node, compile to SQL there, execute against that node's local DuckDB. No function serialisation, no cluster manager, no heavyweight RPC.
def deps do
[{:dux, "~> 0.2.0"}]
endDux is a pure Elixir project. The DuckDB engine is provided via ADBC — a precompiled driver downloaded automatically at compile time. No Rust or C++ compilation needed.
require Dux
# Built-in datasets — no files needed
Dux.Datasets.flights()
|> Dux.filter(distance > 1000)
|> Dux.group_by(:origin)
|> Dux.summarise(avg_delay: avg(arr_delay), n: count(flight))
|> Dux.sort_by(desc: :avg_delay)
|> Dux.head(5)
|> Dux.to_rows()Every verb (filter, mutate, group_by, summarise, etc.) takes Elixir expressions via macros. Bare identifiers become column names. ^ interpolates Elixir values safely as parameter bindings:
min_amount = 500
Dux.filter(df, amount > ^min_amount and status == "active")All DuckDB functions work inside expressions — year(), lower(), coalesce(), regexp_matches(), and hundreds more. cond maps to CASE WHEN, in maps to IN:
Dux.mutate(df,
tier: cond do
amount > 1000 -> "gold"
amount > 100 -> "silver"
true -> "bronze"
end
)
Dux.filter(df, status in ["active", "pending"])The _with variants accept raw DuckDB SQL for window functions and other constructs the macro doesn't cover:
Dux.mutate_with(df, rank: "ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)")Read and write CSV, Parquet, NDJSON, Excel, and database tables:
df = Dux.from_parquet("s3://bucket/data/**/*.parquet")
df = Dux.from_csv("data.csv", delimiter: "\t")
df = Dux.from_excel("sales.xlsx", sheet: "Q1")
Dux.to_parquet(df, "output/", partition_by: [:year, :month])
Dux.to_excel(df, "report.xlsx")
Dux.insert_into(df, "pg.public.events", create: true)Cross-source queries via DuckDB's ATTACH — Postgres, MySQL, SQLite, Iceberg, Delta, DuckLake:
Dux.attach(:warehouse, "host=db.internal dbname=analytics", type: :postgres)
customers = Dux.from_attached(:warehouse, "public.customers")
Dux.from_parquet("s3://lake/orders/*.parquet")
|> Dux.join(customers, on: :customer_id)
|> Dux.group_by(:region)
|> Dux.summarise(revenue: sum(amount))
|> Dux.to_rows()Mark a pipeline for distributed execution with distribute/2. The same verbs work — Dux handles partitioning, fan-out, and merge automatically:
workers = Dux.Remote.Worker.list()
Dux.from_parquet("s3://lake/events/**/*.parquet")
|> Dux.distribute(workers)
|> Dux.filter(year == 2024)
|> Dux.group_by(:region)
|> Dux.summarise(total: sum(revenue))
|> Dux.to_rows()Under the hood: the Coordinator partitions files across workers (size-balanced, with Hive partition pruning), each worker compiles and executes SQL against its local DuckDB, and the Merger re-aggregates results. Workers read from and write to storage directly — no data funnels through the coordinator.
Distributed writes work the same way:
Dux.from_parquet("s3://input/**/*.parquet")
|> Dux.distribute(workers)
|> Dux.filter(status == "active")
|> Dux.to_parquet("s3://output/", partition_by: :year)Attach Postgres and distribute reads with partition_by::
Dux.from_attached(:pg, "public.orders", partition_by: :id)
|> Dux.distribute(workers)
|> Dux.insert_into("pg.public.summary", create: true)See the Distributed Execution guide for the full architecture — aggregate rewrites, broadcast vs shuffle joins, streaming merge, and fault tolerance.
A graph is two dataframes. All algorithms return %Dux{} — pipe into any verb:
graph = Dux.Graph.new(vertices: users, edges: follows)
graph |> Dux.Graph.pagerank() |> Dux.sort_by(desc: :rank) |> Dux.head(10)
graph |> Dux.Graph.shortest_paths(start_node)
graph |> Dux.Graph.connected_components()Add kino_dux for rich rendering and smart cells in Livebook:
Mix.install([
{:dux, "~> 0.2.0"},
{:kino_dux, "~> 0.1"}
])Lazy pipelines render with source provenance, operations, and generated SQL. Computed results become interactive data tables.
- Getting Started — core concepts, expressions, pipelines
- Data IO — CSV, Parquet, Excel, NDJSON, database writes
- Transformations — filter, mutate, window functions
- Joins & Reshape — join types, ASOF joins, pivots
- Distributed Execution — architecture, partitioning, distributed IO
- Graph Analytics — PageRank, shortest paths, components
- Cheatsheet — quick reference for all verbs
Dual-licensed under Apache 2.0 and MIT. See LICENSE-APACHE and LICENSE-MIT.