Dux

Note: Dux is under active development and not yet production ready. APIs may change between releases.

DuckDB-native dataframes for Elixir.

Dux gives you a dplyr-style verb API backed by DuckDB's analytical engine, with built-in distributed execution across the BEAM. Pipelines are lazy, operations compile to SQL, and DuckDB handles columnar execution, vectorised aggregation, and predicate pushdown.

require Dux

Dux.from_parquet("s3://data/sales/**/*.parquet")
|> Dux.filter(amount > 100 and region == "US")
|> Dux.mutate(revenue: price * quantity)
|> Dux.group_by(:product)
|> Dux.summarise(total: sum(revenue), orders: count(product))
|> Dux.sort_by(desc: :total)
|> Dux.to_rows()

Design

Dux is the successor to Explorer. That means it borrows its verb design from dplyr and the tidyverse — constrained, composable operations that each do one thing well. If you've used dplyr::filter(), mutate(), group_by() |> summarise(), the Dux API will feel familiar.

Where Dux diverges from Explorer:

The module IS the dataframe. Dux.filter(df, ...) not Dux.DataFrame.filter(df, ...). No Series API — all operations are dataframe-level.
DuckDB is the only engine. No pluggable backends, no abstraction tax. Full access to DuckDB's SQL functions, window functions, recursive CTEs, and 50+ extensions.
Lazy by default. Operations accumulate as an AST in %Dux{}. When you materialise (compute/1, to_rows/1), the whole pipeline compiles to a chain of SQL CTEs and DuckDB optimises end-to-end.
Distributed on the BEAM. %Dux{} is plain data — ship it to any BEAM node, compile to SQL there, execute against that node's local DuckDB. No function serialisation, no cluster manager, no heavyweight RPC.

Installation

def deps do
  [{:dux, "~> 0.2.0"}]
end

Dux is a pure Elixir project. The DuckDB engine is provided via ADBC — a precompiled driver downloaded automatically at compile time. No Rust or C++ compilation needed.

Getting Started

require Dux

# Built-in datasets — no files needed
Dux.Datasets.flights()
|> Dux.filter(distance > 1000)
|> Dux.group_by(:origin)
|> Dux.summarise(avg_delay: avg(arr_delay), n: count(flight))
|> Dux.sort_by(desc: :avg_delay)
|> Dux.head(5)
|> Dux.to_rows()

Every verb (filter, mutate, group_by, summarise, etc.) takes Elixir expressions via macros. Bare identifiers become column names. ^ interpolates Elixir values safely as parameter bindings:

min_amount = 500
Dux.filter(df, amount > ^min_amount and status == "active")

All DuckDB functions work inside expressions — year(), lower(), coalesce(), regexp_matches(), and hundreds more. cond maps to CASE WHEN, in maps to IN:

Dux.mutate(df,
  tier: cond do
    amount > 1000 -> "gold"
    amount > 100 -> "silver"
    true -> "bronze"
  end
)

Dux.filter(df, status in ["active", "pending"])

The _with variants accept raw DuckDB SQL for window functions and other constructs the macro doesn't cover:

Dux.mutate_with(df, rank: "ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)")

IO

Read and write CSV, Parquet, NDJSON, Excel, and database tables:

df = Dux.from_parquet("s3://bucket/data/**/*.parquet")
df = Dux.from_csv("data.csv", delimiter: "\t")
df = Dux.from_excel("sales.xlsx", sheet: "Q1")

Dux.to_parquet(df, "output/", partition_by: [:year, :month])
Dux.to_excel(df, "report.xlsx")
Dux.insert_into(df, "pg.public.events", create: true)

Cross-source queries via DuckDB's ATTACH — Postgres, MySQL, SQLite, Iceberg, Delta, DuckLake:

Dux.attach(:warehouse, "host=db.internal dbname=analytics", type: :postgres)
customers = Dux.from_attached(:warehouse, "public.customers")

Dux.from_parquet("s3://lake/orders/*.parquet")
|> Dux.join(customers, on: :customer_id)
|> Dux.group_by(:region)
|> Dux.summarise(revenue: sum(amount))
|> Dux.to_rows()

Distributed Execution

Mark a pipeline for distributed execution with distribute/2. The same verbs work — Dux handles partitioning, fan-out, and merge automatically:

workers = Dux.Remote.Worker.list()

Dux.from_parquet("s3://lake/events/**/*.parquet")
|> Dux.distribute(workers)
|> Dux.filter(year == 2024)
|> Dux.group_by(:region)
|> Dux.summarise(total: sum(revenue))
|> Dux.to_rows()

Under the hood: the Coordinator partitions files across workers (size-balanced, with Hive partition pruning), each worker compiles and executes SQL against its local DuckDB, and the Merger re-aggregates results. Workers read from and write to storage directly — no data funnels through the coordinator.

Distributed writes work the same way:

Dux.from_parquet("s3://input/**/*.parquet")
|> Dux.distribute(workers)
|> Dux.filter(status == "active")
|> Dux.to_parquet("s3://output/", partition_by: :year)

Attach Postgres and distribute reads with partition_by::

Dux.from_attached(:pg, "public.orders", partition_by: :id)
|> Dux.distribute(workers)
|> Dux.insert_into("pg.public.summary", create: true)

See the Distributed Execution guide for the full architecture — aggregate rewrites, broadcast vs shuffle joins, streaming merge, and fault tolerance.

Graph Analytics

A graph is two dataframes. All algorithms return %Dux{} — pipe into any verb:

graph = Dux.Graph.new(vertices: users, edges: follows)

graph |> Dux.Graph.pagerank() |> Dux.sort_by(desc: :rank) |> Dux.head(10)
graph |> Dux.Graph.shortest_paths(start_node)
graph |> Dux.Graph.connected_components()

Livebook

Add kino_dux for rich rendering and smart cells in Livebook:

Mix.install([
  {:dux, "~> 0.2.0"},
  {:kino_dux, "~> 0.1"}
])

Lazy pipelines render with source provenance, operations, and generated SQL. Computed results become interactive data tables.

Guides

Getting Started — core concepts, expressions, pipelines
Data IO — CSV, Parquet, Excel, NDJSON, database writes
Transformations — filter, mutate, window functions
Joins & Reshape — join types, ASOF joins, pivots
Distributed Execution — architecture, partitioning, distributed IO
Graph Analytics — PageRank, shortest paths, components
Cheatsheet — quick reference for all verbs

License

Dual-licensed under Apache 2.0 and MIT. See LICENSE-APACHE and LICENSE-MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows		.github/workflows
bench		bench
guides		guides
lib		lib
priv/datasets		priv/datasets
test		test
.credo.exs		.credo.exs
.formatter.exs		.formatter.exs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
cliff.toml		cliff.toml
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dux

Design

Installation

Getting Started

IO

Distributed Execution

Graph Analytics

Livebook

Guides

License

Links

About

Licenses found

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dux

Design

Installation

Getting Started

IO

Distributed Execution

Graph Analytics

Livebook

Guides

License

Links

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages