Add num_workers to load() by hagenw · Pull Request #35 · audeering/audmodel

hagenw · 2025-10-24T13:50:49Z

Adds num_workers to audmodel.load() to speed up model loading from backends that support downloads using multiple workers.

This requires a development version of audbackend and audeer at the moment.

Benchmarks

All benchmarks use the model 7289b57d-1.0.0 (4.2 GB).

num_workers	Before	After	After + audeering/audeer#186
1			0:02:13.382400
2			0:02:06.741217
3			0:02:00.317035
4			0:02:00.812837
5			0:02:00.533920
10			0:02:00.036849

I run the benchmark with

$ uv run --python 3.12 benchmark-audmodel.py

benchmark-audmodel.py

# /// script
# dependencies = [
#   "audmodel",
#   "numpy",
#   "pandas",
#   "tabulate",
# ]
# [tool.uv.sources]
# audmodel = { path = ".", editable = true }
# ///
import datetime
import tempfile
import time

import numpy as np
import pandas as pd

import audeer
import audmodel


def main():

    model_id = "7289b57d-1.0.0"
    num_iter = 10

    ds = []
    for num_workers in audeer.progress_bar([1, 2, 3, 4, 5, 10]):
        elapsed = []
        for _ in range(num_iter):
            with tempfile.TemporaryDirectory() as cache_root:
                t = time.time()
                audmodel.load(
                    model_id,
                    cache_root=cache_root,
                    num_workers=num_workers,
                    verbose=False,
                )
                elapsed.append(time.time() - t)

        ds.append(
            {
                "num_workers": num_workers,
                "num_iter": num_iter,
                "elapsed(avg)": str(datetime.timedelta(seconds=np.mean(elapsed))),
                "elapsed(std)": str(datetime.timedelta(seconds=np.std(elapsed))),
            }
        )

    df = pd.DataFrame(ds)
    df.to_csv(f"results.csv", index=False)
    df.to_markdown(f"results.md", index=False)


if __name__ == "__main__":
    main()

Performance when using the single-threaded implementations of audeer and audbackend:

num_workers	num_iter	elapsed(avg)	elapsed(std)
1	10	0:02:17.690002	0:00:05.033113

When using only multiple workers with audbackend, performance gets only slightly better. So most likely the archive extraction is costly.

num_workers	num_iter	elapsed(avg)	elapsed(std)
1	10	0:02:23.903513	0:00:05.744711
2	10	0:02:14.147027	0:00:07.193578
3	10	0:02:09.759947	0:00:00.829281
4	10	0:02:10.224645	0:00:00.940978
5	10	0:02:12.284520	0:00:03.940332
10	10	0:02:11.610993	0:00:01.676725

When using multiple workers with audbackend and audeer, performance only slightly gets better. I think the biggest problem is that we store the model in a single big file. Using multiple workers during extraction of the archive is only efficient if the model would be stored in several smaller chunks.

num_workers	num_iter	elapsed(avg)	elapsed(std)
1	10	0:02:13.382400	0:00:03.332782
2	10	0:02:06.741217	0:00:00.941959
3	10	0:02:00.317035	0:00:00.878243
4	10	0:02:00.812837	0:00:01.087174
5	10	0:02:00.533920	0:00:01.224509
10	10	0:02:00.036849	0:00:01.653377

Summary by Sourcery

Introduce a num_workers parameter to the model loading API and backend to enable parallel downloads, and update the audbackend dependency to a development branch that supports multi-worker downloads.

New Features:

Add num_workers argument to audmodel.load() to configure parallel download jobs
Propagate num_workers through get_archive and backend download functions for concurrency support

Build:

Update audbackend dependency to the workers-download development branch for multi-worker download support

sourcery-ai · 2025-10-24T13:50:55Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Introduces a num_workers parameter to the high-level audmodel.load API and propagates it through the backend download logic, ensuring parallel model downloads by leveraging a development version of the audbackend.

Sequence diagram for model loading with num_workers parameter

sequenceDiagram
    participant User
    participant API as audmodel.load()
    participant Backend as get_archive()
    User->>API: load(uid, cache_root, num_workers, verbose)
    API->>Backend: get_archive(short_id, version, cache_root, num_workers, verbose)
    Backend-->>API: returns archive path
    API-->>User: returns archive path

Class diagram for updated load and get_archive functions

classDiagram
    class api {
        +load(uid: str, cache_root: str | None = None, num_workers: int = 1, verbose: bool = False) str
    }
    class backend {
        +get_archive(short_id: str, version: str, cache_root: str, num_workers: int, verbose: bool) str
    }
    api --> backend: calls get_archive

File-Level Changes

Change	Details	Files
Expose num_workers in load API	Add `num_workers` argument with default Extend docstring to describe the new parameter Pass `num_workers` to backend call	`audmodel/core/api.py`
Support parallel downloads in backend	Add `num_workers` parameter to `get_archive` signature Propagate `num_workers` into internal download invocation	`audmodel/core/backend.py`
Pin audbackend to workers-enabled dev branch	Update `audbackend` dependency to GitHub URL with `workers-download` branch	`pyproject.toml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `audmodel/core/api.py:277` </location>
<code_context>
     uid: str,
     *,
     cache_root: str | None = None,
+    num_workers: int = 1,
     verbose: bool = False,
 ) -> str:
</code_context>

<issue_to_address>
**suggestion:** Consider validating num_workers for positive integer values.

A check for num_workers > 0 will help prevent errors from invalid input.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-10-24T13:51:24Z

    uid: str,
    *,
    cache_root: str | None = None,
+    num_workers: int = 1,


suggestion: Consider validating num_workers for positive integer values.

A check for num_workers > 0 will help prevent errors from invalid input.

codecov · 2025-10-24T13:51:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (ded8bec) to head (b45a5dc).

Additional details and impacted files

Files with missing lines	Coverage Δ
audmodel/core/api.py	`100.0% <100.0%> (ø)`
audmodel/core/backend.py	`100.0% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hagenw · 2026-01-07T16:25:02Z

Using multiple workers during extraction of the archive is only efficient if the model would be stored in several smaller chunks.

Maybe we should extend audmodel.publish() to split up large files?

sourcery-ai bot reviewed Oct 24, 2025

View reviewed changes

hagenw mentioned this pull request Oct 24, 2025

Use workers for file download audeering/audbackend#271

Merged

hagenw added 2 commits October 27, 2025 18:25

Add num_workers argument to load()

a461a8d

Depend on audbackend dev branch

5ea114e

hagenw force-pushed the get-num-workers branch from 6b71aca to 5ea114e Compare October 27, 2025 17:25

hagenw added 3 commits November 11, 2025 12:47

Add dependency on audeer PR

24c34c1

Fix URL

b45a5dc

Depend on audbackend[all]>=2.3.0

3fef2a7

hagenw marked this pull request as draft January 14, 2026 16:27

hagenw mentioned this pull request Jan 16, 2026

Extract ZIP archives while downloading audeering/audbackend#279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add num_workers to load()#35

Add num_workers to load()#35
hagenw wants to merge 5 commits intomainfrom
get-num-workers

hagenw commented Oct 24, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot commented Oct 24, 2025 •

edited

Loading

Reviewer's Guide

Sequence diagram for model loading with num_workers parameter

Class diagram for updated load and get_archive functions

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Oct 24, 2025

Uh oh!

codecov bot commented Oct 24, 2025 •

edited

Loading

Uh oh!

hagenw commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hagenw commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for model loading with num_workers parameter

Class diagram for updated load and get_archive functions

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hagenw commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hagenw commented Oct 24, 2025 •

edited

Loading

sourcery-ai bot commented Oct 24, 2025 •

edited

Loading

codecov bot commented Oct 24, 2025 •

edited

Loading