Skip to content

Add num_workers to load()#35

Draft
hagenw wants to merge 5 commits intomainfrom
get-num-workers
Draft

Add num_workers to load()#35
hagenw wants to merge 5 commits intomainfrom
get-num-workers

Conversation

@hagenw
Copy link
Copy Markdown
Member

@hagenw hagenw commented Oct 24, 2025

Adds num_workers to audmodel.load() to speed up model loading from backends that support downloads using multiple workers.

This requires a development version of audbackend and audeer at the moment.

Benchmarks

All benchmarks use the model 7289b57d-1.0.0 (4.2 GB).

num_workers Before After After + audeering/audeer#186
1 0:02:13.382400
2 0:02:06.741217
3 0:02:00.317035
4 0:02:00.812837
5 0:02:00.533920
10 0:02:00.036849

I run the benchmark with

$ uv run --python 3.12 benchmark-audmodel.py
benchmark-audmodel.py
# /// script
# dependencies = [
#   "audmodel",
#   "numpy",
#   "pandas",
#   "tabulate",
# ]
# [tool.uv.sources]
# audmodel = { path = ".", editable = true }
# ///
import datetime
import tempfile
import time

import numpy as np
import pandas as pd

import audeer
import audmodel


def main():

    model_id = "7289b57d-1.0.0"
    num_iter = 10

    ds = []
    for num_workers in audeer.progress_bar([1, 2, 3, 4, 5, 10]):
        elapsed = []
        for _ in range(num_iter):
            with tempfile.TemporaryDirectory() as cache_root:
                t = time.time()
                audmodel.load(
                    model_id,
                    cache_root=cache_root,
                    num_workers=num_workers,
                    verbose=False,
                )
                elapsed.append(time.time() - t)

        ds.append(
            {
                "num_workers": num_workers,
                "num_iter": num_iter,
                "elapsed(avg)": str(datetime.timedelta(seconds=np.mean(elapsed))),
                "elapsed(std)": str(datetime.timedelta(seconds=np.std(elapsed))),
            }
        )

    df = pd.DataFrame(ds)
    df.to_csv(f"results.csv", index=False)
    df.to_markdown(f"results.md", index=False)


if __name__ == "__main__":
    main()

Performance when using the single-threaded implementations of audeer and audbackend:

num_workers num_iter elapsed(avg) elapsed(std)
1 10 0:02:17.690002 0:00:05.033113

When using only multiple workers with audbackend, performance gets only slightly better. So most likely the archive extraction is costly.

num_workers num_iter elapsed(avg) elapsed(std)
1 10 0:02:23.903513 0:00:05.744711
2 10 0:02:14.147027 0:00:07.193578
3 10 0:02:09.759947 0:00:00.829281
4 10 0:02:10.224645 0:00:00.940978
5 10 0:02:12.284520 0:00:03.940332
10 10 0:02:11.610993 0:00:01.676725

When using multiple workers with audbackend and audeer, performance only slightly gets better. I think the biggest problem is that we store the model in a single big file. Using multiple workers during extraction of the archive is only efficient if the model would be stored in several smaller chunks.

num_workers num_iter elapsed(avg) elapsed(std)
1 10 0:02:13.382400 0:00:03.332782
2 10 0:02:06.741217 0:00:00.941959
3 10 0:02:00.317035 0:00:00.878243
4 10 0:02:00.812837 0:00:01.087174
5 10 0:02:00.533920 0:00:01.224509
10 10 0:02:00.036849 0:00:01.653377

Summary by Sourcery

Introduce a num_workers parameter to the model loading API and backend to enable parallel downloads, and update the audbackend dependency to a development branch that supports multi-worker downloads.

New Features:

  • Add num_workers argument to audmodel.load() to configure parallel download jobs
  • Propagate num_workers through get_archive and backend download functions for concurrency support

Build:

  • Update audbackend dependency to the workers-download development branch for multi-worker download support

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Oct 24, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Introduces a num_workers parameter to the high-level audmodel.load API and propagates it through the backend download logic, ensuring parallel model downloads by leveraging a development version of the audbackend.

Sequence diagram for model loading with num_workers parameter

sequenceDiagram
    participant User
    participant API as audmodel.load()
    participant Backend as get_archive()
    User->>API: load(uid, cache_root, num_workers, verbose)
    API->>Backend: get_archive(short_id, version, cache_root, num_workers, verbose)
    Backend-->>API: returns archive path
    API-->>User: returns archive path
Loading

Class diagram for updated load and get_archive functions

classDiagram
    class api {
        +load(uid: str, cache_root: str | None = None, num_workers: int = 1, verbose: bool = False) str
    }
    class backend {
        +get_archive(short_id: str, version: str, cache_root: str, num_workers: int, verbose: bool) str
    }
    api --> backend: calls get_archive
Loading

File-Level Changes

Change Details Files
Expose num_workers in load API
  • Add num_workers argument with default
  • Extend docstring to describe the new parameter
  • Pass num_workers to backend call
audmodel/core/api.py
Support parallel downloads in backend
  • Add num_workers parameter to get_archive signature
  • Propagate num_workers into internal download invocation
audmodel/core/backend.py
Pin audbackend to workers-enabled dev branch
  • Update audbackend dependency to GitHub URL with workers-download branch
pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `audmodel/core/api.py:277` </location>
<code_context>
     uid: str,
     *,
     cache_root: str | None = None,
+    num_workers: int = 1,
     verbose: bool = False,
 ) -> str:
</code_context>

<issue_to_address>
**suggestion:** Consider validating num_workers for positive integer values.

A check for num_workers > 0 will help prevent errors from invalid input.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread audmodel/core/api.py
uid: str,
*,
cache_root: str | None = None,
num_workers: int = 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider validating num_workers for positive integer values.

A check for num_workers > 0 will help prevent errors from invalid input.

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (ded8bec) to head (b45a5dc).

Additional details and impacted files
Files with missing lines Coverage Δ
audmodel/core/api.py 100.0% <100.0%> (ø)
audmodel/core/backend.py 100.0% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hagenw
Copy link
Copy Markdown
Member Author

hagenw commented Jan 7, 2026

Using multiple workers during extraction of the archive is only efficient if the model would be stored in several smaller chunks.

Maybe we should extend audmodel.publish() to split up large files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant