Skip to content

Very high RAM usage #127

@niko256

Description

@niko256

Hey! I've been using tdf recently and noticed some issues with memory usage that are actually quite critical.

I started investigating this because my system completely froze and became unresponsive a few times, forcing a hard reboot. This happened specifically after using 'zoom in' mode (i assume this triggered a heavy re-render that filled up my RAM). I couldn't find any existing issues about this, so i initially thought it was problem with my setup. I checked and got similar results on another machine as well. (Arch + Ghostty in both cases).

However, i ran some tests with the same ~1000 page textbook and found that the memory usage is extremely high.

Here are the measurements (tracking RSS every sec ) :

watch -n 1 'ps -C tdf -o rss= | awk "{s+=\$1} END{print s/1024 \" MB\"}"'

1) Default run (immediately after opening) :

    Every 1.0s: ps -C tdf -o r… archlinux: 05:55:28 PM
    in 0.007s (0)

8897.39 MB

Tbh, 8.9GB looks very unsafe and unpredictable for the user.

2) Run with -p 2 flag : memory starts low, but after scrolling through about 200pages:

Every 1.0s: ps -C tdf -o r… archlinux: 05:56:54 PM
in 0.007s (0)
1153.61 MB

1.15 GB still feels like way too much for just viewing a portion of a document.

I took a look at the sources what is going on.

I understand that the issue in the first case is that tdf renders the entire document at once. But my first question is: why is --prerender behavior set to All by default? I've been thinking about it, but i couldn't come up with a scenario where this is necessary as a default. Maybe i didn't get something. Why not set the default rendering window to 20-30 pages?

More generally, I noticed that tdf keeps rendered pages in a Vec for the whole runtime. It seems like when people read, they operate within a certain "context window", so there is no need to keep all rendered pages in memory for the entire runtime of the app.

Maybe we could solve this linear memory growth if we introduced a page eviction policy at the display level, using like a sliding window or LRU cache approach. This way, we could drop old pages from held memory and re-render them on demand.

Please let me know if this sounds reasonable or if I'm missing some context. If you are open to this, I can work on a PR.

Anyway, thanks a lot for tdf! I've been looking for something like this for a long time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions