Increase lookup speed and decrease binary size by PeterReid · Pull Request #13 · unicode-rs/unicode-xid

PeterReid · 2019-09-05T23:52:15Z

This patch changes the is_xid_start and is_xid_continue checks to use a list of bits instead of a binary search for some lower part of the unicode range. The code generating script chooses the cutoff for switching strategies to minimize binary size. For my benchmark on my machine, it reduces the runtime from 2.2 seconds to 1.3 seconds and decreases the binary size (release mode) by 4096 bytes. (I would like to get a more precise look at the effect on size, but cargo-bloat is not helping me for some reason.)

I used the following benchmark for this:

fn main() {
    let t0 = Instant::now();
    let mut starts = 0;
    for _ in 0..10000 {
        for i in 1..0x3000 {
            if let Some(c) = char::from_u32(i) {
                if UnicodeXID::is_xid_start(c) {
                    starts += 1;
                }
            }
        }
    }
    let tf = Instant::now();
    println!("{} {:?}", starts, tf-t0);
}

matklad · 2019-09-06T08:13:59Z

@PeterReid this sounds cool!

Have you seen https://github.com/BurntSushi/ucd-generate? I think it makes sense to replace our python generation script with that. So perhaps it’s best to start with a PR against that crate? Note that ucd-generate has three different strategies for generating the tables, so it makes sense to benchmark against those as well.

matklad · 2019-09-06T08:16:21Z

For performance, it also might make sense to fast-path ASCII at the call site: https://github.com/rust-lang/rust/blob/6b5f9b2e973e438fc1726a2d164d046acd80b170/src/librustc_lexer/src/lib.rs#L153

PeterReid · 2019-09-28T02:26:38Z

I think you are right that the fast-path for ASCII would be another optimization. The rust compiler already has a fast path for that, so it would slow things down in that case unless the compiler was clever enough to see the redundancy across crates: https://github.com/rust-lang/rust/blob/618768492f0c731fcb770dc2d178abe840846419/src/librustc_lexer/src/lib.rs#L146

Redoing this pull request by rewriting ucd-generate is an interesting idea, but that is up to someone else.

See rust-lang/rust's `src/librustc_lexer/src/lib.rs` Idea came from unicode-rs#13

This is based on a one-off benchmark from unicode-rs#13

Increase lookup speed and decrease binary size

143702d

adligit2 approved these changes Nov 28, 2019

View reviewed changes

epage pushed a commit to epage/unicode-xid that referenced this pull request Apr 28, 2021

Add ASCII fast path from rust-lang

122b387

See rust-lang/rust's `src/librustc_lexer/src/lib.rs` Idea came from unicode-rs#13

epage mentioned this pull request Apr 28, 2021

Add ASCII fast path from rust-lang #20

Merged

epage pushed a commit to epage/unicode-xid that referenced this pull request Apr 28, 2021

Add basic benchmark

5db1f9e

This is based on a one-off benchmark from unicode-rs#13

epage mentioned this pull request Apr 28, 2021

Add basic benchmark #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase lookup speed and decrease binary size#13

Increase lookup speed and decrease binary size#13
PeterReid wants to merge 1 commit intounicode-rs:masterfrom
PeterReid:master

PeterReid commented Sep 5, 2019

Uh oh!

matklad commented Sep 6, 2019

Uh oh!

matklad commented Sep 6, 2019

Uh oh!

PeterReid commented Sep 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

PeterReid commented Sep 5, 2019

Uh oh!

matklad commented Sep 6, 2019

Uh oh!

matklad commented Sep 6, 2019

Uh oh!

PeterReid commented Sep 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants