Increase lookup speed and decrease binary size#13
Increase lookup speed and decrease binary size#13PeterReid wants to merge 1 commit intounicode-rs:masterfrom
Conversation
|
@PeterReid this sounds cool! Have you seen https://github.com/BurntSushi/ucd-generate? I think it makes sense to replace our python generation script with that. So perhaps it’s best to start with a PR against that crate? Note that ucd-generate has three different strategies for generating the tables, so it makes sense to benchmark against those as well. |
|
For performance, it also might make sense to fast-path ASCII at the call site: https://github.com/rust-lang/rust/blob/6b5f9b2e973e438fc1726a2d164d046acd80b170/src/librustc_lexer/src/lib.rs#L153 |
|
I think you are right that the fast-path for ASCII would be another optimization. The rust compiler already has a fast path for that, so it would slow things down in that case unless the compiler was clever enough to see the redundancy across crates: https://github.com/rust-lang/rust/blob/618768492f0c731fcb770dc2d178abe840846419/src/librustc_lexer/src/lib.rs#L146 Redoing this pull request by rewriting ucd-generate is an interesting idea, but that is up to someone else. |
See rust-lang/rust's `src/librustc_lexer/src/lib.rs` Idea came from unicode-rs#13
This is based on a one-off benchmark from unicode-rs#13
This patch changes the is_xid_start and is_xid_continue checks to use a list of bits instead of a binary search for some lower part of the unicode range. The code generating script chooses the cutoff for switching strategies to minimize binary size. For my benchmark on my machine, it reduces the runtime from 2.2 seconds to 1.3 seconds and decreases the binary size (release mode) by 4096 bytes. (I would like to get a more precise look at the effect on size, but cargo-bloat is not helping me for some reason.)
I used the following benchmark for this: