Skip to content

update: support new Llama API + assess OpenGVLab/OmniQuant#113#114

Open
Tfloow wants to merge 3 commits intoOpenGVLab:mainfrom
Tfloow:llama-and-act-update
Open

update: support new Llama API + assess OpenGVLab/OmniQuant#113#114
Tfloow wants to merge 3 commits intoOpenGVLab:mainfrom
Tfloow:llama-and-act-update

Conversation

@Tfloow
Copy link
Copy Markdown

@Tfloow Tfloow commented Apr 11, 2026

See #113 for issue

Also support new Llama API. Work in Progress, Need some code review to make sure I don't break any other things in Omniquant.

@Tfloow
Copy link
Copy Markdown
Author

Tfloow commented Apr 11, 2026

Also, I updated the weight_only=False for torch to be able to load properly the testloader/dataloader

@Tfloow
Copy link
Copy Markdown
Author

Tfloow commented Apr 11, 2026

Benchmark results:

python main.py --model meta-llama/Llama-3.2-1B --epochs 0 --output_dir ./log --eval_ppl --wbits 4 --abits 16 --group_size 128 --lwc 

On Llama-3.2-1B with Wikitext2 epochs 0 lwc

  • Previous version: PPL = 11.66
  • My proposition: PPL = 11.56

Roughly 1% PPL reduction by removing unnecessary quantization of activation when --abits 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant