Quantization plays a crucial role in deploying Large Language Models (LLMs) in resource-constrained environments. However, the presence of outlier features significantly hinders low-bit quantization.
usage: run.py [-h] [--dataset DATASET] [--root ROOT] [--code-length CODE_LENGTH] [--max-iter MAX_ITER] [--topk TOPK] [--gpu GPU] ITQ_PyTorch optional arguments: -h ...