ClassicConnect

LyraNovaHeart

Welcome back! Wanna contribute to the community by quanting for others? Don't worry, it's easy!

Onto the guide:

Part 1: Quant Types

Now, you may be saying: Lyra, didn't you explain what quant types are already in the last post? And if you did read the last post, yes, I did! I know it's repetitive but trust me, it'll make sense.

Types:

GGUF: GPT-Generated Unified Format, this is THE most common type you'll encounter, as it's the easiest for people to run due to its ability to run either on CPU or GPU.
EXL2: Exllama 2, a quant format based on safetensors for super fast inferencing. This is less common, given that it needs a GPU to quant and to run.
FP8: Floating Point 8-bit. This is closer to FP16 format-wise but runs only on the RTX 40 series and above.

There are others, and when I learn how to quant those, I will post them here. Yes, I am learning with everyone as I go

Part 2: Actually quanting these formats

Okay, here's the fun part: actually quanting! You'll need a few tools, of course.

Required dependencies:

Python 3.10 minimum, 3.11 recommended. DO NOT USE PYTHON 3.13 OR ANY PYTHON FROM THE MICROSOFT STORE!!! THIS WILL CAUSE CONFLICTS!!!

We'll start with GGUF first, since it's a bit jank.

You will need:

llama.cpp git repo: https://github.com/ggml-org/llama.cpp.git
llama.cpp releases: https://github.com/ggml-org/llama.cpp/releases/tag/b5489
A model to quant (in HF format): any on Huggingface will do, example: https://huggingface.co/Qwen/Qwen3-0.6B

Steps:

Clone the repo and download a release. Make a folder for each.

Download the model you want to quantize, example:

nick99nack · Posted: Mon May 26, 2025 4:06 am Post subject:

Good guide, just want to add a couple things I had to do when trying this the other day on Windows 11.

1. (May or may not be applicable) When installing everything in requirements.txt, I had to install Visual Studio as pip was looking for build tools. However, I later discovered that I was using the wrong version of Python (3.13, as Lyra specifically mentioned NOT to use), so I'm not sure if that's needed for 3.11.

2. Windows 11 hijacks the "python3" command for its own MS Store version. This caused issues with convert_hf_to_gguf.py. To fix this, I edited the first line of the script from