: Compresses 16-bit weights to 4 bits, effectively reducing VRAM usage by ~75% (e.g., a 65B parameter model can be loaded with ~35GB instead of ~130GB).
π‘ : If you are looking for the software/machine learning paper, search for "QLoRA" or "4-bit NormalFloat" on arXiv .
: An information-theoretically optimal data type for normally distributed weights. It uses 16 quantization levels based on the quantiles of a standard normal distribution. NF4.rar
: Neural network weights typically follow a normal distribution. NF4 concentrates its 16 "bins" where most weights exist (near zero), minimizing rounding errors.
: A feature to handle memory spikes during training by offloading to CPU RAM. π¬ Key Technical Details : Compresses 16-bit weights to 4 bits, effectively
The term "NF4" is central to this "long paper" which revolutionized how large language models (LLMs) are fine-tuned on consumer hardware.
: RNF4 mediates the degradation of the PML-RARΞ± fusion protein. It uses 16 quantization levels based on the
: Recent research (April 2026) has further optimized this by creating Fast NF4 Dequantization Kernels that achieve 2.0β2.2Γ speedups on NVIDIA GPUs. β οΈ Alternative Interpretation