SqueezeBits
Bringing NPUs into Production: Our Journey with Intel Gaudi
SqueezeBits has partnered with Intel to make Gaudi NPUs more usable in practice. We optimized LLMs and diffusion models for Gaudi-2 and created yetter, a generative AI API service.
How to Quantize Transformer-based model for TensorRT Deployment
This article describes the experimental results of quantized Vision Transformer model and its variants with OwLite.
How to Quantize YOLO models with OwLite
This article describes the experimental results of quantized YOLO models with OwLite.
OwLite: No More Compromising on AI Performance After Quantization
Discover how OwLite simplifies AI model optimization with seamless integration and secure architecture.