Submissions from neuralmagic.com

		Enhancing DeepSeek Models with MLA and FP8 Optimizations in VLLM (neuralmagic.com)
		2 points by hochmartinez on Feb 24, 2025 \| past
		Multimodal Model Quantization Support Through LLM Compressor by Neural Magic (neuralmagic.com)
		1 point by BUFU on Feb 17, 2025 \| past
		What happens if we remove 50 percent of Llama? (neuralmagic.com)
		231 points by BUFU on Nov 26, 2024 \| past \| 132 comments
		We Ran Over Half a Million Evaluations on Quantized LLMs (neuralmagic.com)
		12 points by eldar_ciki on Oct 18, 2024 \| past \| 2 comments
		Pushing the Boundaries of Mixed-Precision LLM Inference with Marlin (neuralmagic.com)
		2 points by mwitiderrick on June 11, 2024 \| past
		Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse (neuralmagic.com)
		238 points by mwitiderrick on Nov 23, 2023 \| past \| 26 comments
		Build Scalable NLP and Computer Vision Pipelines with DeepSparse (neuralmagic.com)
		1 point by mwitiderrick on June 8, 2023 \| past
		Achieving 1,000X CPU Performance Boost with Sparse Models in MLPerf (neuralmagic.com)
		1 point by NM_Ricky on April 5, 2023 \| past \| 1 comment
		SparseGPT: Remove 100B Parameters for Free (neuralmagic.com)
		3 points by homarp on March 24, 2023 \| past \| 1 comment
		SparseGPT: Remove 100B Parameters for Free (neuralmagic.com)
		2 points by todsacerdoti on March 24, 2023 \| past
		Sparsify Image Classification Models Faster with SparseML and Deep Lake (neuralmagic.com)
		1 point by mwitiderrick on March 16, 2023 \| past
		YOLOv8 Detection 10x Faster with DeepSparse (neuralmagic.com)
		1 point by mwitiderrick on Jan 19, 2023 \| past
		Image Segmentation: Your Ultimate Guide to Easy Deployment and Fast Inferencing (neuralmagic.com)
		2 points by mwitiderrick on Jan 5, 2023 \| past \| 2 comments
		Search Documents Quickly with Extractive Question Answering (neuralmagic.com)
		1 point by mwitiderrick on Dec 15, 2022 \| past \| 1 comment
		Accelerate Customer Review Classification with Sparse Transformers (neuralmagic.com)
		1 point by mwitiderrick on Nov 22, 2022 \| past \| 1 comment
		Neural Network inference on commodity CPUs using sparsity (neuralmagic.com)
		2 points by atylerrice on Sept 21, 2022 \| past \| 3 comments
		Using compound sparsification for faster BERT on CPUs with better accuracy (neuralmagic.com)
		4 points by szpcela on Sept 24, 2021 \| past
		YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance (neuralmagic.com)
		121 points by T-A on Sept 10, 2021 \| past \| 53 comments
		Show HN: YOLOv3 – Pruning and Quantizing to Improve Object Detection Performance (neuralmagic.com)
		4 points by markurtz on June 23, 2021 \| past
		A Software Architecture for the Future of ML (neuralmagic.com)
		2 points by beefman on May 29, 2021 \| past

HN For You