For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | fromregister
Delayed Tensor Parallelism for Faster Transformer Inference (kog.ai)
2 points by matt_d 12 days ago | past | discuss
3000 tokens/sec LLM playground (kog.ai)
6 points by rashkov 12 days ago | past | 3 comments
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request (kog.ai)
219 points by NicoConstant 12 days ago | past | 97 comments
Real-time LLM Inference on Standard GPUs (3k tokens/s per request) (kog.ai)
7 points by morgangiraud 13 days ago | past | discuss

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You