For the best experience on desktop, install the
Chrome extension
to track your reading on news.ycombinator.com
×
Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
history
|
from
register
Delayed Tensor Parallelism for Faster Transformer Inference
(
kog.ai
)
2 points
by
matt_d
12 days ago
|
past
|
discuss
3000 tokens/sec LLM playground
(
kog.ai
)
6 points
by
rashkov
12 days ago
|
past
|
3 comments
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
(
kog.ai
)
219 points
by
NicoConstant
12 days ago
|
past
|
97 comments
Real-time LLM Inference on Standard GPUs (3k tokens/s per request)
(
kog.ai
)
7 points
by
morgangiraud
13 days ago
|
past
|
discuss
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
×
HN For You
Display Mode
Highlight
Top
Only
Debug mode
Sign Out
API Key:
Connect
Create an account
to get your API key.