More

sid-the-kid · 2026-01-28T07:10:10 1769584210

never head of InWorld. Pretty impressive.

sid-the-kid · 2026-01-28T07:05:15 1769583915

ooof. You saw the Chinese text. Yup, that's super annoying. We are trying to squash that hallucination.

Thanks for the feedback! That's helpful!

Terretta · 2026-01-28T15:44:51 1769615091

the chinese text happened last night in your main chat agent widget, the cartoon woman professing to be in a town in brazil with a lemon tree on her cupboard. she claimed it was a test of subtitling then admitted it wasn't.

btw, she gives helpful instructions like "/imagine" whatever but the instructions only seem to work about 50% of the time. meaning, try the same command or variants a few times, and it works about half of them. she never did shift out of aussie accent though.

she came up with a remarkably fanciful explanation why as a brazilian she sounded aussie and why imagining native accent like she said would work didn't...

i was shocked when /imagine face left turn to the side did actually work, the agent was in side profile and precisely as natural as the original front facing avatar

all in all, by far the best agent experience i've played with!

andrew-w · 2026-01-28T15:58:56 1769615936

So glad you enjoyed it! We've been able to significantly reduce those text hallucinations with a few tricks, but it seems they haven't been fully squashed. The /imagine command only works with the image at the moment, but we'll think about ways to tie that into the personality and voice. Thanks for the feedback!

sid-the-kid · 2026-01-28T00:05:59 1769558759

Thank you! We are considering to release an open-source version of the model. Somebody will do it soon. Might as well be us. We are mostly concerned with the additional overhead of releasing and then supporting it. So, TBD.

js4ever · 2026-01-28T06:21:39 1769581299

Overhead? None Your real concern is: will potential customers run the model by themselves and skipping us?

Answer is no because you will eventually release a subpar model not your sorta model.

Also people don't have infrastructure to run this at scale (100-500 concurrent users) at best they can run it for 1-2 concurrent users.

This could be a good way for peoples to test it then use your infra.

Ah but you do have an online demo, so you might think this is enough, WRONG.

sid-the-kid · 2026-01-27T22:09:44 1769551784

Good question! Software gets democratized so fast that I am sure others will implement similar approaches soon. And, to be clear, some of our "speed upgrades" are pieced together from recent DiT papers. I do think getting everything running on a single GPU at this resolution and speed is totally new (as far as i have seen).

I think people will just copy it, and we just need to continue moving as fast as we can. I do think that a bit of a revolution is happening right now in real-time video diffusion models. There are so many great papers being published in that area in the last 6 months. My guess is that many DiT models will be real time within 1 year.

jedwhite · 2026-01-28T00:49:35 1769561375

> I do think getting everything running on a single GPU at this resolution and speed is totally new

Thanks, it seemed to be the case that this was really something new, but HN tends to be circumspect so wanted to check. It's an interesting space and I try to stay current but everything is moving so fast. But I was pretty sure I hadn't seen anyone do that. Its a huge achievement to do it first and make it work for real like this! So well done!

sid-the-kid · 2026-01-27T22:11:51 1769551911

One thing that is interesting: LLMs pipelines have been highly optimize for speed (since speed is directly related to cost for companies). That is just not true for real-time DiTs. So, there is still lots of low hanging fruit for how we (and others) can make things faster and better.

storystarling · 2026-01-27T22:30:34 1769553034

Curious about the memory bandwidth constraints here. 20B parameters at 20fps seems like it would saturate the bandwidth of a single GPU unless you are running int4. I assume this requires an H100?

andrew-w · 2026-01-27T22:38:47 1769553527

Yep, the model is running on Hopper architecture. Anything less was not sufficient in our experiments.

sid-the-kid · 2026-01-27T21:26:51 1769549211

it's a fair concern. but, we don't know r0fl. and we are not astroturfing.

even I am surprised with how many opnely positive comments we are getting. it's not been our experience in the past.

sid-the-kid · 2026-01-27T20:02:00 1769544120

Thank you! Yes, right now we are using Qwen for the LLM. They also released a super fast TTS model that we have not tried yet, which is supposed to be very fast.

sid-the-kid · 2026-01-27T19:34:36 1769542476

And, just like that, Max Headroom is back: https://lemonslice.com/try/agent_ccb102bdfc1fcb30

sbarre · 2026-01-28T02:20:52 1769566852

That.. is not Max Headroom.

lcolucci · 2026-01-28T18:12:18 1769623938

Can you help us make him? What's the right voice? https://lemonslice.com/hn

sbarre · 2026-01-29T00:05:58 1769645158

https://www.youtube.com/watch?v=cYdpOjletnc

andrew-w · 2026-01-28T03:32:23 1769571143

I wonder how it would come across with the right voice. We're focused on building out the video layer tech, but at the end of the day, the voice is also pretty important for a positive experience.

sid-the-kid · 2026-01-27T19:28:39 1769542119

1) yes on Max Headroom. we are on it. 2) it already is real time...?

wumms · 2026-01-27T19:51:22 1769543482

Whoops! Mistook the "You're about to speak with an AI."-progress bar for processing delay.

lcolucci · 2026-01-27T22:45:35 1769553935

I wonder if we should make the UI a more common interface (e.g. "the call is ringing") to avoid this confusion?

It's a normal mp4 video that's looping initially (the "welcome message") and then as soon as you send the bot a message, we connect you to a GPU and the call becomes interactive. Connecting to the GPU takes about 10s.

sid-the-kid · 2026-01-27T20:19:53 1769545193

Makes sense. The init should be about 10s. But, after that, it should be real time. TBH, this is probably a common confusion. So thanks for calling it out.

sid-the-kid · 2026-01-27T19:26:02 1769541962

Fix deployed! This is why it's good to launch on hacker news. thanks for the tip.

bennyp101 · 2026-01-27T19:38:03 1769542683

Nice one - thanks :)

sid-the-kid · 2026-01-27T19:23:47 1769541827

glad we found somebody who likes it as much as us! BTW, biggest thing we are working to improve is speed of the response. I think we can make that much faster.

HN For You