For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | rossirpaulo's commentsregister

This is great! We had a similar thought and couldn't agree more with "LLMs prefer producing something rather than nothing." We have been consistently requesting responses in JSON format, which, despite its numerous advantages, sometimes imposes an obligation for an output even if it shouldn't. This frequently results in hallucinations. Encouraging NULL returns, for example, is a great way to deal with that.


I've found that this is best dealt with along two axes with constrained options. i.e., request both a string and a boolean, and if you get boolean false you can simply ignore the string. So when the LLM ignores you and prints a string like "This article does not contain mention of sharks", you can discard that easily.

If you tell it "Return what this says about sharks or nothing if it does not mention them", it will mess up.


Have you tried this sort of prompt?

User text: "Blah blah ... Sharks ... Surfing ..." Instruction: Return an JSON object containing an array of all sentences in the user text which mention sharks directly or by implication. Response: {"list_of_shark_related_sentences": [

Stop token: ']}'

It'll try to complete the JSON response and it'll try to end it by closing the array and object as shown in the stop token. This severely limits rambling, and if it does add a spurious field it'll (usually) still be valid JSON and you can usually just ignore the unwanted field.

wrt OpenAI, text-davinci-003 handles this well, the other models not so much.


Making it rank multiple attributes on a scale of 1-10 also works decent in my experience. Then one can simply k-means cluster (or similar) and evaluate the grouping to see how accurate its estimations are


Yes, agreed. I'm doing this as well. Works excellently for NLP classifier tasks.

Funnily enough, there is a certain propensity for it to output round numbers (50, 100, etc.) so I have to ask it not to do this and provide examples ("like 27, 63, or 4"). Now that I think about it I should probably randomize those.


Interesting, I've just been doing 1-10 (maybe i should include 0) -- Do you get the same result if you floatify the larger integers, e.g. 0.000 - 10.000?


Have you tried using GPT-4s new Function Call feature? The "killer" portion of this is guaranteed JSON based on a schema you pass to the model.


That's a good point! We're actually working on integrating this as well, but in practice, what we've found is that LLM's in general don't like to respond with empty strings for example.

My hypothesis here is that due to RLFH, there's likely some implicit learning that tangentially related content is better than no content.

Given that, you'd likely still get better results with your schema being:

"string | null" so the LLM can output a null instead of "" since there is probably not as much training data that gives "" high log prob values.

But we're looking forward to evaluating the functions call, and seeing what the metrics show!


I integrated the function calling feature into my personal project and wrote a blog post about it here:

https://letscooktime.com/Blog/ai,/machine/learning,/chatgpt,...

Hopefully this saves you some time!


Thanks for the post! Really liked it being short and precise to the point.

Also looking to integrate the new function feature and now already got some learnings out of the post without even starting to code.


Nope, it's not guaranteed. They warn you in the OpenAI docs that it might hallucinate inexistent parameters.


Constrained generation should not require calling supplemental functions. It's as simply as banning or reducing the weight of the naughty tokens. There are several libraries which enable this without function calling (microsoft guidance, jsonformer, lmql)


The output is not 100% guaranteed. Be careful about that and have another layer to check the output.

I had a schema with a string enum property to categorise some inputs. One of the category names was "media/other" or something to that effect. Sometimes the output would stop at just media even though it wasn't a valid option in the schema.


I've run into the same issue, but you can turn it into an advantage if you are careful enough.

Basically, give the LLM a schema that is loose enough for the LLM to expand where it feels expansion is needed. Saying always "return a number" is super limiting if the LLM has figured out you need a range instead. Saying "always populate this field" is silly because sometimes the field doesn't need to be populated.


That's an interesting point on Authorization. Though, if Google does it, there must be a way for you to serialize the data too. Curious to know more as well.

Anyways, excited to see where this goes! I've been reaching out to people for a consumer-oriented idea, and forums have been a great source of info; making that search more specific and succinct is much needed. This would really help me out!


That's completely right. Fun fact: We're over 70,000 new indie titles per year already!


That's a great question! Initially, MagnaPlay's entire idea was building a smaller portfolio (~50-100 titles) compared to our competitors; we'd be focused on quality over quantity -- essentially becoming the place to play the best indies. We would do that by collecting player engagement data, serializing reviews, and flat-out asking for which games our audience wanted to see on our platform. While we still haven't changed that strategy in the short term, we see ourselves adding more and more titles with time and eventually allowing MagnaPlay's community do all the curation instead of relying on more abstract agents like the team's preferences :)

So, picture this: if you like puzzle games, for instance, you would essentially be put in the "Librarian" role/house (much like how Hogwarts would put you in Gryffindor if you're brave). All so you can see fewer and fewer titles that aren't worth your time!


Interesting idea! Adding MagnaPlay to the Steam Deck might pose an exciting challenge... We've developed our platform on a cross-platform stack, so it might be possible. I'll take a look at the feasibility here.

And hey, indies are super casual, and a 300Mb launcher runs on any system. Give our platform a chance; maybe it is just casual enough for an evening gaming session after you close down all your Chrome tabs and spot our logo on your Desktop...


When I close all my tabs I'm looking at a Linux desktop, so, there's that. :-)

I do wish you well in this, I think an indie game pass is a great idea. My fault that I'm in a 1.4%-sized piece of the market pie...


That's right; I agree with you on the content creation point. We do plan to make originals, much like Netflix creates their own TV shows and films. Yet that's quite a burn for an early-stage startup like ours. The long-term vision of MagnaPlay must encompass funding developers and adding their games to our portfolio. However, in the meantime, all we can do is upstage the indie segment by giving them visibility, staying true to our cause, and supporting developers with community-friendly features. For instance, players can distribute 10% of their subscription to any developer, essentially crowdfunding exciting teams and projects.


Makes total sense. We talked to other people who had a similar headspace: exploration is essential, though purchasing each game is not necessarily the problem. One of the things you gain from subscription service is curation; the idea around there is to minimize those encounters where you absolutely dislike the game, either because there's an audience voting for the titles you see or because there's a recommendation system that filters stuff you might not like.

On the point of trying out more stuff before you buy them: we considered adding a section on the platform for alpha/beta releases. It would allow developers to test and get feedback throughout the early stages of development while allowing players to engage with titles they would otherwise avoid. It would be separate from the revenue share model as these games wouldn't be fully baked and, therefore, at a lower standard than those on the "official" portfolio. We might be able to roll out this feature in the coming weeks! Would that be something you see yourself using?

And absolutely, in the context you gave, you would help developers a lot with our model. Think about it this way: the 200 hours you played on that game got compensated once: when you bought it. On the other hand, if you are subscribed to MagnaPlay and mostly play that game, you'd be essentially giving 80% of your subscription to its developer every month.


I've played some beta games on steam. Fermi Paradox was beta when I tried it, IIRC, and it was fun then. If people are playing beta games, I think the devs should be getting paid, tbh.

>On the other hand, if you are subscribed to MagnaPlay and mostly play that game, you'd be essentially giving 80% of your subscription to its developer every month.

Do I want to pay $75 a year for that game though? And if I play that game for 20 hours in one month, and also play a new game and complete it in 20 hours, how much does that new game get? Many games are very much play-once and done. Like I don't think I'll replay Hob again, but I still think it was one of my favorite gaming experiences of 2021 (when I discovered it on PS4). Likewise Carrion (actually I did replay that one coz people are chewy).

This may be a thing just for the whales, but maybe have a thing like reddit where I could rate a game positively and back that up with cash (like over and above the $8/mo). In a way, the pay-what-you-want bundles offer a similar mechanism, so the idea may be validated already.

I'm also realizing that most of the games I think of as Indie, I played on PS4/5 or Switch. Dead Cells, Hades, Manifold Garden, Celeste, Carrion. With Manifold Garden and Celeste I even bought the soundtracks. (Spend a lot more on consoles than steam, and finish more games there).


With shorter games, the model is still a challenge. Pay-what-you-want is an interesting idea.


That's a good idea. We like to think that subscription services foments this idea of exploration; after all, it makes the traditional process of finding, purchasing, installing, and finally playing shorter and smoother.


Excited to be building this with you, Pedro!


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You