More

justindz · 2026-04-29T15:14:18 1777475658

What a great lunch read! I've been weekend-warrioring a terminal-based CRPG for a bit myself. I was recently exploring ways to use agents to help with balance testing, which is a real scale problem for solo indie dev. So far, all I've created is a fight simulator: essentially, have the current player state (stats, effects, gear, companions, etc.) do this fight, simulated, X number of times using one of the currently-implemented GOAP personalities and report how often it wins, loses, average end turn, stuff like that.

I hadn't really thought about trying to create a harness for agents to play the full game interactively. I'd love to explore this. If you don't mind, here are a few questions:

1) Correct to assume that I probably need a text-only harness even though my game is text-based already because I do make use of menu selections made via arrow-key-and-enter interactions?

2) Do you have prompt recommendations for the type of feedback you have found to be useful? I would guess in your case, the objectives of the game are more clear than an open-world RPG. What dead ends have you run into? Maybe a variety of approaches would be good? One agent tries to fight everything. Another focuses on gaining and completing as many quests as possible?

3) How bad is the token burn doing this? Any optimization strategies you've employed?

lubujackson · 2026-04-29T17:03:26 1777482206

I did something similar, but instead of having the LLM play the game I had it build an entire bot system to play the game. Bots require much more determinism, but I'd rather burn tokens encoding problem solving approaches and bot decision profiles than using LLMs for every turn of the game. This can be developed rapidly if you create an agent in a loop and say "figure out how to have the bot reach room 3 in under 10 actions" or something like that. It is easy for this to get bloated, but I found it makes a nice feedback loop that allows me to quickly test things like pacing changes and think of the game as a series of user actions that can be sculpted purposefully.

justindz · 2026-04-29T17:19:32 1777483172

Thanks, this is another great idea and I'll consider it as an addition or alternative. Do you think this works in an open-world, non-linear type game?

jschomay · 2026-04-30T00:21:13 1777508473

OP here: Thank you and I appreciate the thoughtful questions. To answer: 1) I used a text representation because it made sense for my game and let me "render" certain details in a more AI-friendly way, like the compact map. You could use something like agent-browser and it would probably work just fine, but I figured it added an extra layer of indirection that I didn't need, plus it would be a lot of screenshots! Being able to have a turn based loop really helped make this work.

2) I had a skill on just how to use the playtest server. I also gave it context on what the game is and how to play it. From there, it probably depends on your use case. I wasn't that impressed with its natural ability to playtest for bug discovery, so I would consider making a skill describing what a playtester would normally do. Focused playtester instances is a good idea. Ultimately what I found to be most helpful was to point it at a feature or bug that I was aware of and have it validate it. Not only was it fairly successful, that was the part that saved the most time for me.

3) I think I only burned about 300K tokens on my longest play-test session, and that includes a bunch of code tweaks too. Running it after every feature as a validation step is pretty cheap. Running it overnight in "open" playtesting could add up.

Good luck, please let me know how it goes if you get somewhere helpful!

justindz · 2026-04-03T13:24:36 1775222676

I think it would be fantastic to have a reference site for significant, complex projects either developed or substantially extended primarily via agent(s). Every time I look at someone's incredible example of a workflow for handling big context projects, it ends up being a greenfield static microblog example with vague, arm-wavey assertions that it will definitely scale.

justindz · on Aug 12, 2015

I ride the metro for about 1:30 a day (part of which is walking to and fro), which I prefer to my prior driving commute of 25-30 minutes. I have had to increase my fantasy and sci-fi novel budget, however.

justindz · on July 23, 2015

One can perhaps take comfort in the fact that a civilization advanced enough to be aware of us would likely be aware of the fact that we were more advanced than what they could see at any given moment (or they had found a way to narrow the perception lag).

TeMPOraL · on July 23, 2015

They would likely have extrapolated the savagery they saw and realized that we may pose some problems for the galactic community. RKKVs are probably on their way towards Earth as we speak.

justindz · on May 4, 2015

I was in the Azure TAP program many years ago and they talked about this extensively to the participants. I would echo the consistency of direction on this. I had a vague sense that it was an attempt to quell enterprise fears about the cloud and lock-in, but perhaps not.

justindz · on April 13, 2015

I'm not sure I quite expected to see this on HN, though I'm glad it was posted since I happen to be trying to learn novel writing. I thought the before and after example of the basketball article was a concise and clear way to explain over-writing. The second version was clearer, more accessible, less obnoxious and no less informative and narrative. A series of poetry and short fiction teachers in college helped me become self-aware about this bad habit. I still make the mistake consistently, but I have learned to either catch it or agree with the suggestions of my peers who catch it when I don't.

I personally believe that programmers can learn from poets, for example. Write constantly. Read the work of others, both critically and for enjoyment. Writing is the sexy part, but revising is at least as important a task.

scott_s · on April 13, 2015

I find your last sentence amusing because I actually prefer editing to writing. It's when I'm editing that I feel like I'm actually applying a craft. I get the same kind of satisfaction from editing my words that I do from refactoring my code. In both cases, it is often only after doing the initial work that I realize how it should be structured, and reworking it into that elegant structure is satisfying.

rudolf0 · on April 13, 2015

Can't agree with you more, both for refactoring code and editing prose.

I find it also sometimes helps to perform the revisions while in a somewhat altered state of mind (be it mood, setting, music, minor inebriation, or otherwise). It helps you look at the first draft with fresh eyes and see better ways of structuring things and removing excess. Make a version with your prospective changes, then compare the before and after copies a day or two later (this time sober if you weren't before...).

magic_beans · on April 13, 2015

I never thought to compare writing words and writing code, but it's true. The writing phase is simply getting all the ideas out, and the editing phase is to make... beauty out of chaos!

justindz · on April 13, 2015

I am the same way. I found it surprising. I get so much positive feedback from the editing process that I have actually developed a routine of editing as I draft. Although it improves the quality of my draft, it takes ages. For poetry, it worked well. For novels, I think I'll need to break the habit.

ricree · on April 13, 2015

Though for what it's worth, I've seen a lot of professional authors that warn about that as another kind of trap that beginners fall into. That is, paying too much attention to the revision process, at the expense of producing actual finished works.

GoodIntentions · on April 13, 2015

>I'm not sure I quite expected to see this on HN

I think it belongs - What he is suggesting carries over to technical endeavours as well.

1. Learning to cut mercilessly improved clarity of my emails and documentation.

2. Learning to put my head down and carry on till a project is first draft complete, warts and all improved my ability to actually _complete_ things.

3. Peer review as a bullshit detector is good.

I'll take Poe or Lovecraft over King any day for entertainment, but I might just crack 'on writing' after reading this article. The man sounds pretty dialed in.

justindz · on March 20, 2015

I was at a product management event once and met a guy who managed a product in this space. A group of us went out for drinks after the event and he ended up explaining what he did. At some point he mentioned "Chinese hackers." Another guy in the group called him on it, wondering why he just assumed it was Chinese. He laughed and said that the near constant level of activity they see goes basically flat on Chinese New Year.

I suppose if you're not a Chinese hacker, it might pay to pretend you are by tailoring your working hours and days.

dreamins · on March 20, 2015

Yep it is THAT obvious...

justindz · on May 29, 2014

I read it the way you apparently intended to write it. In college, getting to where I could write and get published poems that weren't a personal embarrassment took more or less the same amount of study and practice as getting my CS degree. I think mastering either craft would be a comparable endeavor and accomplishment (though one would pay much better).

The nice thing about compilers is they tell you when your code has a problem. There's no equivalent barrier to posting awful poetry-drivel on Facebook ;-)

justindz · on May 19, 2014

My experience has also been that managers at the "average company" these days are pretty much expected to both manage and be a fully productive individual contributor. Realistically, this means that they spend about 80% of their time working to hit their own goals, which directly point at them, and 20% of their time (at best) really working to enable their team to succeed, which can always be deflected from direct responsibility to some degree. Since most managers' managers are also in the same boat, they can't tell when someone is shooting their own team members for self-preservation because they are too busy to have any sense for the morale and culture they've created.

In my personal opinion, you get a healthier culture if you either 1) have managers and let them manage or 2) admit that you require them to be individual contributors and restrict their "managing" purely to part time HR initiatives and not to actual additive management (something more like extra-curricular mentoring and not talent management, career and skills development).

I should caveat that I've had good managers, bad managers and completely mediocre managers. So I do believe that, although rare, it can be done well and it can provide value to individual contributors' careers and to the company's value. I just don't assume it's automatically the right approach at every company.

justindz · on May 1, 2014

Lawrence and crew just responded to my tweet on this. They use Stripe, which is encrypted. The SSL certs for the page that is unencrypted will be up later today.

https://twitter.com/Boyko4TX/status/461902353105317891

EDIT: forgot to link the tweet.

DEinspanjer · on May 1, 2014

I'm very unhappy with their replies on Twitter. They can't just say that the information is going to Stripe and Stripe is safe. The facts are, they have a form which asks people to put their credit card number in it. That form is on an unprotected page, which means it is vulnerable to some advanced attacks even before posting. Further, the form posts back to the same unprotected page. I don't see any evidence of fancy Javascript behaviors to prevent the posting, but even if it were so, they are still putting their users in significant danger of having that information plucked out of the air by anyone who might be able to sniff the traffic on any leg of the trip from the user's Wifi all the way to the company's firewall.

DEinspanjer · on May 1, 2014

Okay, my facts weren't entirely correct.

The HTML of the form shows as POSTing to the same page, but the Stripe JS captures the submit event and cancels it, then makes an API call to Stripe's server via a secure connection. It works, but it is still somewhat vulnerable to MitM attacks.

I like @lessig's latest response. Much more firm and reassuring:

https://twitter.com/lessig/status/461914159417147392

nollidge · on May 1, 2014

I just hit "donate" and it took me to:

https://mayone.us/fec_compliance/

Sincere thanks to everybody who complained to them about this - I wouldn't have donated without HTTPS.

dllthomas · on May 1, 2014

Did you miss the "SSL certs should go through later today"? I agree "it's going through Stripe" isn't enough, of course.

justindz · on May 1, 2014

Agreed. In its current state, I will not participate.

DEinspanjer · on May 1, 2014

I'm pushing and anxious because this is exactly the sort of problem that could negatively impact the entire campaign, and I think there is worth to the goal. I hope they take a strong stance and fix it quickly rather than trying to placate and coast to a fix.

justindz · on May 1, 2014

Looks like they got the cert deployed, although the process isn't directed to https yet. I hand-added https to the URL for the payment collection page and had no issues.

HN For You