That's unfortunate to hear. This has allowed me to connect with talent that has otherwise not been given the opportunity to showcase their skills. At the scale I'm doing it it's pretty harmless but can see how it can be easily exploited.
This doesn't seem to hard to solve except for the ever so recurring llm output validation problem. If the true positive is rare you don't know if the earthquake alert system works until there's an earthquake.
... just force the data into a structured format, then use "hard code" on the structure.
"Generate the following JSON formatted object array representing the interruptions in my daily traffic. If no results, emit []. Send this at 8am every morning. {some schema}. Then run jsonreporter.py"
Then just let jsonreporter.py discriminate however it likes. Keep the LLMs doing what they are good at, and keep hard code doing what it's good at.
Slightly related, I was using an LLM to help me understand whether I should add milk to my coffee before walking to my table or when I get to my table (objective to maximise coffee temperature at the point drinking). Turns out it's best to add the milk immediately when the coffee is made because the rate of cooling is higher at higher temperatures.
But fortunately, you're not one of those idiots who immediately jump from that to the conclusion that "This article must be written by an AI!" — right...?
reply