For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | William_BB's commentsregister

If you randomly sample letters from the alphabet and those letters make up actual words, then actual sentences. Did you think about it? Probably not

> you randomly sample letters from the alphabet and those letters make up actual words, then actual sentences

That sounds like a decently apt description of how I (a human) communicate. The only thing is that I suppose you implied a uniform distribution, while my sampling approach is significantly more complicated and path-dependent.

But yes, to the extent that I have some introspective visibility into my cognitive processes, it does seem like I'm asking myself "which of the possible next letters/words I could choose would be appropriate grammatically, fit with my previous words, and help advance my goals" and then I sample from these with some non-zero temperature, to avoid being too boring/predictable.


It's not sampling randomly though.

"it" is also not "thinking". It is still randomly (though not all words are equal probabilities) sampling from a distribution of words that have been stolen and it been trained on

If "randomly sampling from a trained distribution" can't produce useful, meaningful output, then deterministic computation is even more suspect. After all, it's a strict subset. You're sampling with temperature zero from a handcrafted distribution.

(this post directionality ok, but there's many a devil in the details)


How do we know we're not doing that based on our memories and reaction to external stimuli though?

You should read the article you posted before you write a comment. Hint: check P_F=0 in tables 2, 3 and 4.

"Factored" is doing a lot of lifting here and is borderline deceptive. Plenty of researchers have long ago pointed out that this won't scale, see M Mosca for reference.


I'm aware; I don't think gate model machines have demonstrated much potential of scaling in practice any time soon so this is more of a lark to show how unimpressive the current Shor's attempts have been

To me, Github has always seemed well positioned to be a one-stop solution for software development: code, CI/CD, documentation, ticket tracking, project management etc. Could anyone explain where they failed? I keep hearing that Github is terrible


It always starts out good enough, but the reason they pursue horizontal integration is that it ensures that you won't be able to get out even if (when) you eventually want to. You'll be as glued as a fly to flypaper.

That's the reason you hear the complaints: they're from people who no longer want to be using this product but have no choice.

Because Microsoft doesn't need to innovate or even provide good service to keep the flies glued, they do what they've been doing: focus all their resources on making the glue stickier rather than focusing on making people want to stay even if they had an option to leave.


We use GH and are investing more in the platform features.

Codespaces specifically is quite good for agent heavy teams. Launch a full stack runtime for PRs that are agent owned.

    >  keep hearing that Github is terrible
I do not doubt people are having issues and I'm sure there have been outages and problems, but none that have affected my work for weeks.

GH is many things to many teams and my sense is that some parts of it are currently less stable than others. But the overall package is still quite good and delivers a lot of value, IMO.

There is a bit of an echo chamber effect with GH to some degree.


We use GitHub actions and we have more build failures from actions than we do any other source.


They got acquired by Microsoft.


If this happens to software development, this will happen to most mental jobs.


CRUD


It's always something that already exists but requires 100x the code.


(Geopolitical) prediction markets almost always tend to overreact. This is expected from retail, so I'm not sure about the signal.


So why not just always bet against their reaction? Over time you should make money then, right?


Oh, like the LLM OS?


Sure. Reading a book is a much more difficult and ultimately, productive, task than writing a book.


This is just the creator of Claude Code overselling Claude Code


Ok I will bite.

Every single example you gave is in a hobby project territory. Relatively self-contained, maintainable by 3-4 devs max, within 1k-10k lines of code. I've been successfully using coding agents to create such projects for the past year and it's great, I love it.

However, lots of us here work on codebases that are 100x, 1000x the size of these projects you and Karpathy are talking about. Years of domain specific code. From personal experience, coding agents simply don't work at that scale the same way they do for hobby projects. Over the past year or two, I did not see any significant improvement from any of the newest models.

Building a slightly bigger hobby project is not even close to making these agents work at industrial scale.


I think that in general there is a big difference between javascript/typescript projects big or small and other projects that actually address a specific project domain. These two are not the same. The same claude code agent can create a lot of parts of a function web project, but will struggle providing anything functional but a base frame for you to build on if you were to create a new SoC support in some drone firmware.

The problem is that everyone working on those more serious projects knows that and treats LLMs accordingly, but the people that come from the web space come in with the expectation that they can replicate the success they have in their domain just as easily, when oftentimes you need to have some domain knowledge.

I think the difference simply comes down to the sheer volume of training material, i.e. web projects on github. Most "engineers" are actually just framework consumers and within those frameworks llms work great.


Most of the stuff I'm talking about here came out in November. There hasn't been much time for professional teams to build new things with it yet, especially given the holidays!


For what it's worth, I'm working with it on a huge professional monorepo, and the difference was also stark.


For what it’s worth, I have Claude coding away at Unreal Engine codebase. That’s a pretty large c++ codebase and it’s having no trouble at all. Just a cool several million lines of C++ lovely.


Everything is made of smaller parts. I'd like to think we can sub divide a code base into isolated modules at least.


Depends on what kinds of problems you're solving...

I'd put it in line with monolith vs microservices... You're shifting complexity somewhere, if it's on orchestration or the codebase. In the end, the piper gets paid.

Also, not all problems can be broken down cleanly into smaller parts.


In the real world, not all problems decompose nicely. In fact, I think it may be the case that the problems we actually get paid to solve with code are often of this type.


Problems like?


That’s right, but it also hints at a solution: split big code bases into parts that are roughly the size of a big hobby project. You’ll need to write some docs to be effective at it, which also helps agents. CICD means continuous integration continuous documentation now.


Splitting one big codebase into 100 microservices always seems tempting, except that big codebases already exist in modules and that doesn't stop one module's concerns from polluting the other modules' code. What you've got now is 100 different repositories that all depend on each other, get deployed separately, and can only be tested with some awful docker-compose setup. Frankly, given the impedance of hopping back and forth between repos separated by APIs, I'd expect an LLM to do far worse in a microservice ecosystem than in an equivalent monolith.


I wonder if anyone has tried this thing before, like... micro-projects or such... ;)


It's not the size that's the issue, it's the domain that is. It's tempting to say that adding drivers to Linux is hard because Linux is big, but that's not the issue.


I worked at Slack earlier this year. Slack adopted Cursor as an option in December of 2024 if memory serves correctly. I had just had a project cut due to a lot of unfortunate reasons so I was working on it with one other engineer. It was a rewrite of a massive and old Python code base that ran Slack's internal service catalog. The only reason I was able to finish rewrites of the backend, frontend, and build an SLO sub-system is because of coding agents. Up until December I'd been doing that entire rewrite through sixteen hour days and just pure sweat equity.

Again, that codebase is millions of lines of Python code and frankly the agents weren't as good then as they are now. I carefully used globbing rules in Cursor to navigate coding and testing standards. I had a rule that functioned as how people use agents.md now, which was put on every prompt. That honestly got me a lot more mileage than you'd think. A lot of the outcomes of these tools are how you use them and how good your developer experience is. If professional software engineers have to think about how to navigate and iterate on different parts of your code, then an LLM will find it doubly difficult.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You