More

iiJDSii · on Feb 6, 2025

What does your perception look like, are you using raw screenshots? GUI snapshots? Vision is very difficult for these, and snapshots are incomplete, is what I've found in some earlier experiments.

mountainriver · on Feb 6, 2025

Perception is just 1-2 screenshots. A number of recent VLM models have a lot more pretraining data on GUI interactions, which helps.

iiJDSii · on Feb 6, 2025

Such as? Are they able to recognize arbitrary GUI elements from various desktop programs, web browsers, etc?

mountainriver · on Feb 6, 2025

Qwen2.5-vl seems to be the best right now by our tests.

UI-TARS by bytedance also has a good amount of pretraining.

Molmo is also very good at coordinates.

iiJDSii · on Feb 3, 2025

What are some truly valuable companies that would be accurately described as just "UIs around Postgres"?

I think it's a non-sequitur - either companies are much more than just UIs around a DB, or those companies just aren't very valuable (in which case, who cares).

maeil · on Feb 3, 2025

The exact same holds for the pure AI wrappers - they'll end up similarly valued to UIs around Postgres.

iiJDSii · on Feb 3, 2025

I've heard this idea echoed quite a few times.

But can anybody name a wrapper company as/more valuable than foundation model companies like OpenAI, DeepSeek, Anthropic, Mistral, etc?

Only one that comes to mind is Perplexity, but they're a bit more than a wrapper startup - I think there's some hardcore engineering to get their web search product working so well.

maeil · on Feb 3, 2025

Of course there aren't any that are more valuable.

But take Notion, Asana, Trello. All of them started out.. like a UI around a database. And they're plenty valuable now. More valuable than Oracle, the highest valued DB company? Maybe not, but still very much successful companies. They didn't start out with 500 integrations.

I think Character.ai is already very profitable, and that's a pure wrapper.

figure8 · on Feb 3, 2025

> But can anybody name a wrapper company as/more valuable than foundation model companies

Ask the same question about databases. Then consider these wrapper companies: Salesforce, LinkedIn, AirBnB, Workday, ...

The only database company worth more then Salesforce is Oracle. Other database companies are worth much less.

iiJDSii · on Feb 2, 2025

Very cool! Any video demos for sample tasks? I didn't come across any on the website (browsing on mobile).

artabra · on Feb 2, 2025

Those are still in the cooker, we'll throw them up asap once they're ready.

Some demos we will have are:

- Logging into twitter and tweeting

- Finding information from google maps of any nearby business whether that's for leads or finding local restaurant options.

- Scraping anything from wikipedia like current events etc.

- And more!

iiJDSii · on Feb 2, 2025

Those are good ones. I've fiddled with similar systems before, do you have a rough success rate? I know they can be finicky, especially as you execute through a chain-of-thought action plan, or however you're doing it.

jawerty · on Feb 2, 2025

Anything improving reasoning chains of though improves planning. Right now the long term ones Art mentioned like logging in have been around 80% while simpler ones have been higher. Right now our main issue is figuring out how to keep the server up :/ we're getting a little more traffic than expected. However, to bump those success rates up (which we need to) we really really need to fine tune additional models which we're planning out right now.

I have a few ideas around that mostly going down the RL route (with a twist) mixed with some knowledge graph work. We'll give an update when we push that!

iiJDSii · on Feb 2, 2025

> keep the server up

Oh maybe I didn't understand from the site - it's not a standalone desktop app? What processing do you do on your server side?

jawerty · on Feb 2, 2025

We have an API server where we execute all the agent reasoning/planning jobs then we stream the browser commands to the client. We mention this in the how it works section on the website. This is the main reason why we have the 5 bot a day limit is because of this. It's cheap for us to run as of now but if anyone would like us to ship a version where you'd use your own api keys (plug n play) locally let us know!

iiJDSii · on Jan 19, 2025

Is this like window/program repositioning for multiple monitors or something? Just making sure I understand.

Would you describe how every window should be positioned? And if so would you want to just do so once, and then have the configuration (or configuration pattern, as you might have a similar but different set of windows) set up when you ask for e.g. "apply my window snapping" ?

andersco · on Jan 20, 2025

This could be for one or multiple monitors. I’d want to ideally only have to describe my snapping rules once and then just have windows snap automatically, but also be able to tweak my instructions as I maybe discover edge cases.

iiJDSii · on Jan 5, 2025

Great question. It's a combination of economics - lower-wage humans do many tasks much better - and technology - precise manipulation of tools and navigation of novel environments is incredibly difficult. Any small seemingly trivial task actually has an insane amount of complexity to it.

Let's say I wanted a robot to take out my trash. It sounds simple but there are so many incredibly difficult tasks when you break it down, each with a near-infinite number of variations in different homes:

- First, learn where in the house it is, and how to get to it. * Is it in a drawer? What kind of drawer, how to open it? * Is it a plastic garbage bag in a bin, with a foot lever? In a drawer? - How does the robot lift out a plastic bag, replace the plastic bag? We don't have the dexterity to do this yet * What happens when the plastic bag gets caught slightly on a corner, or begins to rip? - Let's say we pick up the plastic bag, now we need to move to a door that will take us out of the house * Are there stairs, pets, children, other obstacles that could get in the way? Just this bullet point here could be harder than self-driving cars, which is far from solved * How do we interact with the door to open it? Is it a round knob, is there a deadbolt to unlock, does the door swing outward or inwards?

...etc

This probably barely scratches the surface on all of the variability inside of a home, and yet a small kid can do all this without even thinking, while even a small subset of these problems probably requires billions of dollars to solve in a controlled/closed environment.

Maybe humanoid robot teleoperation + artificial intelligence will get us there, that's the pipe dream of a lot of these humanoid robot companies. But then they need to make money and out-compete some young/lower-skilled workers happy to do things for $10/hour. At which point one wonders how these companies will make money to justify the insane R&D needed for even the simplest of tasks. But hey the same story has played out in other industries where robots have outplaced low-skilled labor. The difference though is that these environments have been heavily controlled, i.e. the same few steps to assemble a widget, the same motions to clean the same type of object, etc.

iiJDSii · on Jan 3, 2025

Interesting examples. Personally, I don't really consider Ebay, AirBnB, Reddit, or Substack to be "technology companies". They are businesses that happen to be online, imo. I'm sure that e.g. AirBnB has lots of machine learning technology now, but I reckon it's doing niche stuff like optimizing conversion rates and profits by customer tracking, recommending, etc. So basically, after-the-fact value optimization. Nothing really invented or new created, aside from their novel ideas and unique executions.

Other actual tech examples seem to fall into 1 of 2 camps: obvious but hard to do (better search engine, better rockets, electric cars, etc), or cool but non-obvious customer end uses (maybe LLMs, VR/AR, curved or flexible high def screens, etc). The latter category has more risk but probably lower hanging fruit to get started, because the market needs are less obvious.

In your example, do you think the customer focus lead to pre-mature optimization and kind of tunnel-visioned the team away from further LLM development? That's another type of trade-off that's probably impossible to predict at the time. I mean who doesn't want customers.

I'm not entirely surprised that OpenAI was able to achieve so much given their structure - they had the mandate of a trendy new research lab, top talent, with 100M+ funding and no need to cater to any early customers. Seems like a great (though typically impractical) way to build big new things. Then they had the right top-level guidance when the tech was getting ready, to pivot and raise more money (unlike XEROX PARC for example).

PaulHoule · on Jan 3, 2025

I'm going to argue VR has camp 1 problems. My persona for this is the owner of a few Thai restaurants who is brilliant at social media and SEO marketing. I could sell him a VR project easily if he believed in the ROI. Part of that is the user base but part of it is authoring and VR authoring is expensive.

If VR is going to be like the web we need some way he can get his business in the metaverse for $5000 not $500,000. Horizon Worlds falls down flat not because Meta is stupid but because the problem is difficult -- I'd like to make WebXR content based on my photography (and stereography) but once you have big textures you start to feel the 8GB limit of the device. The art gallery I want to make would require low resolution images or would require some of the programming techniques used in open world games.

In my mind VR seems to be the future of gaming, when I see many action games like Monster Hunter World or Rise of the Tomb Raider I think I'd like to experience them in VR but practically I still keep playing a lot of flat games like Dome Keeper and Dynasty Warriors 9 because there are a lot of them and they don't take the dedication that it takes to play through a game like Asgard's Wrath 2.

====

At the time I believed that better training data (business process, UX, and lots of things go into that) rather than better models was the key to products (I saw projects, including mine, that went nowhere because people did not muster the will to collect this data) so I felt we were getting a lot out of being engaged with customers.

I advocated a lot for drawing a clean line so you could reuse the same training data from different models in which case we could have had a team working on advanced models while the customer facing team gathered the data we needed to eval and refined those models over time. It would have been good if we could have gotten more VC money to hire up.

iiJDSii · on Jan 3, 2025

Indeed that's a good distinction that I don't think is made. Some of the other advice I've gotten from YC/PG essays is of the form 'build what seems cool' (if you're a technical and curious person). This seems to be the 'high reward, high risk' route. I suppose VR would fall into this bucket, it's definitely cool to a lot of people, but from its inception, has taken decades to really get anywhere in a commercial sense - and it still seems quite niche.

Other examples: even though PG mentions them as solving problems, would Microsoft (starting with OS software for a niche machine) and Facebook (college kids online directory) really have been starting as 'problem-first' products? It seems like the founders were more building what seemed interesting to them, and got the high reward by stumbling on the right things (and building them better than competitors).

Still even with the distinction ('person not in search of a problem'), it seems there could be a hammer/nail phenomenon going on if someone, say, builds cool tech of a certain type, and then tries to commercialize from there, in that order. What seems to make sense is to 'build what seems cool', keep an up-to-date mental model on new and emerging technologies, and stay open-minded to problems arising that can be solved. But even then, without a 'problem-first' mindset, could one get too pigeon-holed I wonder.

Not sure where I'm going with this, just thinking out loud. The topic fascinates me.

iiJDSii · on July 25, 2024

That would be my ideal device haha... have any particular devices of that nature you'd recommend? Lower end on cost would be preferred.

seba_dos1 · on July 25, 2024

I've been using a Librem 5 as my daily driver for years now. I also have a PinePhone, but it's quite slow compared to L5.

iiJDSii · on July 25, 2024

Appreciate the ego boost, but it was literally some modifications to python opencv demos available on their webpage. But that's what's amazing, it was up and running so fast and worked cross-platform after 5 minutes of debugging something with ChatGPT.

Yes it's become quite clear to me that using an Android phone is not a good development platform to work with external hardware devices. I understand this has never been it's intended purpose, but come on, we have a linux kernel sitting under the hood. Just feels like a waste of potential.

giantg2 · on July 25, 2024

Why use external hardware when they have cameras built in? I'm sure you could find an example or open source app that uses the camera and just modify that, like you did with the openCV.

HN For You