For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | francedot's commentsregister

A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.

> An AI Agent for Computer Use is an autonomous program that can reason about tasks, plan sequences of actions, and act within the domain of a computer or mobile device in the form of clicks, keystrokes, other computer events, command-line operations and internal/external API calls. These agents combine perception, decision-making, and control capabilities to interact with digital interfaces and accomplish user-specified goals independently.


AI assistants have changed the way we use computers to work and search for information. As LLMs become more powerful, what’s next? Agents.

Excited to introduce Windows Agent Arena, a benchmark for evaluating AI models that can reason, plan and act to solve tasks on your PC.

Blog: https://microsoft.com/applied-sciences/projects/windows-agen... Webpage: https://microsoft.github.io/WindowsAgentArena/ Paper: https://arxiv.org/abs/2409.08264 Code: https://github.com/microsoft/WindowsAgentArena


NavAIGuide (/næv eɪ aɪ ɡaɪd/) is a TypeScript Extensible components toolkit for integrating LLMs into Navigation Agents and Browser Companions. Key features include:

Natural Language Task Detection: Supports both visual (using GPT-4V) and textual modes to identify tasks from web pages. Automation Code Generation: Automates the creation of code for predicted tasks with options for Playwright (requires Node) or native JavaScript Browser APIs. Visual Grounding: Enhances the accuracy of locating visual elements on web pages for better interaction. Efficient DOM Processing and Token Reduction: Utilizes advanced strategies for DOM element management, significantly reducing the number of tokens required for accurate grounding and action detection. Reliability: Includes a retry mechanism with exponential backoff to handle transient failures in LLM calls. JSON Mode & Action-based Framework: Utilizes JSON mode and reproducible outputs for predictable outcomes and an action-oriented approach for task execution.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You