For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | rampantraccoon's commentsregister

a new lightweight network architecture -- Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed. GELAN's architecture confirms that PGI has gained superior results on lightweight models. We verified the proposed GELAN and PGI on MS COCO dataset based object detection. The results show that GELAN only uses conventional convolution operators to achieve better parameter utilization than the state-of-the-art methods developed based on depth-wise convolution.



"Interestingly, even at this scale, we observe no sign of saturation in performance, suggesting that AIM potentially represents a new frontier for training large-scale vision models."


The problem being solved is AI being able to distinguish unique objects within visual data. Before SAM, people would have to train a model on specific objects by labeling data and training a model to understand those objects specifically. This becomes problematic given the variety of objects in the world, settings they can be in, and their orientation in an image. SAM can identify objects it has never seen before, as in objects that might not be part of the training data.

Once you can determine which pixels belong to which object automatically, you can start to utilize that knowledge for other applications.

If you have SAM showing you all objects, you can use other models to identify what the object is, understand it's shape/size, understand depth/distance, etc. It's a foundational model to build off of for any application that wants to use visual data as an input.


> SAM can identify objects it has never seen before

I'd love to see what SAM does when you send it a photo of rolling fog though, e.g. https://www.google.com/search?q=rolling+fog+scotland&tbm=isc... - what happens then? (and how can it meaningfully segment-out fog?)


Not sure if this is what you mean, but I grabbed some of those images & dropped them in to see what it predicted: https://imgur.com/a/CXLmYXo


It groups the fog as a single object (except where it's separated by things like hills).

You can see what it does - it's available to test at https://segment-anything.com/.


Yes, what I am interested in are the other applications.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You