More

tom_hartke · on July 26, 2024

Not at the moment -- we're currently searching the abstracts of most major journals (which are public even for paywalled papers) which have been compiled in the Semantic Scholar database (https://www.semanticscholar.org/about/publishers).

tom_hartke · on July 25, 2024

Not a bad idea. Want to avoid too much complexity in pricing, though. Decision fatigue.

tom_hartke · on July 25, 2024

Latency, compute required, and lack of full texts (paywalled publisher content).

tom_hartke · on July 25, 2024

In short, yes, though it's geared toward topic search.

From a strategy perspective, we designed it for topic search because it makes more sense to find everything on a topic first, then filter for the most recent, if recent is what you want. That's because there is a lot of useful information in older articles (citation connections, what people discuss, and how), and gathering all that helps uncover the most relevant results. Conversely if you only ever filtered on articles in the last year, you might discover a few things, but you wouldn't have as much information to adapt to help the search work better.

So, you can ask for articles on coffee (though ideally it should be something a bit more specific, or there will be thousands of results). Our system will carefully find all articles, then you can filter for 2024 articles or look at the timeline.

tom_hartke · on July 25, 2024

The few minute time delay is primarily because of the sequential LLM processing steps by high quality LLMs, not database access times. The system reads and generates paragraphs about papers, then compares them, and we have to use the highest quality LLMs, so token generation times are perceptible. We repeat many times for accuracy. We find it's impossible to be accurate without GPT-4 level models and the delay.

tom_hartke · on July 25, 2024

Ours is slow, but accurate, even for complex topics. The rest are fast, but generally can't handle complex topics. (There's more nuanced explanations in other comments)

tom_hartke · on July 25, 2024

It's worth highlighting that first result is exactly what you asked for, given all 4 of your criteria:

1. It's on adults.

2. It's longitudinal over multiple years.

3. It studies variations in brain volume.

4. It focuses on healthy individuals.

You can see the full results for that search text here: https://undermind.ai/query_app/display_one_search/e1a3805d35...

robwwilliams · on July 25, 2024

And that first hit in JAMA Open is a fabulous paper. Ten or mire yearly MRI scans for 650 subjects.

tom_hartke · on July 25, 2024

Semantic Scholar seems more focused on 1. being the data provider/aggregator for the research community, and 2. long term, I think they plan to develop software at the reading interface that learns as a researcher uses it to browse papers (a rich PDF reader, with hyperlinks, TLDRs, citation contexts, and a way to track your interactions over time, and remind you of what you've seen or not).

Their core feature now is a fast keyword search engine, but they also have a few advanced search features through their API (https://api.semanticscholar.org/api-docs/) like recommendations from positive/negative examples, but neither KW search nor these other systems are currently high enough quality to be very useful for us.

FYI our core dataset for now is provided by Semantic Scholar, so hugely thankful for their data aggregation pipeline and open access/API.

sitkack · on July 25, 2024

Do you plan on adding an API? I already have an inhouse knowledge discovery, annotation and search system that could be augmented by your service. Not super critical at this point, but a would be nice.

And yes, Semantic Scholar is a wonderful part of the academic commons. Fingers crossed they don't go down the jstor/oclc path.

shashkingregory · on July 25, 2024

I've used undermind for literature search and it was very precise! Thanks for the product! I wonder how you plan to extend the search to full paper content (will Semantic Scholar api allow this) - and do you plan to connect more datasets (which ones)? (many of them are paid...)

jramette · on July 25, 2024

We'll certainly be able to include open access full texts, which is already a substantial fraction of the published papers, and a growing fraction too, as the publishing industry is rapidly moving toward open access. Paywalled full text search would require working with the publishers, which is more involved.

shashkingregory · on July 26, 2024

Great! I can definitely ask undermind for an overview paper of the scientific information landscape, unless you have a favourite in quick access to share?

tom_hartke · on July 25, 2024

I think the biggest difference is our focus on search quality, and being willing to spend a lot on compute to do it, while they focus on systematic extraction of data from existing sources and on being fast. It's a bit of an oversimplification (they of course have search, and we also have extraction).

Feature-wise, we definitely have a lot of work to do :) What crucial pieces do you think we're missing?

lyjackal · on July 26, 2024

From what I understand, that’s not the case. They are working on both. I’d be concerned about how you can differentiate and compete with them. They have a big head start

tom_hartke · on July 25, 2024

Potentially. Given the latency and the cost/compute we put into each result, it doesn't fit the usual API mechanics.

What use case are you thinking of?

HN For You