englund's comments

englund · on Nov 4, 2021

I've built https://bundlescanner.com which is similar what you're describing. It has indexed 35,000 of the most popular npm packages. However, it is not accurate enough to reliably identify which specific version of a package is present in a js bundle.

I'd be curious to hear if anyone can think of possible applications of it in security auditing.

englund · on Sept 25, 2021

(Project creator here)

I'm happy to answer any questions about how the project works. Feedback is very much appreciated.

The big challenge this project has faced from the beginning is how to make the matching algorithm as accurate as possible and at the same time fast enough to scan an entire website worth of javascript within a couple of seconds. As someone with zero prior experience with search/information retrieval this was a hard task.

Accuracy-wise it's not quite there yet. In my benchmark, around 5% of identified libraries are false positives and something like 15% of bundled libraries are missed. The false positives mostly stem from cases where two libraries have almost identical content, or cases where one library has bundled a dependency into its own code.

Performance has gotten quite good. In a benchmark of popular websites, it can scan through ~1.4 websites / second, or ~3MBs of minified javascript / second (running on a 65€/month VPS).

englund · on April 8, 2021

Invaded open source? I can't think of any open source projects not run by programmers.

HN For You