For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | heywhy's commentsregister

I just published an S3 storage provider for Elasticlunr. You can now store your indexes to an S3 bucket aside the Disk storage provider included in the base project.

The storage API is flexible, so writing to any storage provider (Google Cloud Storage, DB and so on) shouldn't be a problem. it's just a matter of grabbing the right provider or implementing one yourself.

https://github.com/heywhy/ex_elasticlunr_s3


Yes. This library has stemming, TF-IDF included already. The everything provided by the JS version is included in this library. And improvements are made where applicable.


Yes, it is a port of that library with some improvements.


You might want to consider mapping an index to an ets table-based data structure instead of an immutable object managed by a GenServer, it will give you a way to share it between processes without having to awkwardly copy a potentially huge data structure all over the place.


I do have thoughts about performance too but I was following the "get it working then make improvements" route :). Thank you for the suggestions.


This makes sense, and I think you've taken the correct route. I look forward to trying this in one of my projects and comparing to my current postgres-only backed search strategy. For my use case losing the index between restarts isn't a deal breaker, so hopefully I'll have some useful feedback.


That's great. I will be looking forward to this.


Love it. You're doing exactly the right thing.


I don't understand how this works. Is data read from ETS somehow shared more efficiently than data shared via a regular message? (which iirc is always copied)


It's still copied but if you are using an ets table you're likely only copying a small subset of the data per query instead of schlepping the whole index every time.


It’s still copied, but a process can quickly become a bottleneck in parallel code (every request to a process is sequential).

An ETS table can be concurrently read (and tweaked even further for that use case if desired).


Like eproxus mentioned, it's still been shared through normal process messaging but improvements will be made regarding this.


I'd say even using mnesia as an option for disc copies.


mnesia had very difficult to debug consistency issues that can crop up. Have these been fixed?


Hello. I'm the author of the library, you should use the IndexManager (https://github.com/heywhy/ex_elasticlunr/blob/master/lib/ela...) to store your index after making changes to it but note that the indexes will be lost on application shutdown.

But I'm currently working on a configurable storage mechanism so that you can use whatever storage provider of your choice. See https://github.com/heywhy/ex_elasticlunr/pull/9


Nice work! For certain storage media, e.g. S3 it might be useful to have some sort of delta-based updates where you can enqueue deltas that accrue over time. It might also be interesting to solicit volunteers to help implement distributed in-memory or disk persistence.


Thank you for the suggestions. I also have same direction for the library. I don't mind if you recommend volunteers.

And don't forget to share the project with friends and colleagues who might be interested in contributing.


Post it on elixirforum (elixirforum.com).... Did a quick search and couldn't find it there.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You