For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | thadguidry's commentsregister

One of the most powerful features that I pushed the MyST team to incorporate is xref (linking External MyST projects via borrowed syntax from ReStructuredText) https://mystmd.org/guide/external-references#myst-xref Which allows supplying a list of external references, and complete metadata: https://mystmd.org/guide/website-metadata#myst-xref-json


Thanks @rasmusei! If you are a data scientist you might also be interested in how to work along with Jupyter. Our community has some documentation on our Wiki about that here: https://github.com/OpenRefine/OpenRefine/wiki/Jupyter


Thanks Larry for the ping. Happy to answer questions from the community here!


SIMILE library is used in OpenRefine for certain Faceting like timeline, clustering, etc. Parallax was originated by David to show how time series data visualizations could be enhanced. David was one of our original designers of OpenRefine and I worked closely with him and Stefano in testing it.


Our current architecture is here: https://github.com/OpenRefine/OpenRefine/wiki/Architecture

YES! We want users to process much larger data sets also! We have started experiments with using Apache Spark on the backend where the hope is that we can help users with much larger datasets. This work is being funded by CZI and you can read the grant proposal here: http://openrefine.org/blog/2019/11/14/czi-eoss.html


Long time back, I built an extension, by which you can take the openrefine mappings, and run on a hadoop cluster. The idea is you would run all transformations on a small dataset on your local machine. Once you are satisfied with the mappings, you would deploy the same on hadoop cluster. I have tested this on large datasets, and it works. Let me know how I can help. You can check my github: https://github.com/rmalla1/OpenRefine-HD


before you go down the spark route, consider perl/unix-tools may do this kind of thing quite well: https://livefreeordichotomize.com/2019/06/04/using_awk_and_r...


That author did not have Spark tuned well for the use case. This is a common issue with Spark. Since OpenRefine commonly is used with Strings, we plan to optimize in many areas for that such a few mentioned here: https://databricks.com/glossary/spark-tuning But in general, there are always tradeoffs when trying to provide immediate feedback for interactions. Since OpenRefine has many interactive features, some will need to support batching and advise the user in the interface that things will take longer...do you want to send to batch? Some of the tradeoffs and ways we plan to address these are mentioned in our general OpenRefine on Spark issue here: https://github.com/OpenRefine/OpenRefine/issues/1433


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You