Very cool how you pieced together these OS libraries and models! TIL about PythonAnywhere. Would it be possible to have your task/script update the index.html file? Since that’s the only thing that updates the feed, it might be a nice little simplification to render it without JS.
Since I already have the flask app set up I could inject the items into the page with a render template before serving it. Or write directly I guess since I already have the html snippet I want to insert! Good suggestion. The current architecture is partly just an artefact of the order I developed things :)
Qdrant looks great, thanks for sharing! I'll definitely play around with it and do some benchmarks :) Have you used it / compared it with faiss yourself?
Thanks for sharing! Hmm, it isn’t clear to me from your post why a library would be harder to maintain. Could you share more? The database seems to add quite a bit of complexity at a system-level and I’m still not sure the benefit.
Sure thing! If you decide to use a library that other processes or machines will communicate with, you need to build a whole service around it. The library also doesn't scale well if you need to go beyond a single-machine setup. With a proper database, the distributed deployment is already built-in, so you can scale it up on demand.
I'm always surprised when people advocate for .py files over notebooks because of poor software practice. (Genuine question) have you found that it improves the situation at all?
I’ve found varied success. In general, I’ve encouraged the move across to being teaching source control. That has been in contexts where notebooks are being used for critical outputs rather than exploration.
When you get into MLOps as well, having .py templates actually makes the Data Scientist’s job easier as they can plug and play their models into a system that tracks inputs, outputs and changes for them
It's the wrong way around though, Jupyter notebooks break a git work flow. I think the fault here is completely with the design of the Jupyter notebook file format (and the way editors save to it).
I think it's quite unfortunate that they did not consider that the format would integrate well with version control systems when first designing ipython notebooks.
Nah man, you got it backwards. Git still works just fine while my notebooks are definitely broken. Not here to play the blame game, just trying to relate the practical results.
I'm suggesting that git only breaks Jupyter notebooks (or anything else) if you do not know what to expect from git.
But if you don't know that git modifies files when conflicts, then you're an interesting and rather unexpected audience, I assume.
Meaning that for the typical git user, meaning, knowing about git diffs, the behavior is expected hence not broken. The files end up in an expected broken state, but git does not break them per se.
If you still disagree, let's just settle that we disagree and be done with it.
I agree re format vs tool complexity. I don't think Jupyter is a particularly difficult format though, its mostly light JSON -- all human-readable.
We realised after working with Jupyter+Git for a while that the pain-points were actually with Jupyter editors (and/or their conventions) rather than the format, because they do things like store user-metadata in the file which pollutes diffs and leads to merge conflicts.
In fact, if Jupyter editors could handle merge conflicted files, we wouldn't need a custom merge driver either.
Jupytext does a lot more than just fix Jupyter/git integration, which is great if you want to adopt its approach, but a bit too heavy IMO if you don't. The approach mentioned here is extremely lightweight and doesn't use too much more than built-in Jupyter/git functionality (and it all happens automatically behind the scenes)
Hi, I'm the author of the git merge driver and Jupyter save hook in nbdev2 :) I'd be happy to answer any questions you have about how we're handling using notebooks with git
Can this do three-way merge? If I have to resolve two conflicting code blocks, it is often useful to know how each of them change the code from the shared parent.
It does an ordinary three-way git merge (treating notebooks as plaintext) then a two-way merge on conflicted bits. We opted for that approach because its incredibly simple and has worked perfectly for us (I think since we tend to work with small code cells). I think nbdime has a full-on three-way notebook merge if that's what you need, which can be used together with nbdev's Jupyter save hook to clan up unneeded metadata.