wasimlorgat's comments

wasimlorgat · on Dec 30, 2022

Very cool how you pieced together these OS libraries and models! TIL about PythonAnywhere. Would it be possible to have your task/script update the index.html file? Since that’s the only thing that updates the feed, it might be a nice little simplification to render it without JS.

Yenrabbit · on Dec 30, 2022

Since I already have the flask app set up I could inject the items into the page with a render template before serving it. Or write directly I guess since I already have the html snippet I want to insert! Good suggestion. The current architecture is partly just an artefact of the order I developed things :)

wasimlorgat · on Nov 25, 2022

Qdrant looks great, thanks for sharing! I'll definitely play around with it and do some benchmarks :) Have you used it / compared it with faiss yourself?

andre-z · on Nov 25, 2022

There are benchmarks. However it does not really make sense to compare with a library https://qdrant.tech/benchmarks/#why-you-are-not-comparing-wi...

kacperlukawski · on Nov 25, 2022

Hey, I wrote an article describing the difference between FAISS, which is a library, and Qdrant - a proper database: https://lukawskikacper.medium.com/i-used-faiss-so-you-dont-h...

TLDR: using a vector database for production systems really pays off, as library is hard to maintain in a long run

wasimlorgat · on Nov 25, 2022

Thanks for sharing! Hmm, it isn’t clear to me from your post why a library would be harder to maintain. Could you share more? The database seems to add quite a bit of complexity at a system-level and I’m still not sure the benefit.

kacperlukawski · on Dec 5, 2022

Sure thing! If you decide to use a library that other processes or machines will communicate with, you need to build a whole service around it. The library also doesn't scale well if you need to go beyond a single-machine setup. With a proper database, the distributed deployment is already built-in, so you can scale it up on demand.

wasimlorgat · on Aug 26, 2022

I'm always surprised when people advocate for .py files over notebooks because of poor software practice. (Genuine question) have you found that it improves the situation at all?

HuwFulcher · on Aug 26, 2022

I’ve found varied success. In general, I’ve encouraged the move across to being teaching source control. That has been in contexts where notebooks are being used for critical outputs rather than exploration.

When you get into MLOps as well, having .py templates actually makes the Data Scientist’s job easier as they can plug and play their models into a system that tracks inputs, outputs and changes for them

wasimlorgat · on Aug 26, 2022

Oh, on the topic of file formats: Quarto also lets you do plaintext notebooks in quite an interesting way, definitely worth checking out: https://quarto.org/docs/computations/python.html

euler_angles · on Aug 26, 2022

The latest release of nbdev has fully embraced Quarto! It's very awesome, check it out.

wasimlorgat · on Aug 26, 2022

Have you worked with Jupyter notebooks and git? It's a literally true statement :D and quite a struggle for many of us

cycomanic · on Aug 26, 2022

It's the wrong way around though, Jupyter notebooks break a git work flow. I think the fault here is completely with the design of the Jupyter notebook file format (and the way editors save to it).

I think it's quite unfortunate that they did not consider that the format would integrate well with version control systems when first designing ipython notebooks.

fumeux_fume · on Aug 26, 2022

Nah man, you got it backwards. Git still works just fine while my notebooks are definitely broken. Not here to play the blame game, just trying to relate the practical results.

ocimbote · on Aug 26, 2022

If you leave git diffs in your files, whether Jupyter notebooks or otherwise, and run/compile them... They will break.

If you give me a counter example, good for you, but my statement holds true 99%.

jsweojtj · on Aug 26, 2022

You state in the top level comment that this claim stains the article: "Stating that git breaks Jupyter notebooks is quite a flex."

But you are saying here: "If you leave git diffs in your files, whether Jupyter notebooks or otherwise, and run/compile them... They will break."

Have you changed your mind in this thread? Or what's your objection?

ocimbote · on Aug 26, 2022

I'm suggesting that git only breaks Jupyter notebooks (or anything else) if you do not know what to expect from git.

But if you don't know that git modifies files when conflicts, then you're an interesting and rather unexpected audience, I assume.

Meaning that for the typical git user, meaning, knowing about git diffs, the behavior is expected hence not broken. The files end up in an expected broken state, but git does not break them per se.

If you still disagree, let's just settle that we disagree and be done with it.

wasimlorgat · on Aug 26, 2022

I agree re format vs tool complexity. I don't think Jupyter is a particularly difficult format though, its mostly light JSON -- all human-readable.

We realised after working with Jupyter+Git for a while that the pain-points were actually with Jupyter editors (and/or their conventions) rather than the format, because they do things like store user-metadata in the file which pollutes diffs and leads to merge conflicts.

In fact, if Jupyter editors could handle merge conflicted files, we wouldn't need a custom merge driver either.

wasimlorgat · on Aug 26, 2022

Jupytext does a lot more than just fix Jupyter/git integration, which is great if you want to adopt its approach, but a bit too heavy IMO if you don't. The approach mentioned here is extremely lightweight and doesn't use too much more than built-in Jupyter/git functionality (and it all happens automatically behind the scenes)

wasimlorgat · on Aug 26, 2022

Hi, I'm the author of the git merge driver and Jupyter save hook in nbdev2 :) I'd be happy to answer any questions you have about how we're handling using notebooks with git

jks · on Aug 26, 2022

Can this do three-way merge? If I have to resolve two conflicting code blocks, it is often useful to know how each of them change the code from the shared parent.

wasimlorgat · on Aug 26, 2022

It does an ordinary three-way git merge (treating notebooks as plaintext) then a two-way merge on conflicted bits. We opted for that approach because its incredibly simple and has worked perfectly for us (I think since we tend to work with small code cells). I think nbdime has a full-on three-way notebook merge if that's what you need, which can be used together with nbdev's Jupyter save hook to clan up unneeded metadata.

howon92 · on Aug 26, 2022

I enjoyed reading the writeup and think the solution is clean! Thanks for sharing

HN For You