Yep i started off with trying to get it to work with pytorch (https://github.com/bkkaggle/lm-training-research-project/blo...) then with pt-lightning but the whole 1 user VM per TPU board limitation in pytorch-xla 7-8 months ago made me switch over to TF
heh. I've been using jax for a couple of months and its been a pretty nice replacement of both pt and tf. it feels like what a ml framework would look like if it were built around easy scaling and dev friendliness.
Yeah I totally get your point about the title—the TPU quota that I got was close to about the equivalent of $20k—but in my defense I don't have any other access to compute beyond anything that I get through the TFRC or through google colab
Speaking as a hobbyist, earlier if you had enough determination you could create just about any software if you kept hacking at it long enough. CPU or cost was generally not an issue, your time and tenacity was.
This has now unfortunately changed and innovation in software (esp ML) is now largely more about how deep are you pockets are.
I think this is quite a rose colored view of the past. Rendering with many graphics techniques was out of reach for hobbyists for a long time for example.
This is pretty interesting especially since I just finished an assignment a couple of days ago for my proofs course that was all about proving the divisibility of large integers and the sums of their digits.
Modular arithmetic, pretty fun stuff even if i'm incredibly bad at it.
i used pytorch lightning back in may when i was working on pretraining gpt2 on TPUs (https://bkkaggle.github.io/blog/nlp-research-part-2/). it was really impressive how stable it was especially given how a lot of features were still being added at a very fast pace.
also, this was probably the first (and maybe still is?) high-level pytorch library that let you train on tpus without a lot of refactoring and bugs which was a really nice thing to be able to do given how the pytorch-xla api was still unstable at that point. <3