I began using python as a way to "mock" out the overall design; intending to re-implement it in rust. The main reasoning for using python: was the ability to focus on "high-level" concepts and speed of tinkering.
This implements a single process, single thread, single connection database- so performance and low-level concurrency control were not explicit goals or really optimized for. For those (real-live concerns) rust or C++ are much better; but also come with their set of complexities.
It doesn't have a notion of atomically batching multiple statements, i.e. transaction. But beyond that, it's a single file database,
which can only have a single process (learndb instance) that is operating on the database (file). So you get consistency and isolation via being a single connection database. Durability, you get to the extent that the file system is durable. So it's somewhere on the ACIDity spectrum.
Re: Query planning/optimization
I haven't implemented this; but I've considered where the optimization could module sit:
The parser spits out an AST. This or a derived intermediate representation could be optimized,i.e. the AST could be rewritten or nodes deleted,
before the VM executes the AST.
Edit: I see what you mean. I surveyed a bunch a parser generator libraries, and they also seemed to use a text based DSL- rather than DSL based on python structures. What you're describing would have made the grammar development more ergonomic and simple.
Perhaps a more accurate claim would be "SQLite inspired". Calling it a clone is misleading.
Mad props to the author. Many Python programmers never had proper training in computer science, so it is encouraging to see people filling in the gaps of their knowledge.
I think it doesn't even get close to being a criticism, and it's certainly unclear if the goal is to literally clone SQLite or to implement SQLite-ish. This is a fair question.
Just trying to encourage clear and accurate communication. I agree with your first sentence. The only thing unfair is the author claiming it is a SQLite clone. It isn't, as we both seem to agree. It is a form of cheating.
Fair. It’s “inspired by”, not a “clone”. Frankly, I don’t think these terms are that specific, that one couldn’t level the same point against “inspired by”.. in what sense is it inspired?
Cool stuff. I had similar intuitions- Python allow me to focus on the high-level concepts. Albeit, there were times where I wished I had gone with a statically-typed + compiled language.
Most definitely. The b-tree implementation was the first motivation for starting the project. Especially, all the details around node rebalancing and splitting. And the fact that it was an on-disk structure, added another wrinkle to the thinking about the impl
This incomplete tutorial is about how to write a sqlite clone, with particular emphasis on how to implement the underlying b-tree: https://cstack.github.io/db_tutorial/
for YEARS that tutorial has been stuck on "Alright. One more step toward a fully-operational btree implementation. The next step should be splitting internal nodes. Until then!". Must not be worth the time to finish writing the tutorial.
This is really cool. I recently attempted to build a toy database, and subsequently implemented my own b-tree (
https://github.com/spandanb/learndb-py). I ended up running into a lot of these issues.