Three volumes. Free PDFs. No sign-up. Creative Commons (CC BY-NC-ND 4.0).
I kept noticing smart people missing the same things. Not from lack of intelligence — from filters they didn't
know they were running. So I wrote a nonfiction trilogy called The Calibrated View.
Volume I covers nine perceptual filters — the lenses you inherit from family, culture, trauma, information diet —
that shape what you see before you're conscious enough to question them. Volume II is about what it costs to stay
in a room when the room gets difficult. Volume III is about the person who sees a gap and fills it without being
asked.
Each chapter opens with a real person's story, then names the pattern. Bowlby's attachment theory, Fannie Lou
Hamer's voter registration, a man in India who planted a forest alone for 40 years.
I published under a pen name and put everything under Creative Commons. Built the PDFs with Playwright/Chromium
from Markdown source. The site is a static flipbook on GitHub Pages.
Happy to answer questions about the writing process, the framework, or the build pipeline.
Hi HN, solo developer here. OctoFlow is a GPU-native programming language —
the GPU is the primary execution target, not an accelerator you opt into.
4.5 MB binary, zero dependencies, any Vulkan GPU.
This is vibe-coded. LLMs generated the bulk of the code. That's not a caveat —
it's the point. But "vibe-coded" doesn't mean "unreviewed." Every architectural
decision has a human at the gate: pure Vulkan (no CUDA lock-in), zero external
dependencies (hand-rolled everything), 23-concept language spec that fits in an
LLM prompt, Loom Engine dispatch chains that let the GPU run autonomously.
The AI writes code. The human decides what to build, why, and whether it ships.
Two principles guide every decision:
- Sustainability: Is complexity growing faster than it can be maintained? If the
test count drops or the gotcha list grows, stop and fix before shipping more.
- Empowerment: Can a non-GPU-programmer go from intent to working GPU code? If a
feature makes the language harder for AI to generate, it doesn't ship.
What's real today: 966 tests passing, 445 stdlib modules, 150 pre-compiled
SPIR-V kernels, interactive REPL with GPU, AI code generation via
`octoflow chat`. What's in progress: running 24GB LLM models on 6GB consumer
GPUs via layer streaming.
The stdlib and everything on GitHub is MIT. The compiler (Rust) is in a private
repo for now, but the developer is willing to open-source all of it once there's
a team to sustain it. Looking for co-maintainers — especially people with Vulkan
experience or who care about keeping languages small and learnable.
Happy to answer questions about the architecture, the vibe-coding workflow, or
what it's like building a programming language with AI doing 90% of the typing
and a human doing 100% of the deciding.
Each gpu_* call emits SPIR-V and dispatches via Vulkan
compute. Data stays resident in VRAM between calls — no
round-trips to CPU unless you need the result.
No thread_id exposed. The runtime handles thread indexing
internally — gpu_add(a, b) means "one thread per element,
each does a[i] + b[i]." Workgroup sizing and dispatch
dimensions are automatic.
The tradeoff: you can't write custom kernels with shared
memory or warp-level ops. OctoFlow targets the 80% of
GPU work that's embarrassingly parallel. For the other
20% you still want CUDA/Vulkan directly.
The vendor-agnostic GPU approach via KernelAbstractions
is great to see. The Vulkan compute path is underrated
for this — it runs on AMD, NVIDIA, and Intel without
needing ROCm or CUDA, just whatever driver ships with
the GPU.
Re: the compilation latency discussion — it's a real
tension. JIT gives you expressiveness but kills startup.
AOT gives you instant start but limits flexibility.
Interesting that most GPU languages went JIT when the
GPU itself runs pre-compiled SPIR-V/PTX anyway.
Great writeup. The three-pass softmax is where everyone
gets stuck — subtracting the max for numerical stability
is one of those things you can't learn from a textbook.
The pain points you hit (byte alignment, dispatch dims,
strict typing) make me wonder if there's a sweet spot
between raw WGSL and "import pytorch" that keeps you
close to the metal without all the papercuts.