For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | mr_octopus's commentsregister

Three volumes. Free PDFs. No sign-up. Creative Commons (CC BY-NC-ND 4.0).

  I kept noticing smart people missing the same things. Not from lack of intelligence — from filters they didn't
  know they were running. So I wrote a nonfiction trilogy called The Calibrated View.

  Volume I covers nine perceptual filters — the lenses you inherit from family, culture, trauma, information diet —
  that shape what you see before you're conscious enough to question them. Volume II is about what it costs to stay
  in a room when the room gets difficult. Volume III is about the person who sees a gap and fills it without being
  asked.

  Each chapter opens with a real person's story, then names the pattern. Bowlby's attachment theory, Fannie Lou
  Hamer's voter registration, a man in India who planted a forest alone for 40 years.

  I published under a pen name and put everything under Creative Commons. Built the PDFs with Playwright/Chromium
  from Markdown source. The site is a static flipbook on GitHub Pages.

  Happy to answer questions about the writing process, the framework, or the build pipeline.


Hi HN, solo developer here. OctoFlow is a GPU-native programming language — the GPU is the primary execution target, not an accelerator you opt into. 4.5 MB binary, zero dependencies, any Vulkan GPU.

  This is vibe-coded. LLMs generated the bulk of the code. That's not a caveat —
  it's the point. But "vibe-coded" doesn't mean "unreviewed." Every architectural
  decision has a human at the gate: pure Vulkan (no CUDA lock-in), zero external
  dependencies (hand-rolled everything), 23-concept language spec that fits in an
  LLM prompt, Loom Engine dispatch chains that let the GPU run autonomously.

  The AI writes code. The human decides what to build, why, and whether it ships.

  Two principles guide every decision:

  - Sustainability: Is complexity growing faster than it can be maintained? If the
    test count drops or the gotcha list grows, stop and fix before shipping more.

  - Empowerment: Can a non-GPU-programmer go from intent to working GPU code? If a
    feature makes the language harder for AI to generate, it doesn't ship.

  What's real today: 966 tests passing, 445 stdlib modules, 150 pre-compiled
  SPIR-V kernels, interactive REPL with GPU, AI code generation via
  `octoflow chat`. What's in progress: running 24GB LLM models on 6GB consumer
  GPUs via layer streaming.

  The stdlib and everything on GitHub is MIT. The compiler (Rust) is in a private
  repo for now, but the developer is willing to open-source all of it once there's
  a team to sustain it. Looking for co-maintainers — especially people with Vulkan
  experience or who care about keeping languages small and learnable.

  Happy to answer questions about the architecture, the vibe-coding workflow, or
  what it's like building a programming language with AI doing 90% of the typing
  and a human doing 100% of the deciding.


But "vibe-coded" doesn't mean "unreviewed."

(Just FYI, that’s literally what it means. The term was coined here: https://x.com/karpathy/status/1886192184808149383 . Excerpt:

“I ‘Accept All’ always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it.”)


thanks for that, will change vibe-coded to AI-Assisted then...


Most GPU work boils down to a few patterns — map, reduce, scan. Each one has a known way to assign threads.

So instead of writing a kernel with thread_id:

  let c = gpu_add(a, b)
  let total = gpu_sum(c)
The thread indexing is still there — just handled by the runtime, like how Python hides pointer math.


Thanks for trying it! :)

Each gpu_* call emits SPIR-V and dispatches via Vulkan compute. Data stays resident in VRAM between calls — no round-trips to CPU unless you need the result.

No thread_id exposed. The runtime handles thread indexing internally — gpu_add(a, b) means "one thread per element, each does a[i] + b[i]." Workgroup sizing and dispatch dimensions are automatic.

The tradeoff: you can't write custom kernels with shared memory or warp-level ops. OctoFlow targets the 80% of GPU work that's embarrassingly parallel. For the other 20% you still want CUDA/Vulkan directly.

Cheers


The vendor-agnostic GPU approach via KernelAbstractions is great to see. The Vulkan compute path is underrated for this — it runs on AMD, NVIDIA, and Intel without needing ROCm or CUDA, just whatever driver ships with the GPU.

Re: the compilation latency discussion — it's a real tension. JIT gives you expressiveness but kills startup. AOT gives you instant start but limits flexibility. Interesting that most GPU languages went JIT when the GPU itself runs pre-compiled SPIR-V/PTX anyway.


Great writeup. The three-pass softmax is where everyone gets stuck — subtracting the max for numerical stability is one of those things you can't learn from a textbook.

The pain points you hit (byte alignment, dispatch dims, strict typing) make me wonder if there's a sweet spot between raw WGSL and "import pytorch" that keeps you close to the metal without all the papercuts.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You