llama.cpp compiled as a native Android library via the NDK, linked into React Native through a custom JSI bridge. GGUF models loaded straight into memory.
On Snapdragon devices we use QNN (Qualcomm Neural Network) for hardware acceleration. OpenCL GPU fallback on everything else. CPU-only as a last resort.
Image gen is Stable Diffusion running on the NPU where available. Vision uses SmolVLM and Qwen3-VL. Voice is on-device Whisper.
The model browser filters by your device's RAM so you never download something your phone can't run. The whole thing is MIT licensed - happy to answer anything about the architecture.
dude thats awesome to hear! I literally added support for web search, tool calling and KV cache optimization support + usage of all 99 GPU layers a few hours ago!
Those changes are not live on the play store / app store but its available on GH. I'll make a release later today.
it doesn't need internet to generate an image. it needs it to download the model, and to be fair if you've got the zip for the model already you can just import that
I think thats a bug, I'm guessing you're trying one of the NPU models? If you drop to CPU for now it should hold.
Only phones with qualcomm chips are able to use the NPU. I'm working towards changing that.
but yeah just to be clear there is no internet needed to run any of this. Infact I'm so averse to it, I've not even added analytics for this one. So flying pretty blind here.