Qwen 3.6 35B A3B: the first time for me that a local model has felt both smart AND fast enough to actually be usable.
Win sats!
- 10,000 sats to the highest score on Normal Mode
- 10,000 sats to the highest score on Enhanced Mode
Play it here: https://rolznz.github.io/rewind-snake/
To be eligible, enter your lightning address in the high score entry. Winner announced Wednesday morning US.
I one-shotted a snake game to see if it could... then kept pushing to see how far it'd go.
- 🔥 Wall Breaker mode
- ⏪ a "rewind time" mechanic — pay to undo your death
- 🏆 online high scores + replays
- 📱 mobile support + PWA
Making a game was just a test. What I really have been after is self-sovereign AI - building on a mid-tier laptop is now possible.
My setup: Qwen3.6-35B-A3B-UD-IQ3_XXS on a single NVIDIA RTX 4060 mobile — 8GB VRAM (A mid-tier laptop graphics card)
Using llama.cpp, built from source:
./build/bin/llama-server -m Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf -ngl 99 -np 1 -fa on -ctk q8_0 -ctv q8_0 -c 131072 --host 0.0.0.0 --port 8088 -ncmoe 38 --no-mmap
(Any ideas how I can optimize it more? MTP was not successful for me - 15% faster but much higher memory usage)
I built the app with PI agent. It's great for local-LLM dev because it doesn't waste context.
I also connected PI agent to Alby's builder and payments skills. Now I can build payment apps, and my agent can have budgeted, private access to my wallet.
I also built a simple "second brain" — a place to brainstorm and dump ideas without being spied on.
Looking forward to more self-sovereign AI experiments!
How many tokens/second did you achieve on this setup? How does such local model compares to codex/claude?
I get 25TPS. Codex/Claude is significantly better, but I was happily surprised how well this works without relying on one of these centralised companies. Now I’m very much looking forward to the next generation of local models