My first game built entirely with a local LLM: 🐍⏪ Rewind Snake \ stacker news

Qwen 3.6 35B A3B: the first time for me that a local model has felt both smart AND fast enough to actually be usable.

Win sats!

10,000 sats to the highest score on Normal Mode
10,000 sats to the highest score on Enhanced Mode

Play it here: https://rolznz.github.io/rewind-snake/

To be eligible, enter your lightning address in the high score entry. Winner announced Wednesday morning US.

I one-shotted a snake game to see if it could... then kept pushing to see how far it'd go.

🔥 Wall Breaker mode
⏪ a "rewind time" mechanic — pay to undo your death
🏆 online high scores + replays
📱 mobile support + PWA

Making a game was just a test. What I really have been after is self-sovereign AI - building on a mid-tier laptop is now possible.

My setup: Qwen3.6-35B-A3B-UD-IQ3_XXS on a single NVIDIA RTX 4060 mobile — 8GB VRAM (A mid-tier laptop graphics card)

Using llama.cpp, built from source:

./build/bin/llama-server -m Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf -ngl 99 -np 1 -fa on -ctk q8_0 -ctv q8_0 -c 131072 --host 0.0.0.0 --port 8088 -ncmoe 38 --no-mmap

(Any ideas how I can optimize it more? MTP was not successful for me - 15% faster but much higher memory usage)

I built the app with PI agent. It's great for local-LLM dev because it doesn't waste context.

I also connected PI agent to Alby's builder and payments skills. Now I can build payment apps, and my agent can have budgeted, private access to my wallet.

I also built a simple "second brain" — a place to brainstorm and dump ideas without being spied on.

Looking forward to more self-sovereign AI experiments!

15 sats \ 1 reply \ @CruncherDefi 26 May

How many tokens/second did you achieve on this setup? How does such local model compares to codex/claude?

15 sats \ 0 replies \ @rolznz OP 26 May

I get 25TPS. Codex/Claude is significantly better, but I was happily surprised how well this works without relying on one of these centralised companies. Now I’m very much looking forward to the next generation of local models