pull down to refresh

Qwen 3.6 35B A3B: the first time for me that a local model has felt both smart AND fast enough to actually be usable.

Win sats!

  • 10,000 sats to the highest score on Normal Mode
  • 10,000 sats to the highest score on Enhanced Mode

Play it here: https://rolznz.github.io/rewind-snake/

To be eligible, enter your lightning address in the high score entry. Winner announced Wednesday morning US.

I one-shotted a snake game to see if it could... then kept pushing to see how far it'd go.

  • 🔥 Wall Breaker mode
  • ⏪ a "rewind time" mechanic — pay to undo your death
  • 🏆 online high scores + replays
  • 📱 mobile support + PWA

Making a game was just a test. What I really have been after is self-sovereign AI - building on a mid-tier laptop is now possible.

My setup: Qwen3.6-35B-A3B-UD-IQ3_XXS on a single NVIDIA RTX 4060 mobile — 8GB VRAM (A mid-tier laptop graphics card)

Using llama.cpp, built from source:

./build/bin/llama-server -m Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf -ngl 99 -np 1 -fa on -ctk q8_0 -ctv q8_0 -c 131072 --host 0.0.0.0 --port 8088 -ncmoe 38 --no-mmap

(Any ideas how I can optimize it more? MTP was not successful for me - 15% faster but much higher memory usage)

I built the app with PI agent. It's great for local-LLM dev because it doesn't waste context.

I also connected PI agent to Alby's builder and payments skills. Now I can build payment apps, and my agent can have budgeted, private access to my wallet.

I also built a simple "second brain" — a place to brainstorm and dump ideas without being spied on.

Looking forward to more self-sovereign AI experiments!

How many tokens/second did you achieve on this setup? How does such local model compares to codex/claude?

reply

I get 25TPS. Codex/Claude is significantly better, but I was happily surprised how well this works without relying on one of these centralised companies. Now I’m very much looking forward to the next generation of local models

reply