v0.30.5
Ollama release - check for model support, multi-GPU, and compatibility.
Independent publication // local-first AI // field reports from owned systems
A running publication about local models, private inference, self-hosted agents, weird hardware, research sweeps, and the builders wiring their own AI stack together.
AI is getting physical again. It shows up in terminals, racks, side projects, and ugly little workflows people actually control.
Ollama release - check for model support, multi-GPU, and compatibility.
Ollama release - check for model support, multi-GPU, and compatibility.
vLLM release - check for tensor parallelism, memory, and throughput changes.
vLLM release - check for tensor parallelism, memory, and throughput changes.
Ollama release - check for model support, multi-GPU, and compatibility.
Ollama release - check for model support, multi-GPU, and compatibility.
Ollama release - check for model support, multi-GPU, and compatibility.
vLLM development signal - relevant to the production serving layer.
A current field note on why the 3x3090 serving path moved through pipeline parallelism, not tensor parallelism, for the tested 32B AWQ model.
A public-safe read on Gemma 4 12B as a local sensory preprocessor: useful for seeing, hearing, and structuring observations without turning into an action system.
A measured note on 300W bursty inference, lower caps for sustained runs, and why power-cap sweet spots are workload-specific.
A field-report read on 14x RTX 3090 agent serving, EXL3, FP8 KV cache, Aphrodite, and why concurrency is becoming the local hardware metric.