Hey there
RSS FeedBuilding AI infrastructure with Claude Code. Documenting the journey from context window overflow to self-documenting systems.
Season 1: 25 days, 8 episodes — ConvoCanvas vision to automated infrastructure (Oct 2025).
Season 2: 5 episodes — validation, parallel testing, the patterns that survive production (Oct 2025).
Season 3: building in public — production AI infrastructure deep-dives, one architectural decision at a time (Feb 2026 →).
Featured
-
From Side Project to Platform: How Running Out of Disk Space Led to Rebranding an Entire Platform
3.5 months of silence hid 450 commits, a platform rebrand, 2TB migration, and the transformation from side project to platform engineering discipline.
-
Streaming Journals: Kafka Meets LLMs
The automated journal entries were fiction. Every single one. Here's how a broken pipeline got replaced in 10 days — Kafka, vLLM workers, nine code iterations, a context preservation fix, and an LLM-as-Judge quality gate — all tracked through git commits and vault evidence.
-
The Day Everything Got Sealed
I exposed my MCP bridge to the internet so Claude.ai could search my vault remotely. Within 26 hours, Cloudflare logs showed 39 searches from 15+ Anthropic IPs — and I had no way to tell what they'd asked for. Here's the incident response that sealed every secret, obfuscated every endpoint, and bootstrapped a proper engineering workflow in the process.
-
The Refusal Gate: Teaching a Bot to Say I Don't Know
How a single score threshold became the difference between a bot that hallucinates and a bot you can trust — and why the architectural decision to refuse synthesis is the same shape as a circuit breaker, a feature flag, and a dead-letter queue.
-
The Second Graph Comes Online: Activation, Communities, and an AI Reversal
How the second knowledge graph went from intermittent to dependable — Python AST plus FalkorDB plus Leiden community detection — and the moment the Kafka consumer pattern from the journal pipeline generalised into a fleet of code-intelligence workers. With one deliberate reversal from LLM-based to deterministic NLP that cut hallucinated tags from 75% to 0.3%.
-
Twenty-six hours, twelve tickets — and the audit that started everything else
A false-alarm audit on Wednesday patched 37 CVEs, reverted a cluster upgrade in fourteen minutes, then cascaded into ten more tickets on Thursday. Twelve tickets, twenty-six hours, one lesson.
-
The defenses you haven't built yet — async, a path traversal, and the bug erasing the evidence
A Monday that converted the whole RAG pipeline from sync to async (fast search 9 seconds to under 150 milliseconds), then watched the platform's own automation find a path traversal in its own code — bracketed by the discovery that an upstream auto-update bug had quietly deleted two months of the evidence this series is written from.
Recent Posts
-
The Speedometer I Made Up: Building a Token Budget Governor for AI Coding Agents
A weighted token-budget governor for Claude Code + Codex — and how I found it was metering a number Anthropic never confirmed. Plus: the loop is a token trap.
-
Scout Fleet and the Async Ceiling — what a 200× speedup doesn't fix
April was the month the platform stopped being one fast thing and started being four uneven ones. The async refactor made the fast paths embarrassingly fast, but deep search hit a ceiling — and chasing it meant discovering that the knowledge graph underneath two of my features had been empty for weeks, the autonomous fleet had been starving on it, and the connector pipeline I'd been proud of had a producer shouting into a consumer that was never built.
-
Co-authored-by is a Lie: Cryptographic Provenance for AI Coding Agents
Every AI coding agent signs its commits with a forgeable plain-text line. I gave each of mine a non-exportable key in the Mac's Secure Enclave, hook-enforced, with a verifier that flags forgery — here's the build.
-
Ten bugs, two-tenths of a point — the weekend search got measurably better
A weekend search-quality sprint that took benchmark scores from 0.67 to 0.74, an MCP transport migration that cut cold starts by 96%, ten bugs found along the way, and a CVSS 9.4 Harbor CVE caught by an automation pipeline deployed the same day — with the benchmark research that made it all measurable.
-
From Interview to Implementation: Perfect 100/100 Anthropic Alignment
BT SRE interview Oct 17th morning (strong networking/Linux, exposed K8s gaps), followed by 5.5 hours implementing Anthropic patterns after work. Achieved PERFECT 100/100 alignment (66→94→100) via parallel execution, enhanced MCP docs, LLM-as-judge, bash tools, subagent pattern, resumable execution. Skills migration Day 5-7 complete (103 tests, 99.4% token reduction). The honest gap - AI-assisted execution masked foundational knowledge. Claude Code rebuilt K3s cluster, but I couldn't explain control plane in interview. Manual rebuild plan to close the gap.
-
Three Systems, One Weekend: The Parallel Testing Chronicle
Working with Claude, we shipped three infrastructure systems in 48 hours. Neural Vault crisis forced reindex of 41K docs. Built Journal v3 (8-task pipeline), Hybrid Search (A/B tested), and 7 slash commands - all behind feature flags. Parallel testing revealed hybrid search only 1.7% better despite 27x slowdown. When you can't test everything manually, you architect for failure.
-
54 Minutes to Production: Six Systems in One Day
54 minutes of active work across 14 hours (3 sessions with gap-aware timeline). Six journal automation enhancements deployed - RAG context (30-day search), brag docs (61 accomplishments), emotional tracking (burnout detection), interstitial notes (real-time capture), semantic chunking (102 chunks), automated pipeline. ULTRATHINK diagnosed Prefect silent failure in 45 minutes. Deployed hybrid Cron+Prefect fix in 12 minutes. Created idea-tree system for fast visual brainstorming. Built /blog command with 5-gate quality system. Automation begets automation at maximum meta-level.
-
48 Hours Later: From Validation to Velocity
18.5 hours across 48 hours. Research-backed codebase review (85/100), 5/5 systematic tests, NER enhanced to 163x speedup, 2.5 hours metadata debugging. Not just deployed - validated and improved. The research foundation behind every enhancement. From 85→90 quality score through methodical engineering.