Episode 5: The Migration Question - When K3s Meets Reality

Series: Season 1 - From Zero to Automated Infrastructure Episode: 5 of 8 Date: September 30, 2025 Reading Time: 10 minutes

September 30, 10:00 AM: The Hybrid Reality

Vault Evidence: K3s-Full-Stack-Containerization-Research-2025-09-30.md and K3s-Migration-Value-Score-Update-Velocity-Analysis-2025-09-30.md created September 30, 2025 at 12:00-14:00, documenting migration research for already-running K3s system.

By late September, I had a split personality problem.

Already on K3s (from prior work):

kubectl get pods -A | grep -E "librechat|monitoring|elastic|tracing|kafka"

# Output:
librechat     librechat-ui-7d4b9c8f6-xk2p9      1/1   Running
librechat     mongodb-0                          1/1   Running
librechat     rag-api-deployment-m8k4l           1/1   Running
monitoring    prometheus-server-0                1/1   Running
monitoring    grafana-7d4b9c8f6-xk2p9           1/1   Running
elastic       elasticsearch-0                    1/1   Running
elastic       kibana-deployment-xk2p9           1/1   Running
tracing       jaeger-operator-7d4b9c8f6-xk2p9   1/1   Running
tracing       otel-collector-m8k4l              1/1   Running
kafka         kafka-broker-0                     1/1   Running

Still running natively:

ps aux | grep -E "ollama|chromadb|vllm"

# Output:
ollama    → port 11434 (17 models, 48GB RAM)
vLLM      → port 8000 (OpenAI-compatible API)
ChromaDB  → embedded (24,916 docs indexed)
FastMCP   → port 8002 (MCP bridge)

Hybrid architecture. Half containerized, half bare metal. It worked, but it felt… incomplete.

The question wasn’t “Should I use K3s?” - K3s was already running. The question was: “Should I migrate everything to it?”

The Update Velocity Concern

Before diving into migration, I had one critical concern: Would containerization slow down my ability to update software?

With native installs:

# Ollama releases new version → 2 minutes later
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl restart ollama
# DONE. Latest version running.

With Kubernetes Helm charts:

# Ollama releases new version → wait for community maintainer
# Community maintainer updates Helm chart → 9-39 days later
helm repo update
helm upgrade ollama ollama/ollama
# NOW you're on the latest version. Weeks later.

I needed to research this properly.

September 30, 12:00 PM: The Research Begins

I created two research documents:

K3s Full-Stack Containerization Research - Technical feasibility
K3s Migration Value Score: Update Velocity Analysis - The hidden costs

Key Finding #1: Community-Maintained Charts Lag Behind

Ollama Example (Sept 30, 2025):

Ollama Official Release:
- Latest: v0.12.3 (Sept 26, 2025)
- Performance: 32-600% improvements over v0.11.x
- Features: Web search API added

Ollama Helm Chart (maintained by "otwld" - community volunteer):
- Chart Version: ollama-1.29.0 (Sept 17, 2025)
- App Version: v0.11.11
- Status: 10+ versions behind official release
- Maintainer: GitHub user, NOT Ollama Inc.

Lag time: 9+ days (and counting).

ChromaDB Example:

ChromaDB Official Release:
- Latest: v1.1.1 (Sept 24, 2025)
- Previous: v1.1.0 (Sept 16, 2025)

ChromaDB Helm Chart (maintained by "amikos-tech" - consulting company):
- Chart Version: chromadb-0.1.24 (May 23, 2025)
- Supported: 1.0.x or later
- Status: No updates for 127 days

Lag time: 127+ days.

Key Finding #2: Vendor-Maintained Charts Have Zero Lag

vLLM Production-Stack:

vLLM Official Release:
- Latest: Production-Stack (Jan 2025)
- Maintainer: vLLM Project (vendor)
- Helm Chart: Officially maintained by vLLM
- Lag: ZERO days

NVIDIA GPU Operator:

NVIDIA Official Release:
- Latest: v24.9.0 (Sept 2025)
- Maintainer: NVIDIA Corporation
- Helm Chart: Officially maintained
- Lag: ZERO days
- Bonus: Better than native apt installs (zero-downtime upgrades)

The pattern was clear: When vendors maintain Helm charts, there’s no lag. When community volunteers maintain them, lag is 9-127+ days.

The Value Score Analysis

I built a comprehensive scoring matrix weighing five categories:

┌────────────────────────────────────────────────────────────┐
│       Value Score Analysis - September 30, 2025            │
└────────────────────────────────────────────────────────────┘

Category              Weight   Native   K3s    Winner
─────────────────────────────────────────────────────────────
Update Velocity       35%      8.6      5.6    NATIVE (-3.0)
Operational Mgmt      15%      5.8      9.8    K3S (+4.0)
Scalability           10%      2.5      9.6    K3S (+7.1)
Disaster Recovery     15%      4.3      8.8    K3S (+4.5)
Learning Burden       25%      7.6      5.5    NATIVE (-2.1)
─────────────────────────────────────────────────────────────
WEIGHTED TOTAL        100%     6.68     7.09   K3S (+0.41)

Result: K3s wins by 0.41 points… but it’s dangerously close.

The Hidden Costs

Update Velocity Tax: 9-39 days for critical AI services

Missing 32-600% performance improvements while waiting for chart updates
Depending on volunteer maintainers with no SLA
Risk of chart abandonment (ChromaDB: no updates in 127 days)

Complexity Tax: +30% operational overhead

Must understand Kubernetes API, Helm, StatefulSets, PVCs, GPU operators
Debugging through multiple abstraction layers
Monitoring two sources: official releases AND Helm chart updates

Dependency Tax: Your infrastructure depends on strangers

“otwld” maintains Ollama chart - who is that?
“amikos-tech” maintains ChromaDB chart - will they keep updating it?
What if they abandon the project?

September 30, 6:00 PM: The Hybrid Decision

After 6 hours of research, I made the call: Selective containerization.

Keep Native (update velocity critical):

✅ Ollama (port 11434)
   Reason: Community Helm chart lags 9+ days
   Impact: Miss performance improvements, new features

✅ ChromaDB (embedded + server)
   Reason: Community Helm chart lags 127+ days
   Impact: Severe staleness, chart might be abandoned

✅ Aider (CLI tool)
   Reason: No K8s benefit, interactive workflow

Move to K3s (vendor-maintained or operational benefit):

✅ vLLM
   Reason: Vendor-maintained production-stack, zero lag
   Status: Pending migration

✅ NVIDIA GPU Operator
   Reason: Vendor-maintained, better than native apt
   Benefit: Zero-downtime driver upgrades, automatic compatibility

✅ FastMCP (port 8002)
   Reason: Operational benefits (monitoring, resource limits)
   Not update-critical

✅ Redis
   Reason: Bitnami vendor chart, StatefulSet benefits

Already on K3s (no change needed):

✅ LibreChat + MongoDB + RAG API
✅ Prometheus + Grafana
✅ Elasticsearch + Kibana
✅ Jaeger + OpenTelemetry Collector
✅ Kafka

The final architecture: 70% native, 30% K3s - a deliberate hybrid.

The Rationale: Optimize for What Matters

Why keep Ollama native?

Ollama is the inference engine. It’s the core of the AI workflow. Missing v0.12.x performance improvements (32-600% faster) for 9+ days while waiting for a community maintainer to update a Helm chart is unacceptable.

Why keep ChromaDB native?

24,916 documents indexed. 127+ days without a Helm chart update suggests the maintainer has moved on. I can’t bet my knowledge base on abandoned infrastructure.

Why migrate vLLM to K3s?

vLLM Project maintains their own production-stack Helm chart. Zero lag, first-class Kubernetes support, and it was released in January 2025 specifically for production workloads.

Why migrate NVIDIA drivers to GPU Operator?

This is the exception where K8s is better than native:

Automated driver lifecycle management
Zero-downtime upgrades (no reboot required!)
Automatic CUDA compatibility checks
Rollback capability if upgrade fails

GPU Operator provides features that don’t exist with apt install nvidia-driver-560.

What Worked

Research-Driven Decision: Two comprehensive research documents captured the tradeoffs. Not guessing - analyzing real-world data (Ollama v0.12.3 vs Helm chart v0.11.11).

Value Score Matrix: Quantifying “Update Velocity” (35% weight) vs “Disaster Recovery” (15% weight) made the hybrid approach obvious.

Acknowledging Community Maintainers: “otwld” and “amikos-tech” are doing unpaid work to maintain Helm charts. The lag isn’t their fault - it’s the nature of volunteer efforts. Recognizing this helps set realistic expectations.

GPU Operator Discovery: Finding the ONE case where K8s is genuinely better than native (NVIDIA driver management) validated that selective containerization makes sense.

What Still Sucked

Fragmented Architecture: Half on K3s, half native. Two mental models. Two sets of tools (kubectl vs systemctl). Two monitoring approaches.

Maintenance Burden: Now I have to track:

Official Ollama releases (for native install)
Official vLLM releases (for K3s)
Helm chart updates (for containerized services)
GPU Operator compatibility (for driver management)

The “Incomplete” Feeling: It’s not elegant. It’s not “all-in” on Kubernetes. But it’s pragmatic.

The Numbers (September 30, 2025)

Metric	Value
Research Time	6 hours
Research Documents Created	2 (47 pages combined)
Services on K3s	10
Services Staying Native	3 (Ollama, ChromaDB, Aider)
Services Migrating to K3s	3 (vLLM, NVIDIA drivers, FastMCP)
Helm Chart Update Lag (Ollama)	9+ days
Helm Chart Update Lag (ChromaDB)	127+ days
Value Score (Hybrid)	7.8/10
Value Score (Full K3s)	7.1/10
Value Score (All Native)	5.9/10

★ Insight ───────────────────────────────────── The Hidden Cost of Abstraction:

Containerization sounds like a pure win: isolation, portability, orchestration. But there’s a cost most tutorials don’t mention:

When you containerize vendor software using community-maintained Helm charts, you introduce a third party into your update pipeline:

Vendor Release → Community Maintainer → Your Deployment
(Day 0)          (Day 9-127)            (When you notice)

Native installs bypass the middle layer:

Vendor Release → Your Deployment
(Day 0)          (Day 0)

The question isn’t “Is Kubernetes better?” It’s “Is the orchestration benefit worth the update lag for THIS service?”

For Ollama (core AI inference): No. For vLLM (vendor-maintained chart): Yes. For Prometheus (already containerized): Yes.

Blanket decisions fail. Selective decisions win. ─────────────────────────────────────────────────

What I Learned

1. “Modern infrastructure” doesn’t mean “containerize everything” The best architecture uses the right tool for each component. Sometimes that’s K8s. Sometimes it’s systemctl.

2. Community-maintained Helm charts are gifts, not guarantees “otwld” maintaining Ollama’s Helm chart is generous volunteer work. But depending on it for production means accepting 9+ day update lag.

3. Vendor-maintained charts change the equation vLLM and NVIDIA maintaining official Helm charts meant zero lag. If Ollama Inc. released an official chart tomorrow, I’d migrate immediately.

4. Research prevents regret 6 hours of research on September 30 prevented weeks of frustration from migrating Ollama to K3s, then waiting 9+ days for critical updates.

5. Hybrid architectures are valid (even if messy) 70% native, 30% K3s isn’t elegant. But it optimizes for update velocity (critical) while gaining orchestration benefits (nice-to-have).

Built on Open Source

This research episode relied on incredible open source projects and communities:

K3s by Rancher Labs - Lightweight Kubernetes that made single-node clusters practical for homelabs.

Ollama Helm Chart maintained by otwld - Community-maintained chart that, despite lag concerns, made Ollama deployment on K8s possible for thousands of users.

ChromaDB Helm Chart by amikos-tech - Open source effort to bring vector database orchestration to Kubernetes.

vLLM Production-Stack - Vendor-maintained Kubernetes deployment showing how official support eliminates update lag.

NVIDIA GPU Operator - Enterprise-grade GPU management that proved containerization can be BETTER than native.

Massive thanks to all maintainers - vendor-backed and community volunteers alike. Your work makes modern AI infrastructure possible.

What’s Next

September 30 ended with a decision: Hybrid architecture.

Immediate plans:

Keep Ollama native (avoid 9+ day lag)
Keep ChromaDB native (avoid 127+ day lag)
Migrate vLLM to K3s (vendor chart, zero lag)
Deploy NVIDIA GPU Operator (better than native)

Unknown at the time: By October 5, none of this would matter.

By October 5, 9:00 AM, K3s would have 6,812 pod restarts. By October 5, 10:00 AM, I’d discover the networking layer was completely broken. By October 5, 6:00 PM, I’d have rebuilt the entire cluster from scratch.

The hybrid architecture decision was sound. But the infrastructure beneath it was about to fail spectacularly.

This is Episode 5 of “Season 1: From Zero to Automated Infrastructure” - documenting the research that revealed containerization’s hidden costs.

Previous Episode: ChromaDB Weekend: From 504 to 24,916 Documents Next Episode: When Everything Crashes: The K3s Resurrection Complete Series: Season 1 Mapping Report

The Migration Question: When K3s Meets Reality