KubeGraf
Autonomous AI SRE platform for Kubernetes that detects incidents, performs evidence-backed root cause analysis, and delivers dry-run validated SafeFix™ remediations — all running locally without cloud dependency.
At a Glance
Pricing
Forever free, no account needed.
Engagement
Available On
Developer
Listed Mar 2026
About KubeGraf
KubeGraf is an autonomous, always-on AI SRE platform built exclusively for Kubernetes. It detects incidents like CrashLoopBackOff, OOMKilled, and probe failures, correlates multi-source signals (logs, metrics, traces, events), and delivers evidence-backed SafeFix™ remediations with dry-run validation and human-in-the-loop approval. Powered by OrkasAI, KubeGraf runs entirely local-first — your cluster data never leaves your environment.
- SafeFix™ Remediation — Generates YAML diff previews, blast radius analysis, confidence scores, and one-command rollback for every recommended fix before you apply anything.
- Evidence-Based Root Cause Analysis — Correlates logs, Kubernetes events, metrics, traces, and recent deployments into a reproducible evidence chain with confidence scores — not a black box.
- Dry-Run Validation — Simulates every fix using kubectl diff integration before execution; shows exact changes and potential side effects with zero risk.
- Anomaly Fingerprinting — Detects recurring failure patterns and builds fingerprints to auto-recognize similar incidents, cutting diagnosis time on repeat failures.
- Multi-Cluster Management — Investigate and remediate incidents across multiple clusters from a single interface without losing investigation context.
- BYOK AI Engine — Bring your own API key from OpenAI, Anthropic, Gemini, or Ollama; AI calls go directly from your machine to your provider — KubeGraf never sees your key or queries.
- Terminal UI + Web Dashboard — Use the keyboard-driven TUI during live incidents and the browser-based web dashboard for post-mortems and trend analysis.
- Knowledge Bank — Local SQLite database stores all incident history; search by pod, namespace, error type, or fix; export reports for post-mortems.
- RBAC-Aware Operations — Respects your cluster's RBAC policies; suggested fixes adapt to what your user can actually apply.
- Full Audit Trail — Every analysis, recommendation, and applied fix is logged with timestamps and user context for compliance and post-mortems.
- GitOps Integration — Sync fixes to Git via ArgoCD or Flux; supports Helm, Istio, Cilium, Nginx, and all major cloud Kubernetes providers (EKS, GKE, AKS, OpenShift, K3s).
Community Discussions
Be the first to start a conversation about KubeGraf
Share your experience with KubeGraf, ask questions, or help others learn from your insights.
Pricing
Free Plan Available
Forever free, no account needed.
- Full kubectl terminal + all workload views
- SafeFix Engine + graph-based incident detection
- Knowledge Bank + custom app deployment
- GitOps — sync fixes to Git (ArgoCD / Flux)
- 25 BYOK AI investigations / mo
Pro
1 seat, billed $229/year ($19/mo equivalent). Save 34%.
- Everything in Free · 1 seat
- Unlimited BYOK AI investigations
- ML Insights — anomaly predictions & timeline
- DB export & import (portable encrypted backup)
Team
3-seat minimum at $189/seat/yr ($567/yr base). Save 34%.
- Everything in Pro · 3–50 seats
- Priority email support
Enterprise
Custom pricing — tailored to your needs.
- Unlimited seats
- Self-hosted or cloud deploy
- Custom BYOK AI volume + key management
- Custom SLA + priority escalation
- Dedicated success manager
Capabilities
Key Features
- Autonomous incident detection (CrashLoopBackOff, OOMKilled, ImagePullBackOff, probe failures)
- SafeFix™ dry-run validated remediations with YAML diff preview
- Evidence-based root cause analysis with confidence scores
- Multi-source signal correlation (logs, metrics, traces, events)
- Anomaly fingerprinting for recurring failure patterns
- BYOK AI engine (OpenAI, Anthropic, Gemini, Ollama)
- Terminal UI and web dashboard
- Local-first architecture — zero data exfiltration
- Knowledge Bank with SQLite incident history
- Multi-cluster management
- RBAC-aware operations
- Full audit trail
- GitOps integration (ArgoCD, Flux)
- One-command rollback
- Human-in-the-loop approval for all changes
Integrations
Demo Video

