# KubeGraf > Autonomous AI SRE platform for Kubernetes that detects incidents, performs evidence-backed root cause analysis, and delivers dry-run validated SafeFix™ remediations — all running locally without cloud dependency. KubeGraf is an autonomous, always-on AI SRE platform built exclusively for Kubernetes. It detects incidents like CrashLoopBackOff, OOMKilled, and probe failures, correlates multi-source signals (logs, metrics, traces, events), and delivers evidence-backed SafeFix™ remediations with dry-run validation and human-in-the-loop approval. Powered by OrkasAI, KubeGraf runs entirely local-first — your cluster data never leaves your environment. - **SafeFix™ Remediation** — *Generates YAML diff previews, blast radius analysis, confidence scores, and one-command rollback for every recommended fix before you apply anything.* - **Evidence-Based Root Cause Analysis** — *Correlates logs, Kubernetes events, metrics, traces, and recent deployments into a reproducible evidence chain with confidence scores — not a black box.* - **Dry-Run Validation** — *Simulates every fix using kubectl diff integration before execution; shows exact changes and potential side effects with zero risk.* - **Anomaly Fingerprinting** — *Detects recurring failure patterns and builds fingerprints to auto-recognize similar incidents, cutting diagnosis time on repeat failures.* - **Multi-Cluster Management** — *Investigate and remediate incidents across multiple clusters from a single interface without losing investigation context.* - **BYOK AI Engine** — *Bring your own API key from OpenAI, Anthropic, Gemini, or Ollama; AI calls go directly from your machine to your provider — KubeGraf never sees your key or queries.* - **Terminal UI + Web Dashboard** — *Use the keyboard-driven TUI during live incidents and the browser-based web dashboard for post-mortems and trend analysis.* - **Knowledge Bank** — *Local SQLite database stores all incident history; search by pod, namespace, error type, or fix; export reports for post-mortems.* - **RBAC-Aware Operations** — *Respects your cluster's RBAC policies; suggested fixes adapt to what your user can actually apply.* - **Full Audit Trail** — *Every analysis, recommendation, and applied fix is logged with timestamps and user context for compliance and post-mortems.* - **GitOps Integration** — *Sync fixes to Git via ArgoCD or Flux; supports Helm, Istio, Cilium, Nginx, and all major cloud Kubernetes providers (EKS, GKE, AKS, OpenShift, K3s).* ## Features - Autonomous incident detection (CrashLoopBackOff, OOMKilled, ImagePullBackOff, probe failures) - SafeFix™ dry-run validated remediations with YAML diff preview - Evidence-based root cause analysis with confidence scores - Multi-source signal correlation (logs, metrics, traces, events) - Anomaly fingerprinting for recurring failure patterns - BYOK AI engine (OpenAI, Anthropic, Gemini, Ollama) - Terminal UI and web dashboard - Local-first architecture — zero data exfiltration - Knowledge Bank with SQLite incident history - Multi-cluster management - RBAC-aware operations - Full audit trail - GitOps integration (ArgoCD, Flux) - One-command rollback - Human-in-the-loop approval for all changes ## Integrations AWS EKS, Google GKE, Azure AKS, Rancher, OpenShift, K3s, Helm, ArgoCD, Flux, Istio, Cilium, Nginx, Prometheus, OpenTelemetry, Grafana, OpenAI, Anthropic, Gemini, Ollama ## Platforms WINDOWS, MACOS, LINUX, WEB, API, CLI ## Pricing Freemium — Free tier available with paid upgrades ## Version v1.0.0 ## Links - Website: https://kubegraf.io - Documentation: https://kubegraf.io/docs/ - Repository: https://github.com/kubegraf/kubegraf - EveryDev.ai: https://www.everydev.ai/tools/kubegraf