ninoxAI
Open-source, local-first, read-only AI SRE that clusters alert storms into incidents, investigates root cause over live systems, and proposes human-gated fixes.
At a Glance
Fully open-source under Apache License 2.0 — free to self-host, fork, and build on.
Engagement
Available On
Alternatives
Listed Jun 2026
About ninoxAI
ninoxAI (project name: nightwatch) is a fully open-source AI Site Reliability Engineering layer released under the Apache License 2.0 by the ninoxAI organization. It sits above existing monitoring stacks — Checkmk, Prometheus, Icinga2, Zabbix, Docker, Kubernetes, AWS, Grafana, GitHub, and plain VMs — and answers the hardest on-call question: what broke, why, and what should be done next. The project is self-hosted, local-first, and enforces a strict read-only boundary: it never executes commands, acknowledges alerts, or writes back to production.
What It Is
ninoxAI is an AIOps incident-investigation tool that adds an agentic reasoning layer on top of existing monitoring infrastructure. Rather than replacing observability tooling, it ingests alerts from multiple sources, normalizes them onto a common schema, clusters related signals into a single incident, scores noisy checks, and then dispatches a tool-calling LLM agent to gather live evidence and form a root-cause hypothesis. Every proposed fix is a copy-pasteable artifact that a human must approve — unconditional auto-execution is explicitly out of scope by design.
How the Pipeline Works
The processing pipeline moves through six stages:
- Ingest — read-only adapters pull non-OK alerts from each connected source; JSON/CSV import is also supported.
- Normalize — every source is mapped onto a unified schema with message fingerprinting.
- Cluster — alerts are grouped by host, service, severity, and time window; optional semantic embeddings improve grouping quality.
- Noise scoring — frequency, ack-rate, ticket-rate, short-recovery, and flapping signals combine into a 0–1 noise score per check.
- Recommend — rule-based tuning recommendations with rationale and evidence are surfaced on the dashboard.
- Investigate — a tool-calling LLM runs a ReAct loop (reason → act → observe) over a typed allowlist of read-only capabilities to build a root-cause hypothesis and propose classified fixes.
Cross-tool correlation groups clusters that share the same host, severity, and time window into one incident labeled "confirmed by N tools."
The Read-Only Safety Model
Every action the AI SRE agent can take is classified as read_only, reversible, or irreversible, with a scope field representing blast radius. Unknown classifications coerce to irreversible — never silently auto-execute. Before any remote LLM call, a redaction layer scrubs hostnames, IPs, UUIDs, emails, and paths into deterministic placeholders; credentials are one-way scrubbed and never returned. A grounding gate caps confidence when claims are not backed by gathered evidence.
Distributed ninoxes — Reaching Air-Gapped Environments
The agent can investigate systems it cannot reach directly through lightweight outbound-only runner processes called "ninoxes." Each ninox lives inside one environment (a Kubernetes cluster, VPC, on-prem segment, or VM), holds that environment's credentials locally, and dials home to the ninoxAI brain — requiring no inbound firewall hole. Connected ninoxes appear in the Parliament of Owls dashboard view (/parliament).
Connectors and LLM Providers
Supported monitoring connectors (all read-only) include Checkmk, Prometheus Alertmanager, Icinga2, Zabbix, and a generic webhook receiver; a PRTG stub is noted as incomplete. The investigator's read-only capability surface covers Docker, Kubernetes (in-cluster RBAC), AWS (CloudTrail, EC2, security groups, quotas via IAM read-role), Grafana (PromQL + LogQL), GitHub (CI runs, releases, PRs), Git (commits, diffs, code search), and host-level metrics (CPU, memory, disk, processes, sockets, log tail).
LLM support is modular: the default template provider is fully offline with no API keys required, suitable for summaries and recommendations but not agent-driven investigation. Remote providers include Mistral, Anthropic (noted as the default for the investigator), and OpenAI-compatible endpoints covering Azure, vLLM, Ollama, and LM Studio. The project also supports extension via MCP servers, Python capability plugins, and the runner protocol.
Current Status
The repository was created in June 2026 and last pushed on June 7, 2026, indicating very early-stage active development. The project is self-described as fully open source under Apache 2.0, free to use, self-host, fork, and build on. Gated, governed remediation is listed on the roadmap; the PRTG connector is a stub. Community discussion takes place on Discord.
Community Discussions
Be the first to start a conversation about ninoxAI
Share your experience with ninoxAI, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully open-source under Apache License 2.0 — free to self-host, fork, and build on.
- Alert clustering and incident grouping
- Noise scoring and tuning recommendations
- Read-only AI SRE investigator
- Distributed ninox runners
- All monitoring connectors (Checkmk, Prometheus, Icinga2, Zabbix, Webhook)
Capabilities
Key Features
- Alert storm clustering into single incidents
- Read-only AI SRE investigator with tool-calling LLM
- Root-cause hypothesis generation
- Human-gated fix proposals with risk classification
- Noise scoring for flapping and over-sensitive checks
- Distributed ninox runners for air-gapped environments
- Cross-tool incident correlation
- Offline mode with no LLM or API keys required
- Secret scrubbing and redaction before remote LLM calls
- MCP server and capability plugin extensibility
- Docker Compose quickstart with synthetic mock alerts
- Parliament of Owls dashboard for connected runners
