# ninoxAI

> Open-source, local-first, read-only AI SRE that clusters alert storms into incidents, investigates root cause over live systems, and proposes human-gated fixes.

ninoxAI (project name: nightwatch) is a fully open-source AI Site Reliability Engineering layer released under the Apache License 2.0 by the ninoxAI organization. It sits above existing monitoring stacks — Checkmk, Prometheus, Icinga2, Zabbix, Docker, Kubernetes, AWS, Grafana, GitHub, and plain VMs — and answers the hardest on-call question: what broke, why, and what should be done next. The project is self-hosted, local-first, and enforces a strict read-only boundary: it never executes commands, acknowledges alerts, or writes back to production.

## What It Is

ninoxAI is an AIOps incident-investigation tool that adds an agentic reasoning layer on top of existing monitoring infrastructure. Rather than replacing observability tooling, it ingests alerts from multiple sources, normalizes them onto a common schema, clusters related signals into a single incident, scores noisy checks, and then dispatches a tool-calling LLM agent to gather live evidence and form a root-cause hypothesis. Every proposed fix is a copy-pasteable artifact that a human must approve — unconditional auto-execution is explicitly out of scope by design.

## How the Pipeline Works

The processing pipeline moves through six stages:

- **Ingest** — read-only adapters pull non-OK alerts from each connected source; JSON/CSV import is also supported.
- **Normalize** — every source is mapped onto a unified schema with message fingerprinting.
- **Cluster** — alerts are grouped by host, service, severity, and time window; optional semantic embeddings improve grouping quality.
- **Noise scoring** — frequency, ack-rate, ticket-rate, short-recovery, and flapping signals combine into a 0–1 noise score per check.
- **Recommend** — rule-based tuning recommendations with rationale and evidence are surfaced on the dashboard.
- **Investigate** — a tool-calling LLM runs a ReAct loop (reason → act → observe) over a typed allowlist of read-only capabilities to build a root-cause hypothesis and propose classified fixes.

Cross-tool correlation groups clusters that share the same host, severity, and time window into one incident labeled "confirmed by N tools."

## The Read-Only Safety Model

Every action the AI SRE agent can take is classified as `read_only`, `reversible`, or `irreversible`, with a `scope` field representing blast radius. Unknown classifications coerce to `irreversible` — never silently auto-execute. Before any remote LLM call, a redaction layer scrubs hostnames, IPs, UUIDs, emails, and paths into deterministic placeholders; credentials are one-way scrubbed and never returned. A grounding gate caps confidence when claims are not backed by gathered evidence.

## Distributed ninoxes — Reaching Air-Gapped Environments

The agent can investigate systems it cannot reach directly through lightweight outbound-only runner processes called "ninoxes." Each ninox lives inside one environment (a Kubernetes cluster, VPC, on-prem segment, or VM), holds that environment's credentials locally, and dials home to the ninoxAI brain — requiring no inbound firewall hole. Connected ninoxes appear in the Parliament of Owls dashboard view (`/parliament`).

## Connectors and LLM Providers

Supported monitoring connectors (all read-only) include Checkmk, Prometheus Alertmanager, Icinga2, Zabbix, and a generic webhook receiver; a PRTG stub is noted as incomplete. The investigator's read-only capability surface covers Docker, Kubernetes (in-cluster RBAC), AWS (CloudTrail, EC2, security groups, quotas via IAM read-role), Grafana (PromQL + LogQL), GitHub (CI runs, releases, PRs), Git (commits, diffs, code search), and host-level metrics (CPU, memory, disk, processes, sockets, log tail).

LLM support is modular: the default `template` provider is fully offline with no API keys required, suitable for summaries and recommendations but not agent-driven investigation. Remote providers include Mistral, Anthropic (noted as the default for the investigator), and OpenAI-compatible endpoints covering Azure, vLLM, Ollama, and LM Studio. The project also supports extension via MCP servers, Python capability plugins, and the runner protocol.

## Current Status

The repository was created in June 2026 and last pushed on June 7, 2026, indicating very early-stage active development. The project is self-described as fully open source under Apache 2.0, free to use, self-host, fork, and build on. Gated, governed remediation is listed on the roadmap; the PRTG connector is a stub. Community discussion takes place on Discord.

## Features
- Alert storm clustering into single incidents
- Read-only AI SRE investigator with tool-calling LLM
- Root-cause hypothesis generation
- Human-gated fix proposals with risk classification
- Noise scoring for flapping and over-sensitive checks
- Distributed ninox runners for air-gapped environments
- Cross-tool incident correlation
- Offline mode with no LLM or API keys required
- Secret scrubbing and redaction before remote LLM calls
- MCP server and capability plugin extensibility
- Docker Compose quickstart with synthetic mock alerts
- Parliament of Owls dashboard for connected runners

## Integrations
Checkmk, Prometheus Alertmanager, Icinga2, Zabbix, Generic Webhook, Docker, Kubernetes, AWS (CloudTrail, EC2, IAM), Grafana (PromQL, LogQL), GitHub, Git, Anthropic, OpenAI, Mistral, Ollama, vLLM, LM Studio, Azure OpenAI, MCP servers

## Platforms
CLI, WEB, API, LINUX, MACOS, WINDOWS

## Pricing
Open Source

## Version
main

## Links
- Website: https://github.com/ninoxAI/nightwatch
- Documentation: https://github.com/ninoxAI/nightwatch/blob/main/docs/README.md
- Repository: https://github.com/ninoxAI/nightwatch
- EveryDev.ai: https://www.everydev.ai/tools/ninoxai
