AI-powered SRE assistant for Kubernetes and Prometheus in Discord

Automate DevOps.Troubleshoot incidents

AES-256secure credential encryption
< 10smean time to diagnose issues
100%isolated multi-tenant SaaS
Features

Intelligent
SRE operations.

Invite the SRE Copilot to your Discord server to secure credentials, diagnose container failures, and query metrics anomaly traces.

01

Multi-Tenant SaaS

Allows different Discord servers (guilds) to securely register their own independent clusters. Each server's settings, endpoints, and credentials are completely isolated.

100%tenant isolation
02

AES-256 Cryptography

Kubernetes configs uploaded via Discord are encrypted using Fernet symmetric authenticated cryptography before being saved in a database, ensuring credentials are secure at rest.

256-bitsymmetric encryption
03

Agentic Diagnostics

A stateful LangGraph multi-agent workflow powered by Gemini 2.5 analyzes container status, stdout logs, and metrics anomalies to produce action-oriented recommendations.

Gemini 2.5agentic reasoning model
04

Proactive Monitoring

A background loop watches pod health. Transitioning to unhealthy states (e.g., OOMKilled, CrashLoopBackOff) alerts designated channels, clearing once resolved.

24/7/365continuous cluster monitoring
RESULTS

Minimize
your MTTR.

0% reduction
Mean Time To Resolution (MTTR)
autonomous diagnostics in seconds
secure AES-256 data credentials
Multi-Tenant Isolation
0% isolated
instant alerts and resolution clears
Proactive Loop Monitoring
0/7 loop
Kubernetes ClustersPrometheus ServerFastAPI Backend APILangGraph & Gemini AIPostgreSQL DB+ ChromaDB Vector Store
Workflow

Register.Monitor.Diagnose.

AI Agents

Stateful Agents.
Expert Analysis.

A stateful multi-agent architecture built on LangGraph that leverages specialized Gemini 2.5 Flash agents to troubleshoot cluster errors.

Log Analysis Agent

Tail and parse active pod container stdout for system exceptions, warning messages, and crash trace logs.

Metrics Anomaly Agent

Queries Prometheus metrics dynamically to compute CPU spikes, memory usage spikes, latencies, and error rates.

RAG Runbook Agent

Performs similarity search checks against ChromaDB vectors to find historical incident runbooks.

Root Cause Coordinator

Synthesizes data inputs to formulate a confidence score and generates copy-paste CLI remediation commands.

Use Cases

Scenarios we diagnose.

The payment-service went into a database connection timeout crashloop. SRE Copilot analyzed the stdout logs, diagnosed the network issue, and provided the exact remediation instructions in under 10 seconds.
C

CrashLoopBackOff

Resolved by AI, payment-service

< 10sAutonomous diagnostics
Target Services

Troubleshoot cluster
incidents.

Invite the SRE Copilot Bot to your server, securely upload configurations, and let multi-agent AI diagnose your deployments in real time.

Self-hostable or cloud SaaS. Open-source under MIT.