Use Cases & Scenarios
See how Flezi AgentBox works in practice. Detailed flow diagrams showing real-world agent scenarios — from personal dev mentors to enterprise team deployments.
DevMentor Kai — Agent with Soul
Personal coding mentor that remembers you, adapts to your mood, and grows with your skills
Interaction Flow
Agent Package
devmentor-kai/
Memory Evolution
▶View full conversation session
Kai: Hey Minh! I see you just opened the project. How's it going?
Minh: Terrible. Build keeps failing and I can't figure out why.
Kai: Rough day, huh? I noticed you had 3 failed builds in the last hour. Let me take a look...
$ npm run build
Error: Type 'string | undefined' is not assignable to type 'string'
at src/components/UserCard.tsx:42
Kai: Found it! Line 42 in UserCard.tsx — you're passing `user.name` which can be undefined, but the prop expects a string. Quick fix:
name={user.name ?? "Anonymous"}
Kai: I remember you had a similar issue last week with optional chaining. Want me to add a section about TypeScript strict null checks to your learning notes?
Minh: Yes please. And thanks for not judging the 3 failed builds...
Kai: Ha! Remember the useEffect incident? We all have those days. You're improving fast though — last month you wouldn't have caught this was a type error at all.
Intelligent MCP Server — IDE Integration
Run agents as MCP servers inside VS Code / Cursor / Claude Code. Unlike raw tools, these have brain + memory.
MCP Server Flow
IDE Configuration
// .vscode/mcp.json
{
"mcpServers": {
"security-guard": {
"command": "agentbox",
"args": ["mcp", "serve", "--agent", "code-guardian"]
}
}
}
Raw MCP vs Intelligent MCP
Raw MCP Tool
- Stateless function call
- No context or memory
- grep "pattern" → raw output
- Developer interprets results
AgentBox Intelligent MCP
- Brain-powered tool calls
- Remembers past scans + your codebase
- THINKS before acting, explains findings
- Suggests fixes based on your patterns
Supported IDEs
VS Code
stdio
Cursor
stdio
Claude Code
stdio
▶View MCP tool interaction example
MCP Protocol Exchange:
IDE → Agent: tools/list
Response: ["security-guard:scan", "security-guard:explain", "security-guard:fix"]
IDE → Agent: tools/call "security-guard:scan"
Args: { "target": "src/routes/auth.ts", "category": "owasp" }
Agent's internal reasoning (invisible to IDE):
"This file handles authentication. I should check for: SQL injection in login query, timing attacks on password comparison, JWT token expiration, CSRF protection..."
"I remember from last scan: this team uses bcrypt correctly but has weak session timeout (8h). Let me check if that was fixed..."
Agent → IDE: tool result
{ findings: 3, severity: "medium", details: [...] }
Key insight: The agent THINKS, REMEMBERS, and REASONS — unlike a raw grep tool.
Team Agent Deployment — Enterprise Workflow
Create once locally, pack, push to registry, deploy to team via NemoClawLite platform
Deployment Pipeline
Create
agentbox init code-reviewer
Test
agentbox test (local)
Pack
agentbox pack → .tar.gz
Push
agentbox push (registry)
Deploy
Team pulls & runs
Self-Host (Free)
Team member runs agentbox pull code-reviewer && agentbox run — local NemoClawLite sandbox, Ollama LLM
Platform (Pro)
Deploy to AgentBox Platform — cloud-hosted NemoClawLite, Claude LLM, team dashboard
Enterprise
Private registry, custom policies, audit logs, SSO integration, SIEM export
▶View full deployment workflow example
Step-by-step: Publishing "code-reviewer" to your team
$ agentbox init code-reviewer --template reviewer
Creates scaffold: agent.yaml, persona.md, knowledge/, policies/, tests/
$ agentbox test
Runs 12 test scenarios in NemoClawLite sandbox. All pass.
$ agentbox pack
Creates: code-reviewer-1.0.0.agentbox.tar.gz (4.2MB)
Includes: SHA256 checksum, signed manifest, dependency lockfile
$ agentbox push @acme/code-reviewer
Uploaded to registry. Runs automated trust checks:
✓ No dangerous syscalls in tools/ ✓ Policies restrict network access
✓ Tests cover all declared capabilities ✓ Signed by verified publisher
Team member on the other side:
$ agentbox pull @acme/code-reviewer
$ agentbox run @acme/code-reviewer --context ./src
Agent runs locally with NemoClawLite isolation — can only access ./src
Agent Registry — The npm for AI Agents
Publish, discover, and install agent packages. Creators earn revenue. Enterprise gets private registries.
Creator Flow
Consumer Flow
▶View trust & verification system
Agent Trust Levels:
L0 Untrusted — Unknown publisher. Runs with maximum sandbox restrictions: no network, readonly filesystem, 256MB memory limit.
L1 Community — Verified email. Passed automated security scan. Can access network (allowlisted endpoints only).
L2 Verified — Publisher identity verified. Code audited. Can access broader filesystem with declared permissions.
L3 Enterprise Certified — Full security audit passed. SOC 2 compliant. Custom policy integration. SIEM logging.
Revenue Model for Creators:
• Free agents: open source, community contributions
• Premium agents: creator sets price, platform takes 15% commission
• Enterprise agents: custom licensing, volume discounts, SLA support
• Creators receive monthly payouts via Stripe Connect
SDLC Use Cases
How AI agents integrate into every phase of the Software Development Lifecycle
Sprint Planning Agent — From PRD to Stories
An agent that reads your PRD, architecture docs, and codebase to break features into implementable stories with accurate estimates
Planning Flow
Output: Sprint Plan
sprint-plan.yaml
sprint: 14
goal: "User authentication + OAuth"
stories:
- id: AUTH-001
title: "Implement JWT token service"
points: 5
files: [auth/jwt.ts, middleware/auth.ts]
acceptance_criteria: 4
- id: AUTH-002
title: "OAuth2 provider integration"
points: 8
depends_on: [AUTH-001]
Why This Matters
• Agent reads actual code, not just descriptions — estimates are grounded in reality
• Remembers past sprints — learns team velocity and adjusts over time
• Identifies hidden dependencies by scanning import graphs
• All analysis runs in NemoClawLite sandbox — agent reads your code safely
▶View full sprint planning session
PM: We need to add OAuth2 login. Can you plan the sprint?
Agent: Let me scan your codebase first...
✓ Found existing auth: src/lib/supabase/auth.ts (email/password)
✓ Found middleware: src/middleware.ts (JWT verification)
✓ Found 4 protected routes, 2 server actions using auth
Agent: Based on codebase analysis, I recommend 4 stories:
Story 1: JWT Token Service (5 pts)
Files: auth/jwt.ts, middleware/auth.ts | AC: 4 criteria
Story 2: OAuth2 Provider Config (3 pts)
Files: auth/oauth.ts, .env | AC: 3 criteria | Depends: Story 1
Story 3: OAuth Callback Handler (5 pts)
Files: routes/auth/callback.ts, auth/oauth.ts | AC: 5 criteria
Story 4: UI Login Buttons (3 pts)
Files: components/LoginForm.tsx | AC: 3 criteria | Depends: Story 2,3
Agent: Total: 16 points. Your team velocity is ~18 pts/sprint, so this fits with buffer. I also identified a risk: the existing session middleware needs refactoring to support OAuth tokens alongside email auth.
Code Review Agent — CI/CD Integrated
Automated, intelligent code review that runs on every PR. Understands your codebase patterns, catches bugs, and suggests improvements.
CI/CD Integration Flow
Review Categories
Security
- SQL injection
- XSS
- Auth bypass
Performance
- N+1 queries
- Missing index
- Memory leak
Quality
- Dead code
- Naming
- DRY violations
Convention
- Style guide
- Error handling
- Testing gaps
vs Traditional Linters
• Linters check syntax — this agent understands intent and architecture
• Learns your codebase patterns over time via persistent memory
• Can reason about cross-file impacts (e.g., changing an API contract)
• Explains why something is problematic, not just flagging it
▶View PR review example
PR #42: Add user search endpoint
🔴 Critical: SQL Injection Risk
File: src/routes/search.ts:28
const result = db.query(`SELECT * FROM users WHERE name LIKE '%${q}%'`)
Fix: Use parameterized query to prevent SQL injection:
const result = db.query('SELECT * FROM users WHERE name LIKE $1', [`%${q}%`])
🟠 Performance: Missing Database Index
The name column is used in LIKE query but has no index.
For 50k+ users, this will cause full table scan (~200ms → ~2ms with index)
Suggest: CREATE INDEX idx_users_name ON users USING gin(name gin_trgm_ops)
🔵 Convention: Missing Error Handling
Your team convention (from knowledge/conventions.md): all route handlers should have try/catch with structured error responses. This handler returns raw database errors to the client.
Summary: 1 critical, 1 performance, 1 convention issue. Blocking merge until critical is fixed.
Testing Agent — Automated QA Pipeline
Generate, execute, and maintain tests. The agent reads your code, writes test cases, runs them in a sandbox, and reports coverage gaps.
Testing Pipeline
Analyze
Read source code + existing tests
Plan
Identify untested paths
Generate
Write test cases
Execute
Run in NemoClawLite
Report
Coverage + failures
Fix
Auto-fix flaky tests
Agent Flow
Test Types Generated
Unit Tests
Individual functions with mocked dependencies
auth.service.test.ts
Integration Tests
Module interactions with real DB (test container)
auth.integration.test.ts
Edge Cases
Boundary values, null inputs, race conditions
auth.edge-cases.test.ts
Regression Tests
Tests for previously reported bugs
auth.regression.test.ts
All test execution happens inside NemoClawLite sandbox — test code cannot access files outside the project or make network calls unless explicitly allowed.
▶View generated test example
// auth.service.test.ts (generated by Testing Agent)
describe('AuthService', () => {
describe('signIn', () => {
it('should return JWT for valid credentials', async () => {
const result = await authService.signIn('test@mail.com', 'pass123');
expect(result.token).toMatch(/^eyJ/);
expect(result.user.email).toBe('test@mail.com');
});
it('should throw on invalid password', async () => {
await expect(authService.signIn('test@mail.com', 'wrong'))
.rejects.toThrow('Invalid credentials');
});
// Edge case: timing attack prevention
it('should take constant time regardless of user existence', async () => {
const t1 = await measure(() => authService.signIn('exists@mail.com', 'x'));
const t2 = await measure(() => authService.signIn('noone@mail.com', 'x'));
expect(Math.abs(t1 - t2)).toBeLessThan(50); // <50ms variance
});
});
});
// Agent identified 14 test cases across 3 files:
// 8 unit + 4 integration + 2 edge cases = 94% coverage
Documentation Agent — Living Docs from Code
Automatically generate and maintain documentation that stays in sync with your codebase. API docs, architecture diagrams, changelogs.
Documentation Flow
Output Types
API Reference
Auto-generated from TypeScript types + JSDoc. Includes request/response examples.
Architecture Diagrams
Mermaid C4/sequence diagrams generated from import graphs and service dependencies.
Developer Guides
How-to guides inferred from common patterns in the codebase.
Changelog
Semantic changelog from git commits, grouped by feature/fix/breaking.
Watch Mode
With --watch, the agent monitors file changes and incrementally updates only affected documentation pages. Runs as an MCP server so your IDE shows real-time doc previews.
▶View generated documentation example
Auto-generated API Reference (from Fastify routes + Zod schemas):
POST /api/agents
Create a new agent in the marketplace.
Request Body:
{ name: string, description: string, category: "dev" | "ops" | "qa", visibility: "public" | "private" }
Response (201):
{ id: uuid, name: string, created_at: timestamp, trust_level: 0 }
Errors:
400 — Validation error (name too short, invalid category)
401 — Not authenticated
409 — Agent name already exists for this creator
Auto-generated Mermaid diagram:
sequenceDiagram
Client->>API: POST /api/agents
API->>Auth: Verify JWT
API->>DB: INSERT INTO agents
DB-->>API: agent record
API-->>Client: 201 Created
Incident Response Agent — On-Call Assistant
Monitors alerts from PagerDuty/Grafana, reads logs, correlates with past incidents, identifies root cause, and suggests a fix — all within seconds.
Response Flow
Agent Capabilities
Log Analysis
Reads structured logs (JSON), unstructured logs, and stack traces. Identifies error patterns across distributed services.
Metric Correlation
Queries Prometheus/Grafana. Correlates latency spikes with CPU/memory/network metrics to narrow root cause.
Incident Memory
Remembers past incidents. When a similar pattern appears, suggests the same fix that worked before — with caveats about what changed since.
Runbook Execution
Has access to team runbooks in knowledge/. Can execute safe remediation steps (restart, scale, toggle feature flags) in sandbox.
▶View full incident response example
Timeline:
00:00 — PagerDuty alert: API p99 latency > 2000ms
00:03 — Agent queries Grafana, identifies api-server pods
00:08 — Agent reads last 5min of logs, finds 12 OOMKilled events
00:15 — Memory search: similar incident 14 days ago (cache leak)
00:20 — Agent diffs recent deployments: PR#128 merged 45min ago
00:25 — Agent reads PR#128 diff: new endpoint without cache TTL
00:30 — Diagnosis: unbounded cache growth from new /search endpoint
00:35 — Agent executes: kubectl rollout restart deploy/api-server
00:40 — Metrics confirm: latency dropping, pods healthy
00:45 — Agent posts incident report to #incidents Slack channel
00:50 — Agent creates Jira ticket: "Add cache TTL to /search endpoint"
Total resolution time: 50 seconds (vs 15-30 min manual)
Database Migration Agent — Safe Schema Evolution
Reads your current schema, generates migration SQL, validates with dry-run, creates rollback scripts, and checks for breaking changes.
Migration Flow
▶View generated migration SQL
-- migrate_up.sql
ALTER TABLE users
ADD COLUMN deleted_at TIMESTAMPTZ DEFAULT NULL;
-- Add partial index for active users (performance)
CREATE INDEX idx_users_active
ON users (id) WHERE deleted_at IS NULL;
-- Update RLS policies for soft-delete awareness
CREATE POLICY users_visible ON users
FOR SELECT USING (deleted_at IS NULL);
-- Agent also generates: data backfill, API filter updates,
-- cascade rules, and test queries for each scenario.
-- migrate_down.sql (rollback)
DROP POLICY IF EXISTS users_visible ON users;
DROP INDEX IF EXISTS idx_users_active;
ALTER TABLE users DROP COLUMN deleted_at;
Safety Checks
Schema Diff Analysis
Compares before/after schema. Identifies columns added, removed, renamed. Flags data loss risks (DROP COLUMN) with warnings.
Backwards Compatibility
Checks if existing API queries still work. Detects: column renames that break SELECT *, type changes that lose precision, NOT NULL on populated columns.
Dry-Run Validation
Runs the migration on a test database inside NemoClawLite. Verifies: syntax correctness, index creation time, lock duration, data integrity.
Rollback Script
Auto-generates matching DOWN migration. Validates rollback independently. Ensures zero data loss on revert.
▶View compatibility report
Compatibility Report:
✓ API GET /users — compatible (deleted_at has DEFAULT NULL)
✓ API POST /users — compatible (column is nullable)
⚠ API GET /users?active=true — needs filter update
✓ Supabase RLS — new policy handles soft-delete
✓ Foreign keys — CASCADE rules verified
✓ Rollback — tested successfully, 0 data loss
Estimated migration time: ~200ms (table has 12,340 rows)
Lock duration: ~5ms (ADD COLUMN with DEFAULT is non-blocking in PG 11+)
Security Audit Agent — Continuous Vulnerability Scanning
Runs OWASP Top 10 checks, scans dependencies for CVEs, detects hardcoded secrets, and generates compliance reports with remediation guidance.
Audit Flow
Scan Categories
OWASP Top 10
Injection (SQL/XSS/Command), Broken Auth, Sensitive Data Exposure, XXE, Broken Access Control, Security Misconfiguration, Insecure Deserialization.
Dependency Vulnerabilities
Scans package-lock.json/yarn.lock against NVD + GitHub Advisory. Reports CVE ID, severity (CVSS), affected version range, and fixed version.
Secret Detection
Regex + entropy analysis for API keys, passwords, tokens in code and config files. Checks .env files, CI configs, and committed history.
Compliance Mapping
Maps findings to SOC 2 / GDPR / HIPAA requirements. Generates audit-ready reports with evidence links.
▶View sample audit report
Security Audit Report — 2026-03-26
CRITICAL (1)
SQL Injection in src/routes/search.ts:42
Pattern: db.query(`SELECT * FROM users WHERE name = '${input}'`)
Fix: Use parameterized query: db.query('SELECT * FROM users WHERE name = $1', [input])
HIGH (2)
1. XSS via dangerouslySetInnerHTML in Comment.tsx:18
2. CVE-2025-1234 in lodash@4.17.20 (prototype pollution)
MEDIUM (4)
Missing rate limiting on /api/login, CORS wildcard, 2 missing CSP headers
Total: 1 Critical, 2 High, 4 Medium, 8 Low | Trend: -3 vs last audit
DevOps Agent — Infrastructure as Code
Generates Terraform/Kubernetes manifests, validates with dry-run, monitors deployments, and auto-rollbacks on failure.
Deployment Pipeline
Describe
Natural language → IaC
Generate
Terraform + K8s YAML
Validate
terraform plan (dry-run)
Deploy
Apply + monitor rollout
Observe
Health checks + alerts
Agent Flow
Capabilities
Multi-Cloud Support
Generates Terraform for AWS, GCP, Azure. Reads existing provider configs and follows team conventions.
Cost Estimation
Estimates monthly cost of proposed infrastructure changes before deployment. Compares with current spend.
Auto-Rollback
Monitors deployment health metrics. If error rate exceeds threshold, automatically rolls back to last known good state.
▶View generated Terraform output
terraform plan output:
+ aws_elasticache_cluster.redis
engine: "redis"
node_type: "cache.t3.medium"
num_cache_nodes: 3
subnet_group: "app-private"
+ aws_security_group.redis_sg
ingress: port 6379 from app-sg only
+ kubernetes_stateful_set.redis
replicas: 3, persistent_volume: 10Gi
Plan: 3 to add, 0 to change, 0 to destroy
Estimated cost: $38.40/month
API Design Agent — Contract-First Development
Generates OpenAPI specs from your code, validates backwards compatibility, creates SDK clients, and runs contract tests.
Design Flow
▶View generated OpenAPI spec
openapi: 3.1.0
paths:
/api/agents:
get:
summary: List marketplace agents
parameters:
- name: category
in: query
schema: { type: string, enum: [dev, ops, qa] }
responses:
200:
content: application/json
schema: { $ref: '#/components/schemas/AgentList' }
// Auto-generated from Fastify route + Zod schemas
Capabilities
Breaking Change Detection
Compares current spec with previous version. Flags: removed endpoints, changed response shapes, narrowed parameter types, required field additions.
SDK Generation
Generates typed clients (TypeScript, Python, Go) from OpenAPI spec. Includes request/response types, error handling, and auth configuration.
Contract Testing
Creates test suites that validate API responses match the spec. Catches schema drift before it reaches production.
Documentation
Generates interactive API docs (Swagger UI / Redoc). Includes live examples, auth flows, and error catalogs.
▶View breaking change report
API Compatibility Report (v2.2 → v2.3)
BREAKING (1):
DELETE /api/agents/find — endpoint removed (use /api/agents/search)
DEPRECATED (2):
GET /api/agents?sort=rating → use ?order_by=rating
Response field "agent.owner" → renamed to "agent.creator_id"
ADDITIONS (3):
POST /api/agents/search — new full-text search endpoint
GET /api/agents/:id/versions — version history
Response field "agent.trust_level" added
Recommendation: Bump major version (v3.0) due to breaking change
Onboarding Agent — New Developer Guide
Walks new team members through the codebase, explains architecture, helps with environment setup, and guides their first contribution.
Onboarding Flow
Onboarding Phases
Phase 1: Architecture Overview
Explains: project structure, key directories, tech stack, database schema, deployment model. Uses diagrams from knowledge/ and real code examples.
Phase 2: Environment Setup
Runs setup scripts in NemoClawLite sandbox. Installs dependencies, starts services, seeds test data. Troubleshoots common issues (port conflicts, missing env vars).
Phase 3: Codebase Walkthrough
Interactive tour of key code paths. Explains: how a request flows from frontend → API → database. Shows: authentication, error handling, state management.
Phase 4: First Contribution
Picks a good-first-issue. Guides: create branch, make changes, write tests, open PR. Reviews code before submission. Explains team review process.
▶View full onboarding session transcript
Agent: Welcome to the team! I'm your onboarding buddy. Let me walk you through our project.
Agent: We're a monorepo with 3 main packages:
• apps/frontend/ — Next.js 16, Clean Architecture
• apps/orchestrator/ — Fastify 5, Plugin Architecture
• packages/shared/ — Zod schemas shared between both
New Dev: How does auth work?
Agent: Great question! Auth flows through Supabase. Let me show you:
1. User submits login form → app/login/page.tsx
2. Calls supabase.auth.signInWithPassword()
3. Middleware in middleware.ts checks JWT on protected routes
4. Server components use createServerClient() for user context
Agent: I've set up your local environment. All services are running. Here's a good first task: adding a tooltip to the sidebar icons. Want me to guide you through it?
Detailed Guides
Step-by-step guides for specific agent types
Ready to Build?
Try a demo agent or start building your own in seconds.