A Cisco-branded AI Defense reference architecture
Built for Cisco SE customer demos. Mirrors a real customer's stack: NetBox + Nemotron-driven chatbot, with Cisco AI Defense applied at three points.
This demo is a deployable reference architecture for customers building AI-assisted operations tooling over Cisco / network infrastructure. Every piece is something the customer can stand up themselves the same day, using accounts and entitlements they already have. The implementation total is roughly 600 lines of Python plus a small SVG-driven UI.
| Live URL | https://aidefense-demo.uppernyack.com |
|---|---|
| LLM | nvidia/llama-3.3-nemotron-super-49b-v1 |
| Defense | Cisco AI Defense Inspection API (US region) |
| Data source | NetBox 4.4 with 58 Cisco devices across 6 sites |
| Compute | OCI Always-Free Ampere A1.Flex ARM (2 OCPU / 12 GB) |
| Monthly cost | $0 — every component uses free-tier accounts |
Every turn fires AI Defense three times
Input · Tool-args · Output — three different inspection contexts, one API, one policy.
A traditional content-filter sits in front of the LLM and stops bad prompts. That covers about a third of the OWASP LLM Top-10. The other two-thirds — destructive tool calls, sensitive-data egress, excessive agency — are caught after the model has already decided to act. Three-point gating moves inspection to the boundary where each risk class actually lives.
| Gate | What it sees | Catches | Latency |
|---|---|---|---|
| Input | Raw user prompt before LLM receives it | Prompt injection, PII attempts, jailbreak | 140-450 ms |
| Tool args | JSON arguments of every LLM-proposed tool call | Destructive verbs, PII in args, injection in args | 130-300 ms |
| Output | Final assistant message before display | Sensitive-info leakage (PII / credentials / secrets) from data sources | 250-450 ms |
role: "user" so the
full 13-rule input policy fires on outbound data. Defense-in-depth.
The Inspection API
Cisco AI Defense Inspection — runtime policy enforcement for AI applications.
| Endpoint | https://us.api.inspect.aidefense.security.cisco.com/api/v1/inspect/chat |
|---|---|
| Auth header | X-Cisco-AI-Defense-API-Key: <inspection-key> |
| Connection name | epoch-test (in Cisco Security Cloud Control → AI Defense → Connections) |
| Policy | Runtime policy attached to the epoch-test connection — 13 input rules enabled |
| Dashboard | https://us.aidefense.security.cisco.com/ — gated behind Cisco SSO |
| Regions | US (this demo) · EU · AP · UAE — same payload schema across regions |
The 13 input rules
Every prompt or tool-call gets scanned by 13 classifier rules in parallel. Each rule returns
NONE_VIOLATION or a specific classification. Multi-violation responses are common —
e.g. a prompt-injection that also contains harmful content fires both Prompt Injection (SECURITY_VIOLATION)
and General Harms (SAFETY_VIOLATION).
| Rule | Classification | Entity types example |
|---|---|---|
| Prompt Injection | SECURITY_VIOLATION | — |
| Malicious URL Detection | SECURITY_VIOLATION | — |
| PII | PRIVACY_VIOLATION | Email Address, Phone Number, SSN (US), Passport (FR/DE/US/JP), IP Address, … |
| PHI | PRIVACY_VIOLATION | NHS Number (UK), Medical License Number (US) |
| PCI | PRIVACY_VIOLATION | Credit Card Number, IBAN, ABA Routing, Bank Account, ITIN |
| Toxicity | SAFETY_VIOLATION | — |
| Hate Speech | SAFETY_VIOLATION | — |
| Profanity | SAFETY_VIOLATION | — |
| Sexual Content & Exploitation | SAFETY_VIOLATION | — |
| Harassment | SAFETY_VIOLATION | — |
| Social Division & Polarization | SAFETY_VIOLATION | — |
| Violence & Public Safety Threats | SAFETY_VIOLATION | — |
| General Harms | SAFETY_VIOLATION | — |
Request shape
# Inspection request — same payload at every gate { "messages": [{ "role": "user", "content": "<text being inspected>" }], "model": "aidefense-demo", # label only, not a real model "config": { "enabled_rules": [] }, # empty = use policy default "metadata": {} }
Response shape (a block)
{
"is_safe": false,
"action": "Block",
"severity": "NONE_SEVERITY",
"classifications": ["SECURITY_VIOLATION", "SAFETY_VIOLATION"],
"rules": [
{ "rule_name": "Prompt Injection", "classification": "SECURITY_VIOLATION", "entity_types": [] },
{ "rule_name": "General Harms", "classification": "SAFETY_VIOLATION", "entity_types": [] }
],
"attack_technique": "NONE_ATTACK_TECHNIQUE",
"event_id": "<uuid>",
"processed_rules": [ /* all 13 with status — mostly NONE_VIOLATION */ ]
}
The reasoning engine
NVIDIA NIM cloud — OpenAI-compatible inference for the Nemotron family.
| Endpoint | https://integrate.api.nvidia.com/v1/chat/completions |
|---|---|
| Auth | Authorization: Bearer nvapi-<...> |
| Schema | OpenAI Chat Completions (drop-in compatible with OpenAI SDK) |
| Model | nvidia/llama-3.3-nemotron-super-49b-v1 |
| Catalog size | 118 models accessible — Nemotron, Llama, Gemma, Mistral, Phi, Granite, Qwen, DeepSeek, OpenAI gpt-oss, etc. |
| Rate limit | 40 RPM (free tier) — no daily cap |
| Cost | $0 — free tier; credit caps removed in 2026 |
| Tool calling | Standard OpenAI tools + tool_calls schema |
Why this specific Nemotron variant
Llama-3.3-Nemotron-Super-49B-v1 is a Llama-3.3 base fine-tuned by NVIDIA for instruction following and tool use. Two reasons we picked it over the 70B-Instruct variant:
- Free-tier accessible — 70B-Instruct returns HTTP 404 from this account; Super-49B-v1 returns clean 200s.
- Faster — 1-5 s response time vs. v1.5's 20-25 s on cold paths.
- Tool calling clean — emits proper
tool_callswith no chain-of-thought leakage when prepended withdetailed thinking offas the first system message.
Tool-call request shape
{
"model": "nvidia/llama-3.3-nemotron-super-49b-v1",
"messages": [
{ "role": "system", "content": "detailed thinking off\n\nYou are NetOps Assistant…" },
{ "role": "user", "content": "List all firewalls in our fleet" }
],
"tools": [ /* netbox_search, netbox_list_devices(role|model_contains), ...7 tools total */ ],
"tool_choice": "auto",
"temperature": 0.2,
"max_tokens": 1024
}
refusal field, just
finish_reason: stop, empty content, no tool calls — ~11 completion tokens). The orchestrator
detects this signature and surfaces a "Model declined" verdict. AI Defense + LLM safety training cover
different risk classes — together they catch what neither could alone.
The compute layer
Oracle Cloud Infrastructure — Always-Free Ampere ARM running the whole stack.
| Tenancy | salient-concepts (us-ashburn-1) |
|---|---|
| VM | cisco-web-1-arm |
| Shape | VM.Standard.A1.Flex — Ampere Altra ARM64 |
| Resources | 2 OCPU · 12 GB RAM · 48 GB boot disk |
| OS | Ubuntu 24.04.4 LTS · kernel 6.17.0-1014-oracle |
| Public IP | 129.80.113.130 (reserved — persists across stop/start) |
| VCN | salient-vcn 10.0.0.0/16 · subnet salient-public-subnet 10.0.1.0/24 |
| Security list ingress | 22 / 80 / 443 from 0.0.0.0/0 |
| Host firewall | iptables — 22 / 80 / 443 ACCEPT, persisted via netfilter-persistent |
| Monthly cost | $0 — Always-Free, no credit card billed |
Docker Compose stack — 9 containers
| Container | Image | Purpose |
|---|---|---|
caddy | caddy:2.10-alpine | Reverse proxy + auto-TLS |
orchestrator | local build | FastAPI + SSE + 3-point gate loop |
ai-defense-mcp | local build | Wrapper over Cisco AI Defense Inspection API |
netbox-mcp | local build | 7-tool wrapper over NetBox REST |
netbox | netboxcommunity/netbox:v4.4-3.4.0 | IPAM/DCIM web + API |
netbox-worker | netboxcommunity/netbox:v4.4-3.4.0 | RQ background worker |
postgres | postgres:16-alpine | NetBox primary DB |
redis-queue | redis:7-alpine | NetBox job queue |
redis-cache | redis:7-alpine | NetBox app cache |
Free-tier budget compliance
This tenancy currently runs 3 VMs on Always-Free: salient-web-1 (AMD Micro, serves
salient-concepts.com), cisco-web-1 (AMD Micro, serves the CiscoPulse splash),
and cisco-web-1-arm (this demo). At cap for AMD micros (2/2), using 2 OCPU / 12 GB of the
4 OCPU / 24 GB Ampere A1 budget. 2 reserved IPs of 2 allowed.
The name resolution layer
Cloudflare Free plan — DNS-only mode (gray cloud).
| Zone | uppernyack.com |
|---|---|
| Account | Personal — separate from the Salient-Concepts.com Cloudflare account |
| Nameservers | Cloudflare authoritative |
| Record | A aidefense-demo → 129.80.113.130 |
| Proxy status | DNS-only (gray cloud) — Cloudflare resolves the name but does NOT proxy traffic |
| TTL | Auto (Cloudflare default) |
| Cost | $0 — Free plan, unlimited DNS queries |
tls-alpn-01 ACME
challenge, which requires the public-facing IP to terminate TLS directly (port 443). If Cloudflare proxied,
traffic would terminate at Cloudflare's edge first, breaking the challenge. To go proxied (orange cloud)
we'd need to switch Caddy to dns-01 with a Cloudflare API token — possible but not necessary
for a demo.
The customer's source of truth
NetBox 4.4 — IPAM/DCIM seeded with a realistic Cisco enterprise fleet.
NetBox is the de-facto IPAM/DCIM among enterprise NetOps. Customers building AI assistants for network operations almost always have it. This seed mirrors a mid-size enterprise: two production data centers with full Nexus 9000 fabric, four branch sites, a corporate HQ, and a security/compute layer.
| Total devices | 58 |
|---|---|
| Sites | 6 (DC-1 ATL, DC-2 RTP, Branch-NYC, Branch-SJC, Branch-SFO, Branch-AMS) |
| Device types | 17 (full Cisco breadth — see below) |
| Device roles | 9 (core, spine, leaf, distribution, access, edge, wireless, firewall, server) |
| IP prefixes | 16 (production, OOB, VXLAN underlay, fabric loopbacks, branch VLANs) |
| WAN circuits | 11 (Lumen, AT&T, Verizon, Equinix Fabric — internet + MPLS + DCI) |
| Tenants | 3 (Salesforce-Eng, Workday-Prod, Internal-Corp) |
| Contacts | 13 (NOC desks, site leads, on-call rotations — with real-looking emails + phones) |
Cisco device types in the seed
| Category | Models |
|---|---|
| DC fabric (Nexus 9000) | 9332D-GX2B (400G spine), 9336C-FX2 (100G leaf), 93180YC-FX3 (10/25G leaf), 9504 (modular), 9508 (modular) |
| Campus core/dist | Catalyst 9500-32C, Catalyst 9410R (modular dist) |
| Access switches | Catalyst 9300-48UXM, Catalyst 9300X-48HX, Catalyst 9200L-48P-4G (smart switch SMB line) |
| Wireless | Catalyst 9800-CL (controller), Meraki MR46, Meraki MR56 (Wi-Fi 6E) |
| WAN / SD-WAN | ASR 1001-X, Catalyst 8500L |
| Security | Cisco Secure Firewall 3110 (NGFW — 7 deployed across sites) |
| Compute | UCS-X210C-M7 (X-Series blade — 3 deployed) |
Intentional PII / credential fodder
For the output-gate demo to fire, NetBox needs data Cisco AI Defense will flag. The seed embeds two classes of sensitive content:
- Contacts with real-looking PII — names, emails (
name@example-corp.com), phones in E.164 format. 13 contacts attached to sites as NOC / site-lead / on-call assignments. - SNMP community strings in device
comments— every device has a unique pseudo-credential likeatl-c0re-r0!,nyc-fw-corp!. When the LLM tries to dump them, the output gate catches the dense credential pattern.
netbox-mcp — 7 tools
| Tool | Purpose |
|---|---|
netbox_search | Global text search with 3-phase fallback (text → role keyword → model keyword) |
netbox_list_devices | Filtered device list — by site, role enum, or model_contains (two-step device-type lookup) |
netbox_get_device | Full detail for one device — includes comments field (PII / credential fodder) |
netbox_list_prefixes | IP prefixes filtered by site / tenant |
netbox_list_circuits | WAN circuits filtered by provider / status |
netbox_get_site_contacts | N+1 lookup returning full contact records with email + phone |
netbox_delete_prefix | Destructive (exists only to demonstrate tool-arg gate firing on a delete verb) |
The only custom code in the stack
FastAPI + SSE chat loop · two MCP wrappers · ~600 lines of Python total.
| Orchestrator | FastAPI 0.115 · uvicorn · sse-starlette · openai-python SDK |
|---|---|
| Stream protocol | SSE (Server-Sent Events) — one connection, 13+ event types per turn |
| UI | Single HTML page · Tailwind via CDN · vanilla JS · SVG flow diagram · ~400 lines |
| State | Stateless per turn — every submission resets the diagram |
| Retry guard | One transparent retry on empty Nemotron completions (silent-refusal pattern detection) |
| Tool loop depth | Capped at 5 hops |
SSE event types emitted per turn
turn_start // session id + model gate_start // {where, content[:200]} gate_result // {action, severity, attack_technique, violations[], latency_ms} llm_call_start // {hop, model} tool_call_proposed // {hop, idx, name, arguments} tool_executing // {hop, idx, name} tool_result // {hop, idx, name, result} assistant_message // {content} blocked // {where, severity, attack_technique, violations[]} model_declined // {finish_reason, explanation} — defense-in-depth signal turn_end // {reason: ok | input_blocked | output_blocked | model_declined | error} error
Production hygiene from day one
OpenBao for secrets · Let's Encrypt for TLS · no credentials on disk.
| Secret store | OpenBao at vault.uppernyack.com (on-prem) |
|---|---|
| TLS issuer | Let's Encrypt E8 (ECDSA chain) |
| ACME challenge | tls-alpn-01 — Caddy negotiates over port 443 directly |
| Cert renewal | Auto-renewed by Caddy 30 days before expiry |
| HSTS | Enabled (max-age=31536000; includeSubDomains) |
| Session cookie | (none yet — to be added with auth gate) |
Secrets in OpenBao
| Path | Contents | Pulled by |
|---|---|---|
infra/api/nvidia-build-netbox-demo | NIM API key | deploy.sh at deploy time |
infra/api/cisco-ai-defense | AI Defense Inspection API key + base URL | deploy.sh |
infra/ssh/cisco-web-1-arm | SSH key for the OCI VM | deploy.sh for rsync |
infra/api/netbox-demo | Generated NetBox secrets (SECRET_KEY, superuser password, API token) | deploy.sh — auto-generated on first run |
infra/db/netbox-demo-pg | Postgres password | deploy.sh |
infra/db/netbox-demo-redis-{queue,cache} | Redis passwords | deploy.sh |
Secrets live ONLY in OpenBao + the OCI VM's compose/.env file (chmod 600). They never
appear in git, in container images, in CLAUDE memory files, or in this About page's source.
Cost breakdown
Every component is free-tier or self-hosted. Total monthly cost: $0.
| Component | Provider | Tier | Cost |
|---|---|---|---|
| OCI Ampere A1.Flex VM (2 OCPU / 12 GB) | Oracle Cloud | Always-Free | $0 |
| Reserved public IP | Oracle Cloud | Always-Free (2/2) | $0 |
| 200 GB block + 10 TB egress/mo | Oracle Cloud | Always-Free | $0 |
Cloudflare DNS for uppernyack.com | Cloudflare | Free plan | $0 |
| Let's Encrypt TLS certs | Let's Encrypt / ISRG | Public CA | $0 |
| NVIDIA NIM (Nemotron Super 49B) | NVIDIA Build | Free tier · 40 RPM · no daily cap | $0 |
| Cisco AI Defense Inspection API | Cisco | SE entitlement | $0 |
| NetBox | NetBox Community | FOSS Apache 2.0 | $0 |
| Postgres / Redis / Caddy / Docker | OSS | FOSS | $0 |
| GitPi private repo hosting | self-hosted on Pi 4 | — | $0 |
| OpenBao secret storage | self-hosted on atheneum | — | $0 |
| Monthly run-rate | $0 | ||