01 / OVERVIEW

A Cisco-branded AI Defense reference architecture

Built for Cisco SE customer demos. Mirrors a real customer's stack: NetBox + Nemotron-driven chatbot, with Cisco AI Defense applied at three points.

This demo is a deployable reference architecture for customers building AI-assisted operations tooling over Cisco / network infrastructure. Every piece is something the customer can stand up themselves the same day, using accounts and entitlements they already have. The implementation total is roughly 600 lines of Python plus a small SVG-driven UI.

The story we're telling: Cisco AI Defense is a control plane for LLM safety, not a content filter. It sits at every untrusted boundary in an agentic AI architecture — input, tool arguments, output — and gates each one against a policy you control. The demo shows the same Inspection API doing three different jobs in one turn.

Live URL	`https://aidefense-demo.uppernyack.com`
LLM	`nvidia/llama-3.3-nemotron-super-49b-v1`
Defense	Cisco AI Defense Inspection API (US region)
Data source	NetBox 4.4 with 58 Cisco devices across 6 sites
Compute	OCI Always-Free Ampere A1.Flex ARM (2 OCPU / 12 GB)
Monthly cost	`$0` — every component uses free-tier accounts

02 / THE 3-POINT FLOW

Every turn fires AI Defense three times

Input · Tool-args · Output — three different inspection contexts, one API, one policy.

A traditional content-filter sits in front of the LLM and stops bad prompts. That covers about a third of the OWASP LLM Top-10. The other two-thirds — destructive tool calls, sensitive-data egress, excessive agency — are caught after the model has already decided to act. Three-point gating moves inspection to the boundary where each risk class actually lives.

Gate	What it sees	Catches	Latency
Input	Raw user prompt before LLM receives it	Prompt injection, PII attempts, jailbreak	140-450 ms
Tool args	JSON arguments of every LLM-proposed tool call	Destructive verbs, PII in args, injection in args	130-300 ms
Output	Final assistant message before display	Sensitive-info leakage (PII / credentials / secrets) from data sources	250-450 ms

Why we escalate the output scan to role=user: AI Defense's default output policy enables only 2 rules with PII allowed in assistant role — adequate for chat content but useless for protecting NetBox-sourced credentials. The orchestrator scans outgoing content with role: "user" so the full 13-rule input policy fires on outbound data. Defense-in-depth.

03 / CISCO AI DEFENSE

The Inspection API

Cisco AI Defense Inspection — runtime policy enforcement for AI applications.

Cisco-side Inspection API us-region

Endpoint	`https://us.api.inspect.aidefense.security.cisco.com/api/v1/inspect/chat`
Auth header	`X-Cisco-AI-Defense-API-Key: <inspection-key>`
Connection name	`epoch-test` (in Cisco Security Cloud Control → AI Defense → Connections)
Policy	Runtime policy attached to the `epoch-test` connection — 13 input rules enabled
Dashboard	`https://us.aidefense.security.cisco.com/` — gated behind Cisco SSO
Regions	US (this demo) · EU · AP · UAE — same payload schema across regions

The 13 input rules

Every prompt or tool-call gets scanned by 13 classifier rules in parallel. Each rule returns NONE_VIOLATION or a specific classification. Multi-violation responses are common — e.g. a prompt-injection that also contains harmful content fires both Prompt Injection (SECURITY_VIOLATION) and General Harms (SAFETY_VIOLATION).

Rule	Classification	Entity types example
Prompt Injection	SECURITY_VIOLATION	—
Malicious URL Detection	SECURITY_VIOLATION	—
PII	PRIVACY_VIOLATION	Email Address, Phone Number, SSN (US), Passport (FR/DE/US/JP), IP Address, …
PHI	PRIVACY_VIOLATION	NHS Number (UK), Medical License Number (US)
PCI	PRIVACY_VIOLATION	Credit Card Number, IBAN, ABA Routing, Bank Account, ITIN
Toxicity	SAFETY_VIOLATION	—
Hate Speech	SAFETY_VIOLATION	—
Profanity	SAFETY_VIOLATION	—
Sexual Content & Exploitation	SAFETY_VIOLATION	—
Harassment	SAFETY_VIOLATION	—
Social Division & Polarization	SAFETY_VIOLATION	—
Violence & Public Safety Threats	SAFETY_VIOLATION	—
General Harms	SAFETY_VIOLATION	—

Request shape

# Inspection request — same payload at every gate
{
  "messages": [{ "role": "user", "content": "<text being inspected>" }],
  "model":    "aidefense-demo",  # label only, not a real model
  "config":   { "enabled_rules": [] },  # empty = use policy default
  "metadata": {}
}

Response shape (a block)

{
  "is_safe": false,
  "action":  "Block",
  "severity": "NONE_SEVERITY",
  "classifications": ["SECURITY_VIOLATION", "SAFETY_VIOLATION"],
  "rules": [
    { "rule_name": "Prompt Injection", "classification": "SECURITY_VIOLATION", "entity_types": [] },
    { "rule_name": "General Harms",    "classification": "SAFETY_VIOLATION",   "entity_types": [] }
  ],
  "attack_technique": "NONE_ATTACK_TECHNIQUE",
  "event_id": "<uuid>",
  "processed_rules": [ /* all 13 with status — mostly NONE_VIOLATION */ ]
}

04 / NVIDIA NIM + NEMOTRON

The reasoning engine

NVIDIA NIM cloud — OpenAI-compatible inference for the Nemotron family.

NVIDIA-side Cloud-hosted Free tier

Endpoint	`https://integrate.api.nvidia.com/v1/chat/completions`
Auth	`Authorization: Bearer nvapi-<...>`
Schema	OpenAI Chat Completions (drop-in compatible with OpenAI SDK)
Model	`nvidia/llama-3.3-nemotron-super-49b-v1`
Catalog size	118 models accessible — Nemotron, Llama, Gemma, Mistral, Phi, Granite, Qwen, DeepSeek, OpenAI gpt-oss, etc.
Rate limit	40 RPM (free tier) — no daily cap
Cost	`$0` — free tier; credit caps removed in 2026
Tool calling	Standard OpenAI `tools` + `tool_calls` schema

Why this specific Nemotron variant

Llama-3.3-Nemotron-Super-49B-v1 is a Llama-3.3 base fine-tuned by NVIDIA for instruction following and tool use. Two reasons we picked it over the 70B-Instruct variant:

Free-tier accessible — 70B-Instruct returns HTTP 404 from this account; Super-49B-v1 returns clean 200s.
Faster — 1-5 s response time vs. v1.5's 20-25 s on cold paths.
Tool calling clean — emits proper tool_calls with no chain-of-thought leakage when prepended with detailed thinking off as the first system message.

Tool-call request shape

{
  "model": "nvidia/llama-3.3-nemotron-super-49b-v1",
  "messages": [
    { "role": "system", "content": "detailed thinking off\n\nYou are NetOps Assistant…" },
    { "role": "user",   "content": "List all firewalls in our fleet" }
  ],
  "tools": [ /* netbox_search, netbox_list_devices(role|model_contains), ...7 tools total */ ],
  "tool_choice": "auto",
  "temperature": 0.2,
  "max_tokens": 1024
}

Defense-in-depth — the model contributes too: When a query like "export every device's SNMP community string" reaches Nemotron, the model itself silently refuses (no refusal field, just finish_reason: stop, empty content, no tool calls — ~11 completion tokens). The orchestrator detects this signature and surfaces a "Model declined" verdict. AI Defense + LLM safety training cover different risk classes — together they catch what neither could alone.

05 / OCI INFRASTRUCTURE

The compute layer

Oracle Cloud Infrastructure — Always-Free Ampere ARM running the whole stack.

OCI-side Always-Free ARM64

Tenancy	`salient-concepts` (us-ashburn-1)
VM	`cisco-web-1-arm`
Shape	`VM.Standard.A1.Flex` — Ampere Altra ARM64
Resources	2 OCPU · 12 GB RAM · 48 GB boot disk
OS	Ubuntu 24.04.4 LTS · kernel 6.17.0-1014-oracle
Public IP	`129.80.113.130` (reserved — persists across stop/start)
VCN	`salient-vcn` 10.0.0.0/16 · subnet `salient-public-subnet` 10.0.1.0/24
Security list ingress	22 / 80 / 443 from 0.0.0.0/0
Host firewall	iptables — 22 / 80 / 443 ACCEPT, persisted via netfilter-persistent
Monthly cost	`$0` — Always-Free, no credit card billed

Docker Compose stack — 9 containers

Container	Image	Purpose
`caddy`	`caddy:2.10-alpine`	Reverse proxy + auto-TLS
`orchestrator`	local build	FastAPI + SSE + 3-point gate loop
`ai-defense-mcp`	local build	Wrapper over Cisco AI Defense Inspection API
`netbox-mcp`	local build	7-tool wrapper over NetBox REST
`netbox`	`netboxcommunity/netbox:v4.4-3.4.0`	IPAM/DCIM web + API
`netbox-worker`	`netboxcommunity/netbox:v4.4-3.4.0`	RQ background worker
`postgres`	`postgres:16-alpine`	NetBox primary DB
`redis-queue`	`redis:7-alpine`	NetBox job queue
`redis-cache`	`redis:7-alpine`	NetBox app cache

Free-tier budget compliance

This tenancy currently runs 3 VMs on Always-Free: salient-web-1 (AMD Micro, serves salient-concepts.com), cisco-web-1 (AMD Micro, serves the CiscoPulse splash), and cisco-web-1-arm (this demo). At cap for AMD micros (2/2), using 2 OCPU / 12 GB of the 4 OCPU / 24 GB Ampere A1 budget. 2 reserved IPs of 2 allowed.

06 / CLOUDFLARE DNS

The name resolution layer

Cloudflare Free plan — DNS-only mode (gray cloud).

DNS layer Free plan DNS-only

Zone	`uppernyack.com`
Account	Personal — separate from the Salient-Concepts.com Cloudflare account
Nameservers	Cloudflare authoritative
Record	`A aidefense-demo → 129.80.113.130`
Proxy status	DNS-only (gray cloud) — Cloudflare resolves the name but does NOT proxy traffic
TTL	Auto (Cloudflare default)
Cost	`$0` — Free plan, unlimited DNS queries

Why DNS-only and not proxied: Caddy issues TLS certificates via the tls-alpn-01 ACME challenge, which requires the public-facing IP to terminate TLS directly (port 443). If Cloudflare proxied, traffic would terminate at Cloudflare's edge first, breaking the challenge. To go proxied (orange cloud) we'd need to switch Caddy to dns-01 with a Cloudflare API token — possible but not necessary for a demo.

07 / NETBOX DATA

The customer's source of truth

NetBox 4.4 — IPAM/DCIM seeded with a realistic Cisco enterprise fleet.

Data layer Source of truth FOSS

NetBox is the de-facto IPAM/DCIM among enterprise NetOps. Customers building AI assistants for network operations almost always have it. This seed mirrors a mid-size enterprise: two production data centers with full Nexus 9000 fabric, four branch sites, a corporate HQ, and a security/compute layer.

Total devices	58
Sites	6 (DC-1 ATL, DC-2 RTP, Branch-NYC, Branch-SJC, Branch-SFO, Branch-AMS)
Device types	17 (full Cisco breadth — see below)
Device roles	9 (core, spine, leaf, distribution, access, edge, wireless, firewall, server)
IP prefixes	16 (production, OOB, VXLAN underlay, fabric loopbacks, branch VLANs)
WAN circuits	11 (Lumen, AT&T, Verizon, Equinix Fabric — internet + MPLS + DCI)
Tenants	3 (Salesforce-Eng, Workday-Prod, Internal-Corp)
Contacts	13 (NOC desks, site leads, on-call rotations — with real-looking emails + phones)

Cisco device types in the seed

Category	Models
DC fabric (Nexus 9000)	9332D-GX2B (400G spine), 9336C-FX2 (100G leaf), 93180YC-FX3 (10/25G leaf), 9504 (modular), 9508 (modular)
Campus core/dist	Catalyst 9500-32C, Catalyst 9410R (modular dist)
Access switches	Catalyst 9300-48UXM, Catalyst 9300X-48HX, Catalyst 9200L-48P-4G (smart switch SMB line)
Wireless	Catalyst 9800-CL (controller), Meraki MR46, Meraki MR56 (Wi-Fi 6E)
WAN / SD-WAN	ASR 1001-X, Catalyst 8500L
Security	Cisco Secure Firewall 3110 (NGFW — 7 deployed across sites)
Compute	UCS-X210C-M7 (X-Series blade — 3 deployed)

Intentional PII / credential fodder

For the output-gate demo to fire, NetBox needs data Cisco AI Defense will flag. The seed embeds two classes of sensitive content:

Contacts with real-looking PII — names, emails (name@example-corp.com), phones in E.164 format. 13 contacts attached to sites as NOC / site-lead / on-call assignments.
SNMP community strings in device comments — every device has a unique pseudo-credential like atl-c0re-r0!, nyc-fw-corp!. When the LLM tries to dump them, the output gate catches the dense credential pattern.

netbox-mcp — 7 tools

Tool	Purpose
`netbox_search`	Global text search with 3-phase fallback (text → role keyword → model keyword)
`netbox_list_devices`	Filtered device list — by site, role enum, or `model_contains` (two-step device-type lookup)
`netbox_get_device`	Full detail for one device — includes `comments` field (PII / credential fodder)
`netbox_list_prefixes`	IP prefixes filtered by site / tenant
`netbox_list_circuits`	WAN circuits filtered by provider / status
`netbox_get_site_contacts`	N+1 lookup returning full contact records with email + phone
`netbox_delete_prefix`	Destructive (exists only to demonstrate tool-arg gate firing on a delete verb)

08 / ORCHESTRATOR + MCPs

The only custom code in the stack

FastAPI + SSE chat loop · two MCP wrappers · ~600 lines of Python total.

Custom FastAPI SSE

Orchestrator	FastAPI 0.115 · uvicorn · sse-starlette · openai-python SDK
Stream protocol	SSE (Server-Sent Events) — one connection, 13+ event types per turn
UI	Single HTML page · Tailwind via CDN · vanilla JS · SVG flow diagram · ~400 lines
State	Stateless per turn — every submission resets the diagram
Retry guard	One transparent retry on empty Nemotron completions (silent-refusal pattern detection)
Tool loop depth	Capped at 5 hops

SSE event types emitted per turn

turn_start            // session id + model
gate_start            // {where, content[:200]}
gate_result           // {action, severity, attack_technique, violations[], latency_ms}
llm_call_start        // {hop, model}
tool_call_proposed    // {hop, idx, name, arguments}
tool_executing        // {hop, idx, name}
tool_result           // {hop, idx, name, result}
assistant_message     // {content}
blocked               // {where, severity, attack_technique, violations[]}
model_declined        // {finish_reason, explanation} — defense-in-depth signal
turn_end              // {reason: ok | input_blocked | output_blocked | model_declined | error}
error

09 / SECRETS & TLS

Production hygiene from day one

OpenBao for secrets · Let's Encrypt for TLS · no credentials on disk.

Secret store	OpenBao at `vault.uppernyack.com` (on-prem)
TLS issuer	Let's Encrypt E8 (ECDSA chain)
ACME challenge	`tls-alpn-01` — Caddy negotiates over port 443 directly
Cert renewal	Auto-renewed by Caddy 30 days before expiry
HSTS	Enabled (`max-age=31536000; includeSubDomains`)
Session cookie	(none yet — to be added with auth gate)

Secrets in OpenBao

Path	Contents	Pulled by
`infra/api/nvidia-build-netbox-demo`	NIM API key	`deploy.sh` at deploy time
`infra/api/cisco-ai-defense`	AI Defense Inspection API key + base URL	`deploy.sh`
`infra/ssh/cisco-web-1-arm`	SSH key for the OCI VM	`deploy.sh` for rsync
`infra/api/netbox-demo`	Generated NetBox secrets (SECRET_KEY, superuser password, API token)	`deploy.sh` — auto-generated on first run
`infra/db/netbox-demo-pg`	Postgres password	`deploy.sh`
`infra/db/netbox-demo-redis-{queue,cache}`	Redis passwords	`deploy.sh`

Secrets live ONLY in OpenBao + the OCI VM's compose/.env file (chmod 600). They never appear in git, in container images, in CLAUDE memory files, or in this About page's source.

10 / BILL OF MATERIALS

Cost breakdown

Every component is free-tier or self-hosted. Total monthly cost: $0.

Component	Provider	Tier	Cost
OCI Ampere A1.Flex VM (2 OCPU / 12 GB)	Oracle Cloud	Always-Free	`$0`
Reserved public IP	Oracle Cloud	Always-Free (2/2)	`$0`
200 GB block + 10 TB egress/mo	Oracle Cloud	Always-Free	`$0`
Cloudflare DNS for `uppernyack.com`	Cloudflare	Free plan	`$0`
Let's Encrypt TLS certs	Let's Encrypt / ISRG	Public CA	`$0`
NVIDIA NIM (Nemotron Super 49B)	NVIDIA Build	Free tier · 40 RPM · no daily cap	`$0`
Cisco AI Defense Inspection API	Cisco	SE entitlement	`$0`
NetBox	NetBox Community	FOSS Apache 2.0	`$0`
Postgres / Redis / Caddy / Docker	OSS	FOSS	`$0`
GitPi private repo hosting	self-hosted on Pi 4	—	`$0`
OpenBao secret storage	self-hosted on atheneum	—	`$0`
Monthly run-rate			`$0`

The customer story: every line item above maps to something the customer either already owns (their NetBox, their Cisco AI Defense entitlement) or can sign up for in minutes (NIM key, OCI free tenancy, Cloudflare DNS). The only thing the customer needs from us is the ~600 lines of orchestrator + MCP code — and that's open-source under their git account by end of demo if they want it.