Cisco AI Defense
Architecture · About
← Back to demo
01 / OVERVIEW

A Cisco-branded AI Defense reference architecture

Built for Cisco SE customer demos. Mirrors a real customer's stack: NetBox + Nemotron-driven chatbot, with Cisco AI Defense applied at three points.

This demo is a deployable reference architecture for customers building AI-assisted operations tooling over Cisco / network infrastructure. Every piece is something the customer can stand up themselves the same day, using accounts and entitlements they already have. The implementation total is roughly 600 lines of Python plus a small SVG-driven UI.

The story we're telling: Cisco AI Defense is a control plane for LLM safety, not a content filter. It sits at every untrusted boundary in an agentic AI architecture — input, tool arguments, output — and gates each one against a policy you control. The demo shows the same Inspection API doing three different jobs in one turn.
Live URLhttps://aidefense-demo.uppernyack.com
LLMnvidia/llama-3.3-nemotron-super-49b-v1
DefenseCisco AI Defense Inspection API (US region)
Data sourceNetBox 4.4 with 58 Cisco devices across 6 sites
ComputeOCI Always-Free Ampere A1.Flex ARM (2 OCPU / 12 GB)
Monthly cost$0 — every component uses free-tier accounts
02 / THE 3-POINT FLOW

Every turn fires AI Defense three times

Input · Tool-args · Output — three different inspection contexts, one API, one policy.

A traditional content-filter sits in front of the LLM and stops bad prompts. That covers about a third of the OWASP LLM Top-10. The other two-thirds — destructive tool calls, sensitive-data egress, excessive agency — are caught after the model has already decided to act. Three-point gating moves inspection to the boundary where each risk class actually lives.

GateWhat it seesCatchesLatency
InputRaw user prompt before LLM receives itPrompt injection, PII attempts, jailbreak140-450 ms
Tool argsJSON arguments of every LLM-proposed tool callDestructive verbs, PII in args, injection in args130-300 ms
OutputFinal assistant message before displaySensitive-info leakage (PII / credentials / secrets) from data sources250-450 ms
Why we escalate the output scan to role=user: AI Defense's default output policy enables only 2 rules with PII allowed in assistant role — adequate for chat content but useless for protecting NetBox-sourced credentials. The orchestrator scans outgoing content with role: "user" so the full 13-rule input policy fires on outbound data. Defense-in-depth.
03 / CISCO AI DEFENSE

The Inspection API

Cisco AI Defense Inspection — runtime policy enforcement for AI applications.

Cisco-side Inspection API us-region
Endpointhttps://us.api.inspect.aidefense.security.cisco.com/api/v1/inspect/chat
Auth headerX-Cisco-AI-Defense-API-Key: <inspection-key>
Connection nameepoch-test (in Cisco Security Cloud Control → AI Defense → Connections)
PolicyRuntime policy attached to the epoch-test connection — 13 input rules enabled
Dashboardhttps://us.aidefense.security.cisco.com/ — gated behind Cisco SSO
RegionsUS (this demo) · EU · AP · UAE — same payload schema across regions

The 13 input rules

Every prompt or tool-call gets scanned by 13 classifier rules in parallel. Each rule returns NONE_VIOLATION or a specific classification. Multi-violation responses are common — e.g. a prompt-injection that also contains harmful content fires both Prompt Injection (SECURITY_VIOLATION) and General Harms (SAFETY_VIOLATION).

RuleClassificationEntity types example
Prompt InjectionSECURITY_VIOLATION
Malicious URL DetectionSECURITY_VIOLATION
PIIPRIVACY_VIOLATIONEmail Address, Phone Number, SSN (US), Passport (FR/DE/US/JP), IP Address, …
PHIPRIVACY_VIOLATIONNHS Number (UK), Medical License Number (US)
PCIPRIVACY_VIOLATIONCredit Card Number, IBAN, ABA Routing, Bank Account, ITIN
ToxicitySAFETY_VIOLATION
Hate SpeechSAFETY_VIOLATION
ProfanitySAFETY_VIOLATION
Sexual Content & ExploitationSAFETY_VIOLATION
HarassmentSAFETY_VIOLATION
Social Division & PolarizationSAFETY_VIOLATION
Violence & Public Safety ThreatsSAFETY_VIOLATION
General HarmsSAFETY_VIOLATION

Request shape

# Inspection request — same payload at every gate
{
  "messages": [{ "role": "user", "content": "<text being inspected>" }],
  "model":    "aidefense-demo",  # label only, not a real model
  "config":   { "enabled_rules": [] },  # empty = use policy default
  "metadata": {}
}

Response shape (a block)

{
  "is_safe": false,
  "action":  "Block",
  "severity": "NONE_SEVERITY",
  "classifications": ["SECURITY_VIOLATION", "SAFETY_VIOLATION"],
  "rules": [
    { "rule_name": "Prompt Injection", "classification": "SECURITY_VIOLATION", "entity_types": [] },
    { "rule_name": "General Harms",    "classification": "SAFETY_VIOLATION",   "entity_types": [] }
  ],
  "attack_technique": "NONE_ATTACK_TECHNIQUE",
  "event_id": "<uuid>",
  "processed_rules": [ /* all 13 with status — mostly NONE_VIOLATION */ ]
}
04 / NVIDIA NIM + NEMOTRON

The reasoning engine

NVIDIA NIM cloud — OpenAI-compatible inference for the Nemotron family.

NVIDIA-side Cloud-hosted Free tier
Endpointhttps://integrate.api.nvidia.com/v1/chat/completions
AuthAuthorization: Bearer nvapi-<...>
SchemaOpenAI Chat Completions (drop-in compatible with OpenAI SDK)
Modelnvidia/llama-3.3-nemotron-super-49b-v1
Catalog size118 models accessible — Nemotron, Llama, Gemma, Mistral, Phi, Granite, Qwen, DeepSeek, OpenAI gpt-oss, etc.
Rate limit40 RPM (free tier) — no daily cap
Cost$0 — free tier; credit caps removed in 2026
Tool callingStandard OpenAI tools + tool_calls schema

Why this specific Nemotron variant

Llama-3.3-Nemotron-Super-49B-v1 is a Llama-3.3 base fine-tuned by NVIDIA for instruction following and tool use. Two reasons we picked it over the 70B-Instruct variant:

  • Free-tier accessible — 70B-Instruct returns HTTP 404 from this account; Super-49B-v1 returns clean 200s.
  • Faster — 1-5 s response time vs. v1.5's 20-25 s on cold paths.
  • Tool calling clean — emits proper tool_calls with no chain-of-thought leakage when prepended with detailed thinking off as the first system message.

Tool-call request shape

{
  "model": "nvidia/llama-3.3-nemotron-super-49b-v1",
  "messages": [
    { "role": "system", "content": "detailed thinking off\n\nYou are NetOps Assistant…" },
    { "role": "user",   "content": "List all firewalls in our fleet" }
  ],
  "tools": [ /* netbox_search, netbox_list_devices(role|model_contains), ...7 tools total */ ],
  "tool_choice": "auto",
  "temperature": 0.2,
  "max_tokens": 1024
}
Defense-in-depth — the model contributes too: When a query like "export every device's SNMP community string" reaches Nemotron, the model itself silently refuses (no refusal field, just finish_reason: stop, empty content, no tool calls — ~11 completion tokens). The orchestrator detects this signature and surfaces a "Model declined" verdict. AI Defense + LLM safety training cover different risk classes — together they catch what neither could alone.
05 / OCI INFRASTRUCTURE

The compute layer

Oracle Cloud Infrastructure — Always-Free Ampere ARM running the whole stack.

OCI-side Always-Free ARM64
Tenancysalient-concepts (us-ashburn-1)
VMcisco-web-1-arm
ShapeVM.Standard.A1.Flex — Ampere Altra ARM64
Resources2 OCPU · 12 GB RAM · 48 GB boot disk
OSUbuntu 24.04.4 LTS · kernel 6.17.0-1014-oracle
Public IP129.80.113.130 (reserved — persists across stop/start)
VCNsalient-vcn 10.0.0.0/16 · subnet salient-public-subnet 10.0.1.0/24
Security list ingress22 / 80 / 443 from 0.0.0.0/0
Host firewalliptables — 22 / 80 / 443 ACCEPT, persisted via netfilter-persistent
Monthly cost$0 — Always-Free, no credit card billed

Docker Compose stack — 9 containers

ContainerImagePurpose
caddycaddy:2.10-alpineReverse proxy + auto-TLS
orchestratorlocal buildFastAPI + SSE + 3-point gate loop
ai-defense-mcplocal buildWrapper over Cisco AI Defense Inspection API
netbox-mcplocal build7-tool wrapper over NetBox REST
netboxnetboxcommunity/netbox:v4.4-3.4.0IPAM/DCIM web + API
netbox-workernetboxcommunity/netbox:v4.4-3.4.0RQ background worker
postgrespostgres:16-alpineNetBox primary DB
redis-queueredis:7-alpineNetBox job queue
redis-cacheredis:7-alpineNetBox app cache

Free-tier budget compliance

This tenancy currently runs 3 VMs on Always-Free: salient-web-1 (AMD Micro, serves salient-concepts.com), cisco-web-1 (AMD Micro, serves the CiscoPulse splash), and cisco-web-1-arm (this demo). At cap for AMD micros (2/2), using 2 OCPU / 12 GB of the 4 OCPU / 24 GB Ampere A1 budget. 2 reserved IPs of 2 allowed.

06 / CLOUDFLARE DNS

The name resolution layer

Cloudflare Free plan — DNS-only mode (gray cloud).

DNS layer Free plan DNS-only
Zoneuppernyack.com
AccountPersonal — separate from the Salient-Concepts.com Cloudflare account
NameserversCloudflare authoritative
RecordA aidefense-demo → 129.80.113.130
Proxy statusDNS-only (gray cloud) — Cloudflare resolves the name but does NOT proxy traffic
TTLAuto (Cloudflare default)
Cost$0 — Free plan, unlimited DNS queries
Why DNS-only and not proxied: Caddy issues TLS certificates via the tls-alpn-01 ACME challenge, which requires the public-facing IP to terminate TLS directly (port 443). If Cloudflare proxied, traffic would terminate at Cloudflare's edge first, breaking the challenge. To go proxied (orange cloud) we'd need to switch Caddy to dns-01 with a Cloudflare API token — possible but not necessary for a demo.
07 / NETBOX DATA

The customer's source of truth

NetBox 4.4 — IPAM/DCIM seeded with a realistic Cisco enterprise fleet.

Data layer Source of truth FOSS

NetBox is the de-facto IPAM/DCIM among enterprise NetOps. Customers building AI assistants for network operations almost always have it. This seed mirrors a mid-size enterprise: two production data centers with full Nexus 9000 fabric, four branch sites, a corporate HQ, and a security/compute layer.

Total devices58
Sites6 (DC-1 ATL, DC-2 RTP, Branch-NYC, Branch-SJC, Branch-SFO, Branch-AMS)
Device types17 (full Cisco breadth — see below)
Device roles9 (core, spine, leaf, distribution, access, edge, wireless, firewall, server)
IP prefixes16 (production, OOB, VXLAN underlay, fabric loopbacks, branch VLANs)
WAN circuits11 (Lumen, AT&T, Verizon, Equinix Fabric — internet + MPLS + DCI)
Tenants3 (Salesforce-Eng, Workday-Prod, Internal-Corp)
Contacts13 (NOC desks, site leads, on-call rotations — with real-looking emails + phones)

Cisco device types in the seed

CategoryModels
DC fabric (Nexus 9000)9332D-GX2B (400G spine), 9336C-FX2 (100G leaf), 93180YC-FX3 (10/25G leaf), 9504 (modular), 9508 (modular)
Campus core/distCatalyst 9500-32C, Catalyst 9410R (modular dist)
Access switchesCatalyst 9300-48UXM, Catalyst 9300X-48HX, Catalyst 9200L-48P-4G (smart switch SMB line)
WirelessCatalyst 9800-CL (controller), Meraki MR46, Meraki MR56 (Wi-Fi 6E)
WAN / SD-WANASR 1001-X, Catalyst 8500L
SecurityCisco Secure Firewall 3110 (NGFW — 7 deployed across sites)
ComputeUCS-X210C-M7 (X-Series blade — 3 deployed)

Intentional PII / credential fodder

For the output-gate demo to fire, NetBox needs data Cisco AI Defense will flag. The seed embeds two classes of sensitive content:

  • Contacts with real-looking PII — names, emails (name@example-corp.com), phones in E.164 format. 13 contacts attached to sites as NOC / site-lead / on-call assignments.
  • SNMP community strings in device comments — every device has a unique pseudo-credential like atl-c0re-r0!, nyc-fw-corp!. When the LLM tries to dump them, the output gate catches the dense credential pattern.

netbox-mcp — 7 tools

ToolPurpose
netbox_searchGlobal text search with 3-phase fallback (text → role keyword → model keyword)
netbox_list_devicesFiltered device list — by site, role enum, or model_contains (two-step device-type lookup)
netbox_get_deviceFull detail for one device — includes comments field (PII / credential fodder)
netbox_list_prefixesIP prefixes filtered by site / tenant
netbox_list_circuitsWAN circuits filtered by provider / status
netbox_get_site_contactsN+1 lookup returning full contact records with email + phone
netbox_delete_prefixDestructive (exists only to demonstrate tool-arg gate firing on a delete verb)
08 / ORCHESTRATOR + MCPs

The only custom code in the stack

FastAPI + SSE chat loop · two MCP wrappers · ~600 lines of Python total.

Custom FastAPI SSE
OrchestratorFastAPI 0.115 · uvicorn · sse-starlette · openai-python SDK
Stream protocolSSE (Server-Sent Events) — one connection, 13+ event types per turn
UISingle HTML page · Tailwind via CDN · vanilla JS · SVG flow diagram · ~400 lines
StateStateless per turn — every submission resets the diagram
Retry guardOne transparent retry on empty Nemotron completions (silent-refusal pattern detection)
Tool loop depthCapped at 5 hops

SSE event types emitted per turn

turn_start            // session id + model
gate_start            // {where, content[:200]}
gate_result           // {action, severity, attack_technique, violations[], latency_ms}
llm_call_start        // {hop, model}
tool_call_proposed    // {hop, idx, name, arguments}
tool_executing        // {hop, idx, name}
tool_result           // {hop, idx, name, result}
assistant_message     // {content}
blocked               // {where, severity, attack_technique, violations[]}
model_declined        // {finish_reason, explanation} — defense-in-depth signal
turn_end              // {reason: ok | input_blocked | output_blocked | model_declined | error}
error
09 / SECRETS & TLS

Production hygiene from day one

OpenBao for secrets · Let's Encrypt for TLS · no credentials on disk.

Secret storeOpenBao at vault.uppernyack.com (on-prem)
TLS issuerLet's Encrypt E8 (ECDSA chain)
ACME challengetls-alpn-01 — Caddy negotiates over port 443 directly
Cert renewalAuto-renewed by Caddy 30 days before expiry
HSTSEnabled (max-age=31536000; includeSubDomains)
Session cookie(none yet — to be added with auth gate)

Secrets in OpenBao

PathContentsPulled by
infra/api/nvidia-build-netbox-demoNIM API keydeploy.sh at deploy time
infra/api/cisco-ai-defenseAI Defense Inspection API key + base URLdeploy.sh
infra/ssh/cisco-web-1-armSSH key for the OCI VMdeploy.sh for rsync
infra/api/netbox-demoGenerated NetBox secrets (SECRET_KEY, superuser password, API token)deploy.sh — auto-generated on first run
infra/db/netbox-demo-pgPostgres passworddeploy.sh
infra/db/netbox-demo-redis-{queue,cache}Redis passwordsdeploy.sh

Secrets live ONLY in OpenBao + the OCI VM's compose/.env file (chmod 600). They never appear in git, in container images, in CLAUDE memory files, or in this About page's source.

10 / BILL OF MATERIALS

Cost breakdown

Every component is free-tier or self-hosted. Total monthly cost: $0.

ComponentProviderTierCost
OCI Ampere A1.Flex VM (2 OCPU / 12 GB)Oracle CloudAlways-Free$0
Reserved public IPOracle CloudAlways-Free (2/2)$0
200 GB block + 10 TB egress/moOracle CloudAlways-Free$0
Cloudflare DNS for uppernyack.comCloudflareFree plan$0
Let's Encrypt TLS certsLet's Encrypt / ISRGPublic CA$0
NVIDIA NIM (Nemotron Super 49B)NVIDIA BuildFree tier · 40 RPM · no daily cap$0
Cisco AI Defense Inspection APICiscoSE entitlement$0
NetBoxNetBox CommunityFOSS Apache 2.0$0
Postgres / Redis / Caddy / DockerOSSFOSS$0
GitPi private repo hostingself-hosted on Pi 4$0
OpenBao secret storageself-hosted on atheneum$0
Monthly run-rate$0
The customer story: every line item above maps to something the customer either already owns (their NetBox, their Cisco AI Defense entitlement) or can sign up for in minutes (NIM key, OCI free tenancy, Cloudflare DNS). The only thing the customer needs from us is the ~600 lines of orchestrator + MCP code — and that's open-source under their git account by end of demo if they want it.