True North

AI-powered compliance evaluation agent that autonomously assesses cloud environments against any compliance framework.

Demo

What it does

True North takes a compliance framework (e.g., ISO 42001) and an AWS environment, then:

Explores the environment by calling AWS APIs to discover resources and configurations
Evaluates each control by mapping evidence to requirements
Produces per-control verdicts (PASS / FAIL / NEEDS_ACTION) with evidence and reasoning
Verifies the assessment is complete and correct

Quick Start

Prerequisites

mwinit for Amazon credentials
uv for Python package management

Setup

# 1. Authenticate
mwinit -o

# 2. Clone the repository
git clone ssh://git.amazon.com/pkg/TrueNorthEngine
cd TrueNorthEngine

# 3. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 4. Install dependencies
uv sync

# 5. Configure credentials (copy and fill in your values)
cp .env.example .env   # or create .env manually
# Add: LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_BASE_URL, LANGFUSE_OTEL_ENDPOINT

# 6. Run
uv run true-north --framework data/ISO42001.csv

.env file

LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_BASE_URL=https://observability.nexus.sso.aws.dev
LANGFUSE_OTEL_ENDPOINT=https://observability.nexus.sso.aws.dev/api/public/otel

Run with a specific environment

uv run true-north --framework data/ISO42001.csv \
  --environment benchmark/environments/post_sra.json \
  --ground-truth benchmark/ground_truth/iso42001_post_sra.json

Run full benchmark

uv run python benchmark/run_benchmark.py

Key Features

Framework-agnostic — works with any CSV/JSON compliance framework
Environment-agnostic — discovers resources dynamically via AWS APIs
Parallel evaluation — orchestrator spawns sub-agents for concurrent control evaluation
Evidence-backed verdicts — every verdict cites actual API responses
Applicability detection — distinguishes technical vs organizational controls, flags hallucinations
Langfuse observability — full OTEL tracing with session grouping
Benchmark suite — 3 environments, ground truth, 5-layer scoring