Discipline layer for AI coding agents

AI agents that can't
fake 'done'.

A local Python framework that gates Claude Code, Cursor, Qwen, and Windsurf at the two points where AI agents lie most: starting a task without a goal, and claiming completion without proof. Three messages cover the full cycle.

Get started →View on GitHub

ed25519 signed receipts·Apache 2.0·Python 3.11+·3,844 tests passing·0 core dependencies

~/your-project — claude code

agent › Edit("src/auth.py", "...")tausik › BLOCKED — no active task (SENAR Rule 9.1) you › start workingtausik › session #74 opened · handoff loaded · memory tail refreshed you › fix the mobile button bugtausik › 4 edge cases collected → task T-219 · 3 AC draftedQG-0 passed · goal + AC locked pytest · ruff · tsc · 6 review agents · cachedQG-2 passed · every AC has evidence you › ship ittausik › tausik verify · cached 10m · committed a91f3e2 push? [y/N] _

The problem · The mechanism

Without TAUSIK vs With TAUSIK

Enforcement, not suggestion. The agent literally cannot bypass the gate — the hook process blocks Write/Edit before the tool call lands.

Without TAUSIK

With TAUSIK

Agent says "I'll quickly refactor this" and edits 30 files.

task_gate.py hook returns: BLOCKED — no active task (SENAR Rule 9.1).QG-0

Agent reports "Done — all green" without running tests.

task_done_verify rejects close: AC #2 has no evidence row in verification_runs.QG-2

Next session starts blank. The agent re-asks the same questions.

SessionStart hook injects handoff + memory tail. Last decision and dead-end load with CLAUDE.md.

Agent re-tries the same broken approach two days later.

tausik dead-end records the failed approach. Search surfaces it before the agent burns more tokens.

Agent runs "the obvious tests" — usually none.

tausik verify runs the 25-stack matrix (pytest, ruff, tsc, eslint, cargo, go vet, hadolint…) and caches the result.

You ask "what changed?" and read 200 lines of chat.

tausik metrics prints throughput, defect rate, lead time, cost-per-task. Every gate exit is logged in events.

Verifiable trust · the differentiator

When the agent says green,
you get a receipt.

This is what separates TAUSIK from every prompt-based ruleset. The green isn't a claim you take on faith — it's an ed25519-signed receipt bound to the exact gate and the HEAD commit. Forge-proof, replay-proof, verifiable offline.

tausik verify emits a signed receipt

Format tausik-signed/v1, ed25519, bound to the gate signature and the HEAD commit sha.

task done validates it before close

A green that wasn't actually produced — or was produced for a different commit — fails QG-2.

Receipts are portable

Export one and verify it offline: no SDK, a stateless HTTP endpoint or the no-SDK example.

Releases are signed too

Skill and stack installs verify the signature before writing a byte to disk.

tausik verify → receipt

you › tausik verify --task T-219 tausik › pytest · ruff · tsc passed receipt tausik-signed/v1 alg ed25519 gate a1b2c3… · head a91f3e2 sig 3045022100… ✓ valid tausik › task done · QG-2 reads receipt ✓

How signed receipts work →

When an agent tells you the build is green, you don't have to believe it. You hold a receipt that proves it — or proves it lied.

Task lifecycle

Three messages. Full lifecycle.

You describe what you want. The framework forces the steps you skip when you trust the agent too much.

01 session start

message 1 of 3

you › start working tausik › session #74 opened handoff: fix(auth): retry on 401 memory tail: 14 patterns · 3 gotchas CLAUDE.md refreshed last commit: a14c2bf · 2h ago ready.

Opens session, loads handoff from the last one, refreshes the CLAUDE.md memory tail.

02 task lifecycle

message 2 of 3

you › fix the bug — button › doesn't work on mobile tausik › interviewing on edge cases…4 edge cases collected task T-219 · 3 AC draftedQG-0 passed writing code → pytest · ruff · tsc5 review agents · parallel AC evidence: 3 / 3QG-2 passed

Interviews you on edge cases, creates a task with acceptance criteria, writes the code, runs tests + lint + 6 parallel review agents, verifies each AC has evidence in the DB.

03 ship

message 3 of 3

you › ship it tausik › $ tausik verify cache hit · 10m TTLQG-2 passed staging 7 files commit a91f3e2 "fix(ui): mobile button"  push to origin/main?[y/N] _

Runs tausik verify (cached 10 min), passes QG-2, commits, asks before pushing.

What you get

Six things, none of them optional.

The framework is small on purpose. Every piece exists to enforce one specific behavior.

Quality gates

QG-0 blocks task start without goal + AC. QG-2 blocks task done without verify evidence.

Project memory

SQLite + FTS5 for patterns, gotchas, decisions, dead-ends. Re-injected at session start.

Verify-First

Heavy tests on a separate verify step, cached for 10 minutes; closing a task is millisecond.

21 real-time hooks

Task gate, bash firewall, push gate, auto-format, memory audits — block bad actions before they happen.

105 MCP tools

Full programmatic access to the project DB. Works the same in Claude Code, Cursor, Qwen Code, Windsurf.

Cross-project brainoptional

Notion-mirrored decisions, patterns, gotchas with privacy-preserving project hashes.

Quick start — 10 minutes (after your AI IDE is set up)

Four commands, then restart your IDE.

Bootstrap auto-detects your stack (Python, TS, Rust, Go) and enables matching quality gates.

bash

# 1 · go to your project
$ cd your-project

# 2 · add tausik-core as a submodule
$ git submodule add https://github.com/Kibertum/tausik-core .tausik-lib

# 3 · bootstrap (detects stack, wires hooks)
$ python .tausik-lib/bootstrap/bootstrap.py --init

# 4 · ignore local state
$ echo ".tausik/" >> .gitignore

Restart your IDE — done. Bootstrap auto-detects your stack and enables matching quality gates.

Dogfooding

TAUSIK built TAUSIK.

Every feature, every refactor, every bug fix went through the same gates that ship with the framework. The numbers below are the dogfood project's own state.

800+

tasks closed — every one with a goal + AC

tasks closed without verify evidence

3,844

tests passing

core dependencies / phone-home calls

Snapshot at v1.5.0. Live numbers via tausik metrics.

Supported IDEs & agents

Six runtimes. One enforcement layer.

VSCode + Claude Extension

Officially tested

Cursor

Officially tested

Claude Code (CLI)

Expected · partial matrix

Qwen Code

Expected · partial matrix

Windsurf

Expected · partial matrix

Codex / OpenCode-style agents

Expected · manual validation

105 MCP tools and the 12 core skills work everywhere. Real-time hooks live in Claude Code and Qwen Code today; Cursor and Windsurf get the same enforcement at QG-0 and QG-2 task transitions.

Clarity

TAUSIK is not.

Setting expectations before you install.

Not a SaaS.

Everything runs locally. Your task DB lives in .tausik/ next to your code. No phone-home, no usage telemetry, no required account.

Not a model.

TAUSIK does not generate code. It guards an existing coding agent (Claude Code, Cursor, Qwen, Windsurf) and tracks its work.

Not a replacement for Cursor / Claude Code.

It runs inside them as MCP tools, hooks, and skills. You keep your existing IDE workflow.

Not a junior-onboarding tool.

It enforces practice for engineers who already know what good looks like — it does not teach you what an AC is.

Not auto-merging.

QG-0 and QG-2 ask the agent for proof; the agent still asks you to confirm before push.

Landscape

How TAUSIK differs.

Same row → same capability. Empty cell → the tool does not address it natively.

Capability	TAUSIK	Aider	Cursor Rules	Continue	Claude Skills
Enforced task model (goal + AC)	✓ QG-0 hook blocks edits	—	—	—	—
Signed verify receipts (ed25519)	✓ tausik-signed/v1	—	—	—	—
Verify cache decoupled from close	✓ 10-min TTL	—	—	—	—
Tracked decisions / dead-ends	✓ SQLite + FTS5	—	—	—	—
Cross-project memory (opt-in)	✓ Notion-backed brain	—	—	—	—
Stack-aware verify suites	✓ 25 stacks	single-language	—	—	—
Multi-IDE same surface	✓ MCP + skills	CLI only	Cursor only	Continue only	Claude only
Editor-agnostic install	✓ Python script	✓	—	—	—

Answers

Common questions.

Do I need an extra API key on top of my AI IDE?

No. TAUSIK never calls any LLM directly. The agent (Claude Code / Cursor / Qwen / Windsurf) uses the API key you already configured for that IDE.

Does it phone home?

No. Everything is local: SQLite under .tausik/, hooks under .claude/. The optional Shared Brain only writes to your own Notion workspace if you wire it up.

Can my team share decisions and patterns?

Yes, via the optional Shared Brain. Per-project hashes keep names private; the cross-project content goes through a scrubbing linter before it lands in Notion.

Does it work on Windows?

Yes. The CLI ships .tausik/tausik.cmd for PowerShell/cmd. A few hooks (pre-commit shell, push gate) prefer Git Bash or WSL; the rest of the pipeline runs natively.

What about my existing AGENTS.md / CLAUDE.md?

TAUSIK manages a small dynamic block inside CLAUDE.md (session + counts). Your existing instructions in CLAUDE.md or AGENTS.md stay intact; TAUSIK reads them, doesn't overwrite them.

Foundation

Built on SENAR.

SENAR

TAUSIK implements SENAR — an open engineering standard for AI-assisted development. Quality gates, session management, metrics, verification checklists — all defined in SENAR. See senar.tech for the spec.

AI agents that can'tfake 'done'.

Without TAUSIK vs With TAUSIK

When the agent says green,you get a receipt.

tausik verify emits a signed receipt

task done validates it before close

Receipts are portable

Releases are signed too

Three messages. Full lifecycle.

Six things, none of them optional.

Quality gates

Project memory

Verify-First

21 real-time hooks

105 MCP tools

Cross-project brainoptional

Four commands, then restart your IDE.

TAUSIK built TAUSIK.

Six runtimes. One enforcement layer.

TAUSIK is not.

How TAUSIK differs.

Common questions.

Built on SENAR.

AI agents that can't
fake 'done'.

When the agent says green,
you get a receipt.