AI LAB · classroom simulation

Can you break the assistant?

Three levels. You'll attack a naive assistant, poison a document it reads, and then switch sides and fix a broken permission model. This is a personal assignment — work on your own. Every input is fake data — ACME Corp doesn't exist.

The whole lesson in one sentence:

A system prompt is not authorization. The model follows meaning, not keywords — which is why filtering can't save you, and why the only real defense is controlling what data the model is ever allowed to see.

Level 1

Direct injection

Break a naive assistant with intent, not keywords.

Level 2

Indirect injection

Feed the assistant a poisoned document to summarize.

Level 3

Authorization

You're the engineer now. Fix the leak.

Student sign-in required

Continue with Google

View live leaderboard →Professor login