AI LAB · classroom simulation
Can you break the assistant?
Three levels. You'll attack a naive assistant, poison a document it reads, and then switch sides and fix a broken permission model. This is a personal assignment — work on your own. Every input is fake data — ACME Corp doesn't exist.
The whole lesson in one sentence:
A system prompt is not authorization. The model follows meaning, not keywords — which is why filtering can't save you, and why the only real defense is controlling what data the model is ever allowed to see.
Level 1
Direct injection
Break a naive assistant with intent, not keywords.
Level 2
Indirect injection
Feed the assistant a poisoned document to summarize.
Level 3
Authorization
You're the engineer now. Fix the leak.
Student sign-in required
Sign in with your school Google account (*.ie.edu) to enter the lab.
Continue with Google