Faster and Safer Testing for AI Tutors
May 5, 2025
By Vassili Philippov
Imagine building the perfect virtual teacher - a virtual tutor that knows how to teach any student.
But how do you test it? How do you compare teaching strategies A vs. B? Randomized controlled trials (RCTs) are the gold standard, sure. But they’re like waiting for a glacier to melt - accurate, but slow (think hundreds of students, months of data).
What if, instead of real students, we had virtual ones?
Think of it like this: How do they train self-driving cars? They don’t just unleash them on the highway. They use incredibly detailed simulators—millions of miles driven, countless scenarios—before a single car hits the road.
A virtual student is the same idea. It’s a model that learns like a student.
“Wait, a what?” you may say. “Are we talking about sentient AI here? Is this even possible?”
Let’s break it down. Forget the sci-fi for a second.
What a Virtual Student Actually Does
At its core, a virtual student does one simple thing:
Predicts how a student will perform on the next learning task, based on their past learning history.
That’s it.
- Input: Learning history: (activity 1, response 1, time 1), (activity 2, response 2, time 2)…
- Output: Predicted response to activity 3.
Beyond Right/Wrong: Rich Signals
And it’s not just about questions and answers. Think broader:
- Activities: Could be anything—reading a passage, solving a problem, watching a video, playing an educational game.
- Responses: Not just “right” or “wrong.” It’s the whole picture:
- Did they get it correct?
- How long did it take them? (Response time is a huge clue.)
- Did they give up? (That tells you something even if they didn’t answer.)
- When did they study? (Timing matters.)
If you can model that, you can test any teaching strategy. You can run thousands of “virtual experiments” in hours, not years.
This Isn’t Fantasy: It’s Knowledge Tracing (KT)
This approach has a name: Knowledge Tracing (KT).
KT models track a student’s knowledge state—like a GPS for learning.
Sounds complex? It is. Predicting human behavior is hard. But in specific areas, it’s already working.
Example: learning vocabulary. KT-powered algorithms can beat traditional spaced repetition. They choose the optimal word to review now, based on your entire history. It’s like having a personal tutor who knows your brain.
Think about this:
- Traditional spaced repetition: show word after 1 day, 3 days, 7 days…
- KT-powered spaced repetition: choose the optimal word based on your full learning history and a simulation of possible futures.
The difference? It’s like a doctor prescribing a fixed dose vs. adjusting the dose based on your blood tests—personalized vs. one-size-fits-all.
We’re not talking about replacing teachers. We’re talking about giving them superpowers. We’re talking about moving education from intuition to data.