Data Mining for Understanding and Detecting Student Behaviors
Students use interactive learning environments in a considerable variety of ways. I use data mining, in combination
with human observations, in order to understand student behavior better, and in order
to develop behavior detectors which can be used to drive adaptive responses to differences in student behavior.
Student behavior presents interesting challenges to data mining and machine learning. For one thing, it is often
valuable to model student behavior at multiple grain-sizes at once: it is relevant both which students engage in a
specific behavior, and when a student is engaging in that behavior. I use Latent Response Models (Maris, 1995), a
type of hierarchical regression model, in order to develop detectors which are effective at both grain-sizes at once.
In such a detector, a prediction is made about whether each individual student action is an instance of the
behavior being studied. These action-by-action predictions usually involve unlabeled data, since human observations
operate at larger, episode-by-episode grain-sizes. The action-by-action predictions are then aggregated into
episode-by-episode or student-by-student frequency predictions, which do involve labeled data.
This sort of model can be used to determine what individual actions make up, and are associated with, a broader
category of behavior. This can be done either by inspecting the model itself, or by analyzing the predictions
it makes. My work includes examples of both types of analysis: I inspected features that compose models of
gaming the system and off-task behavior in order to understand these behaviors better. I
also investigated how likely a student is to know the skills he/she games on, by comparing predictions of what actions
students game on to predictions of the likelihood the student knows the skill excercised at each step.
In addition, model-generation can be used to distinguish whether an apparent category of behavior is unitary, or actually
consists of multiple separable categories of behavior. For example, by training a model of gaming the system, and determining
which students were poorly captured by that model, we determined that gaming the system actually splits into two
categories of behavior. In one category, "harmful" gaming, students repeatedly game the system on the same set of poorly known problem steps across problems. In another category, "non-harmful" gaming, students game well-known but
time-consuming steps, in order to devote more time to poorly-known steps. If models are trained to treat these
categories of behavior as separate, the two categories are separated from each other and non-gaming behavior. If a model is trained to treat the two categories as identical, the model captures most of the harmful gaming behavior and
a small proportion of the non-gaming behavior.
Another important issue is how well a detector will transfer across tutor lessons and even to different tutor environments. A detector which only works in a
single context is unlikely to have a major impact, no matter how effective it is. Using meta-analytic techniques, my colleagues and I have determined that training a gaming detector with data from multiple lessons results in a system that shows little degradation of performance from training lessons to test lessons. By comparison, a detector trained on a single lesson shows considerable degradation from the training lesson to test lessons.
Baker, R.S.J.d. (in press) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. To appear in Proceedings of ACM CHI 2007: Computer-Human Interaction. [pdf]
Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R., Roll, I. (2006) Generalizing Detection of Gaming the System Across a Tutoring Curriculum. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, 402-411. [pdf]
Baker, R.S., Corbett, A.T., Koedinger, K.R. (2004)
Detecting Student Misuse of Intelligent Tutoring Systems .
Proceedings of the 7th International Conference on Intelligent Tutoring
Baker, R.S., Corbett, A., Koedinger, K., Roll, I. (2005) Detecting
When Students Game The System, Across Tutor Subjects and Classroom
Cohorts . Proceedings of User Modeling 2005, 220-224. [pdf]
Collaborators and co-authors