Intelligent Tutoring Systems Educational Data Mining Human-Computer Interaction Gaming the System
Ryan Shaun Joazeiro de Baker


Data Mining for Understanding and Detecting Student Behaviors

Project Description

Students use interactive learning environments in a considerable variety of ways. I use data mining, in combination with human observations, in order to understand student behavior better, and in order to develop behavior detectors which can be used to drive adaptive responses to differences in student behavior.

Student behavior presents interesting challenges to data mining and machine learning. For one thing, it is often valuable to model student behavior at multiple grain-sizes at once: it is relevant both which students engage in a specific behavior, and when a student is engaging in that behavior. I use Latent Response Models (Maris, 1995), a type of hierarchical regression model, in order to develop detectors which are effective at both grain-sizes at once. In such a detector, a prediction is made about whether each individual student action is an instance of the behavior being studied. These action-by-action predictions usually involve unlabeled data, since human observations operate at larger, episode-by-episode grain-sizes. The action-by-action predictions are then aggregated into episode-by-episode or student-by-student frequency predictions, which do involve labeled data.

This sort of model can be used to determine what individual actions make up, and are associated with, a broader category of behavior. This can be done either by inspecting the model itself, or by analyzing the predictions it makes. My work includes examples of both types of analysis: I inspected features that compose models of gaming the system and off-task behavior in order to understand these behaviors better. I also investigated how likely a student is to know the skills he/she games on, by comparing predictions of what actions students game on to predictions of the likelihood the student knows the skill excercised at each step.

In addition, model-generation can be used to distinguish whether an apparent category of behavior is unitary, or actually consists of multiple separable categories of behavior. For example, by training a model of gaming the system, and determining which students were poorly captured by that model, we determined that gaming the system actually splits into two categories of behavior. In one category, "harmful" gaming, students repeatedly game the system on the same set of poorly known problem steps across problems. In another category, "non-harmful" gaming, students game well-known but time-consuming steps, in order to devote more time to poorly-known steps. If models are trained to treat these categories of behavior as separate, the two categories are separated from each other and non-gaming behavior. If a model is trained to treat the two categories as identical, the model captures most of the harmful gaming behavior and a small proportion of the non-gaming behavior.

Another important issue is how well a detector will transfer across tutor lessons and even to different tutor environments. A detector which only works in a single context is unlikely to have a major impact, no matter how effective it is. Using meta-analytic techniques, my colleagues and I have determined that training a gaming detector with data from multiple lessons results in a system that shows little degradation of performance from training lessons to test lessons. By comparison, a detector trained on a single lesson shows considerable degradation from the training lesson to test lessons.


Baker, R.S.J.d. (in press) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. To appear in Proceedings of ACM CHI 2007: Computer-Human Interaction. [pdf]

Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R., Roll, I. (2006) Generalizing Detection of Gaming the System Across a Tutoring Curriculum. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, 402-411. [pdf]

Baker, R.S., Corbett, A.T., Koedinger, K.R. (2004) Detecting Student Misuse of Intelligent Tutoring Systems . Proceedings of the 7th International Conference on Intelligent Tutoring Systems, 531-540. [pdf]

Baker, R.S., Corbett, A., Koedinger, K., Roll, I. (2005) Detecting When Students Game The System, Across Tutor Subjects and Classroom Cohorts . Proceedings of User Modeling 2005, 220-224. [pdf]

Collaborators and co-authors

Albert Corbett
Kenneth Koedinger
Ido Roll
Joseph Beck
Tom Mitchell

Quantitative Field Observation Motivational Modeling Interaction Design Psychometric Machine-Learned Models