Latent feature extraction for process data via multidimensional scaling


Computer-based interactive items have become prevalent in recent educational assessments. In such items, detailed human-computer interactive process, known as response process, is recorded in a log file. The recorded response processes provide great opportunities to understand individuals’ problem solving processes. However, difficulties exist in analyzing these data as they are high dimensional sequences in a nonstandard format. This paper aims at extracting useful information from response processes. In particular, we consider an exploratory analysis that extracts latent variables from process data through a multidimensional scaling framework. A dissimilarity measure is described to quantify the discrepancy between two response processes. The proposed method is applied to both simulated data and real process data from 14 PSTRE items in PIAAC 2012. A prediction procedure is used to examine the information contained in the extracted latent variables. We find that the extracted latent variables preserve a substantial amount of information in the process and have reasonable interpretability. We also empirically prove that process data contains more information than classic binary item responses in terms of out-of-sample prediction of many variables.