An iterative approach to developing, refining and validating machine-scored constructed response assessments

Project No.
PI Name
Luanna Prevost
University of South Florida


Abstract 1

An iterative approach to developing, refining and validating machine-scored constructed response assessments

Presentation Type
Mark Urban-Lurain, Michigan State University, Overall Project PI John Merrill, Michigan State University, PI Melanie Cooper, Michigan State University, co-PI Carl Lira, Michigan State University, co-PI Kevin Haudek, Michigan State University, co-PI Andrea Bierema, Michigan State University, Post-doctoral Researcher Anne-Marie Hoskinson, Michigan State University, Post-doctoral Researcher Rosa Moscarella, Michigan State University, Post-doctoral Researcher Matthew Steele, Michigan State University, Post-doctoral Researcher Alexandria Mazur, Michigan State University, Undergraduate Researcher Paula Lemons, University of Georgia, PI Jennifer Kaplan, University of Georgia, PI Mark Farmer, University of Georgia, co-PI Tessa Andrews, University of Georgia, Senior Personnel Jill McCourt, University of Georgia, Post-doctoral Researcher Kyle Jennings, University of Georgia, Graduate Researcher Treメcherie Crumbs, University of Georgia, Undergraduate Researcher Alex Lyford, University of Georgia, Undergraduate Researcher Luanna Prevost, University of South Florida, PI Kelli Hayes, University of South Florida, Graduate Researcher Michelle Smith, University of Maine, PI Karen Pelletreau, University of Maine, Post-doctoral Researcher Scott Merrill, University of Maine, Undergraduate Researcher Jenny Knight, University of Colorado, Boulder, PI Jeremy Rentsch, University of Colorado, Boulder, Post-doctoral Researcher Ross Nehm, SUNY-Stony Brook, PI Minsu Ha, SUNY-Stony Brook, Post-doctoral Researcher Mary Anne Sydlik, Western Michigan University, PI ヨ Project Evaluation Eva Ngulo, Western Michigan University, Graduate Researcher


The Automated Analysis of Constructed Response (AACR) project seeks to develop a community of faculty who use evidence based practices to improve instruction by presenting faculty with novel assessment platforms for written assessment. Written assessments provide faculty in-depth evidence of student learning as they allow faculty to gather student understanding in studentsメ own words. However, written assessments are used infrequently in undergraduate biology courses, particularly courses with high student enrollment, because of the time and effort necessary to read and provide feedback.


The primary AACR goals are to (1) provide the means for faculty to gather evidence on student learning using formative written assessments and computerized analysis tools and (2) facilitate widespread use of these written assessments. The goal of the question development group within AACR is twofold 1) to develop a suite of formative written assessments in biology, chemistry and statistics that uncover student conceptual difficulties, and 2) develop text analysis and machine learning models that automatically analyze student writing, providing faculty with immediate feedback.


Our approach is to use pre-existing concept inventories, the science education literature, and interviews with faculty to identify areas of biology, chemistry and statistics where students have persistent conceptual difficulties. We then develop questions that target these conceptual difficulties. Questions are refined based on input from faculty and data from student interviews. Questions are piloted and revised, so answers can be analyzed by computers. After we have developed a question, we use two approaches to analyze student answers: text analysis and machine learning. Both methods identify and extract words and phrases from student writing that are used to build models of human scoring. The models classify the key concepts or correctness of a response and do so in high agreement with human scoring. Finally, models are piloted in the classrooms of members of our faculty learning communities at six different institutions.


We have developed 53 questions in biology, chemistry, chemical engineering, and statistics. We have collected responses from 7854 students and provided 123 reports to faculty. We have also improved our process of question development through the use of clustering and multinomial logistic regression analyses. We also have created more interactive and user friendly feedback reports for faculty.

Broader Impacts

Currently 19 faculty are using AACR assessments and participating in our faculty learning communities. We have also recruited 12 new faculty members across our institutions to join our FLCs and use AACR assessments and resources. Additionally, we have expanded to collaborate with faculty in physics at Michigan State University and Stony Brook University and statistics at Grand Valley State University. We disseminated our findings though 37 presentations and 12 journal articles. AACR products are currently available to faculty via 2 websites.

Unexpected Challenges

Members of our faculty learning communities would like to broaden their use of AACR questions to a wider range of biology topics. To meet this challenge, we are developing AACR questions for ecology, cell biology and other topics. We are also employing new statistical approaches to facilitate rubric development and improve our scoring models.


ユ Kaplan, J. J. and Haudek, K. C. and Ha, M. and Rogness, N. and Fisher, D. (2014). Using Lexical Analysis Software to Assess Student Writing in Statistics. Technology Innovations in Statistics Education. 8 (1),
ユ Urban-Lurain, Mark and Cooper, Melanie M. and Haudek, Kevin C. and Kaplan, Jennifer J. and Knight, J. K. and Lemons, Paula P. and Lira, Carl T. and Merrill, John E. and Nehm, Ross H. and Prevost, Luanna B. and Smith, Michelle K. and Sydlik, Maryanne (2015). Expanding a National Network for Automated Analysis of Constructed Response Assessments to Reveal Student Thinking in STEM. Computers in Education Journal. 6 (1), 65-81.
ユ Weston, Michele and Prevost, L. B. and Haudek, K.C. and Merrill, J. and Urban-Lurain, M. (2014). Examining the Impact of Question Surface Features on Students' Answers to Constructed Response Questions on Photosynthesis. CBE Life Sci Educ. 14 (2). DOI: 10.1187/cbe.14-07-0110
ユ Moharreri, Kayhan and Ha, Minsu and Nehm, Ross H (2014). EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach. 7 (1), 1-14. DOI: 10.1186/s12052-014-0015-2

Project Page

Project Document