Inter-Rater Reliability Analysis of Good's Program Summary Analysis Scheme

Byckling P., Kuittinen M., Nevalainen S., Sajaniemi J. (2004)

Inter-Rater Reliability Analysis of Good's Program Summary Analysis Scheme

Proceedings of the 16th Annual Workshop of the Psychology of Programming Interest Group (PPIG 2004). Institute of Technology Carlow, Ireland, 170-184.

Abstract: In computer science education and research into the psychology of programming, program summary analysis has been used to characterize mental models of novice and expert programmers and to measure learning outcome of programs and programming concepts. This paper reports an investigation where three raters used Good's program summary analysis scheme consisting of two independent classifications of program summary segments: information types and object description categories. The problems in using the scheme as well as differences between the raters were recorded and analyzed. The findings indicate that by improving the scheme and its documentation, most of the observed inter-rater differences can be avoided. The only open problem concerns making the distinction between descriptions of data and activities in cases where the specific words that are used, or the abstractness of expression may affect raters' interpretation of the information type.

Back to the literature page

Last updated: July 14, 2005

saja.fi@gmail.com