ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	12

Source

Applied Measurement in…

Publication Type

Journal Articles	19
Reports - Research	10
Reports - Evaluative	7
Information Analyses	2

Education Level

Grade 5	3
Elementary Education	1
Grade 4	1
Grade 6	1
Higher Education	1
Postsecondary Education	1

Audience

Location

United Kingdom	2
Oman	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

An Examination of Individual Ability Estimation and Classification Accuracy under Rapid Guessing Misidentifications

Peer reviewed

Direct link

Rios, Joseph – Applied Measurement in Education, 2022

To mitigate the deleterious effects of rapid guessing (RG) on ability estimates, several rescoring procedures have been proposed. Underlying many of these procedures is the assumption that RG is accurately identified. At present, there have been minimal investigations examining the utility of rescoring approaches when RG is misclassified, and…

Descriptors: Accuracy, Guessing (Tests), Scoring, Classification

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Coefficient [beta] as Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-Scored Tests

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Measurement in Education, 2021

KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…

Descriptors: Test Reliability, Scores, Scoring, Computation

Partial Credit in Answer-Until-Correct Multiple-Choice Tests Deployed in a Classroom Setting

Peer reviewed

Direct link

Slepkov, Aaron D.; Godfrey, Alan T. K. – Applied Measurement in Education, 2019

The answer-until-correct (AUC) method of multiple-choice (MC) testing involves test respondents making selections until the keyed answer is identified. Despite attendant benefits that include improved learning, broad student adoption, and facile administration of partial credit, the use of AUC methods for classroom testing has been extremely…

Descriptors: Multiple Choice Tests, Test Items, Test Reliability, Scores

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring

A Qualitative Investigation of Panelists' Experiences of Standard Setting Using Two Variations of the Bookmark Method

Peer reviewed

Direct link

Hein, Serge F.; Skaggs, Gary E. – Applied Measurement in Education, 2009

Only a small number of qualitative studies have investigated panelists' experiences during standard-setting activities or the thought processes associated with panelists' actions. This qualitative study involved an examination of the experiences of 11 panelists who participated in a prior, one-day standard-setting meeting in which either the…

Descriptors: Focus Groups, Standard Setting, Cutting Scores, Cognitive Processes

Evaluating Scoring Procedures for Context-Dependent Item Sets.

Peer reviewed

Keller, Lisa A.; Swaminathan, Hariharan; Sireci, Stephen G. – Applied Measurement in Education, 2003

Evaluated two strategies for scoring context-dependent test items: ignoring the depending and scoring dichotomously or modeling the dependence through polytomous scoring. Results for data from 38,965 examinees taking a professional examination show that dichotomous scoring may overestimate test information, but polytomous scoring may underestimate…

Descriptors: Adults, Licensing Examinations (Professions), Scoring, Test Items

Recommendations for Preparing and Scoring Constructed-Response Items: What the Experts Say

Peer reviewed

Direct link

Hogan, Thomas P.; Murphy, Gavin – Applied Measurement in Education, 2007

We determined the recommendations for preparing and scoring constructed-response (CR) test items in 25 sources (textbooks and chapters) on educational and psychological measurement. The project was similar to Haladyna's (2004) analysis for multiple-choice items. We identified 12 recommendations for preparing CR items given by multiple sources,…

Descriptors: Test Items, Scoring, Test Construction, Educational Indicators

Partial-Credit Scoring Methods for Multiple-Choice Tests.

Peer reviewed

Frary, Robert B. – Applied Measurement in Education, 1989

Multiple-choice response and scoring methods that attempt to determine an examinee's degree of knowledge about each item in order to produce a total test score are reviewed. There is apparently little advantage to such schemes; however, they may have secondary benefits such as providing feedback to enhance learning. (SLD)

Descriptors: Knowledge Level, Multiple Choice Tests, Scoring, Scoring Formulas

Identifying Possible Sources of Differential Functioning Using Differential Bundle Functioning with Polytomously Scored Data

Peer reviewed

Direct link

McCarty, F. A.; Oshima, T. C.; Raju, Nambury S. – Applied Measurement in Education, 2007

Oshima, Raju, Flowers, and Slinde (1998) described procedures for identifying sources of differential functioning for dichotomous data using differential bundle functioning (DBF) derived from the differential functioning of items and test (DFIT) framework (Raju, van der Linden, & Fleer, 1995). The purpose of this study was to extend the…

Descriptors: Rating Scales, Test Bias, Scoring, Test Items

Technological Innovations in Large-Scale Assessment.

Peer reviewed

Zenisky, April L.; Sireci, Stephen G. – Applied Measurement in Education, 2002

Reviews and illustrates some of the current technological developments in computer-based testing, focusing on novel item formats and automated scoring methodologies. The review shows a number of innovations being researched and implemented. (SLD)

Descriptors: Educational Innovation, Educational Technology, Elementary Secondary Education, Large Scale Assessment

Previous Page | Next Page »

Pages: 1 | 2

Scoring	19
Test Items	19
Test Construction	6
Multiple Choice Tests	5
Comparative Analysis	3
Cutting Scores	3
Difficulty Level	3
Foreign Countries	3
Generalizability Theory	3
Grade 5	3
Interrater Reliability	3
Test Reliability	3
Ability	2
Automation	2
Computation	2
Educational Technology	2
Elementary Secondary Education	2
Error of Measurement	2
Guessing (Tests)	2
Item Banks	2
Item Response Theory	2
Probability	2
Psychometrics	2
Scores	2
Standard Setting	2
More ▼

Sireci, Stephen G.	2
Almehrizi, Rashid S.	1
Bennett, Randy Elliot	1
Bimpeh, Yaw	1
Boyer, Michelle	1
Chis, Liliana	1
Clauser, Brian E.	1
Feldt, Leonard S.	1
Frary, Robert B.	1
Godfrey, Alan T. K.	1
Hambleton, Ronald K.	1
Harik, Polina	1
Harrison, Liz	1
Hein, Serge F.	1
Hogan, Thomas P.	1
Kachchaf, Rachel	1
Keller, Lisa A.	1
Kieftenbeld, Vincent	1
Kim, Stella Yun	1
Lee, Won-Chan	1
Margolis, Melissa J.	1
Martinez, Michael E.	1
McCarty, F. A.	1
McManus, I. C.	1
Mills, Craig N.	1
More ▼