ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	7
Since 2007 (last 20 years)	14

Descriptor

Testing Problems	98
Test Items	22
Achievement Tests	18
Test Validity	18
Test Interpretation	17
Scores	14
Test Construction	14
Mathematical Models	13
Test Reliability	13
Measurement Techniques	12
Elementary Secondary Education	11
Item Response Theory	11
Multiple Choice Tests	10
Response Style (Tests)	10
Test Bias	10
Test Use	10
Educational Assessment	9
Educational Testing	9
Scoring	9
Test Wiseness	9
College Entrance Examinations	8
Guessing (Tests)	8
Item Analysis	8
Measurement	8
Statistical Analysis	8
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	75
Reports - Research	39
Reports - Evaluative	16
Book/Product Reviews	8
Opinion Papers	7
Information Analyses	2
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Secondary Education	4
Elementary Secondary Education	1

Audience

Researchers	3
Practitioners	1

Location

Netherlands	1
New Jersey	1
Rhode Island	1

Laws, Policies, & Programs

Elementary and Secondary…

Assessments and Surveys

SAT (College Admission Test)	7
Iowa Tests of Basic Skills	3
National Assessment of…	3
Program for International…	3
Indiana Statewide Testing for…	2
ACT Assessment	1
Advanced Placement…	1
Bruininks Oseretsky Test of…	1
California Test of Mental…	1
Childrens Manifest Anxiety…	1
National Longitudinal Study…	1
Peabody Picture Vocabulary…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 98 results Save | Export

Incorporating Test-Taking Engagement into Multistage Adaptive Testing Design for Large-Scale Assessments

Peer reviewed

Direct link

Okan Bulut; Guher Gorgun; Hacer Karamese – Journal of Educational Measurement, 2025

The use of multistage adaptive testing (MST) has gradually increased in large-scale testing programs as MST achieves a balanced compromise between linear test design and item-level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can…

Descriptors: Response Style (Tests), Testing Problems, Testing Accommodations, Measurement

Item Calibration Methods with Multiple Subscale Multistage Testing

Peer reviewed

Direct link

Chun Wang; Ping Chen; Shengyu Jiang – Journal of Educational Measurement, 2020

Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait [theta] estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to…

Descriptors: Test Construction, Test Items, Adaptive Testing, Maximum Likelihood Statistics

How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data

Peer reviewed

Direct link

Sinharay, Sandip – Journal of Educational Measurement, 2017

Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…

Descriptors: Goodness of Fit, Testing, Test Items, Scores

Assessing Individual-Level Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Choi, Seung W.; Kim, Dong-In – Journal of Educational Measurement, 2015

With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Statistical Analysis

A New Statistic for Detection of Aberrant Answer Changes

Peer reviewed

Direct link

Sinharay, Sandip; Duong, Minh Q.; Wood, Scott W. – Journal of Educational Measurement, 2017

As noted by Fremer and Olson, analysis of answer changes is often used to investigate testing irregularities because the analysis is readily performed and has proven its value in practice. Researchers such as Belov, Sinharay and Johnson, van der Linden and Jeon, van der Linden and Lewis, and Wollack, Cohen, and Eckerly have suggested several…

Descriptors: Identification, Statistics, Change, Tests

Dealing with Item Nonresponse in Large-Scale Cognitive Assessments: The Impact of Missing Data Methods on Estimated Explanatory Relationships

Peer reviewed

Direct link

Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Journal of Educational Measurement, 2017

Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…

Descriptors: Test Items, Cognitive Measurement, Testing Problems, Regression (Statistics)

A Response Time Process Model for Not-Reached and Omitted Items

Peer reviewed

Direct link

Lu, Jing; Wang, Chun – Journal of Educational Measurement, 2020

Item nonresponses are prevalent in standardized testing. They happen either when students fail to reach the end of a test due to a time limit or quitting, or when students choose to omit some items strategically. Oftentimes, item nonresponses are nonrandom, and hence, the missing data mechanism needs to be properly modeled. In this paper, we…

Descriptors: Item Response Theory, Test Items, Standardized Tests, Responses

Determining the Overall Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)

Modeling Skipped and Not-Reached Items Using IRTrees

Peer reviewed

Direct link

Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017

When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…

Descriptors: Item Response Theory, Test Items, Responses, Testing Problems

Monitoring Rater Performance over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

Peer reviewed

Direct link

Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009

In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…

Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)

Judges' Use of Examinee Performance Data in an Angoff Standard-Setting Exercise for a Medical Licensing Examination: An Experimental Study

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009

Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…

Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Detecting Differential Speededness in Multistage Testing

Peer reviewed

Direct link

van der Linden, Wim J.; Breithaupt, Krista; Chuah, Siang Chee; Zhang, Yanwei – Journal of Educational Measurement, 2007

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed…

Descriptors: Adaptive Testing, Evaluation Methods, Test Items, Reaction Time

The Effect of Distractions on Sixth-Grade Students in a Testing Situation

Peer reviewed

Trentham, Landa L. – Journal of Educational Measurement, 1975

Descriptors: Comparative Testing, Educational Testing, Elementary Education, Grade 6

Impact of Diagnosticity on the Adequacy of Models for Cognitive Diagnosis under a Linear Attribute Structure: A Simulation Study

Peer reviewed

Direct link

de La Torre, Jimmy; Karelitz, Tzur M. – Journal of Educational Measurement, 2009

Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM-based data with a linear attribute structure. The study utilizes a procedure…

Descriptors: Simulation, Item Response Theory, Psychometrics, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7

Linn, Robert L.	4
Sinharay, Sandip	4
Wainer, Howard	4
Budescu, David	2
Choi, Seung W.	2
Fitzpatrick, Anne R.	2
Hoover, H. D.	2
Hughes, David C.	2
Kim, Dong-In	2
Rowley, Glenn L.	2
Secolsky, Charles	2
Wan, Ping	2
Al-Karni, Ali	1
Askegaard, Lewis D.	1
Baglin, Roger F.	1
Baker, Frank B.	1
Baldwin, Su G.	1
Bar-Hillel, Maya	1
Berdie, Frances S.	1
Beuk, Cees H.	1
Bliss, Leonard B.	1
Braun, Henry I.	1
Breithaupt, Krista	1
Bridgeford, Nancy J.	1
More ▼