ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	9

Descriptor

Test Items	35
Test Validity	28
Test Construction	15
Item Analysis	11
Higher Education	7
Multiple Choice Tests	7
Difficulty Level	6
Scores	6
Test Reliability	6
Testing Problems	5
Comparative Analysis	4
Computer Simulation	4
Construct Validity	4
Item Response Theory	4
Mathematical Models	4
Achievement Tests	3
Adaptive Testing	3
Computer Assisted Testing	3
Evaluation Methods	3
High Schools	3
Measurement Techniques	3
Models	3
Rating Scales	3
Test Format	3
Test Interpretation	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	33
Reports - Research	25
Reports - Evaluative	5
Speeches/Meeting Papers	3
Reports - Descriptive	2
Opinion Papers	1

Education Level

Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Researchers

Location

Canada

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	2
Iowa Tests of Basic Skills	1
Program for International…	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing 1 to 15 of 35 results Save | Export

Does Timed Testing Affect the Interpretation of Efficiency Scores?--A GLMM Analysis of Reading Components

Peer reviewed

Direct link

Frank Goldhammer; Ulf Kroehne; Carolin Hahnel; Johannes Naumann; Paul De Boeck – Journal of Educational Measurement, 2024

The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability…

Descriptors: Timed Tests, Efficiency, Scores, Test Interpretation

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions: A Demonstration

Peer reviewed

Direct link

Yaneva, Victoria; Clauser, Brian E.; Morales, Amy; Paniagua, Miguel – Journal of Educational Measurement, 2021

Eye-tracking technology can create a record of the location and duration of visual fixations as a test-taker reads test questions. Although the cognitive process the test-taker is using cannot be directly observed, eye-tracking data can support inferences about these unobserved cognitive processes. This type of information has the potential to…

Descriptors: Eye Movements, Test Validity, Multiple Choice Tests, Cognitive Processes

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

Exploring the Influence of Judge Proficiency on Standard-Setting Judgments

Peer reviewed

Direct link

Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…

Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators

Dealing with Item Nonresponse in Large-Scale Cognitive Assessments: The Impact of Missing Data Methods on Estimated Explanatory Relationships

Peer reviewed

Direct link

Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Journal of Educational Measurement, 2017

Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…

Descriptors: Test Items, Cognitive Measurement, Testing Problems, Regression (Statistics)

An Experimental Study of the Internal Consistency of Judgments Made in Bookmark Standard Setting

Peer reviewed

Direct link

Clauser, Brian E.; Baldwin, Peter; Margolis, Melissa J.; Mee, Janet; Winward, Marcia – Journal of Educational Measurement, 2017

Validating performance standards is challenging and complex. Because of the difficulties associated with collecting evidence related to external criteria, validity arguments rely heavily on evidence related to internal criteria--especially evidence that expert judgments are internally consistent. Given its importance, it is somewhat surprising…

Descriptors: Evaluation Methods, Standard Setting, Cutting Scores, Expertise

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

An Empirically Based Method of Q-Matrix Validation for the DINA Model: Development and Applications

Peer reviewed

Direct link

de la Torre, Jimmy – Journal of Educational Measurement, 2008

Most model fit analyses in cognitive diagnosis assume that a Q matrix is correct after it has been constructed, without verifying its appropriateness. Consequently, any model misfit attributable to the Q matrix cannot be addressed and remedied. To address this concern, this paper proposes an empirically based method of validating a Q matrix used…

Descriptors: Matrices, Validity, Models, Evaluation Methods

Validation of Group Domain Score Estimates Using a Test of Domain

Peer reviewed

Direct link

Pommerich, Mary – Journal of Educational Measurement, 2006

Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because…

Descriptors: Test Validity, Scores, Simulation, Evaluation Methods

A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective.

Peer reviewed

Ackerman, Terry A. – Journal of Educational Measurement, 1992

The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)

Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)

The Effectiveness of Illustrated Items

Peer reviewed

Washington, William N.; Godfrey, R. Richard – Journal of Educational Measurement, 1974

Item statistics between illustrated and written items drawn from the same content areas were compared using F ratios. The results indicated: that illustrated items performed slightly better than matched written items; and that the best performing category of illustrated items was tables. (Author/BB)

Descriptors: Achievement Tests, Illustrations, Test Construction, Test Items

On the Direct Measurement of Face Validity: A Comment on Nevo.

Peer reviewed

Secolsky, Charles – Journal of Educational Measurement, 1987

For measuring the face validity of a test, Nevo suggested that test takers and nonprofessional users rate items on a five point scale. This article questions the ability of those raters and the credibility of the aggregated judgment as evidence of the validity of the test. (JAZ)

Descriptors: Content Validity, Measurement Techniques, Rating Scales, Test Items

Improving Construct Validity with Cognitive Psychology Principles.

Peer reviewed

Embretson, Susan; Gorin, Joanna – Journal of Educational Measurement, 2001

Examines testing practices in: (1) the past, in which the traditional paradigm left little room for cognitive psychology principles; (2) the present, in which testing research is enhanced by principles of cognitive psychology; and (3) the future, in which the potential of cognitive psychology should be fully realized through item design.…

Descriptors: Cognitive Psychology, Construct Validity, Educational Research, Educational Testing

The Use of Latent Partition Analysis to Identify Homogeneity of an Item Population

Peer reviewed

Hartke, Alan R. – Journal of Educational Measurement, 1978

Latent partition analysis is shown to be useful in determining the conceptual homogeneity of an item population. Such item populations are useful for mastery testing. Applications of latent partition analysis in assessing content validity are suggested. (Author/JKS)

Descriptors: Higher Education, Item Analysis, Item Sampling, Mastery Tests

A Monte Carlo Comparison of Ten Item Discrimination Indices.

Peer reviewed

Beuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979

Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)

Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction

Previous Page | Next Page »

Pages: 1 | 2 | 3

Clauser, Brian E.	2
Secolsky, Charles	2
Wainer, Howard	2
Ackerman, Terry A.	1
Baldwin, Peter	1
Bennett, Randy Elliot	1
Benson, Jeri	1
Beuchert, A. Kent	1
Brandenburg, Dale C.	1
Carolin Hahnel	1
Carstensen, Claus H.	1
Chalifour, Clark L.	1
Cohen, Allan S.	1
Ebel, Robert L.	1
Embretson, Susan	1
Enright, Mary K.	1
Forsyth, Robert A.	1
Frank Goldhammer	1
Frary, Robert B.	1
Frisbie, David A.	1
Garg, Rashmi	1
Godfrey, R. Richard	1
Gorin, Joanna	1
Hartke, Alan R.	1
More ▼