NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 23 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Hyeryung Lee; Walter P. Vispoel – Journal of Educational Measurement, 2025
Traditional methods for detecting cheating on assessments tend to focus on either identifying cheaters or compromised items in isolation, overlooking their interconnection. In this study, we present a novel biclustering approach that simultaneously detects both cheaters and compromised items by identifying coherent subgroups of examinees and items…
Descriptors: Identification, Cheating, Test Wiseness, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Chun Wang; Ping Chen; Shengyu Jiang – Journal of Educational Measurement, 2020
Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait [theta] estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to…
Descriptors: Test Construction, Test Items, Adaptive Testing, Maximum Likelihood Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Journal of Educational Measurement, 2017
Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…
Descriptors: Test Items, Cognitive Measurement, Testing Problems, Regression (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Lu, Jing; Wang, Chun – Journal of Educational Measurement, 2020
Item nonresponses are prevalent in standardized testing. They happen either when students fail to reach the end of a test due to a time limit or quitting, or when students choose to omit some items strategically. Oftentimes, item nonresponses are nonrandom, and hence, the missing data mechanism needs to be properly modeled. In this paper, we…
Descriptors: Item Response Theory, Test Items, Standardized Tests, Responses
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Journal of Educational Measurement, 2017
Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…
Descriptors: Goodness of Fit, Testing, Test Items, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017
When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…
Descriptors: Item Response Theory, Test Items, Responses, Testing Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Peer reviewed Peer reviewed
Secolsky, Charles – Journal of Educational Measurement, 1987
For measuring the face validity of a test, Nevo suggested that test takers and nonprofessional users rate items on a five point scale. This article questions the ability of those raters and the credibility of the aggregated judgment as evidence of the validity of the test. (JAZ)
Descriptors: Content Validity, Measurement Techniques, Rating Scales, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J.; Breithaupt, Krista; Chuah, Siang Chee; Zhang, Yanwei – Journal of Educational Measurement, 2007
A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed…
Descriptors: Adaptive Testing, Evaluation Methods, Test Items, Reaction Time
Peer reviewed Peer reviewed
Lord, Frederic M. – Journal of Educational Measurement, 1977
Two approaches for determining the optimal number of choices for a test item, presently in the literature, are compared with two new approaches. (Author)
Descriptors: Forced Choice Technique, Latent Trait Theory, Multiple Choice Tests, Test Items
Peer reviewed Peer reviewed
Ebel, Robert L. – Journal of Educational Measurement, 1982
Reasonable and practical solutions to two major problems confronting the developer of any test of educational achievement (what to measure and how to measure it) are proposed, defended, and defined. (Author/PN)
Descriptors: Measurement Techniques, Objective Tests, Test Construction, Test Items
Peer reviewed Peer reviewed
Wainer, Howard; Kiely, Gerard L. – Journal of Educational Measurement, 1987
The testlet, a bundle of test items, alleviates some problems associated with computerized adaptive testing: context effects, lack of robustness, and item difficulty ordering. While testlets may be linear or hierarchical, the most useful ones are four-level hierarchical units, containing 15 items and partitioning examinees into 16 classes. (GDC)
Descriptors: Adaptive Testing, Computer Assisted Testing, Context Effect, Item Banks
Peer reviewed Peer reviewed
Klein, Lawrence W.; Jarjoura, David – Journal of Educational Measurement, 1985
The test equating accuracy of content-representative anchors (subsets of items in common) versus nonrepresentative, but substantially longer, anchors was compared for a professional certification examination. Through a chain of equatings, it was found that content representation in anchors was critical. (Author/GDC)
Descriptors: Adults, Equated Scores, Higher Education, Item Banks
Peer reviewed Peer reviewed
Jaeger, Richard M. – Journal of Educational Measurement, 1981
Five indices are discussed that should logically discriminate between situations in which: (1) the linear equating method (LEM) adequately adjusts for difference between score distributions of two approximately parallel test forms; or (2) a method more complex than the linear equating method is needed. (RL)
Descriptors: College Entrance Examinations, Comparative Analysis, Difficulty Level, Equated Scores
Peer reviewed Peer reviewed
Veale, James R.; Foreman, Dale I. – Journal of Educational Measurement, 1983
Statistical procedures for measuring heterogeneity of test item distractor distributions, or cultural variation, are presented. These procedures are based on the notion that examinees' responses to the incorrect options of a multiple-choice test provide more information concerning cultural bias than their correct responses. (Author/PN)
Descriptors: Ethnic Bias, Item Analysis, Mathematical Models, Multiple Choice Tests
Previous Page | Next Page »
Pages: 1  |  2