ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	21

Descriptor

Foreign Countries	24
Test Items	24
Achievement Tests	7
International Assessment	7
Item Analysis	7
Mathematics Tests	7
Difficulty Level	6
Test Construction	6
Models	5
Correlation	4
English	4
Reading Tests	4
Scores	4
Secondary School Students	4
Test Bias	4
Comparative Analysis	3
Computation	3
Computer Assisted Testing	3
Error of Measurement	3
French	3
High School Students	3
Item Response Theory	3
Multiple Choice Tests	3
Psychometrics	3
Scoring	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	24
Reports - Research	21
Reports - Evaluative	3
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Secondary Education	7
Elementary Secondary Education	6
Elementary Education	3
Grade 3	2
Grade 8	2
High Schools	2
Higher Education	2
Junior High Schools	2
Middle Schools	2
Postsecondary Education	2
Grade 11	1
Grade 4	1
Grade 6	1
Grade 9	1
Intermediate Grades	1
More ▼

Audience

Location

Canada	6
Germany	3
United Kingdom	3
Israel	2
Australia	1
Belgium	1
Finland	1
France	1
Iran	1
Italy	1
Japan	1
Jordan	1
Netherlands	1
Oman	1
Romania	1
Russia	1
Spain	1
United Kingdom (Northern…	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	5
Trends in International…	3
Progress in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 24 results Save | Export

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Coefficient [beta] as Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-Scored Tests

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Measurement in Education, 2021

KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…

Descriptors: Test Reliability, Scores, Scoring, Computation

Not-Reached Items: An Issue of Time and of Test-Taking Disengagement? The Case of PISA 2015 Reading Data

Peer reviewed

Direct link

Pools, Elodie – Applied Measurement in Education, 2022

Many low-stakes assessments, such as international large-scale surveys, are administered during time-limited testing sessions and some test-takers are not able to endorse the last items of the test, resulting in not-reached (NR) items. However, because the test has no consequence for the respondents, these NR items can also stem from quitting the…

Descriptors: Achievement Tests, Foreign Countries, International Assessment, Secondary School Students

Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments

Peer reviewed

Direct link

Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023

Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…

Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models

Computer-Based Listening Test with Full Video, Visual-Limited Video, and Audio: A Comparative Analysis Based on Difficulty, Discrimination Power, and Response Time

Peer reviewed

Direct link

Takahiro Terao – Applied Measurement in Education, 2024

This study aimed to compare item characteristics and response time between stimulus conditions in computer-delivered listening tests. Listening materials had three variants: regular videos, frame-by-frame videos, and only audios without visuals. Participants were 228 Japanese high school students who were requested to complete one of nine…

Descriptors: Computer Assisted Testing, Audiovisual Aids, Reaction Time, High School Students

The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments

Peer reviewed

Direct link

El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020

In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…

Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

Peer reviewed

Direct link

Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…

Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability

Item Parameter Drift in a Time-Varying Predictor

Peer reviewed

Direct link

Lee, HyeSun – Applied Measurement in Education, 2018

The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…

Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores

Using Mixed Methods to Interpret Differential Item Functioning

Peer reviewed

Direct link

Benítez, Isabel; Padilla, José-Luis; Hidalgo Montesinos, María Dolores; Sireci, Stephen G. – Applied Measurement in Education, 2016

Analysis of differential item functioning (DIF) is often used to determine if cross-lingual assessments are equivalent across languages. However, evidence on the causes of cross-lingual DIF is still evasive. Expert appraisal is a qualitative method useful for obtaining detailed information about problematic elements in the different linguistic…

Descriptors: Test Bias, Mixed Methods Research, Questionnaires, International Assessment

Negative Keying Effects in the Factor Structure of TIMSS 2011 Motivation Scales and Associations with Reading Achievement

Peer reviewed

Direct link

Michaelides, Michalis P. – Applied Measurement in Education, 2019

The Student Background survey administered along with achievement tests in studies of the International Association for the Evaluation of Educational Achievement includes scales of student motivation, competence, and attitudes toward mathematics and science. The scales consist of positively- and negatively keyed items. The current research…

Descriptors: International Assessment, Achievement Tests, Mathematics Achievement, Mathematics Tests

Focusing on Interactions between Content and Cognition: A New Perspective on Gender Differences in Mathematical Sub-Competencies

Peer reviewed

Direct link

George, Ann Cathrice; Robitzsch, Alexander – Applied Measurement in Education, 2018

This article presents a new perspective on measuring gender differences in the large-scale assessment study Trends in International Science Study (TIMSS). The suggested empirical model is directly based on the theoretical competence model of the domain mathematics and thus includes the interaction between content and cognitive sub-competencies.…

Descriptors: Achievement Tests, Elementary Secondary Education, Mathematics Achievement, Mathematics Tests

Examinee Non-Effort on Contextualized and Non-Contextualized Mathematics Items in Large-Scale Assessments

Peer reviewed

Direct link

Nijlen, Daniel Van; Janssen, Rianne – Applied Measurement in Education, 2015

In this study it is investigated to what extent contextualized and non-contextualized mathematics test items have a differential impact on examinee effort. Mixture item response theory (IRT) models are applied to two subsets of items from a national assessment on mathematics in the second grade of the pre-vocational track in secondary education in…

Descriptors: Mathematics Tests, Measurement, Item Response Theory, Test Items

Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Peer reviewed

Direct link

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…

Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis

Testing Expert-Based versus Student-Based Cognitive Models for a Grade 3 Diagnostic Mathematics Assessment

Peer reviewed

Direct link

Roduta Roberts, Mary; Alves, Cecilia B.; Chu, Man-Wai; Thompson, Margaret; Bahry, Louise M.; Gotzmann, Andrea – Applied Measurement in Education, 2014

The purpose of this study was to evaluate the adequacy of three cognitive models, one developed by content experts and two generated from student verbal reports for explaining examinee performance on a grade 3 diagnostic mathematics test. For this study, the items were developed to directly measure the attributes in the cognitive model. The…

Descriptors: Foreign Countries, Mathematics Tests, Cognitive Processes, Models

Previous Page | Next Page »

Pages: 1 | 2

Ercikan, Kadriye	2
Gierl, Mark J.	2
Abu-Ghazalah, Rashid M.	1
Abulela, Mohammed A. A.	1
Ainley, John	1
Allalouf, Avi	1
Almehrizi, Rashid S.	1
Alves, Cecilia B.	1
Andrich, David	1
Bahry, Louise M.	1
Benítez, Isabel	1
Bimpeh, Yaw	1
Bosker, Roel J.	1
Boulais, André-Philippe	1
Chis, Liliana	1
Chu, Man-Wai	1
Clauser, Brian E.	1
Cor, M. Kenneth	1
Cui, Ying	1
De Champlain, André	1
Deunk, Marjolein I.	1
Dubins, David N.	1
El Masri, Yasmine H.	1
Fraillon, Julian	1
Gebhardt, Eveline	1
More ▼