ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	15

Source

Applied Measurement in…

Publication Type

Journal Articles	63
Reports - Evaluative	63
Speeches/Meeting Papers	3
Information Analyses	1
Opinion Papers	1
Reports - Research	1

Education Level

Elementary Secondary Education	4
Elementary Education	3
Grade 5	2
Middle Schools	2
Secondary Education	2
Grade 4	1
Grade 7	1
Grade 8	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Postsecondary Education	1
Primary Education	1
More ▼

Audience

Location

United Kingdom

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Advanced Placement…	1
Law School Admission Test	1
SAT (College Admission Test)	1
TerraNova Multiple Assessments	1

What Works Clearinghouse Rating

Showing 1 to 15 of 63 results Save | Export

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Prescribing Structure for Validation Arguments: Elemental, Structural, and Ecological Validity

Peer reviewed

Direct link

Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019

Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…

Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation

Effort Analysis: Individual Score Validation of Achievement Test Data

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2015

Whenever the purpose of measurement is to inform an inference about a student's achievement level, it is important that we be able to trust that the student's test score accurately reflects what that student knows and can do. Such trust requires the assumption that a student's test event is not unduly influenced by construct-irrelevant factors…

Descriptors: Achievement Tests, Scores, Validity, Test Items

Probabilistic Approaches to Examining Linguistic Features of Test Items and Their Effect on the Performance of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Applied Measurement in Education, 2014

This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…

Descriptors: English Language Learners, Test Items, Probability, Test Bias

Second-Generation Challenges for Making Content Assessments Accessible for ELLs

Peer reviewed

Direct link

Kopriva, Rebecca J. – Applied Measurement in Education, 2014

In this commentary, Rebecca Kopriva examines the articles in this special issue by drawing on her experience from three series of investigations examining how English language learners (ELLs) and other students perceive what test items ask and how they can successfully represent what they know. The first series examined the effect of different…

Descriptors: English Language Learners, Test Items, Educational Assessment, Access to Education

An Analytic Comparison of Effect Sizes for Differential Item Functioning

Peer reviewed

Direct link

Demars, Christine E. – Applied Measurement in Education, 2011

Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…

Descriptors: Effect Size, Test Bias, Probability, Difficulty Level

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Peer reviewed

Direct link

Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement

Are Inferential Reading Items More Susceptible to Cultural Bias than Literal Reading Items?

Peer reviewed

Direct link

Banks, Kathleen – Applied Measurement in Education, 2012

The purpose of this article is to illustrate a seven-step process for determining whether inferential reading items were more susceptible to cultural bias than literal reading items. The seven-step process was demonstrated using multiple-choice data from the reading portion of a reading/language arts test for fifth and seventh grade Hispanic,…

Descriptors: Reading Tests, Test Items, Standardized Tests, Test Bias

Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration

Peer reviewed

Direct link

Penfield, Randall D.; Alvarez, Karina; Lee, Okhee – Applied Measurement in Education, 2009

The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…

Descriptors: Test Bias, Classification, Test Items, Criteria

Correlates of Rapid-Guessing Behavior in Low-Stakes Testing: Implications for Test Development and Measurement Practice

Peer reviewed

Direct link

Wise, Steven L.; Pastor, Dena A.; Kong, Xiaojing J. – Applied Measurement in Education, 2009

Previous research has shown that rapid-guessing behavior can degrade the validity of test scores from low-stakes proficiency tests. This study examined, using hierarchical generalized linear modeling, examinee and item characteristics for predicting rapid-guessing behavior. Several item characteristics were found significant; items with more text…

Descriptors: Guessing (Tests), Achievement Tests, Correlation, Test Items

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Peer reviewed

Direct link

Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…

Descriptors: Test Items, Test Content, Testing Programs, Simulation

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring

Creating IRT-Based Parallel Test Forms Using the Genetic Algorithm Method

Peer reviewed

Direct link

Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008

In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…

Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory

Conclusions about Frequently Studied Modified Angoff Standard-Setting Topics

Peer reviewed

Direct link

Brandon, Paul R. – Applied Measurement in Education, 2004

This article reviews the empirical literature on 9 topics about the modified Angoff standard-setting method that have been studied repeatedly in the literature, while taking into consideration the methodological warrant for the findings on the topics. It concludes that we can be reasonably confident about selecting the appropriate number of judges…

Descriptors: Test Items, Standard Setting (Scoring), Research Methodology, Testing

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Downing, Steven M.	3
Gierl, Mark J.	3
Haladyna, Thomas M.	3
Meijer, Rob R.	3
Wise, Steven L.	3
Frary, Robert B.	2
Puhan, Gautam	2
Su, Ya-Hui	2
Wainer, Howard	2
Wang, Wen-Chung	2
Ackerman, Terry A.	1
Alvarez, Karina	1
Bandalos, Deborah L.	1
Banks, Kathleen	1
Bimpeh, Yaw	1
Bolt, Daniel M.	1
Boughton, Keith A.	1
Brandon, Paul R.	1
Chang, Lei	1
Chen, Yu-Jen	1
Cheng, Chien-Fen	1
Chis, Liliana	1
Clauser, Brian E.	1
Cohen, Allan S.	1
More ▼

Test Items	63
Item Response Theory	23
Test Construction	18
Multiple Choice Tests	12
Scores	12
Psychometrics	10
Achievement Tests	9
Classification	9
Simulation	9
Item Bias	8
Responses	8
Test Bias	8
Comparative Analysis	7
Computer Assisted Testing	7
Evaluation Methods	7
Scoring	7
Validity	7
Identification	6
Item Analysis	6
Measurement Techniques	6
Research Methodology	6
Test Format	6
Ability	5
Elementary Secondary Education	5
Reading Tests	5
More ▼