ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	18

Descriptor

Error of Measurement	19
Item Response Theory	19
Testing	19
Test Bias	8
Grade 7	6
Scoring	6
Test Construction	6
Test Reliability	6
Testing Programs	6
Academic Achievement	5
Achievement Tests	5
Data Collection	5
English	5
Grade 3	5
Grade 4	5
Grade 5	5
Grade 6	5
Grade 8	5
Language Arts	5
Language Tests	5
Mathematics Tests	5
Simulation	5
Test Items	5
Test Results	5
Test Validity	5
More ▼

Source

New York State Education…	5
Educational and Psychological…	4
Applied Psychological…	2
Annenberg Institute for…	1
Behavioral Research and…	1
Council of Chief State School…	1
ETS Research Report Series	1
Educational Research	1
Journal of Educational…	1
Online Submission	1
Psicologica: International…	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	8
Numerical/Quantitative Data	6
Reports - Descriptive	6
Reports - Evaluative	5
Speeches/Meeting Papers	1

Education Level

Junior High Schools	7
Secondary Education	7
Grade 7	6
Middle Schools	6
Early Childhood Education	5
Elementary Education	5
Grade 3	5
Grade 4	5
Grade 5	5
Grade 6	5
Grade 8	5
Intermediate Grades	5
Primary Education	5
Elementary Secondary Education	1
High Schools	1
Higher Education	1
Postsecondary Education	1
More ▼

Audience

Practitioners

Location

New York	5
Taiwan	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

The Sensitivity of Value-Added Estimates to Test Scoring Decisions. EdWorkingPaper No. 25-1226

Download full text

Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…

Descriptors: Value Added Models, Tests, Testing, Scoring

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Examining Measurement Invariance and Differential Item Functioning with Discrete Latent Construct Indicators: A Note on a Multiple Testing Procedure

Peer reviewed

Direct link

Raykov, Tenko; Dimitrov, Dimiter M.; Marcoulides, George A.; Li, Tatyana; Menold, Natalja – Educational and Psychological Measurement, 2018

A latent variable modeling method for studying measurement invariance when evaluating latent constructs with multiple binary or binary scored items with no guessing is outlined. The approach extends the continuous indicator procedure described by Raykov and colleagues, utilizes similarly the false discovery rate approach to multiple testing, and…

Descriptors: Models, Statistical Analysis, Error of Measurement, Test Bias

Improving Measures via Examining the Behavior of Distractors in Multiple-Choice Tests: Assessment and Remediation

Peer reviewed

Direct link

Sideridis, Georgios; Tsaousis, Ioannis; Al Harbi, Khaleel – Educational and Psychological Measurement, 2017

The purpose of the present article was to illustrate, using an example from a national assessment, the value from analyzing the behavior of distractors in measures that engage the multiple-choice format. A secondary purpose of the present article was to illustrate four remedial actions that can potentially improve the measurement of the…

Descriptors: Multiple Choice Tests, Attention Control, Testing, Remedial Instruction

Monitoring Items in Real Time to Enhance CAT Security

Peer reviewed

Direct link

Zhang, Jinming; Li, Jie – Journal of Educational Measurement, 2016

An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…

Descriptors: Computer Assisted Testing, Test Items, Difficulty Level, Item Response Theory

New York State Testing Program 2018: English Language Arts and Mathematics Grades 3-8. Technical Report

Download full text

New York State Education Department, 2018

This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2018 Operational Tests. This report includes information about test content and test development, item (i.e., individual…

Descriptors: English, Language Arts, Language Tests, Mathematics Tests

Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

Peer reviewed

Direct link

Kim, Jihye; Oshima, T. C. – Educational and Psychological Measurement, 2013

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…

Descriptors: Test Bias, Test Items, Statistical Analysis, Error of Measurement

New York State Testing Program 2017: English Language Arts and Mathematics Grades 3-8. Technical Report

Download full text

New York State Education Department, 2017

This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2017 Operational Tests. This report includes information about test content and test development, item (i.e., individual…

Descriptors: English, Language Arts, Language Tests, Mathematics Tests

New York State Testing Program 2016: English Language Arts and Mathematics Grades 3-8. Technical Report

Download full text

New York State Education Department, 2016

This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 Common Core English Language Arts (ELA) and Mathematics 2016 Operational Tests. This report includes information about test content and test development, item (i.e.,…

Descriptors: Testing Programs, English, Language Arts, Mathematics Tests

Assessing Short-Term Individual Consistency Using IRT-Based Statistics

Peer reviewed
PDF on ERIC

Download full text

Ferrando, Pere J. – Psicologica: International Journal of Methodology and Experimental Psychology, 2010

This article proposes a procedure, based on a global statistic, for assessing intra-individual consistency in a test-retest design with a short-term retest interval. The procedure is developed within the framework of parametric item response theory, and the statistic is a likelihood-based measure that can be considered as an extension of the…

Descriptors: Item Response Theory, Intervals, Psychometrics, Testing

New York State Testing Program 2015: English Language Arts and Mathematics Grades 3-8. Technical Report

Download full text

New York State Education Department, 2015

This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 Common Core English Language Arts (ELA) and Mathematics 2015 Operational Tests. This report includes information about test content and test development, item (i.e.,…

Descriptors: Testing Programs, English, Language Arts, Mathematics Tests

New York State Testing Program 2014: English Language Arts and Mathematics Grades 3-8. Technical Report

Download full text

New York State Education Department, 2014

This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 Common Core English Language Arts (ELA) and Mathematics 2014 Operational Tests. This report includes information about test content and test development, item (i.e.,…

Descriptors: Testing Programs, English, Language Arts, Mathematics Tests

Ramsay-Curve Differential Item Functioning

Peer reviewed

Direct link

Woods, Carol M. – Applied Psychological Measurement, 2011

Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…

Descriptors: Simulation, Item Response Theory, Testing, Questionnaires

Addressing Two Commonly Unrecognized Sources of Score Instability in Annual State Assessments

Download full text

Doorey, Nancy A. – Council of Chief State School Officers, 2011

The work reported in this paper reflects a collaborative effort of many individuals representing multiple organizations. It began during a session at the October 2008 meeting of TILSA when a representative of a member state asked the group if any of their programs had experienced unexpected fluctuations in the annual state assessment scores, and…

Descriptors: Testing, Sampling, Expertise, Testing Programs

The Development and Technical Adequacy of Seventh-Grade Reading Comprehension Measures in a Progress Monitoring Assessment System. Technical Report #1102

Download full text

Park, Bitnara Jasmine; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2011

This technical report describes the process of development and piloting of reading comprehension measures that are appropriate for seventh-grade students as part of an online progress screening and monitoring assessment system, http://easycbm.com. Each measure consists of an original fictional story of approximately 1,600 to 1,900 words with 20…

Descriptors: Reading Comprehension, Reading Tests, Grade 7, Test Construction

Previous Page | Next Page »

Pages: 1 | 2

Woods, Carol M.	2
Al Harbi, Khaleel	1
Alonzo, Julie	1
Benjamin W. Domingue	1
Bramley, Tom	1
Chang, Shun-Wen	1
Dimitrov, Dimiter M.	1
Doorey, Nancy A.	1
Dorans, Neil J.	1
Ferrando, Pere J.	1
Guo, Hongwen	1
James G. Soland	1
Joshua B. Gilbert	1
Karkee, Thakur B.	1
Kim, Jihye	1
Li, Jie	1
Li, Tatyana	1
Lu, Ru	1
Marcoulides, George A.	1
Menold, Natalja	1
Oshima, T. C.	1
Park, Bitnara Jasmine	1
Raykov, Tenko	1
Sideridis, Georgios	1
Tindal, Gerald	1
More ▼