ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	13

Descriptor

Scoring	18
Statistical Analysis	18
Simulation	17
Item Response Theory	8
Test Items	8
Models	6
Comparative Analysis	5
Reliability	4
Accuracy	3
Adaptive Testing	3
Computation	3
Computer Assisted Testing	3
Computer Software	3
Correlation	3
Item Analysis	3
Sample Size	3
Scores	3
Test Theory	3
Algebra	2
Classification	2
Computer Simulation	2
Difficulty Level	2
Error of Measurement	2
Evaluation Criteria	2
Evaluation Methods	2
More ▼

Source

ETS Research Report Series	4
ProQuest LLC	4
American Journal of…	1
Assessment	1
International Educational…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Research on…	1

Publication Type

Reports - Research	11
Journal Articles	9
Dissertations/Theses -…	4
Collected Works - Proceedings	1
Speeches/Meeting Papers	1

Education Level

Secondary Education	3
Junior High Schools	2
Middle Schools	2
Elementary Education	1
Grade 4	1
Grade 8	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Postsecondary Education	1

Audience

Location

Afghanistan	1
Illinois (Chicago)	1
New York	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Torrance Tests of Creative…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Improving Methods for Propensity Score Analysis with Mismeasured Variables by Incorporating Background Variables with Moderated Nonlinear Factor Analysis

Direct link

Greifer, Noah – ProQuest LLC, 2018

There has been some research in the use of propensity scores in the context of measurement error in the confounding variables; one recommended method is to generate estimates of the mis-measured covariate using a latent variable model, and to use those estimates (i.e., factor scores) in place of the covariate. I describe a simulation study…

Descriptors: Evaluation Methods, Probability, Scores, Statistical Analysis

Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…

Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Assessing Methods for Generalizing Experimental Impact Estimates to Target Populations

Peer reviewed

Direct link

Kern, Holger L.; Stuart, Elizabeth A.; Hill, Jennifer; Green, Donald P. – Journal of Research on Educational Effectiveness, 2016

Randomized experiments are considered the gold standard for causal inference because they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research,…

Descriptors: Educational Research, Generalization, Sampling, Participant Characteristics

Item Response Data Analysis Using Stata Item Response Theory Package

Peer reviewed

Direct link

Yang, Ji Seung; Zheng, Xiaying – Journal of Educational and Behavioral Statistics, 2018

The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…

Descriptors: Item Response Theory, Item Analysis, Computer Software, Statistical Analysis

New and Improved? A Comparison of the Original and Revised Versions of the Structured Interview of Reported Symptoms

Peer reviewed

Direct link

Green, Debbie; Rosenfeld, Barry; Belfi, Brian – Assessment, 2013

The current study evaluated the accuracy of the Structured Interview of Reported Symptoms, Second Edition (SIRS-2) in a criterion-group study using a sample of forensic psychiatric patients and a community simulation sample, comparing it to the original SIRS and to results published in the SIRS-2 manual. The SIRS-2 yielded an impressive…

Descriptors: Structured Interviews, Comparative Analysis, Patients, Simulation

An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

Peer reviewed
PDF on ERIC

Download full text

Ali, Usama S.; Chang, Hua-Hua – ETS Research Report Series, 2014

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…

Descriptors: Adaptive Testing, Simulation, Pretests Posttests, Test Items

A Comparison of Item Calibration Procedures in the Presence of Test Speededness

Peer reviewed

Direct link

Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012

In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…

Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory

Longitudinal Rater Modeling with Splines

Direct link

Dobria, Lidia – ProQuest LLC, 2011

Performance assessments rely on the expert judgment of raters for the measurement of the quality of responses, and raters unavoidably introduce error in the scoring process. Defined as the tendency of a rater to assign higher or lower ratings, on average, than those assigned by other raters, even after accounting for differences in examinee…

Descriptors: Simulation, Performance Based Assessment, Performance Tests, Scoring

From Biology to Education: Scoring and Clustering Multilingual Text Sequences and Other Sequential. Research Report. ETS RR-12-25

Peer reviewed
PDF on ERIC

Download full text

Sukkarieh, Jane Z.; von Davier, Matthias; Yamamoto, Kentaro – ETS Research Report Series, 2012

This document describes a solution to a problem in the automatic content scoring of the multilingual character-by-character highlighting item type. This solution is language independent and represents a significant enhancement. This solution not only facilitates automatic scoring but plays an important role in clustering students' responses;…

Descriptors: Scoring, Multilingualism, Test Items, Role

Model Choice and Sample Size in Item Response Theory Analysis of Aphasia Tests

Peer reviewed

Direct link

Hula, William D.; Fergadiotis, Gerasimos; Martin, Nadine – American Journal of Speech-Language Pathology, 2012

Purpose: The purpose of this study was to identify the most appropriate item response theory (IRT) measurement model for aphasia tests requiring 2-choice responses and to determine whether small samples are adequate for estimating such models. Method: Pyramids and Palm Trees (Howard & Patterson, 1992) test data that had been collected from…

Descriptors: Sample Size, Guessing (Tests), Aphasia, Item Response Theory

Evaluating IRT- and CTT-Based Methods of Estimating Classification Consistency and Accuracy Indices from Single Administrations

Direct link

Deng, Nina – ProQuest LLC, 2011

Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…

Descriptors: Item Response Theory, Test Theory, Computation, Classification

Proceedings of the International Conference on Educational Data Mining (EDM) (9th, Raleigh, North Carolina, June 29-July 2, 2016)

Peer reviewed
PDF on ERIC

Download full text

Barnes, Tiffany, Ed.; Chi, Min, Ed.; Feng, Mingyu, Ed. – International Educational Data Mining Society, 2016

The 9th International Conference on Educational Data Mining (EDM 2016) is held under the auspices of the International Educational Data Mining Society at the Sheraton Raleigh Hotel, in downtown Raleigh, North Carolina, in the USA. The conference, held June 29-July 2, 2016, follows the eight previous editions (Madrid 2015, London 2014, Memphis…

Descriptors: Data Analysis, Evidence Based Practice, Inquiry, Science Instruction

Identifying Nonuniform DIF in Polytomously Scored Test Items. ACT Research Report Series 94-1.

Download full text

Spray, Judith; Miller, Tim – 1994

Computer simulations under three conditions of polytomous differential item functioning (DIF) compared the ability of three different statistical procedures to detect nonuniform DIF. The procedures were a nominal and an ordinal extension of the Mantel-Haenszel statistic, and logistic discriminant function analysis. Results showed that only the…

Descriptors: Computer Simulation, Identification, Item Bias, Sample Size

Conditional Covariance Theory and DETECT for Polytomous Items. Research Report. ETS RR-04-50

Peer reviewed
PDF on ERIC

Download full text

Zhang, Jinming – ETS Research Report Series, 2004

This paper extends the theory of conditional covariances to polytomous items. It has been mathematically proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, is positive if the two items are dimensionally homogeneous and negative…

Descriptors: Test Items, Test Theory, Correlation, National Competency Tests

Previous Page | Next Page »

Pages: 1 | 2

Ali, Usama S.	1
Barnes, Tiffany, Ed.	1
Belfi, Brian	1
Chang, Hua-Hua	1
Chi, Min, Ed.	1
Cho, Sun-Joo	1
Deng, Nina	1
Dobria, Lidia	1
Feng, Mingyu, Ed.	1
Fergadiotis, Gerasimos	1
Green, Debbie	1
Green, Donald P.	1
Greene, John F.	1
Greifer, Noah	1
Harris, Dickie A.	1
Hill, Jennifer	1
Hula, William D.	1
Kern, Holger L.	1
Kim, Sooyeon	1
Livingston, Samuel A.	1
Martin, Nadine	1
Miller, Tim	1
Penell, Roger J.	1
Rosenfeld, Barry	1
Spray, Judith	1
More ▼