ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	6

Descriptor

True Scores	12
Equated Scores	7
Item Response Theory	7
Simulation	5
Error of Measurement	3
Statistical Analysis	3
Accuracy	2
Correlation	2
Evaluation Criteria	2
Reliability	2
Sample Size	2
Sampling	2
Test Bias	2
Test Format	2
Test Items	2
Test Reliability	2
Test Theory	2
Advanced Placement Programs	1
Automation	1
Bias	1
College Entrance Examinations	1
Comparative Analysis	1
Computation	1
Computer Assisted Testing	1
Data Analysis	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	12
Reports - Research	8
Reports - Evaluative	4

Education Level

Early Childhood Education	1
Higher Education	1
Postsecondary Education	1
Preschool Education	1

Audience

Location

Israel

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Accuracy and Sensitivity of Coefficient Alpha and Its Alternatives with Unidimensional and Contaminated Scales

Peer reviewed

Direct link

Xiao, Leifeng; Hau, Kit-Tai – Applied Measurement in Education, 2023

We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small…

Descriptors: Test Theory, Test Reliability, Factor Analysis, Test Length

Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model

Peer reviewed

Direct link

Zhang, Zhonghua – Applied Measurement in Education, 2020

The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…

Descriptors: Error of Measurement, Computation, Equated Scores, True Scores

Investigating Repeater Effects on Small Sample Equating: Include or Exclude?

Peer reviewed

Direct link

Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020

Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…

Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

An Extension of IRT-Based Equating to the Dichotomous Testlet Response Theory Model

Peer reviewed

Direct link

Tao, Wei; Cao, Yi – Applied Measurement in Education, 2016

Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…

Descriptors: Item Response Theory, Equated Scores, Test Format, Models

Bi-Factor MIRT Observed-Score Equating for Mixed-Format Tests

Peer reviewed

Direct link

Lee, Guemin; Lee, Won-Chan – Applied Measurement in Education, 2016

The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…

Descriptors: Test Format, Multidimensional Scaling, Item Response Theory, Equated Scores

A Comparison among IRT True- and Observed-Score Equatings and Traditional Equipercentile Equating.

Peer reviewed

Han, Tianqi; And Others – Applied Measurement in Education, 1997

Stability among equating procedures was studied by comparing item response theory (IRT) true-score equating with IRT observed-score equating, IRT true-score equating with equipercentile equating, and IRT observed-score equating with equipercentile equating. On average, IRT true-score equating more frequently produced more stable conversions. (SLD)

Descriptors: Comparative Analysis, Equated Scores, Item Response Theory, Raw Scores

Simulation Results of Effects on Linear and Curvilinear Observed- and True-Score Equating Procedures of Matching on a Fallible Criterion.

Peer reviewed

Eignor, Daniel R.; And Others – Applied Measurement in Education, 1990

Two independent replications of a sequence of simulations were conducted to aid in the diagnosis and interpretation of equating differences found between representative (random) and matched (nonrandom) samples for three commonly used conventional observed-score equating procedures and one item-response-theory-based equating procedure. (SLD)

Descriptors: Equated Scores, Item Response Theory, Sampling, Simulation

Variability in Reliability Coefficients and the Standard Error of Measurement from School District to District.

Peer reviewed

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1999

Examined the stability of the standard error of measurement and the relationship between the reliability coefficient and the variance of both true scores and error scores for 170 school districts in a state. As expected, reliability coefficients varied as a function of group variability, but the variation in split-half coefficients from school to…

Descriptors: Elementary Secondary Education, Error of Measurement, Reliability, School Districts

Evaluating the Effects of Multidimensionality on IRT True-Score Equating.

Peer reviewed

Bolt, Daniel M. – Applied Measurement in Education, 1999

Examined whether the item response theory (IRT) true-score equating method is more adversely affected by the presence of multidimensionality than two conventional equating methods, linear and equipercentile equating. Results of two simulation studies suggest that the IRT method performs as well as the conventional methods when the correlation…

Descriptors: Correlation, Equated Scores, Item Response Theory, Simulation

Performance of SIBTEST When the Percentage of DIF Items Is Large

Peer reviewed

Direct link

Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…

Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

Information Retention as a Function of the Number of Intervals and the Reliability of Continuous Variables.

Peer reviewed

Schiel, Jeffrey L.; Shaw, Dale G. – Applied Measurement in Education, 1992

Changes in information retention resulting from changes in reliability and number of intervals in scale construction were studied to provide quantitative information to help in decisions about choosing intervals. Information retention reached a maximum when the number of intervals was about 8 or more and reliability was near 1.0. (SLD)

Descriptors: Decision Making, Knowledge Level, Mathematical Models, Monte Carlo Methods

Ben-Simon, Anat	1
Bolt, Daniel M.	1
Boughton, Keith A.	1
Cao, Yi	1
Cohen, Yoav	1
Diao, Hongyu	1
Eignor, Daniel R.	1
Feldt, Leonard S.	1
Gierl, Mark J.	1
Gotzmann, Andrea	1
Han, Tianqi	1
Hau, Kit-Tai	1
Keller, Lisa	1
Lee, Guemin	1
Lee, Won-Chan	1
Levi, Effi	1
Qualls, Audrey L.	1
Schiel, Jeffrey L.	1
Shaw, Dale G.	1
Tao, Wei	1
Xiao, Leifeng	1
Zhang, Zhonghua	1
More ▼