ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	10

Descriptor

Error of Measurement	16
Scores	16
Test Length	16
Test Items	8
Simulation	7
Test Reliability	7
Comparative Analysis	5
Item Response Theory	5
Estimation (Mathematics)	4
Computation	3
Computer Assisted Testing	3
Models	3
Test Format	3
Goodness of Fit	2
Language Proficiency	2
Language Tests	2
Probability	2
Reliability	2
Sample Size	2
Scoring	2
Scoring Formulas	2
Statistical Analysis	2
Test Bias	2
Test Content	2
Test Results	2
More ▼

Source

Applied Psychological…	2
International Journal of…	2
Journal of Educational…	2
ACT Education Corp.	1
Applied Measurement in…	1
Assessment & Evaluation in…	1
ETS Research Report Series	1
Education and Information…	1
Grantee Submission	1

Publication Type

Journal Articles	11
Reports - Research	10
Reports - Evaluative	6
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	1
High Schools	1
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Iran

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Armed Forces Qualification…	1
California Psychological…	1
Test of English as a Foreign…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Initial Evidence Supporting Interpretations of Scores from the Enhanced ACT Test. ACT Research. Research Report. R2425

Download full text

Jeff Allen; Ty Cruce – ACT Education Corp., 2025

This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…

Descriptors: College Entrance Examinations, Testing, Change, Scores

Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data

Peer reviewed
PDF on ERIC

Download full text

Direct link

Xue Zhang; Chun Wang – Grantee Submission, 2022

Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit…

Descriptors: Goodness of Fit, Item Response Theory, Scores, Test Length

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

Item Parameter Drift in a Time-Varying Predictor

Peer reviewed

Direct link

Lee, HyeSun – Applied Measurement in Education, 2018

The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…

Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores

Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

Peer reviewed

Direct link

Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

Descriptors: Test Bias, Test Reliability, Performance, Scores

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

A Comparison of Item Fit Statistics for Mixed IRT Models

Peer reviewed

Direct link

Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010

In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…

Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation

Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

Peer reviewed

Direct link

Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008

This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…

Descriptors: Test Length, Test Content, Simulation, Computation

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Estimating the Consistency and Accuracy of Classifications Based on Test Scores.

Download full text

Livingston, Samuel A.; Lewis, Charles – 1993

This paper presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including the formation of a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate its effective test length in terms of…

Descriptors: Classification, Error of Measurement, Estimation (Mathematics), Reliability

Estimation of the Conditional Standard Error of Measurement for Stratified Tests.

Download full text

Livingston, Samuel A. – 1981

The standard error of measurement (SEM) is a measure of the inconsistency in the scores of a particular group of test-takers. It is largest for test-takers with scores ranging in the 50 percent correct bracket; with nearly perfect scores, it is smaller. On tests used to make pass/fail decisions, the test-takers' scores tend to cluster in the range…

Descriptors: Error of Measurement, Estimation (Mathematics), Mathematical Formulas, Pass Fail Grading

Stepping Up Test Score Conditional Variances.

Peer reviewed

Woodruff, David – Journal of Educational Measurement, 1991

Improvements are made on previous estimates for the conditional standard error of measurement in prediction, the conditional standard error of estimation (CSEE), and the conditional standard error of prediction (CSEP). Better estimates of how test length affects CSEE and CSEP are derived. (SLD)

Descriptors: Equations (Mathematics), Error of Measurement, Estimation (Mathematics), Mathematical Models

Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking

Peer reviewed

Direct link

Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004

The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…

Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items

Relationship Among the Number of Sub-Tests; Skewness, Kurtosis, and Size of Population; And Magnitude of Errors of Estimate in Multiple Matrix Sampling. (Revised Version).

PDF pending restoration

Misanchuk, Earl R. – 1978

Multiple matrix sampling of three subscales of the California Psychological Inventory was used to investigate the effects of four variables on error estimates of the mean (EEM) and variance (EEV). The four variables were examinee population size (600, 450, 300, 150, 100, and 75); number of subtests, (2, 3, 4, 5, 6, and 7), hence the number of…

Descriptors: Adults, Analysis of Variance, Error of Measurement, Item Sampling

Previous Page | Next Page »

Pages: 1 | 2

Livingston, Samuel A.	2
Burton, Richard F.	1
Chon, Kyong Hee	1
Chun Wang	1
Cui, Zhongmin	1
Dunbar, Stephen B.	1
Emons, Wilco H. M.	1
Gelbal, Selahattin	1
Henning, Grant	1
Jeff Allen	1
Kolen, Michael J.	1
Kruyen, Peter M.	1
Lee, HyeSun	1
Lee, Won-Chan	1
Lee, Yi-Hsuan	1
Lewis, Charles	1
Misanchuk, Earl R.	1
Ozdemir, Burhanettin	1
Patsula, Liane	1
Rizavi, Saba	1
Rotou, Ourania	1
Sijtsma, Klaas	1
Steffen, Manfred	1
Ty Cruce	1
Woodruff, David	1
More ▼