ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	11

Descriptor

Error of Measurement	14
Probability	14
Test Items	14
Item Response Theory	6
Simulation	5
Test Length	4
Equated Scores	3
Reliability	3
Sample Size	3
Scores	3
Statistical Analysis	3
Achievement Tests	2
Comparative Analysis	2
Cutting Scores	2
Generalizability Theory	2
Mathematical Models	2
Psychometrics	2
Standard Setting (Scoring)	2
Test Reliability	2
Ability	1
Academic Achievement	1
Accuracy	1
Adaptive Testing	1
Bayesian Statistics	1
Bilingual Education	1
More ▼

Source

Applied Measurement in…	3
Journal of Educational…	3
Educational Researcher	1
International Journal of…	1
International Journal of…	1
Measurement:…	1
Practical Assessment,…	1
Psychometrika	1

Publication Type

Journal Articles	12
Reports - Research	10
Reports - Evaluative	4
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 14 results Save | Export

How to Obtain the Most Error-Free Estimate of Reliability? Eight Sources of Deflation in the Estimates of Reliability to Avoid

Peer reviewed
PDF on ERIC

Download full text

Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022

The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…

Descriptors: Test Reliability, Scores, Test Items, Correlation

Interval Estimation of Item Response Probabilities along Studied Latent Dimensions

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A.; Pusic, Martin – Measurement: Interdisciplinary Research and Perspectives, 2021

An interval estimation procedure is discussed that can be used to evaluate the probability of a particular response for a binary or binary scored item at a pre-specified point along an underlying latent continuum. The item is assumed to: (a) be part of a unidimensional multi-component measuring instrument that may contain also polytomous items,…

Descriptors: Item Response Theory, Computation, Probability, Test Items

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Equality of Admission Tests Using Kernel Equating under the Non-Equivalent Groups with Covariates Design

Peer reviewed
PDF on ERIC

Download full text

Altintas, Ozge; Wallin, Gabriel – International Journal of Assessment Tools in Education, 2021

Educational assessment tests are designed to measure the same psychological constructs over extended periods. This feature is important considering that test results are often used for admittance to university programs. To ensure fair assessments, especially for those whose results weigh heavily in selection decisions, it is necessary to collect…

Descriptors: College Admission, College Entrance Examinations, Test Bias, Equated Scores

Sensitivity of the RMSD for Detecting Item-Level Misfit in Low-Performing Countries

Peer reviewed

Direct link

Tijmstra, Jesper; Bolsinova, Maria; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2020

Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered…

Descriptors: Test Items, Goodness of Fit, Probability, Accuracy

Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

Peer reviewed

Direct link

Andersson, Björn – Journal of Educational Measurement, 2016

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…

Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Testing Manifest Monotonicity Using Order-Constrained Statistical Inference

Peer reviewed

Direct link

Tijmstra, Jesper; Hessen, David J.; van der Heijden, Peter G. M.; Sijtsma, Klaas – Psychometrika, 2013

Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores,…

Descriptors: Item Response Theory, Statistical Inference, Probability, Psychometrics

Standard Error of Linear Observed-Score Equating for the NEAT Design with Nonnormally Distributed Data

Peer reviewed

Direct link

Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…

Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Who Is Given Tests in What Language by Whom, When, and Where? The Need for Probabilistic Views of Language in the Testing of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Educational Researcher, 2008

The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment…

Descriptors: Generalizability Theory, Testing, Second Language Learning, Error of Measurement

An Alternative Interpretation of Three Stability Models. Measurement and Methodology, Work Unit 2: Technical Adequacy of Tests.

Wilcox, Rand R. – 1978

Two fundamental problems in mental test theory are to estimate true score and to estimate the amount of error when testing an examinee. In this report, three probability models which characterize a single test item in terms of a population of examinees are described. How these models may be modified to characterize a single examinee in terms of an…

Descriptors: Achievement Tests, Comparative Analysis, Error of Measurement, Mathematical Models

Assessing Inconsistencies in Standard Setting with the Angoff or Nedelsky Technique.

Download full text

van der Linden, Wim J. – 1982

A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The…

Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Interrater Reliability

Altering the Level of Difficulty in Computer Adaptive Testing.

Peer reviewed

Bergstrom, Betty A.; And Others – Applied Measurement in Education, 1992

Effects of altering test difficulty on examinee ability measures and test length in a computer adaptive test were studied for 225 medical technology students in 3 test difficulty conditions. Results suggest that, with an item pool of sufficient depth and breadth, acceptable targeting to test difficulty is possible. (SLD)

Descriptors: Ability, Adaptive Testing, Change, College Students

Sijtsma, Klaas	2
Tijmstra, Jesper	2
Altintas, Ozge	1
Andersson, Björn	1
Bergstrom, Betty A.	1
Bolsinova, Maria	1
Emons, Wilco H. M.	1
Hessen, David J.	1
Kannan, Priya	1
Katz, Irvin R.	1
Kim, Stella Yun	1
Kruyen, Peter M.	1
Lee, Won-Chan	1
Liaw, Yuan-Ling	1
Marcoulides, George A.	1
Metsämuuronen, Jari	1
Pusic, Martin	1
Raykov, Tenko	1
Rutkowski, David	1
Rutkowski, Leslie	1
Sgammato, Adrienne	1
Solano-Flores, Guillermo	1
Tannenbaum, Richard J.	1
Wallin, Gabriel	1
More ▼