ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	10

Descriptor

Error of Measurement	20
Reliability	20
Test Items	20
Scores	8
Item Analysis	6
Statistical Analysis	5
Generalizability Theory	4
Item Response Theory	4
Comparative Analysis	3
Correlation	3
Equated Scores	3
Goodness of Fit	3
Mathematics Tests	3
Probability	3
Sampling	3
Statistical Bias	3
Test Construction	3
Test Length	3
Classification	2
College Students	2
Computer Assisted Testing	2
Cutting Scores	2
Decision Making	2
Definitions	2
Educational Assessment	2
More ▼

Source

Educational and Psychological…	3
Applied Measurement in…	2
ETS Research Report Series	2
Assessment & Evaluation in…	1
Educational Researcher	1
International Journal of…	1
Practical Assessment,…	1
Psychological Methods	1
Research Papers in Education	1

Publication Type

Journal Articles	13
Reports - Research	11
Reports - Evaluative	7
Speeches/Meeting Papers	5
Reports - Descriptive	2
Tests/Questionnaires	1

Education Level

Elementary Education	2
Grade 5	2
Higher Education	2
Postsecondary Education	2
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 8	1
Junior High Schools	1
Middle Schools	1

Audience

Location

Maryland	1
Portugal	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

The Impact of Measurement Model Misspecification on Coefficient Omega Estimates of Composite Reliability

Peer reviewed

Direct link

Stephanie M. Bell; R. Philip Chalmers; David B. Flora – Educational and Psychological Measurement, 2024

Coefficient omega indices are model-based composite reliability estimates that have become increasingly popular. A coefficient omega index estimates how reliably an observed composite score measures a target construct as represented by a factor in a factor-analysis model; as such, the accuracy of omega estimates is likely to depend on correct…

Descriptors: Influences, Models, Measurement Techniques, Reliability

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Quality Control Charts in Large-Scale Assessment Programs

Peer reviewed

Direct link

Schafer, William D.; Coverdale, Bradley J.; Luxenberg, Harlan; Jin, Ying – Practical Assessment, Research & Evaluation, 2011

There are relatively few examples of quantitative approaches to quality control in educational assessment and accountability contexts. Among the several techniques that are used in other fields, Shewart charts have been found in a few instances to be applicable in educational settings. This paper describes Shewart charts and gives examples of how…

Descriptors: Charts, Quality Control, Educational Assessment, Statistical Analysis

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Who Is Given Tests in What Language by Whom, When, and Where? The Need for Probabilistic Views of Language in the Testing of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Educational Researcher, 2008

The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment…

Descriptors: Generalizability Theory, Testing, Second Language Learning, Error of Measurement

Conditional Standard Errors of Measurement for Tests Composed of Testlets.

Download full text

Lee, Guemin – 1999

Previous studies have indicated that the reliability of test scores composed of testlets is overestimated by conventional item-based reliability estimation methods (S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer, 1995; H. Wainer and D. Thissen, 1996; G. Lee and D. Frisbie). In light of these studies, it seems reasonable to ask whether the…

Descriptors: Definitions, Error of Measurement, Estimation (Mathematics), Reliability

Estimating Conditional Standard Errors of Measurement for Tests Composed of Testlets.

Download full text

Lee, Guemin – 1998

The primary purpose of this study was to investigate the appropriateness and implication of incorporating a testlet definition into the estimation of the conditional standard error of measurement (SEM) for tests composed of testlets. The five conditional SEM estimation methods used in this study were classified into two categories: item-based and…

Descriptors: Definitions, Error of Measurement, Estimation (Mathematics), Reliability

Identifying and Managing Local Item Dependence in Context-Dependent Item Sets.

Download full text

Allen, Sally; Sudweeks, Richard R. – 2001

A study was conducted to identify local item dependence (LID) in the context-dependent item sets used in an examination prepared for use in an introductory university physics class and to assess the effects of LID on estimates of the reliability and standard error of measurement. Test scores were obtained for 487 students in the physics class. The…

Descriptors: College Students, Error of Measurement, Higher Education, Physics

Reliability and the Nonequivalent Groups with Anchor Test Design. Research Report. ETS RR-07-16

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…

Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

On the Consistency of Individual Classification Using Short Scales

Peer reviewed

Direct link

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…

Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

The Information Function for the One-Parameter Logistic Model: Is it Reliability?

Peer reviewed

Direct link

Doran, Harold C. – Educational and Psychological Measurement, 2005

The information function is an important statistic in item response theory (IRT) applications. Although the information function is often described as the IRT version of reliability, it differs from the classical notion of reliability from a critical perspective: replication. This article first explores the information function for the…

Descriptors: Item Response Theory, Error of Measurement, Evaluation Methods, Reliability

E-Assessment within the Bologna Paradigm: Evidence from Portugal

Peer reviewed

Direct link

Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010

The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…

Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment

When Can Subscores Have Value? Research Report. ETS RR-05-08

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2005

In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean-squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…

Descriptors: Scores, Test Items, Error of Measurement, Computation

Previous Page | Next Page »

Pages: 1 | 2

Emons, Wilco H. M.	2
Lee, Guemin	2
Sijtsma, Klaas	2
Solano-Flores, Guillermo	2
Allen, Sally	1
Anwyll, Steve	1
Coverdale, Bradley J.	1
David B. Flora	1
Doran, Harold C.	1
Ferrao, Maria	1
Glanville, Matthew	1
Graham, James M.	1
Gustafsson, Jan-Eric	1
Haberman, Shelby J.	1
He, Qingping	1
Jensen, Harald E.	1
Jin, Ying	1
Kachchaf, Rachel	1
Kannan, Priya	1
Katz, Irvin R.	1
Kim, Sooyeon	1
Kruyen, Peter M.	1
Luxenberg, Harlan	1
Meijer, Rob R.	1
More ▼