Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 11 |
Descriptor
Error of Measurement | 20 |
Reliability | 20 |
Test Items | 20 |
Scores | 8 |
Item Analysis | 6 |
Statistical Analysis | 5 |
Generalizability Theory | 4 |
Item Response Theory | 4 |
Comparative Analysis | 3 |
Correlation | 3 |
Equated Scores | 3 |
More ▼ |
Source
Author
Emons, Wilco H. M. | 2 |
Lee, Guemin | 2 |
Sijtsma, Klaas | 2 |
Solano-Flores, Guillermo | 2 |
Allen, Sally | 1 |
Anwyll, Steve | 1 |
Coverdale, Bradley J. | 1 |
David B. Flora | 1 |
Doran, Harold C. | 1 |
Ferrao, Maria | 1 |
Glanville, Matthew | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Research | 11 |
Reports - Evaluative | 7 |
Speeches/Meeting Papers | 5 |
Reports - Descriptive | 2 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 2 |
Grade 5 | 2 |
Higher Education | 2 |
Postsecondary Education | 2 |
Elementary Secondary Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 8 | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Audience
Location
Maryland | 1 |
Portugal | 1 |
United Kingdom (England) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Stephanie M. Bell; R. Philip Chalmers; David B. Flora – Educational and Psychological Measurement, 2024
Coefficient omega indices are model-based composite reliability estimates that have become increasingly popular. A coefficient omega index estimates how reliably an observed composite score measures a target construct as represented by a factor in a factor-analysis model; as such, the accuracy of omega estimates is likely to depend on correct…
Descriptors: Influences, Models, Measurement Techniques, Reliability
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014
Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…
Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis
Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012
Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…
Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement
Schafer, William D.; Coverdale, Bradley J.; Luxenberg, Harlan; Jin, Ying – Practical Assessment, Research & Evaluation, 2011
There are relatively few examples of quantitative approaches to quality control in educational assessment and accountability contexts. Among the several techniques that are used in other fields, Shewart charts have been found in a few instances to be applicable in educational settings. This paper describes Shewart charts and gives examples of how…
Descriptors: Charts, Quality Control, Educational Assessment, Statistical Analysis
Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012
We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…
Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers
Solano-Flores, Guillermo – Educational Researcher, 2008
The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment…
Descriptors: Generalizability Theory, Testing, Second Language Learning, Error of Measurement
Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007
This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…
Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis
Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007
Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…
Descriptors: Psychiatry, Patients, Error of Measurement, Test Length
Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010
The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…
Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment
Lee, Guemin – 1999
Previous studies have indicated that the reliability of test scores composed of testlets is overestimated by conventional item-based reliability estimation methods (S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer, 1995; H. Wainer and D. Thissen, 1996; G. Lee and D. Frisbie). In light of these studies, it seems reasonable to ask whether the…
Descriptors: Definitions, Error of Measurement, Estimation (Mathematics), Reliability
Lee, Guemin – 1998
The primary purpose of this study was to investigate the appropriateness and implication of incorporating a testlet definition into the estimation of the conditional standard error of measurement (SEM) for tests composed of testlets. The five conditional SEM estimation methods used in this study were classified into two categories: item-based and…
Descriptors: Definitions, Error of Measurement, Estimation (Mathematics), Reliability
Allen, Sally; Sudweeks, Richard R. – 2001
A study was conducted to identify local item dependence (LID) in the context-dependent item sets used in an examination prepared for use in an introductory university physics class and to assess the effects of LID on estimates of the reliability and standard error of measurement. Test scores were obtained for 487 students in the physics class. The…
Descriptors: College Students, Error of Measurement, Higher Education, Physics
Doran, Harold C. – Educational and Psychological Measurement, 2005
The information function is an important statistic in item response theory (IRT) applications. Although the information function is often described as the IRT version of reliability, it differs from the classical notion of reliability from a critical perspective: replication. This article first explores the information function for the…
Descriptors: Item Response Theory, Error of Measurement, Evaluation Methods, Reliability
Haberman, Shelby J. – ETS Research Report Series, 2005
In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean-squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…
Descriptors: Scores, Test Items, Error of Measurement, Computation
Previous Page | Next Page ยป
Pages: 1 | 2