Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 11 |
Descriptor
Classification | 16 |
Error of Measurement | 16 |
Reliability | 16 |
Item Response Theory | 6 |
Scores | 6 |
Accuracy | 3 |
Probability | 3 |
Psychometrics | 3 |
Test Length | 3 |
True Scores | 3 |
Academic Achievement | 2 |
More ▼ |
Source
Author
Harris, Deborah J. | 2 |
Livingston, Samuel A. | 2 |
Sijtsma, Klaas | 2 |
Wang, Tianyou | 2 |
Anwyll, Steve | 1 |
Betebenner, Damian W. | 1 |
Bramley, Tom | 1 |
Choi, Ikkyu | 1 |
Choi, Jiwon | 1 |
De Cock, P. | 1 |
Deane, Paul | 1 |
More ▼ |
Publication Type
Journal Articles | 12 |
Reports - Evaluative | 9 |
Reports - Research | 6 |
Reports - Descriptive | 2 |
Speeches/Meeting Papers | 2 |
Guides - General | 1 |
Numerical/Quantitative Data | 1 |
Education Level
Adult Education | 1 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
High School Equivalency… | 1 |
High Schools | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Work Keys (ACT) | 2 |
ACT Assessment | 1 |
What Works Clearinghouse Rating
Najera, Hector – Measurement: Interdisciplinary Research and Perspectives, 2023
Measurement error affects the quality of population orderings of an index and, hence, increases the misclassification of the poor and the non-poor groups and affects statistical inferences from binary regression models. Hence, the conclusions about the extent, profile, and distribution of poverty are likely to be misleading. However, the size and…
Descriptors: Poverty, Error of Measurement, Classification, Statistical Inference
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Choi, Ikkyu; Hao, Jiangang; Deane, Paul; Zhang, Mo – ETS Research Report Series, 2021
"Biometrics" are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable…
Descriptors: Accuracy, High Stakes Tests, Writing Tests, Benchmarking
Nugent, William Robert; Moore, Matthew; Story, Erin – Educational and Psychological Measurement, 2015
The standardized mean difference (SMD) is perhaps the most important meta-analytic effect size. It is typically used to represent the difference between treatment and control population means in treatment efficacy research. It is also used to represent differences between populations with different characteristics, such as persons who are…
Descriptors: Error of Measurement, Error Correction, Predictor Variables, Monte Carlo Methods
Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J. – ACT, Inc., 2016
ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…
Descriptors: Scores, Classification, College Entrance Examinations, Error of Measurement
He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014
Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…
Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis
Sijtsma, Klaas – International Journal of Testing, 2009
This article reviews three topics from test theory that continue to raise discussion and controversy and capture test theorists' and constructors' interest. The first topic concerns the discussion of the methodology of investigating and establishing construct validity; the second topic concerns reliability and its misuse, alternative definitions…
Descriptors: Construct Validity, Reliability, Classification, Test Theory
Bramley, Tom – Educational Research, 2010
Background: A recent article published in "Educational Research" on the reliability of results in National Curriculum testing in England (Newton, "The reliability of results from national curriculum testing in England," "Educational Research" 51, no. 2: 181-212, 2009) suggested that: (1) classification accuracy can be…
Descriptors: National Curriculum, Educational Research, Testing, Measurement
Monbaliu, E.; Ortibus, E.; Roelens, F.; Desloovere, K.; Deklerck, J.; Prinzie, P.; De Cock, P.; Feys, H. – Developmental Medicine & Child Neurology, 2010
Aim: This study investigated the reliability and validity of the Barry-Albright Dystonia Scale (BADS), the Burke-Fahn-Marsden Movement Scale (BFMMS), and the Unified Dystonia Rating Scale (UDRS) in patients with bilateral dystonic cerebral palsy (CP). Method: Three raters independently scored videotapes of 10 patients (five males, five females;…
Descriptors: Content Validity, Cerebral Palsy, Validity, Interrater Reliability
Betebenner, Damian W.; Shang, Yi; Xiang, Yun; Zhao, Yan; Yue, Xiaohui – Journal of Educational Measurement, 2008
No Child Left Behind (NCLB) performance mandates, embedded within state accountability systems, focus school AYP (adequate yearly progress) compliance squarely on the percentage of students at or above proficient. The singular importance of this quantity for decision-making purposes has initiated extensive research into percent proficient as a…
Descriptors: Classification, Error of Measurement, Statistics, Reliability
Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007
Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…
Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

Milanowski, Anthony T. – Journal of Personnel Evaluation in Education, 1999
Describes the temporal consistency of school classification observed in the Kentucky, and secondarily in the Charlotte-Mecklinberg (North Carolina), school-based performance award programs. Data from the Kentucky Department of Education show the extent to which temporal inconsistency could be due to measurement error. (SLD)
Descriptors: Academic Achievement, Achievement Gains, Classification, Error of Measurement
Livingston, Samuel A.; Lewis, Charles – 1993
This paper presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including the formation of a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate its effective test length in terms of…
Descriptors: Classification, Error of Measurement, Estimation (Mathematics), Reliability

Wang, Tianyou; Kolen, Michael J.; Harris, Deborah J. – Journal of Educational Measurement, 2000
Describes procedures for calculating conditional standard error of measurement (CSEM) and reliability of scale scores and classification of consistency of performance levels. Applied these procedures to data from the American College Testing Program's Work Keys Writing Assessment with sample sizes of 7,097, 1,035, and 1,793. Results show that the…
Descriptors: Adults, Classification, Error of Measurement, Item Response Theory
Livingston, Samuel A. – 1976
A distinction is made between reliability of measurement and reliability of classification; the "criterion-referenced reliability coefficient" describes the former. Application of this coefficient to the probability distribution of possible scores for a single student yields a meaningful way to describe the reliability of a single score. (Author)
Descriptors: Classification, Criterion Referenced Tests, Error of Measurement, Measurement
Previous Page | Next Page ยป
Pages: 1 | 2