Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 28 |
Since 2006 (last 20 years) | 49 |
Descriptor
Scoring | 82 |
Statistical Analysis | 82 |
Test Reliability | 39 |
Reliability | 27 |
Correlation | 22 |
Interrater Reliability | 19 |
Test Validity | 19 |
Comparative Analysis | 17 |
Foreign Countries | 15 |
Test Items | 13 |
Test Construction | 12 |
More ▼ |
Source
Author
Braun, Henry I. | 2 |
Ebuoh, Casmir N. | 2 |
Lembke, Erica S. | 2 |
Livingston, Samuel A. | 2 |
Poch, Apryl L. | 2 |
Prevost, Luanna B. | 2 |
Steedle, Jeffrey T. | 2 |
Algina, James | 1 |
Alkahtani, Saif F. | 1 |
Allalouf, Avi | 1 |
Allen, Abigail A. | 1 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 2 |
Parents | 1 |
Practitioners | 1 |
Students | 1 |
Teachers | 1 |
Location
California | 3 |
China | 2 |
New York | 2 |
Nigeria | 2 |
Turkey | 2 |
Australia | 1 |
Delaware | 1 |
District of Columbia | 1 |
Estonia | 1 |
Florida | 1 |
Israel | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Fromm, Davida; Katta, Saketh; Paccione, Mason; Hecht, Sophia; Greenhouse, Joel; MacWhinney, Brian; Schnur, Tatiana T. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: Analysis of connected speech in the field of adult neurogenic communication disorders is essential for research and clinical purposes, yet time and expertise are often cited as limiting factors. The purpose of this project was to create and evaluate an automated program to score and compute the measures from the Quantitative Production…
Descriptors: Speech, Automation, Statistical Analysis, Adults
Pérez-Ferreirós, Alexandra; Kalén, Anton; Gómez, Miguel-Ángel; Rey, Ezequiel – Research Quarterly for Exercise and Sport, 2019
In basketball, game-related statistics are the most common measure of performance. However, the literature assessing their reliability is scarce. Purpose: Analyze the number of games required to obtain a good relative and absolute reliability of teams' game-related statistics. Method: A total of 884 games from the 2015-2016 to 2017-2018 seasons of…
Descriptors: Team Sports, Statistics, Reliability, Foreign Countries
Tingir, Seyfullah – ProQuest LLC, 2019
Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…
Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Kelleher, Leila K.; Beach, Tyson A. C.; Frost, David M.; Johnson, Andrew M.; Dickey, James P. – Measurement in Physical Education and Exercise Science, 2018
The scoring scheme for the functional movement screen implicitly assumes that the factor structure is consistent, stable, and congruent across different populations. To determine if this is the case, we compared principal components analyses of three samples: a healthy, general population (n = 100), a group of varsity athletes (n = 101), and a…
Descriptors: Factor Structure, Test Reliability, Screening Tests, Motion
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Ebuoh, Casmir N. – World Journal of Education, 2018
Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…
Descriptors: Holistic Approach, Scoring, Essay Tests, Biology
Levine, William H.; Betzner, Michelle; Autry, Kevin S. – Discourse Processes: A multidisciplinary journal, 2016
Recent research has provided evidence that the information provided before a story--a spoiler--may increase the enjoyment of that story, perhaps by increasing the processing fluency experienced during reading. In one experiment, we tested the reliability of these findings by closely replicating existing methods and the generality of these findings…
Descriptors: Literary Genres, Reading Fluency, Reliability, Reading Processes
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal – ETS Research Report Series, 2016
In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…
Descriptors: Scoring, Test Reliability, Statistical Analysis, Psychometrics
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Allen, Abigail A.; Poch, Apryl L.; Lembke, Erica S. – Learning Disability Quarterly, 2018
This manuscript describes two empirical studies of alternative scoring procedures used with curriculum-based measurement in writing (CBM-W). Study 1 explored the technical adequacy of a trait-based rubric in first grade. Study 2 explored the technical adequacy of a trait-based rubric, production-dependent, and production-independent scores in…
Descriptors: Scoring, Alternative Assessment, Curriculum Based Assessment, Emergent Literacy