Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 11 |
| Since 2007 (last 20 years) | 27 |
Descriptor
| Accuracy | 27 |
| Evaluation Methods | 27 |
| Statistical Analysis | 27 |
| Comparative Analysis | 7 |
| Scores | 7 |
| Models | 6 |
| Statistical Bias | 6 |
| Student Evaluation | 6 |
| Classification | 5 |
| Error of Measurement | 5 |
| Sample Size | 5 |
| More ▼ | |
Source
Author
Publication Type
| Reports - Research | 20 |
| Journal Articles | 19 |
| Dissertations/Theses -… | 3 |
| Reports - Evaluative | 3 |
| Tests/Questionnaires | 2 |
| Information Analyses | 1 |
| Reports - Descriptive | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
| Higher Education | 4 |
| Elementary Secondary Education | 2 |
| High Schools | 2 |
| Postsecondary Education | 2 |
| Secondary Education | 2 |
| Adult Education | 1 |
| Grade 7 | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
| International English… | 1 |
What Works Clearinghouse Rating
Lingbo Tong; Wen Qu; Zhiyong Zhang – Grantee Submission, 2025
Factor analysis is widely utilized to identify latent factors underlying the observed variables. This paper presents a comprehensive comparative study of two widely used methods for determining the optimal number of factors in factor analysis, the K1 rule, and parallel analysis, along with a more recently developed method, the bass-ackward method.…
Descriptors: Factor Analysis, Monte Carlo Methods, Statistical Analysis, Sample Size
Jacob M. Schauer; Kaitlyn G. Fitzgerald; Sarah Peko-Spicer; Mena C. R. Whalen; Rrita Zejnullahi; Larry V. Hedges – Grantee Submission, 2021
Several programs of research have sought to assess the replicability of scientific findings in different fields, including economics and psychology. These programs attempt to replicate several findings and use the results to say something about large-scale patterns of replicability in a field. However, little work has been done to understand the…
Descriptors: Statistical Analysis, Research Methodology, Evaluation Methods, Replication (Evaluation)
Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020
The residual gain score has been of historical interest, and its percentile rank has been of interest more recently given its close correspondence to the popular Student Growth Percentile. However, these estimators suffer from low accuracy and systematic bias (bias conditional on prior latent achievement). This article explores three…
Descriptors: Accuracy, Student Evaluation, Measurement Techniques, Evaluation Methods
Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2017
Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what…
Descriptors: Interrater Reliability, Evaluation Methods, Statistical Bias, Accuracy
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Rajagopal, Prabha; Ravana, Sri Devi – Information Research: An International Electronic Journal, 2017
Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…
Descriptors: Information Retrieval, Documentation, Scores, Information Systems
Ades, A. E.; Lu, Guobing; Dias, Sofia; Mayo-Wilson, Evan; Kounali, Daphne – Research Synthesis Methods, 2015
Objective: Trials often may report several similar outcomes measured on different test instruments. We explored a method for synthesising treatment effect information both within and between trials and for reporting treatment effects on a common scale as an alternative to standardisation Study design: We applied a procedure that simultaneously…
Descriptors: Research Methodology, Evaluation Methods, Metabolism, Accuracy
Suero, Manuel; Privado, Jesús; Botella, Juan – Psicologica: International Journal of Methodology and Experimental Psychology, 2017
A simulation study is presented to evaluate and compare three methods to estimate the variance of the estimates of the parameters d and "C" of the signal detection theory (SDT). Several methods have been proposed to calculate the variance of their estimators, "d'" and "c." Those methods have been mostly assessed by…
Descriptors: Evaluation Methods, Theories, Simulation, Statistical Analysis
Szulewski, Adam; Kelton, Danielle; Howes, Daniel – Frontline Learning Research, 2017
Background: Pupillometry has been studied as a physiological marker for quantifying cognitive load since the early 1960s. It has been established that small changes in pupillary size can provide an index of the cognitive load of an individual as he/she performs a mental task. The utility of pupillometry as a measure of expertise is less well…
Descriptors: Expertise, Medicine, Eye Movements, Diagnostic Tests
Smolkowski, Keith; Cummings, Kelli D. – Assessment for Effective Intervention, 2015
Diagnostic tools can help schools more consistently and fairly match instructional resources to the needs of their students. To ensure the best educational outcome for each child, diagnostic decision-making systems seek to balance time, clarity, and accuracy. However, recent research notes that many educational decisions tend to be made using…
Descriptors: At Risk Students, Educational Diagnosis, Decision Making, Statistical Analysis
Peer versus Teacher Assessment: Implications for CAF Triad Language Ability and Critical Reflections
Ghahari, Shima; Farokhnia, Farzaneh – International Journal of School & Educational Psychology, 2018
Literature on the learning benefits and interpersonal mechanisms of peer assessment (PA) and teacher assessment (TA) has been inconsistent. As part of a large-scale study, the research reported here has addressed the effect of formative PA on language grammar uptake and complexity, accuracy, and fluency triad scale levels, in comparison both to TA…
Descriptors: Reflection, Formative Evaluation, Peer Evaluation, Grammar
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
Martínez, José Felipe; Schweig, Jonathan; Goldschmidt, Pete – Educational Evaluation and Policy Analysis, 2016
A key question facing teacher evaluation systems is how to combine multiple measures of complex constructs into composite indicators of performance. We use data from the Measures of Effective Teaching (MET) study to investigate the measurement properties of composite indicators obtained under various conjunctive, disjunctive (or complementary),…
Descriptors: Teacher Evaluation, Outcome Measures, Evaluation Methods, Educational Policy
Ostrow, Korinn; Donnelly, Chistopher; Heffernan, Neil – International Educational Data Mining Society, 2015
As adaptive tutoring systems grow increasingly popular for the completion of classwork and homework, it is crucial to assess the manner in which students are scored within these platforms. The majority of systems, including ASSISTments, return the binary correctness of a student's first attempt at solving each problem. Yet for many teachers,…
Descriptors: Intelligent Tutoring Systems, Scoring, Testing, Credits
Solomon, Benjamin G.; Forsberg, Ole J. – School Psychology Quarterly, 2017
Bayesian techniques have become increasingly present in the social sciences, fueled by advances in computer speed and the development of user-friendly software. In this paper, we forward the use of Bayesian Asymmetric Regression (BAR) to monitor intervention responsiveness when using Curriculum-Based Measurement (CBM) to assess oral reading…
Descriptors: Bayesian Statistics, Regression (Statistics), Least Squares Statistics, Evaluation Methods
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
