Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 28 |
Descriptor
Source
Author
Publication Type
Education Level
Higher Education | 9 |
Postsecondary Education | 6 |
Secondary Education | 5 |
Elementary Education | 4 |
High Schools | 3 |
Grade 3 | 2 |
Grade 4 | 2 |
Grade 8 | 2 |
Intermediate Grades | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
More ▼ |
Audience
Researchers | 7 |
Practitioners | 2 |
Teachers | 2 |
Students | 1 |
Location
New York | 2 |
Alabama | 1 |
Australia | 1 |
California | 1 |
Canada | 1 |
Colorado | 1 |
Florida | 1 |
Illinois | 1 |
Indiana | 1 |
Jordan | 1 |
New York (New York) | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Becker, Kirk A.; Kao, Shu-chuan – Journal of Applied Testing Technology, 2022
Natural Language Processing (NLP) offers methods for understanding and quantifying the similarity between written documents. Within the testing industry these methods have been used for automatic item generation, automated scoring of text and speech, modeling item characteristics, automatic question answering, machine translation, and automated…
Descriptors: Item Banks, Natural Language Processing, Computer Assisted Testing, Scoring
Gayle Geschwind; Michael Vignal; Marcos D. Caballero; H.? J. Lewandowski – Physical Review Physics Education Research, 2024
The Survey of Physics Reasoning on Uncertainty Concepts in Experiments (SPRUCE) was designed to measure students' proficiency with measurement uncertainty concepts and practices across ten different assessment objectives to help facilitate the improvement of laboratory instruction focused on this important topic. To ensure the reliability and…
Descriptors: Measurement, Ambiguity (Context), Scientific Concepts, Physics
Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023
The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…
Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability
Sophie Litschwartz – Society for Research on Educational Effectiveness, 2021
Background/Context: Pass/fail standardized exams frequently selectively rescore failing exams and retest failing examinees. This practice distorts the test score distribution and can confuse those who do analysis on these distributions. In 2011, the Wall Street Journal showed large discontinuities in the New York City Regent test score…
Descriptors: Standardized Tests, Pass Fail Grading, Scoring Rubrics, Scoring Formulas
Rainey, Katherine D.; Vignal, Michael; Wilcox, Bethany R. – Physical Review Physics Education Research, 2022
Currently there are no assessment instruments available for upper-division thermal physics, though several introductory assessments are currently available. Notably missing from these introductory assessment are items targeting statistical mechanics. This leaves a gap in the content that can be assessed by upper-division thermal physics faculty.…
Descriptors: Physics, Science Instruction, Thermodynamics, College Science
Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019
This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…
Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring
Ayanwale, Musa Adekunle; Adeleke, Joshua Oluwatoyin; Mamadelo, Titilayo Iyabode – Journal of the International Society for Teacher Education, 2019
A scoring framework that does not reflect true performance of an examinee would ultimately result in an abnormal score. This study assessed invariance person estimates of 2017 Nigerian National Examinations Council Basic Education Certificate Examination Mathematics Multiple Choice using classical test theory (CTT) and item response theory (IRT)…
Descriptors: Test Theory, Item Response Theory, Scoring, National Competency Tests
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing
Allalouf, Avi – International Journal of Testing, 2014
The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…
Descriptors: Quality Control, Scoring, Test Theory, Scores
Ramsay, James O.; Wiberg, Marie – Journal of Educational and Behavioral Statistics, 2017
This article promotes the use of modern test theory in testing situations where sum scores for binary responses are now used. It directly compares the efficiencies and biases of classical and modern test analyses and finds an improvement in the root mean squared error of ability estimates of about 5% for two designed multiple-choice tests and…
Descriptors: Scoring, Test Theory, Computation, Maximum Likelihood Statistics
Powell, J. C. – International Association for Development of the Information Society, 2013
This reflection paper challenges current test scoring practices on the grounds that most wrong-answer selections are thoughtful not random, presenting research supporting this proposition. An alternative test scoring system is presented, described and its outcomes discussed. This new scoring system increases the number of variables considered,…
Descriptors: Test Theory, Test Interpretation, Scoring, Multiple Choice Tests
Schoen, Robert C.; Liu, Sicong; Yang, Xiaotong; Paek, Insu – Grantee Submission, 2017
The Early Fractions Test is a paper-pencil test designed to measure mathematics achievement of third- and fourth-grade students in the domain of fractions. The purpose, or intended use, of the Early Fractions Test is to serve as a student pretest covariate and a test of baseline equivalence in the larger study. In this report, we discuss our…
Descriptors: Mathematics Achievement, Fractions, Mathematics Tests, Grade 3
France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015
Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory
Mitchell, Alison M.; Truckenmiller, Adrea; Petscher, Yaacov – Communique, 2015
As part of the Race to the Top initiative, the United States Department of Education made nearly 1 billion dollars available in State Educational Technology grants with the goal of ramping up school technology. One result of this effort is that states, districts, and schools across the country are using computerized assessments to measure their…
Descriptors: Computer Assisted Testing, Educational Technology, Testing, Efficiency