Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 11 |
Descriptor
Scoring | 19 |
Test Reliability | 19 |
Test Theory | 19 |
Test Validity | 10 |
Item Response Theory | 7 |
Computer Assisted Testing | 6 |
Comparative Analysis | 5 |
Measurement Techniques | 5 |
Test Construction | 5 |
Test Interpretation | 5 |
Testing | 5 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 3 |
Postsecondary Education | 3 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Practitioners | 1 |
Researchers | 1 |
Students | 1 |
Teachers | 1 |
Location
Jordan | 1 |
New York | 1 |
New York (New York) | 1 |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
ACT Assessment | 1 |
Graduate Record Examinations | 1 |
Preliminary Scholastic… | 1 |
SAT (College Admission Test) | 1 |
Test of English as a Foreign… | 1 |
Thematic Apperception Test | 1 |
What Works Clearinghouse Rating
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Gayle Geschwind; Michael Vignal; Marcos D. Caballero; H.? J. Lewandowski – Physical Review Physics Education Research, 2024
The Survey of Physics Reasoning on Uncertainty Concepts in Experiments (SPRUCE) was designed to measure students' proficiency with measurement uncertainty concepts and practices across ten different assessment objectives to help facilitate the improvement of laboratory instruction focused on this important topic. To ensure the reliability and…
Descriptors: Measurement, Ambiguity (Context), Scientific Concepts, Physics
Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023
The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…
Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability
Sophie Litschwartz – Society for Research on Educational Effectiveness, 2021
Background/Context: Pass/fail standardized exams frequently selectively rescore failing exams and retest failing examinees. This practice distorts the test score distribution and can confuse those who do analysis on these distributions. In 2011, the Wall Street Journal showed large discontinuities in the New York City Regent test score…
Descriptors: Standardized Tests, Pass Fail Grading, Scoring Rubrics, Scoring Formulas
Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019
This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…
Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing
Mitchell, Alison M.; Truckenmiller, Adrea; Petscher, Yaacov – Communique, 2015
As part of the Race to the Top initiative, the United States Department of Education made nearly 1 billion dollars available in State Educational Technology grants with the goal of ramping up school technology. One result of this effort is that states, districts, and schools across the country are using computerized assessments to measure their…
Descriptors: Computer Assisted Testing, Educational Technology, Testing, Efficiency
Badjadi, Nour El Imane – Online Submission, 2013
The current paper on writing assessment surveys the literature on the reliability and validity of essay tests. The paper aims to examine the two concepts in relationship with essay testing as well as to provide a snapshot of the current understandings of the reliability and validity of essay tests as drawn in recent research studies. Bearing in…
Descriptors: Essay Tests, Writing Evaluation, Test Validity, Test Reliability
Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie – Measurement and Evaluation in Counseling and Development, 2013
Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…
Descriptors: Item Response Theory, Test Theory, Measures (Individuals), Racial Identification
Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012
Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…
Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability
Haberman, Shelby J. – Educational Testing Service, 2011
Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…
Descriptors: Writing Tests, Scoring, Essays, Language Tests

Jaradat, Derar; Sawaged, Sari – Journal of Educational Measurement, 1986
The impact of the Subset Selection Technique (SST) for multiple-choice items on certain properties of a test was compared with that of two other methods, the Number Right and the Correction for Guessing Formula. Results indicated that SST outperformed the other two, producing higher reliability and validity without favoring high risk takers.…
Descriptors: Foreign Countries, Grade 9, Guessing (Tests), Measurement Techniques

Feldt, Leonard S.; Spray, Judith A. – Research Quarterly for Exercise and Sport, 1983
The reliabilities of two types of measurement plans were compared across six hypothetical distributions of true scores or abilities. The measurement plans were: (1) fixed-length, where the number of trials for all examinees is set in advance; and (2) trials-to-criterion, where examinees must keep trying until they complete a given number of trials…
Descriptors: Criterion Referenced Tests, Evaluation Methods, Higher Education, Measurement Techniques
Crocker, Linda; Algina, James – 1986
This text was written to help the reader acquire a base of knowledge about classical psychometrics and to integrate new ideas into that framework of knowledge. The material is organized into five units: (1) introduction to measurement theory; (2) reliability; (3) validity; (4) item analysis in test development; and (5) test scoring and…
Descriptors: Item Analysis, Measurement Techniques, Psychometrics, Scoring
Smith, Donald M. – 1976
The Kuder Richardson-20 Formula is shown to be a special case, where each examinee is given sufficient time to answer each item, of a more general formula where each examinee may not be allowed the necessary time. The formula is extended to allow two scores, knowledge and speed, to be extracted from each examinees test score. Using a sample of 82…
Descriptors: Career Development, Comparative Analysis, Grade Point Average, Predictive Measurement
Previous Page | Next Page »
Pages: 1 | 2