ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	11

Descriptor

Scoring	19
Test Reliability	19
Test Theory	19
Test Validity	10
Item Response Theory	7
Computer Assisted Testing	6
Comparative Analysis	5
Measurement Techniques	5
Test Construction	5
Test Interpretation	5
Testing	5
Psychometrics	4
Statistical Analysis	4
Test Items	4
Equated Scores	3
Evaluation Methods	3
Interrater Reliability	3
Item Analysis	3
Scores	3
Accuracy	2
Career Development	2
Correlation	2
Difficulty Level	2
Error of Measurement	2
Essays	2
More ▼

Source

Communique	1
ETS Research Report Series	1
Educational Measurement:…	1
Educational Testing Service	1
European Journal of Science…	1
International Journal of…	1
Journal of Educational…	1
Journal on Educational…	1
Measurement and Evaluation in…	1
Online Submission	1
Physical Review Physics…	1
Research Quarterly for…	1
Society for Research on…	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	10
Books	3
Reports - Descriptive	3
Reports - Evaluative	2
Collected Works - General	1
Guides - Classroom - Learner	1
Guides - Non-Classroom	1
Information Analyses	1
Reference Materials -…	1
Speeches/Meeting Papers	1
More ▼

Education Level

Higher Education	3
Postsecondary Education	3
High Schools	1
Secondary Education	1

Audience

Practitioners	1
Researchers	1
Students	1
Teachers	1

Location

Jordan	1
New York	1
New York (New York)	1

Laws, Policies, & Programs

Elementary and Secondary…

Assessments and Surveys

ACT Assessment	1
Graduate Record Examinations	1
Preliminary Scholastic…	1
SAT (College Admission Test)	1
Test of English as a Foreign…	1
Thematic Apperception Test	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

Evidence for Validity and Reliability of a Research-Based Assessment Instrument on Measurement Uncertainty

Peer reviewed

Direct link

Gayle Geschwind; Michael Vignal; Marcos D. Caballero; H.? J. Lewandowski – Physical Review Physics Education Research, 2024

The Survey of Physics Reasoning on Uncertainty Concepts in Experiments (SPRUCE) was designed to measure students' proficiency with measurement uncertainty concepts and practices across ten different assessment objectives to help facilitate the improvement of laboratory instruction focused on this important topic. To ensure the reliability and…

Descriptors: Measurement, Ambiguity (Context), Scientific Concepts, Physics

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

A General Method for Adjusting Test Score Distributions to Account for Rescoring and Retesting

Peer reviewed

Direct link

Sophie Litschwartz – Society for Research on Educational Effectiveness, 2021

Background/Context: Pass/fail standardized exams frequently selectively rescore failing exams and retest failing examinees. This practice distorts the test score distribution and can confuse those who do analysis on these distributions. In 2011, the Wall Street Journal showed large discontinuities in the New York City Regent test score…

Descriptors: Standardized Tests, Pass Fail Grading, Scoring Rubrics, Scoring Formulas

A Design for Comparing CTT and IRT in Test Assembly, Scoring and Argumentation: Differences among Reliability, Information and Validation

Peer reviewed

Direct link

Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019

This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…

Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring

Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…

Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing

Computer-Adaptive Assessments: Fundamentals and Considerations

Direct link

Mitchell, Alison M.; Truckenmiller, Adrea; Petscher, Yaacov – Communique, 2015

As part of the Race to the Top initiative, the United States Department of Education made nearly 1 billion dollars available in State Educational Technology grants with the goal of ramping up school technology. One result of this effort is that states, districts, and schools across the country are using computerized assessments to measure their…

Descriptors: Computer Assisted Testing, Educational Technology, Testing, Efficiency

Conceptualizing Essay Tests' Reliability and Validity: From Research to Theory

Download full text

Badjadi, Nour El Imane – Online Submission, 2013

The current paper on writing assessment surveys the literature on the reliability and validity of essay tests. The paper aims to examine the two concepts in relationship with essay testing as well as to provide a snapshot of the current understandings of the reliability and validity of essay tests as drawn in recent research studies. Bearing in…

Descriptors: Essay Tests, Writing Evaluation, Test Validity, Test Reliability

An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

Peer reviewed

Direct link

Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie – Measurement and Evaluation in Counseling and Development, 2013

Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…

Descriptors: Item Response Theory, Test Theory, Measures (Individuals), Racial Identification

The Contestant Perspective on Taking Tests: Emanations from the Statue within

Peer reviewed

Direct link

Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012

Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…

Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability

Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

Download full text

Haberman, Shelby J. – Educational Testing Service, 2011

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

Descriptors: Writing Tests, Scoring, Essays, Language Tests

The Subset Selection Technique for Multiple-Choice Tests: An Empirical Inquiry.

Peer reviewed

Jaradat, Derar; Sawaged, Sari – Journal of Educational Measurement, 1986

The impact of the Subset Selection Technique (SST) for multiple-choice items on certain properties of a test was compared with that of two other methods, the Number Right and the Correction for Guessing Formula. Results indicated that SST outperformed the other two, producing higher reliability and validity without favoring high risk takers.…

Descriptors: Foreign Countries, Grade 9, Guessing (Tests), Measurement Techniques

A Theory-Based Comparison of the Reliabilities of Fixed-Length and Trials-to-Criterion Scoring of Physical Education Skills Tests.

Peer reviewed

Feldt, Leonard S.; Spray, Judith A. – Research Quarterly for Exercise and Sport, 1983

The reliabilities of two types of measurement plans were compared across six hypothetical distributions of true scores or abilities. The measurement plans were: (1) fixed-length, where the number of trials for all examinees is set in advance; and (2) trials-to-criterion, where examinees must keep trying until they complete a given number of trials…

Descriptors: Criterion Referenced Tests, Evaluation Methods, Higher Education, Measurement Techniques

Introduction to Classical and Modern Test Theory.

Crocker, Linda; Algina, James – 1986

This text was written to help the reader acquire a base of knowledge about classical psychometrics and to integrate new ideas into that framework of knowledge. The material is organized into five units: (1) introduction to measurement theory; (2) reliability; (3) validity; (4) item analysis in test development; and (5) test scoring and…

Descriptors: Item Analysis, Measurement Techniques, Psychometrics, Scoring

The KR-20 Reliability Coefficient as a Special Case of a More General Formula.

Download full text

Smith, Donald M. – 1976

The Kuder Richardson-20 Formula is shown to be a special case, where each examinee is given sufficient time to answer each item, of a more general formula where each examinee may not be allowed the necessary time. The formula is extended to allow two scores, knowledge and speed, to be extracted from each examinees test score. Using a sample of 82…

Descriptors: Career Development, Comparative Analysis, Grade Point Average, Predictive Measurement

Previous Page | Next Page »

Pages: 1 | 2

Aksu, Gökhan	1
Algina, James	1
Alqarni, Abdulelah Mohammed	1
Badjadi, Nour El Imane	1
Beaujean, A. Alexander	1
Braithwaite, Nicholas St. J.	1
Chase, Clinton I.	1
Cohen, Allan S., Comp.	1
Costantino, Giuseppe	1
Crocker, Linda	1
Dorans, Neil J.	1
Eser, Mehmet Taha	1
Feldt, Leonard S.	1
Gayle Geschwind	1
H.? J. Lewandowski	1
Haberman, Shelby J.	1
Hedgeland, Holly	1
Jacobs, Lucy Cheser	1
Jaradat, Derar	1
Jordan, Sally E.	1
Kim, Sooyeon	1
Linn, Robert L., Ed.	1
Livingston, Samuel A.	1
Marcos D. Caballero	1
More ▼