Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 5 |
Descriptor
Error of Measurement | 9 |
Interrater Reliability | 9 |
Test Items | 9 |
Cutting Scores | 4 |
Scoring | 4 |
English | 3 |
Goodness of Fit | 3 |
Test Reliability | 3 |
Academic Standards | 2 |
Computation | 2 |
Correlation | 2 |
More ▼ |
Source
New Mexico Public Education… | 2 |
Educational Measurement:… | 1 |
Society for Research on… | 1 |
Sociological Methods &… | 1 |
Author
Arends-Tòth, Judit | 1 |
Bais, Frank | 1 |
Cope, Ronald T. | 1 |
Cox, Kyle | 1 |
Douhou, Salima | 1 |
Griph, Gerald W. | 1 |
Ito, Kyoko | 1 |
Kelcey, Ben | 1 |
Kieruj, Natalia | 1 |
Lugtig, Peter | 1 |
Morren, Mattijn | 1 |
More ▼ |
Publication Type
Reports - Research | 5 |
Speeches/Meeting Papers | 4 |
Reports - Descriptive | 3 |
Journal Articles | 2 |
Numerical/Quantitative Data | 2 |
Reports - Evaluative | 1 |
Education Level
Elementary Secondary Education | 2 |
Elementary Education | 1 |
Audience
Researchers | 1 |
Location
New Mexico | 2 |
Netherlands | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Trends in International… | 1 |
What Works Clearinghouse Rating
Bais, Frank; Schouten, Barry; Lugtig, Peter; Toepoel, Vera; Arends-Tòth, Judit; Douhou, Salima; Kieruj, Natalia; Morren, Mattijn; Vis, Corrie – Sociological Methods & Research, 2019
Item characteristics can have a significant effect on survey data quality and may be associated with measurement error. Literature on data quality and measurement error is often inconclusive. This could be because item characteristics used for detecting measurement error are not coded unambiguously. In our study, we use a systematic coding…
Descriptors: Foreign Countries, National Surveys, Error of Measurement, Test Items
Kelcey, Ben; Wang, Shanshan; Cox, Kyle – Society for Research on Educational Effectiveness, 2016
Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…
Descriptors: Test Items, Models, Evaluators, Longitudinal Studies
Sykes, Robert C.; Ito, Kyoko; Wang, Zhen – Educational Measurement: Issues and Practice, 2008
Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…
Descriptors: Test Items, Mathematics Tests, Reading Tests, Scoring
Cope, Ronald T. – 1987
This study used generalizability theory and other statistical concepts to assess the application of the Angoff method to setting cutoff scores on two professional certification tests. A panel of ten judges gave pre- and post-feedback Angoff probability ratings of items of two forms of a professional certification test, and another panel of nine…
Descriptors: Certification, Correlation, Cutting Scores, Error of Measurement
New Mexico Public Education Department, 2007
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring
van der Linden, Wim J. – 1982
A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The…
Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Interrater Reliability
Rothman, M. L.; And Others – 1982
A practical application of generalizability theory, demonstrating how the variance components contribute to understanding and interpreting the data collected to evaluate a program, is described. The evaluation concerned 120 learning modules developed for the Dental Auxiliary Education Project. The goals of the project were to design, implement,…
Descriptors: Correlation, Data Collection, Dental Schools, Educational Research
Smith, Teresa A. – 1997
The Third International Mathematics and Science Study (TIMSS) measured mathematics and science achievement of middle school students in more than 40 countries. About one quarter of the tests' nearly 300 items were free response items requiring students to generate their own answers. Scoring these responses used a two-digit diagnostic code rubric…
Descriptors: Comparative Education, English, Error of Measurement, Foreign Countries
Griph, Gerald W. – New Mexico Public Education Department, 2006
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring