Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 10 |
Descriptor
Scoring | 10 |
Simulation | 10 |
Item Response Theory | 5 |
Scores | 4 |
Statistical Analysis | 4 |
Test Items | 4 |
Comparative Analysis | 3 |
Computer Software | 3 |
Evaluation Methods | 3 |
Models | 3 |
Accuracy | 2 |
More ▼ |
Source
ProQuest LLC | 10 |
Author
Brown, Robin T. | 1 |
Deng, Nina | 1 |
Dobria, Lidia | 1 |
Green, Jennifer L. | 1 |
Greifer, Noah | 1 |
Hacer Karamese | 1 |
MacInnes, Jann Marie Wise | 1 |
Shin, Hyo Jeong | 1 |
Wang, Keyin | 1 |
Yun, Jiyeo | 1 |
Publication Type
Dissertations/Theses -… | 10 |
Education Level
Higher Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Hacer Karamese – ProQuest LLC, 2022
Multistage adaptive testing (MST) has become popular in the testing industry because the research has shown that it combines the advantages of both linear tests and item-level computer adaptive testing (CAT). The previous research efforts primarily focused on MST design issues such as panel design, module length, test length, distribution of test…
Descriptors: Adaptive Testing, Scoring, Computer Assisted Testing, Design
Greifer, Noah – ProQuest LLC, 2018
There has been some research in the use of propensity scores in the context of measurement error in the confounding variables; one recommended method is to generate estimates of the mis-measured covariate using a latent variable model, and to use those estimates (i.e., factor scores) in place of the covariate. I describe a simulation study…
Descriptors: Evaluation Methods, Probability, Scores, Statistical Analysis
Wang, Keyin – ProQuest LLC, 2017
The comparison of item-level computerized adaptive testing (CAT) and multistage adaptive testing (MST) has been researched extensively (e.g., Kim & Plake, 1993; Luecht et al., 1996; Patsula, 1999; Jodoin, 2003; Hambleton & Xing, 2006; Keng, 2008; Zheng, 2012). Various CAT and MST designs have been investigated and compared under the same…
Descriptors: Comparative Analysis, Computer Assisted Testing, Adaptive Testing, Test Items
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Brown, Robin T. – ProQuest LLC, 2017
This scholarly project was a non-experimental, pre/post-test design to (a) facilitate the voluntary adoption of the National Early Warning Score (NEWS), and (b) develop clinical decision making (CDM) in one cohort of junior level nursing students participating in a simulation lab. NEWS is an evidence-based predictive scoring tool developed by the…
Descriptors: Nursing Students, Scoring, Evidence Based Practice, Prediction
Shin, Hyo Jeong – ProQuest LLC, 2015
This dissertation is comprised of three papers that propose and apply psychometric models to deal with complexities and challenges in large-scale assessments, focusing on modeling rater effects and complex learning progressions. In particular, three papers investigate extensions and applications of multilevel and multidimensional item response…
Descriptors: Item Response Theory, Psychometrics, Models, Measurement
Dobria, Lidia – ProQuest LLC, 2011
Performance assessments rely on the expert judgment of raters for the measurement of the quality of responses, and raters unavoidably introduce error in the scoring process. Defined as the tendency of a rater to assign higher or lower ratings, on average, than those assigned by other raters, even after accounting for differences in examinee…
Descriptors: Simulation, Performance Based Assessment, Performance Tests, Scoring
Deng, Nina – ProQuest LLC, 2011
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…
Descriptors: Item Response Theory, Test Theory, Computation, Classification
MacInnes, Jann Marie Wise – ProQuest LLC, 2009
Multilevel data often exist in educational studies. The focus of this study is to consider differential item functioning (DIF) for dichotomous items from a multilevel perspective. One of the most often used methods for detecting DIF in dichotomously scored items is the Mantel-Haenszel log odds-ratio. However, the Mantel-Haenszel reduces the…
Descriptors: Test Bias, Simulation, Item Response Theory, Test Items
Green, Jennifer L. – ProQuest LLC, 2010
Value-added modeling is an alternative approach to test-based accountability systems based on the proportions of students scoring at or above pre-determined proficiency levels. Value-added modeling techniques provide opportunities to estimate an individual teacher's effect on student learning, while allowing for the possibility to control for the…
Descriptors: Simulation, Scoring, Psychometrics, Data Analysis