Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 12 |
Descriptor
Bayesian Statistics | 19 |
Scores | 19 |
Test Items | 19 |
Item Response Theory | 8 |
Comparative Analysis | 5 |
Models | 5 |
Computer Assisted Testing | 4 |
Foreign Countries | 4 |
Statistical Analysis | 4 |
Test Construction | 4 |
Adaptive Testing | 3 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 13 |
Reports - Research | 12 |
Reports - Evaluative | 4 |
Dissertations/Theses -… | 2 |
Numerical/Quantitative Data | 1 |
Reports - Descriptive | 1 |
Speeches/Meeting Papers | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Kreitchmann, Rodrigo S.; Sorrel, Miguel A.; Abad, Francisco J. – Educational and Psychological Measurement, 2023
Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of…
Descriptors: Measurement Techniques, Questionnaires, Social Desirability, Adaptive Testing
Chen, Yunxiao; Lee, Yi-Hsuan; Li, Xiaoou – Journal of Educational and Behavioral Statistics, 2022
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this article, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric…
Descriptors: Standardized Tests, Test Items, Test Validity, Scores
Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023
Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…
Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Foster, Colin – International Journal of Science and Mathematics Education, 2022
Confidence assessment (CA) involves students stating alongside each of their answers a confidence rating (e.g. 0 low to 10 high) to express how certain they are that their answer is correct. Each student's score is calculated as the sum of the confidence ratings on the items that they answered correctly, minus the sum of the confidence ratings on…
Descriptors: Mathematics Tests, Mathematics Education, Secondary School Students, Meta Analysis
Pei-Hsuan Chiu – ProQuest LLC, 2018
Evidence of student growth is a primary outcome of interest for educational accountability systems. When three or more years of student test data are available, questions around how students grow and what their predicted growth is can be answered. Given that test scores contain measurement error, this error should be considered in growth and…
Descriptors: Bayesian Statistics, Scores, Error of Measurement, Growth Models
Luo, Yong; Dimitrov, Dimiter M. – Educational and Psychological Measurement, 2019
Plausible values can be used to either estimate population-level statistics or compute point estimates of latent variables. While it is well known that five plausible values are usually sufficient for accurate estimation of population-level statistics in large-scale surveys, the minimum number of plausible values needed to obtain accurate latent…
Descriptors: Item Response Theory, Monte Carlo Methods, Markov Processes, Outcome Measures
Magis, David; Tuerlinckx, Francis; De Boeck, Paul – Journal of Educational and Behavioral Statistics, 2015
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Descriptors: Test Bias, Test Items, Regression (Statistics), Scores
Kim, Do-Hong; Lambert, Richard G.; Durham, Sean; Burts, Diane C. – Early Education and Development, 2018
Research Findings: This study builds on prior work related to the assessment of young dual language learners (DLLs). The purposes of the study were to (a) determine whether latent subgroups of preschool DLLs would replicate those found previously and (b) examine the validity of GOLD® by Teaching Strategies with empirically derived subgroups.…
Descriptors: Preschool Education, Teaching Methods, Bilingualism, Bilingual Education
Stiller, Jurik; Hartmann, Stefan; Mathesius, Sabrina; Straube, Philipp; Tiemann, Rüdiger; Nordmeier, Volkhard; Krüger, Dirk; Upmeier zu Belzen, Annette – Assessment & Evaluation in Higher Education, 2016
The aim of this study was to improve the criterion-related test score interpretation of a text-based assessment of scientific reasoning competencies in higher education by evaluating factors which systematically affect item difficulty. To provide evidence about the specific demands which test items of various difficulty make on pre-service…
Descriptors: Logical Thinking, Scientific Concepts, Difficulty Level, Test Items
Md Desa, Zairul Nor Deana – ProQuest LLC, 2012
In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…
Descriptors: Item Response Theory, Computation, Reliability, Classification
Hooker, Giles; Finkelman, Matthew – Psychometrika, 2010
Hooker, Finkelman, and Schwartzman ("Psychometrika," 2009, in press) defined a paradoxical result as the attainment of a higher test score by changing answers from correct to incorrect and demonstrated that such results are unavoidable for maximum likelihood estimates in multidimensional item response theory. The potential for these results to…
Descriptors: Models, Scores, Item Response Theory, Psychometrics
van der Linden, Wim J.; Vos, Hans J. – 1994
This paper presents some Bayesian theories of simultaneous optimization of decision rules for test-based decisions. Simultaneous decision making arises when an institution has to make a series of selection, placement, or mastery decisions with respect to subjects from a population. An obvious example is the use of individualized instruction in…
Descriptors: Bayesian Statistics, Decision Making, Foreign Countries, Scores

Lin, Miao-Hsiang; Hsiung, Chao A. – Psychometrika, 1994
Two simple empirical approximate Bayes estimators are introduced for estimating domain scores under binomial and hypergeometric distributions respectively. Criteria are established regarding use of these functions over maximum likelihood estimation counterparts. (SLD)
Descriptors: Adaptive Testing, Bayesian Statistics, Computation, Equations (Mathematics)
Wang, Xiaohui; Bradlow, Eric T.; Wainer, Howard – ETS Research Report Series, 2005
SCORIGHT is a very general computer program for scoring tests. It models tests that are made up of dichotomously or polytomously rated items or any kind of combination of the two through the use of a generalized item response theory (IRT) formulation. The items can be presented independently or grouped into clumps of allied items (testlets) or in…
Descriptors: Computer Assisted Testing, Statistical Analysis, Test Items, Bayesian Statistics
Previous Page | Next Page »
Pages: 1 | 2