Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 23 |
Descriptor
Probability | 45 |
Test Items | 45 |
Item Response Theory | 15 |
Simulation | 13 |
Difficulty Level | 11 |
Models | 9 |
Comparative Analysis | 7 |
Psychometrics | 7 |
Adaptive Testing | 6 |
Computation | 6 |
Computer Assisted Testing | 6 |
More ▼ |
Source
Author
Publication Type
Reports - Evaluative | 45 |
Journal Articles | 34 |
Speeches/Meeting Papers | 5 |
Reports - Research | 1 |
Education Level
Higher Education | 2 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Grade 8 | 1 |
Postsecondary Education | 1 |
Audience
Practitioners | 1 |
Location
China | 1 |
Oregon | 1 |
Sweden | 1 |
United Kingdom | 1 |
Vermont | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Law School Admission Test | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022
The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord's foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions…
Descriptors: Equated Scores, Test Items, Scores, Probability
Raykov, Tenko; Marcoulides, George A.; Pusic, Martin – Measurement: Interdisciplinary Research and Perspectives, 2021
An interval estimation procedure is discussed that can be used to evaluate the probability of a particular response for a binary or binary scored item at a pre-specified point along an underlying latent continuum. The item is assumed to: (a) be part of a unidimensional multi-component measuring instrument that may contain also polytomous items,…
Descriptors: Item Response Theory, Computation, Probability, Test Items
The Reliability of the Posterior Probability of Skill Attainment in Diagnostic Classification Models
Johnson, Matthew S.; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2020
One common score reported from diagnostic classification assessments is the vector of posterior means of the skill mastery indicators. As with any assessment, it is important to derive and report estimates of the reliability of the reported scores. After reviewing a reliability measure suggested by Templin and Bradshaw, this article suggests three…
Descriptors: Reliability, Probability, Skill Development, Classification
Wallin, Gabriel; Wiberg, Marie – Journal of Educational and Behavioral Statistics, 2019
When equating two test forms, the equated scores will be biased if the test groups differ in ability. To adjust for the ability imbalance between nonequivalent groups, a set of common items is often used. When no common items are available, it has been suggested to use covariates correlated with the test scores instead. In this article, we reduce…
Descriptors: Equated Scores, Test Items, Probability, College Entrance Examinations
Solano-Flores, Guillermo – Applied Measurement in Education, 2014
This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…
Descriptors: English Language Learners, Test Items, Probability, Test Bias
Raykov, Tenko; Marcoulides, George A.; Lee, Chun-Lung; Chang, Chi – Educational and Psychological Measurement, 2013
This note is concerned with a latent variable modeling approach for the study of differential item functioning in a multigroup setting. A multiple-testing procedure that can be used to evaluate group differences in response probabilities on individual items is discussed. The method is readily employed when the aim is also to locate possible…
Descriptors: Test Bias, Statistical Analysis, Models, Hypothesis Testing
Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012
In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…
Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement
Ferrando, Pere J. – Psicologica: International Journal of Methodology and Experimental Psychology, 2012
Model-based attempts to rigorously study the broad and imprecise concept of "discriminating power" are scarce, and generally limited to nonlinear models for binary responses. This paper proposes a comprehensive framework for assessing the discriminating power of item and test scores which are analyzed or obtained using Spearman's…
Descriptors: Student Evaluation, Psychometrics, Test Items, Scores
Demars, Christine E. – Applied Measurement in Education, 2011
Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…
Descriptors: Effect Size, Test Bias, Probability, Difficulty Level
Thompson, Nathan A. – Practical Assessment, Research & Evaluation, 2011
Computerized classification testing (CCT) is an approach to designing tests with intelligent algorithms, similar to adaptive testing, but specifically designed for the purpose of classifying examinees into categories such as "pass" and "fail." Like adaptive testing for point estimation of ability, the key component is the…
Descriptors: Adaptive Testing, Computer Assisted Testing, Classification, Probability
Tseng, Mei-Hui; Fu, Chung-Pei; Wilson, Brenda N.; Hu, Fu-Chang – Research in Developmental Disabilities: A Multidisciplinary Journal, 2010
The aim of this study was to adapt and evaluate the Developmental Coordination Disorder Questionnaire (DCDQ) for use in Chinese-speaking countries. A total of 1082 parents completed the DCDQ and 35 parents repeated it after 2 weeks for test-retest reliability. Two items were deleted after examination of test consistency. Cronbach's [alpha] for the…
Descriptors: Test Validity, Measures (Individuals), Psychometrics, Probability
Arndt, Jason – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2010
Using 3 experiments, I examined false memory for encoding context by presenting Deese-Roediger-McDermott themes (Deese, 1959; Roediger & McDermott, 1995) in usual-looking fonts and by testing related, but unstudied, lure items in a font that was shown during encoding. In 2 of the experiments, testing lure items in the font used to study their…
Descriptors: Testing, Recognition (Psychology), Experiments, Memory
Gierl, Mark J.; Zheng, Yinggan; Cui, Ying – Journal of Educational Measurement, 2008
The purpose of this study is to describe how the attribute hierarchy method (AHM) can be used to evaluate differential group performance at the cognitive attribute level. The AHM is a psychometric method for classifying examinees' test item responses into a set of attribute-mastery patterns associated with different components in a cognitive model…
Descriptors: Test Items, Student Reaction, Pattern Recognition, Psychometrics
Chandler, Steve – Journal of Psycholinguistic Research, 2008
Skousen's (1989, Analogical modeling of language, Kluwer Academic Publishers, Dordrecht) Analogical Model (AM) predicts behavior such as spelling pronunciation by comparing the characteristics of a test item (a given input word) to those of individual exemplars in a data set of previously encountered items. While AM and other exemplar-based models…
Descriptors: Test Items, Reaction Time, Psycholinguistics, Probability
Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009
Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…
Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring