NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Type
Reports - Evaluative45
Journal Articles34
Speeches/Meeting Papers5
Reports - Research1
Audience
Practitioners1
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 1 to 15 of 45 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022
The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord's foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions…
Descriptors: Equated Scores, Test Items, Scores, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Raykov, Tenko; Marcoulides, George A.; Pusic, Martin – Measurement: Interdisciplinary Research and Perspectives, 2021
An interval estimation procedure is discussed that can be used to evaluate the probability of a particular response for a binary or binary scored item at a pre-specified point along an underlying latent continuum. The item is assumed to: (a) be part of a unidimensional multi-component measuring instrument that may contain also polytomous items,…
Descriptors: Item Response Theory, Computation, Probability, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Johnson, Matthew S.; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2020
One common score reported from diagnostic classification assessments is the vector of posterior means of the skill mastery indicators. As with any assessment, it is important to derive and report estimates of the reliability of the reported scores. After reviewing a reliability measure suggested by Templin and Bradshaw, this article suggests three…
Descriptors: Reliability, Probability, Skill Development, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Wallin, Gabriel; Wiberg, Marie – Journal of Educational and Behavioral Statistics, 2019
When equating two test forms, the equated scores will be biased if the test groups differ in ability. To adjust for the ability imbalance between nonequivalent groups, a set of common items is often used. When no common items are available, it has been suggested to use covariates correlated with the test scores instead. In this article, we reduce…
Descriptors: Equated Scores, Test Items, Probability, College Entrance Examinations
Peer reviewed Peer reviewed
Direct linkDirect link
Solano-Flores, Guillermo – Applied Measurement in Education, 2014
This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…
Descriptors: English Language Learners, Test Items, Probability, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Raykov, Tenko; Marcoulides, George A.; Lee, Chun-Lung; Chang, Chi – Educational and Psychological Measurement, 2013
This note is concerned with a latent variable modeling approach for the study of differential item functioning in a multigroup setting. A multiple-testing procedure that can be used to evaluate group differences in response probabilities on individual items is discussed. The method is readily employed when the aim is also to locate possible…
Descriptors: Test Bias, Statistical Analysis, Models, Hypothesis Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012
In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…
Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ferrando, Pere J. – Psicologica: International Journal of Methodology and Experimental Psychology, 2012
Model-based attempts to rigorously study the broad and imprecise concept of "discriminating power" are scarce, and generally limited to nonlinear models for binary responses. This paper proposes a comprehensive framework for assessing the discriminating power of item and test scores which are analyzed or obtained using Spearman's…
Descriptors: Student Evaluation, Psychometrics, Test Items, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Demars, Christine E. – Applied Measurement in Education, 2011
Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…
Descriptors: Effect Size, Test Bias, Probability, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Thompson, Nathan A. – Practical Assessment, Research & Evaluation, 2011
Computerized classification testing (CCT) is an approach to designing tests with intelligent algorithms, similar to adaptive testing, but specifically designed for the purpose of classifying examinees into categories such as "pass" and "fail." Like adaptive testing for point estimation of ability, the key component is the…
Descriptors: Adaptive Testing, Computer Assisted Testing, Classification, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Tseng, Mei-Hui; Fu, Chung-Pei; Wilson, Brenda N.; Hu, Fu-Chang – Research in Developmental Disabilities: A Multidisciplinary Journal, 2010
The aim of this study was to adapt and evaluate the Developmental Coordination Disorder Questionnaire (DCDQ) for use in Chinese-speaking countries. A total of 1082 parents completed the DCDQ and 35 parents repeated it after 2 weeks for test-retest reliability. Two items were deleted after examination of test consistency. Cronbach's [alpha] for the…
Descriptors: Test Validity, Measures (Individuals), Psychometrics, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Arndt, Jason – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2010
Using 3 experiments, I examined false memory for encoding context by presenting Deese-Roediger-McDermott themes (Deese, 1959; Roediger & McDermott, 1995) in usual-looking fonts and by testing related, but unstudied, lure items in a font that was shown during encoding. In 2 of the experiments, testing lure items in the font used to study their…
Descriptors: Testing, Recognition (Psychology), Experiments, Memory
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Zheng, Yinggan; Cui, Ying – Journal of Educational Measurement, 2008
The purpose of this study is to describe how the attribute hierarchy method (AHM) can be used to evaluate differential group performance at the cognitive attribute level. The AHM is a psychometric method for classifying examinees' test item responses into a set of attribute-mastery patterns associated with different components in a cognitive model…
Descriptors: Test Items, Student Reaction, Pattern Recognition, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Chandler, Steve – Journal of Psycholinguistic Research, 2008
Skousen's (1989, Analogical modeling of language, Kluwer Academic Publishers, Dordrecht) Analogical Model (AM) predicts behavior such as spelling pronunciation by comparing the characteristics of a test item (a given input word) to those of individual exemplars in a data set of previously encountered items. While AM and other exemplar-based models…
Descriptors: Test Items, Reaction Time, Psycholinguistics, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009
Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…
Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring
Previous Page | Next Page ยป
Pages: 1  |  2  |  3