NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Lenhard, Wolfgang; Lenhard, Alexandra – Educational and Psychological Measurement, 2021
The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random…
Descriptors: Test Norms, Scores, Regression (Statistics), Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Lu, Ru – ETS Research Report Series, 2018
The purpose of this study was to evaluate the effectiveness of linking test scores by using test takers' background data to form pseudo-equivalent groups (PEG) of test takers. Using 4 operational test forms that each included 100 items and were taken by more than 30,000 test takers, we created 2 half-length research forms that had either 20…
Descriptors: Test Items, Item Banks, Difficulty Level, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Taylor, Robin T.; Bishop, Pamela R.; Lenhart, Suzanne; Gross, Louis J.; Sturner, Kelly – CBE - Life Sciences Education, 2020
We describe the development and initial validity assessment of the 20-item BioCalculus Assessment (BCA), with the objective of comparing undergraduate life science students' understanding of calculus concepts in different courses with alternative emphases (with and without focus on biological applications). The development process of the BCA…
Descriptors: Test Construction, Mathematics Tests, Calculus, Test Validity
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2014
The purpose of this study was to investigate the potential impact of misrouting under a 2-stage multistage test (MST) design, which includes 1 routing and 3 second-stage modules. Simulations were used to create a situation in which a large group of examinees took each of the 3 possible MST paths (high, middle, and low). We compared differences in…
Descriptors: Comparative Analysis, Difficulty Level, Scores, Test Wiseness
Peer reviewed Peer reviewed
Direct linkDirect link
Facon, Bruno; Nuchadee, Marie-Laure; Bollengier, Therese – American Journal on Intellectual and Developmental Disabilities, 2012
This study aimed to discover whether general receptive vocabulary is qualitatively phenotypical in Down syndrome. Sixty-two participants with Down syndrome (M age = 16.74 years, SD = 3.28) were individually matched on general vocabulary raw total score with 62 participants with intellectual disability of undifferentiated etiology (M age = 16.20…
Descriptors: Down Syndrome, Adolescents, Etiology, Receptive Language
Powers, Sonya; Turhan, Ahmet; Binici, Salih – Pearson, 2012
The population sensitivity of vertical scaling results was evaluated for a state reading assessment spanning grades 3-10 and a state mathematics test spanning grades 3-8. Subpopulations considered included males and females. The 3-parameter logistic model was used to calibrate math and reading items and a common item design was used to construct…
Descriptors: Scaling, Equated Scores, Standardized Tests, Reading Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Lamprianou, Iasonas – International Journal of Testing, 2008
This study investigates the effect of reporting the unadjusted raw scores in a high-stakes language exam when raters differ significantly in severity and self-selected questions differ significantly in difficulty. More sophisticated models, introducing meaningful facets and parameters, are successively used to investigate the characteristics of…
Descriptors: High Stakes Tests, Raw Scores, Item Response Theory, Language Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Liao, Chi-Wen; Livingston, Samuel A. – ETS Research Report Series, 2008
Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…
Descriptors: Equated Scores, Item Analysis, Test Items, Difficulty Level
Pommerich, Mary; And Others – 1995
The Mantel-Haenszel (MH) statistic for identifying differential item functioning (DIF) commonly conditions on the observed test score as a surrogate for conditioning on latent ability. When the comparison group distributions are not completely overlapping (i.e., are incongruent), the observed score represents different levels of latent ability…
Descriptors: Ability, Comparative Analysis, Difficulty Level, Item Bias
Bell, Anita I. – 1979
An equating study was conducted on the Certifying Examination for Primary Care Physician's Assistants to compare the ability of current examinees with the standardization group and to determine if current test items are more difficult than previous items. Using 46 common items from the multiple choice section, the 1978 exam was equated to the 1976…
Descriptors: Comparative Analysis, Difficulty Level, Educational Trends, Equated Scores