NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)3
Since 2006 (last 20 years)5
Audience
Location
Canada2
New York1
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test)1
What Works Clearinghouse Rating
Showing all 8 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Bulut, Okan; Guo, Qi; Zhang, Xinxin – Review of Educational Research, 2017
Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also…
Descriptors: Multiple Choice Tests, Difficulty Level, Accuracy, Error Patterns
Peer reviewed Peer reviewed
Direct linkDirect link
Bulut, Okan; Quo, Qi; Gierl, Mark J. – Large-scale Assessments in Education, 2017
Position effects may occur in both paper--pencil tests and computerized assessments when examinees respond to the same test items located in different positions on the test. To examine position effects in large-scale assessments, previous studies often used multilevel item response models within the generalized linear mixed modeling framework.…
Descriptors: Structural Equation Models, Educational Assessment, Measurement, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis
Gierl, Mark J.; Leighton, Jacqueline P.; Wang, Changjiang; Zhou, Jiawen; Gokiert, Rebecca; Tan, Adele – College Board, 2009
The purpose of the study is to present research focused on validating the four algebra cognitive models in Gierl, Wang, et al., using student response data collected with protocol analysis methods to evaluate the knowledge structures and processing skills used by a sample of SAT test takers.
Descriptors: Algebra, Mathematics Tests, College Entrance Examinations, Student Attitudes
Peer reviewed Peer reviewed
Gierl, Mark J.; Leighton, Jacqueline P.; Hunka, Stephen M. – Educational Measurement: Issues and Practice, 2000
Discusses the logic of the rule-space model (K. Tatsuoka, 1983) as it applies to test development and analysis. The rule-space model is a statistical method for classifying examinees' test item responses into a set of attribute-mastery patterns associated with different cognitive skills. Directs readers to a tutorial that may be downloaded. (SLD)
Descriptors: Item Analysis, Item Response Theory, Test Construction, Test Items
Peer reviewed Peer reviewed
Gierl, Mark J.; Rogers, W. Todd; Klinger, Don A. – Alberta Journal of Educational Research, 1999
Evaluates the equivalence of translated achievement tests administered to 4,400 English- and French-speaking sixth-graders. Items displaying differential item functioning were flagged using three statistical methods; results were relatively consistent across methods, but not identical. Substantive review of French items via back-translation to…
Descriptors: Achievement Tests, Evaluation Methods, Evaluation Research, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Leighton, Jacqueline P.; Tan, Xuan – Journal of Educational Measurement, 2006
DETECT, the acronym for Dimensionality Evaluation To Enumerate Contributing Traits, is an innovative and relatively new nonparametric dimensionality assessment procedure used to identify mutually exclusive, dimensionally homogeneous clusters of items using a genetic algorithm ( Zhang & Stout, 1999). Because the clusters of items are mutually…
Descriptors: Program Evaluation, Cluster Grouping, Evaluation Methods, Multivariate Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Ercikan, Kadriye; Gierl, Mark J.; McCreith, Tanya; Puhan, Gautam; Koh, Kim – Applied Measurement in Education, 2004
This research examined the degree of comparability and sources of incomparability of English and French versions of reading, mathematics, and science tests that were administered as part of a survey of achievement in Canada. The results point to substantial psychometric differences between the 2 language versions. Approximately 18% to 36% of the…
Descriptors: Foreign Countries, Psychometrics, Science Tests, French